Author Topic: Speeding Up The RFM69 Library  (Read 15497 times)

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Speeding Up The RFM69 Library
« on: January 12, 2017, 09:50:06 PM »
Has anyone fooled with writing smaller tighter code to speed up Felix's library in order to save power?  Without any major heroics I was able to reduce the time needed to transmit a 6-byte payload from 1,300us to 1,000us by essentially just rewriting Select, Unselect, ReadReg, and WriteReg.  Once I'm happy with my version of the library I'll share it here for everyone but it seems there is a decent power saving potential just in speeding up the code.

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #1 on: January 12, 2017, 11:28:23 PM »
Okay wow, now I've got the actual transmission down to 476us just by ditching the bloated and slow digitalRead() in sendFrame in favor of

Code: [Select]
PIND & _BV(_interruptPin)

I knew it was too good to be true.  It was just an error causing the while loop to go false immediately.
« Last Edit: January 13, 2017, 12:07:01 AM by ChemE »

joelucid

  • Hero Member
  • *****
  • Posts: 868
Re: Speeding Up The RFM69 Library
« Reply #2 on: January 13, 2017, 02:20:40 AM »
I did a lot of this stuff to make the library smaller to fit in the bootloader. BTW a good way to save power is to sleep the CPU while waiting for packetsent or payloaddone. This is particularly useful for coin cells where peak current limits battery life.

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #3 on: January 13, 2017, 07:31:31 AM »
I did a lot of this stuff to make the library smaller to fit in the bootloader. BTW a good way to save power is to sleep the CPU while waiting for packetsent or payloaddone. This is particularly useful for coin cells where peak current limits battery life.

Geez I bet you did then!  I forgot that the bootloader knows how to work the radio.  Thanks for the tip on sleeping.  It seems to take around 500us between switching to Tx and getting back a signal DIO0.  I hope the radio isn't sucking down 17mA for that whole period or my calculated power budget is far from accurate.

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #4 on: January 13, 2017, 02:48:09 PM »
...Without any major heroics I was able to reduce the time needed to transmit a 6-byte payload from 1,300us to 1,000us by...

I spent some time reading Joe's OTA bootloader thread and decided like he did to dispense with everything.  All the waits, whiles, etc and just bang some bits.  The same transmission (6 byte payload with 7 bytes of preamble/sync/len/address) now only takes 600 microseconds.  Not bad considering the actual transmission of 104 bits at 300 kbps should take 345 microseconds and I have to write all those bits to the FIFO before the transmission can begin.  Shame I can't get the first byte or two into the FIFO, start the transmission and then continue to fill the FIFO while the transmission is in progress.  It takes around 28 microseconds to move a byte to the FIFO so I could in theory save another 300 microseconds if this were possible.  I need to clean this code up for a day or two but I'll be sharing this.  This could be extended back to do ACKs and whatnot but right now it just turns on the radio, fills the FIFO, and sends it out with no retry or ACK request.  I like to get down to bare bones before I add back.

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6866
  • Country: us
    • LowPowerLab
Re: Speeding Up The RFM69 Library
« Reply #5 on: January 13, 2017, 02:52:46 PM »
I'll be sharing this.
Very nice sir, sharing is caring!

WhiteHare

  • Hero Member
  • *****
  • Posts: 1300
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #6 on: January 13, 2017, 04:33:33 PM »
Shame I can't get the first byte or two into the FIFO, start the transmission and then continue to fill the FIFO while the transmission is in progress. 

The datasheet refers to a similar technique for handling large packets, but I suppose it might (?) also work with shorter packets like yours also:

Quote
5.5.6. Handling Large Packets
When Payload length exceeds FIFO size (66 bytes) whether in fixed, variable or unlimited length packet format, in addition
to PacketSent in Tx and PayloadReady or CrcOk in Rx, the FIFO interrupts/flags can be used as described below:
􀂊 For Tx:
FIFO can be prefilled in Sleep/Standby but must be refilled "on-the-fly" during Tx with the rest of the payload.
1) Prefill FIFO (in Sleep/Standby first or directly in Tx mode) until FifoThreshold or FifoFull is set
2) In Tx, wait for FifoThreshold or FifoNotEmpty to be cleared (i.e. FIFO is nearly empty)
3) Write bytes into the FIFO until FifoThreshold or FifoFull is set.
4) Continue to step 2 until the entire message has been written to the FIFO (PacketSent will fire when the last bit of the
packet has been sent).

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #7 on: January 13, 2017, 04:44:36 PM »
Okay, I've satisfied my #define code art urges at least with the most fundamental routines.

Code: [Select]
#define 	SS_PIN			PB2
#define         interruptPin            2
#define SELECT noInterrupts(); SS_WRITE_LOW
#define UNSELECT SS_WRITE_HIGH; interrupts()  //if (!_inISR) interrupts()
#define    SS_WRITE_LOW      PORTB &= ~(1<<SS_PIN)
#define    SS_WRITE_HIGH      PORTB |= 1<<SS_PIN
#define         WAIT_WHILE_SPI_BUSY     asm volatile("nop"); while (!(SPSR & 1<<SPIF))

uint8_t readReg(uint8_t addr) {
  SELECT;
  SPDR = ( addr & 0x7F );
  WAIT_WHILE_SPI_BUSY;
  SPDR = ( 0 );
  WAIT_WHILE_SPI_BUSY;
  UNSELECT;
  return SPDR;
}

void writeReg(uint8_t addr, uint8_t value) {
  SELECT;
  SPDR = ( addr | 0x80 );
  WAIT_WHILE_SPI_BUSY;
  SPDR = ( value );
  WAIT_WHILE_SPI_BUSY;
  UNSELECT;
}

static inline uint8_t SPI_XFER(uint8_t data) {
  SPDR = data;
  WAIT_WHILE_SPI_BUSY;
  return(SPDR);
}

This should more or less play ball with sendFrame in Felix's library except one needs to change SPI.transfer to SPI_XFER.  There was a lot of execution time given over to basically thrashing the SPI settings and jumping into and back from routines.  What Felix wrote is obviously much safer than this, but in a well controlled loop where one isn't changing SPI settings, there is no need save them, change them, and restore them each time we read and write a byte.

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #8 on: January 13, 2017, 04:49:28 PM »
Full Code:

Code: [Select]
#include "HTU21D.h"
#include "LowPower.h"
#include <RFM69.h>
#include <RFM69registers.h>
#include <SPI.h>

// Define various ADC prescaler
#define    ADC_PS_16    (1 << ADPS2)
#define    ADC_PS_32    (1 << ADPS2) | (1 << ADPS0)
#define    ADC_PS_64    (1 << ADPS2) | (1 << ADPS1)
#define    ADC_PS_128    (1 << ADPS2) | (1 << ADPS1) | (1 << ADPS0)
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

//*********************************************************************************************
// *********** IMPORTANT SETTINGS - YOU MUST CHANGE/ONFIGURE TO FIT YOUR HARDWARE *************
//*********************************************************************************************
#define         NETWORKID               100  //the same on all nodes that talk to each other - 170 is 10101010 DC free value
#define         RECEIVER                1    //unique ID of the gateway/receiver
#define         SENDER                  2
#define         NODEID                  SENDER  //change to "SENDER" if this is the sender node (the one with the button)
#define         FREQUENCY               RF69_915MHZ
#define SS_PIN PB2
#define         interruptPin            2
#define SELECT noInterrupts(); SS_WRITE_LOW
#define UNSELECT SS_WRITE_HIGH; interrupts()  //if (!_inISR) interrupts()
#define    SS_WRITE_LOW      PORTB &= ~(1<<SS_PIN)        // Much faster and smaller version of digitalWrite(Pin, LOW)
#define    SS_WRITE_HIGH      PORTB |= 1<<SS_PIN        // Much faster and smaller version of digitalWrite(Pin, HIGH)
#define         WAIT_WHILE_SPI_BUSY     asm volatile("nop"); while (!(SPSR & 1<<SPIF))

RFM69 radio;    //Create an instance of the object
bool _inISR=false;

uint8_t readReg(uint8_t addr) {
  SELECT;
  SPDR = ( addr & 0x7F );
  WAIT_WHILE_SPI_BUSY;
  SPDR = ( 0 );
  WAIT_WHILE_SPI_BUSY;
  UNSELECT;
  return SPDR;
}

void writeReg(uint8_t addr, uint8_t value) {
  SELECT;
  SPDR = ( addr | 0x80 );
  WAIT_WHILE_SPI_BUSY;
  SPDR = ( value );
  WAIT_WHILE_SPI_BUSY;
  UNSELECT;
}

static inline uint8_t SPI_XFER(uint8_t data) {
  SPDR = data;
  WAIT_WHILE_SPI_BUSY;
  return(SPDR);
}

static inline void SendFrame(uint8_t toAddress, const void* buffer, uint8_t bufferSize) {
  uint8_t FOO = (readReg(REG_OPMODE) & 0xE3);    // I see no reason not to cache this and some some reads
  writeReg(REG_OPMODE, FOO | RF_OPMODE_STANDBY); // turn off receiver to prevent reception while filling fifo
  while ((readReg(REG_IRQFLAGS1) & RF_IRQFLAGS1_MODEREADY) == 0x00); // wait for ModeReady
  writeReg(REG_DIOMAPPING1, RF_DIOMAPPING1_DIO0_00); // DIO0 is "Packet Sent"
 
  // write to FIFO
  SELECT;
  SPI_XFER(REG_FIFO | 0x80);
  SPI_XFER(bufferSize + 3);
  SPI_XFER(toAddress);
  SPI_XFER(NODEID);
  SPI_XFER(0x00);

  for (uint8_t i = 0; i < bufferSize; i++) SPI_XFER(((uint8_t*) buffer)[i]);
  UNSELECT;

  // no need to wait for transmit mode to be ready since its handled by the radio
  writeReg(REG_OPMODE, FOO | RF_OPMODE_TRANSMITTER);
  uint32_t txStart = millis();
  while (!(PIND & _BV(interruptPin)) && millis() - txStart < RF69_TX_LIMIT_MS);// wait for DIO0 to turn HIGH signalling transmission finish
  writeReg(REG_OPMODE, FOO | RF_OPMODE_STANDBY);
}

static inline void myInit(void) {
  sei();
 
  // Timer 0 initialization from wiring.c for a ATmega 328P (Arduino Uno rev 3) + 12 bytes to sketch size
  TCCR0A = _BV(WGM01) | _BV(WGM00);      // set timer 0 prescale factor to 64
  TCCR0B = _BV(CS01) | _BV(CS00);        // set timer 0 prescale factor to 64
  TIMSK0 = _BV(TOIE0);                 // enable timer 0 overflow interrupt
 
  // Timer 2 initialization from wiring.c for an ATmega 328P (Arduino Uno rev 3) + 20 bytes to sketch size
  TCCR2A |= _BV(COM2A1) | _BV(WGM20);    // Enable timer 2 to _delay_ms() works properly
  TCCR2B |= CS22;                        // set clkT2S/64 (From prescaler)
 
  // ADC Housekeeping
  ADMUX = _BV(REFS0) | _BV(MUX3) | _BV(MUX2) | _BV(MUX1);    // Set the multiplexer to read the internal bandgap voltage
  ADCSRA |= ADC_PS_32;    // set our own prescaler to 32
}

int main(void) {
  myInit();
  radio.initialize(FREQUENCY,NODEID,NETWORKID);
  radio.setPowerLevel(0);
  radio.sleep();
  Serial.begin(115200);
  initTWI();
 
  uint8_t data[6];  //16-bit temp, 16-bit RH, 16-bit Vcc
  uint16_t startT, elapsed;
 
  for(;;) {
    startT = micros();           // ==================== START THE CLOCK ====================
    sbi(ADCSRA, ADSC);  // start a conversion
    issueCommand(WRITE_USER_REGISTER, ELEVEN_BIT_TEMP);    // this conversation takes 88uS - plenty long enough for the ADC
    issueCommand(TRIGGER_TEMP_MEASURE_NOHOLD,0);
    data[5] = ADCL;    // Avoid 16-bit math in this loop since it adds 40us
    data[6] = ADCH;
    LowPower.powerDown(SLEEP_15MS, ADC_OFF, BOD_OFF);
    readRaw(&data[0]);
    issueCommand(WRITE_USER_REGISTER, EIGHT_BIT_RH);
    issueCommand(TRIGGER_HUMD_MEASURE_NOHOLD,0);
    LowPower.powerDown(SLEEP_15MS, ADC_OFF, BOD_OFF); 
    readRaw(&data[2]);
    TWCR = (1<<TWINT)|(1<<TWEN)| (1<<TWSTO);  // stop the TWI
   
    // Send the data
    SendFrame(RECEIVER, data, 6);
    radio.sleep();
    elapsed = micros()-startT;  // ==================== STOP THE CLOCK ====================
 
 
    float Tamb = ((uint16_t) (data[0]<<8 | data[1])) * (316.296 / 65535.0) - 52.33;
    float RHamb = ((uint16_t) (data[2]<<8 | data[3])) * (125.0 / 65535.0) - 6.0;
    Serial.print("\nTemperature: ");
    Serial.print(Tamb);
    Serial.print("\tHumidity: ");
    Serial.print(RHamb);
    Serial.print("\t\tVcc: ");
    Serial.print(112296/(data[6]<<8 | data[5]));
    Serial.print("\t\tLoop took: ");
    Serial.print(elapsed);
    Serial.print(" us");
    _delay_ms(4);  // Give the UART time to get our output across
    Serial.flush();
   
    LowPower.powerDown(SLEEP_8S, ADC_OFF, BOD_OFF);    // Sleep for 8 seconds
  }  // end for
}  // end main

HTU21D Code (probably compatible with Si7021s?)
Code: [Select]
#include "Arduino.h"
#define   BAUD_RATE                     8000000ul
#define   TRIGGER_TEMP_MEASURE_NOHOLD   0xF3
#define   TRIGGER_HUMD_MEASURE_NOHOLD   0xF5
#define   WRITE_USER_REGISTER           0xE6
#define   ELEVEN_BIT_TEMP               B10000011
#define   EIGHT_BIT_RH                  B00000011
#define   SLA_W                         TWDR = (0x40 << 1)
#define   SLA_R                         TWDR = ((0x40 << 1) + 0x01)
#define   START_TWI                     TWCR = (1<<TWINT) | (1<<TWSTA) | (1<<TWEN); WAIT_FOR_TWI_INT
#define   RESTART_TWI                   TWCR = (1<<TWINT) | (1<<TWEN);              WAIT_FOR_TWI_INT
#define   RESTART_TWI_ACK               TWCR = (1<<TWINT) | (1<<TWEA) | (1<<TWEN);  WAIT_FOR_TWI_INT
#define   WAIT_FOR_TWI_INT              while (!(TWCR & (1<<TWINT)) && ++counter)
#define   STOP_TWI                      TWCR = (1<<TWINT)|(1<<TWEN)| (1<<TWSTO)//;    while ((TWCR & (1<<TWSTO)) && ++counter)
#define   NOT_READY                     (TWSR & 0xF8) == 0x48 && ++counter

static inline void initTWI() {
  DDRC |= (1<<PC3) | (1<<PC2);
  PORTC |= (1<<PC2) | (1<<PC4) | (1<<PC5);
  TWBR=1;  //TWBR = ((F_CPU / BAUD_RATE) - 16) / 2;
}

static inline void stopTWI() {
  uint16_t counter;
  STOP_TWI;
}

static inline void issueCommand(uint8_t comm, uint8_t res) {
  uint16_t counter;
  START_TWI; 
  SLA_W;
  RESTART_TWI;
  TWDR = comm;    // Send the command
  RESTART_TWI;
  if (comm == WRITE_USER_REGISTER) {  // Send the new resolution
    TWDR = res;
    RESTART_TWI;
  } else {    // Issue a stop on the I2C bus so we can enter sleep
    STOP_TWI;
  }
}

static inline void readRaw(uint8_t *ptr) {
  uint16_t counter;
  do {  // Start + SLA(R) until we get an ACK
    START_TWI;
    SLA_R;
    RESTART_TWI;
  } while (NOT_READY);

  // Measurement is ready, read back 2 bytes an store them at the address of ptr
  RESTART_TWI_ACK;    // Set the ACK bit to let the transmitter know we need another byte
  *ptr++ = TWDR; // Write the MSB
  RESTART_TWI;    // Set the NACK bit to let the transmitter know we are done
  *ptr = TWDR & 0xFC; // Write the LSB with the status bits masked off
}

I need (want) to finish culling any references to SPI or RFM69 as I intend for this to be small and self-contained but for now I'd like it in the wild in case anyone else can benefit from this.

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #9 on: January 13, 2017, 04:52:23 PM »
The datasheet refers to a similar technique for handling large packets, but I suppose it might (?) also work with shorter packets like yours also:

Thanks, I'll have to check out the datasheet to see how many bytes have to be in the FIFO to trigger FifoThreshold.  Hopefully it is less than 9 or I'm boned.

EDIT: Looks like one can program this threshold to be anything that 7 bits can hold: RegFifoThresh (0x3C).  Thanks WhiteHare, this may have legs!
« Last Edit: January 13, 2017, 05:02:40 PM by ChemE »

WhiteHare

  • Hero Member
  • *****
  • Posts: 1300
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #10 on: January 13, 2017, 06:07:47 PM »
If your motivation is merely to conserve energy, it might be simpler to just load the FIFO while the RFM69 sleeps, then initiate the Tx, then immediately sleep the atmega328p, and then have the radio wake the atmega328p with an interrupt when Tx finishes. 

If you're primarily wanting to reduce latency, though, your approach seems worthwhile. 
« Last Edit: January 13, 2017, 08:03:25 PM by WhiteHare »

perky

  • Hero Member
  • *****
  • Posts: 873
  • Country: gb
Re: Speeding Up The RFM69 Library
« Reply #11 on: January 13, 2017, 08:24:20 PM »
Also the packet you send might have some static values in it that don't change from one packet to another (like packet type, server address etc.), in which case create a 'template' packet in memory and just change the bytes you need to change before sending. Make sure your SPI runs in the MHz, even at 1MHz the time taken to send a single byte to the FIFO during a burst write is 8us or so plus a little time for polling. I think the 328P has a single data register for SPI, however if you were able to use USART in SPI mode you get the double buffering which would allow back-to-back transfers.
Mark.

TD22057

  • NewMember
  • *
  • Posts: 26
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #12 on: January 14, 2017, 11:40:28 AM »
I haven't had time to play with this library yet: https://github.com/iwanders/plainRFM69  but it uses the RFM auto-mode selection to handle rx/tx which in theory should be more efficient.  There are couple of discussions (here and here) about use w/ the moteino.  Might be worth looking at if it really does improve efficiency.

joelucid

  • Hero Member
  • *****
  • Posts: 868
Re: Speeding Up The RFM69 Library
« Reply #13 on: January 14, 2017, 12:18:53 PM »
Another trick: the rfm69 doesn't overwrite the FIFO with any new data that comes in after a packet has been received. It waits until the fifo is emptied before reading new packets. Given this you can use the FIFO as only data store for incoming data and eliminate the 61 bytes DATA buffer the rfm69 lib uses. Also you don't need to use interrupts to copy the FIFO there.

Joe

ChemE

  • Sr. Member
  • ****
  • Posts: 419
  • Country: us
Re: Speeding Up The RFM69 Library
« Reply #14 on: January 15, 2017, 03:09:04 PM »
I haven't had time to play with this library yet: https://github.com/iwanders/plainRFM69  but it uses the RFM auto-mode selection to handle rx/tx which in theory should be more efficient.  There are couple of discussions (here and here) about use w/ the moteino.  Might be worth looking at if it really does improve efficiency.

Great links thank you!  Automatic mode seems like it might be wildly efficient for what I'm wanting to do with my TH nodes.  It will take me a while to digest this code and mode of operation but I suspect I'll be updating my code to make use of it.