Author Topic: Moteino SPIFlash Frozen after wake up  (Read 847 times)

Jason

  • Jr. Member
  • **
  • Posts: 57
Moteino SPIFlash Frozen after wake up
« on: June 04, 2021, 02:09:01 AM »
Hello,

I think this is similar to this post but a little deeper/different dive.
https://lowpowerlab.com/forum/moteino/one-solution-to-moteino-failure-to-return-from-sleep/msg21274/#msg21274

In troubleshooting different issue, decided to start over with fresh out of the box new hardware. I am running the stock weather mote code, that has been configured to my hardware and network.

Hardware:
Moteino Trace Antenna 915MHz with Flash Memory Chip
LowPowerLab BME280 Board
1 400mAh LiPo battery (from Adafruit)
1 MOhm resistor from VIN to A7 - Battery Monitor
1 MOhm resistor from A7 to Ground
1 Solderless breadboard

I started it May 31, 2021 @ 10:15AM and it ran like a champ until June 2nd @ 4:58AM. See attached graph. I restarted it by cycling the power and it sent only one data point. I power cycled the device later and had the same result. I recharged the battery and had the same result. I did notice the battery voltage was dropping relatively "fast". Before work I fully charged the battery to 4.15V plugged it into the mote, which only sent one data point. 10 hours later the battery was down to 3.93V. To me this implies the mote was not sleeping. WHY? ???

I figured there are 3 suspects, the BME280 sensor, the HCW radio, the Flash Chip. I left out the ATMEGA328P for now, but it could be guilty as well. I started checking voltages with a multimeter. The I2C lines to the BME280 were pulled high which I took to mean they weren't stuck in use. The SPI lines were at 0.57V. Something is happening on the SPI lines. The radio CS pin 10 was high as well, meaning the radio wasn't enabled and shouldn't be talking on the SPI bus. The Flash CS pin 8 was at 1.3V which per the flash chip datasheet page 7.4 on page 38 was between a logic level of high and low, but much closer to low. This seems to be pretty good evidence that the Flash Chip is active and doing something.

Further evidence was done by enabling serial in the sketch and connecting the mote to the pc via an FDTI programmer. I added DEBUGln() statments, aka Serial.Println(), where I suspected the issue was.

Code Snippet from WeatherNode Example (DEBUGln's Added):
Code: [Select]
    
#ifdef BLINK_EN
      Blink(LED_BUILTIN, 5);
    #endif
  }
  DEBUGln("Before SERIALFLUSH"); 
  SERIALFLUSH();
  DEBUGln("Before flash.sleep()");
  flash.sleep();
  DEBUGln("Before radio.sleep()");
  radio.sleep(); //you can comment out this line if you want this node to listen for wireless programming requests

Output:
Code: [Select]
WeatherMote - transmitting at: 915 Mhz...
BAT:5.05v F:75.36 H:43.08 P:29.99 (packet length:33)
Before SERIALFLUSH
Before flash.sleep()

Bing-Bango it is not making it past sleeping the flash chip, this matches up with the readings from the multimeter. So what does flash.sleep() do and why is it stuck there.

From the SPIFlash library, SPIFlash.cpp
Code: [Select]
void SPIFlash::sleep() {
  command(SPIFLASH_SLEEP);
  unselect();
}


Okay, looks simple enough. SPIFLASH_SLEEP is defined as 0xB9 (hexi-decimal notation) which matches up with the command to send the flash controller to power down, as shown on the flash chip datasheet 6.2.2 page 14. Lets look at the command function.

Code: [Select]
/// Send a command to the flash chip, pass TRUE for isWrite when its a write command
void SPIFlash::command(uint8_t cmd, boolean isWrite){
#if defined(__AVR_ATmega32U4__) // Arduino Leonardo, MoteinoLeo
  DDRB |= B00000001;            // Make sure the SS pin (PB0 - used by RFM12B on MoteinoLeo R1) is set as output HIGH!
  PORTB |= B00000001;
#endif
  if (isWrite)
  {
    command(SPIFLASH_WRITEENABLE); // Write Enable
    unselect();
  }
  //wait for any write/erase to complete
  //  a time limit cannot really be added here without it being a very large safe limit
  //  that is because some chips can take several seconds to carry out a chip erase or other similar multi block or entire-chip operations
  //  a recommended alternative to such situations where chip can be or not be present is to add a 10k or similar weak pulldown on the
  //  open drain MISO input which can read noise/static and hence return a non 0 status byte, causing the while() to hang when a flash chip is not present
  if (cmd != SPIFLASH_WAKE) while(busy());
  select();
  SPI.transfer(cmd);
}

I'm not using an ATmega32U4 so the first if statement can be ignored.
isWrite was not passed to the function and therefore isWrite = false (default behavior per SPIFlash.h line 95) so the second if statement can be ignored.
(cmd != SPIFLASH_WAKE) would evaluate as true since cmd = SPIFLASH_SLEEP which means we get to while(busy());
Per the comments above this line in the library, noise or static on the SPI lines can cause the code to get stuck in this while loop. This sounds like what I am experiencing. Looking into the busy function.

Code: [Select]
/// check if the chip is busy erasing/writing
boolean SPIFlash::busy()
{
  /*
  select();
  SPI.transfer(SPIFLASH_STATUSREAD);
  uint8_t status = SPI.transfer(0);
  unselect();
  return status & 1;
  */
  return readStatus() & 1;
}

All this is doing is returning readStatus() & 1;
What is readStatus() doing?

Code: [Select]
/// return the STATUS register
uint8_t SPIFlash::readStatus()
{
  select();
  SPI.transfer(SPIFLASH_STATUSREAD);
  uint8_t status = SPI.transfer(0);
  unselect();
  return status;
}

It keeps on going deeper...

Code: [Select]
/// Select the flash chip
void SPIFlash::select() {
  //save current SPI settings
#ifndef SPI_HAS_TRANSACTION
  noInterrupts();
#endif
#if defined (SPCR) && defined (SPSR)
  _SPCR = SPCR;
  _SPSR = SPSR;
#endif

#ifdef SPI_HAS_TRANSACTION
  SPI.beginTransaction(_settings);
#else
  // set FLASH SPI settings
  SPI.setDataMode(SPI_MODE0);
  SPI.setBitOrder(MSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV4); //decided to slow down from DIV2 after SPI stalling in some instances, especially visible on mega1284p when RFM69 and FLASH chip both present
#endif
  digitalWrite(_slaveSelectPin, LOW);
}

TLDR: this function saves current SPI settings, and sets the flash chips SPI settings. This doesn't look to be a problem area, can dive back in if I need to later. Onto SPI.transfer(SPIFLASH_STATUSREAD);  :o This is now going into the SPI library.  ??? This is getting too deep now. Lets back up to where the problem looks to be.

Code: [Select]
uint8_t SPIFlash::readStatus()
{
  select();
  SPI.transfer(SPIFLASH_STATUSREAD);
  uint8_t status = SPI.transfer(0);
  unselect();
  return status;
}

select(); -- this just sets up the SPI bus to the Flash chips settings.
SPI.transfer(SPIFLASH_STATUSREAD); -- this asks the flash chip for the contents of it's status register
uint8_t status = SPI.transfer(0); -- this is getting the first bit of the response from the flash chip, aka the status bit (I think this is what's happening didn't, since I didn't dive into the SPI library.)
unselect(); -- returns the SPI bus to the settings they were before the select(); function.
status is returned.

Here is the problem I see. The flash chip was sent to sleep in the setup loop of the WeatherNode. This means the flash chip is powered down. When sleeping the flash chip will only respond to a wakeup command. We are not sending that command, we are just asking to read the status register of the flash chip. The flash chip doesn't respond so the response we receive is just whatever is on the MISO line. From the SPIFlash.cpp line 187
Quote
//open drain MISO input which can read noise/static and hence return a non 0 status byte, causing the while() to hang when a flash chip is not present
Since the flash chip is powered off, it is like it just wasn't there. To verify this hypothesis, I checked the voltage of the MISO line of the Moteino (pin 12) using a multimeter, 0.35V, BUT  :o the FDTI chip immediately flashed when I touched it and continued flashing every ~8 seconds while I held my probes there. Checking the serial monitor:
Code: [Select]
WeatherMote - transmitting at: 915 Mhz...
BAT:5.05v F:75.36 H:43.08 P:29.99 (packet length:33)
Before SERIALFLUSH
Before flash.sleep()
Befor⸮YHYXYZ⸮i⸮⸮⸮⸮⸮⸮*R⸮⸮⸮ɕ⸮1⸮⸮A⸮ݕɹ⸮⸮ݕ⸮⸮ݹ5)Z-UA5)⸮⸮⸮ɕ⸮MI%11UM!5)&J⸮⸮K⸮⸮⸮flash.sleep()
Before radio.sleep()
Before LowPower.powerDown
WAKEUP
Before SERIALFLUSH
Before

It's ALIVE! Touching pin 12 on the moteino must have lowered the voltage on the MISO to get it to read a low and get out of the while(busy()); loop. After I removed the probes, the moteino froze up again. When I read the voltage again on pin 12 I once again get a low voltage 0.34V and the moteino is unfrozen and sending serial data. So finally, that is where my code is getting stuck and frozen. That is all I have time for tonight, but now I am wondering WHY.

WHYs:
1) Why does pin 12 on the moteino go high once the multimeter probes are removed (I am assuming this due to how the code is acting since whenever I measure it, I get a low value, but when I'm not measuring it, my code hangs where it is waiting for a low on that line.)
2) Why did the moteino work for 3 days before getting stuck here? Especially since now even after many 5+ power cycles it still behaves exactly like this and gets stuck every time after sending one data point.
3) Why doesn't it get stuck in the flash.sleep(); in the setup? - I think this is because it first initializes the flash chip which likely includes a wakeup command first. This means the flash chip isn't sleeping for this first command so it will actually respond to the status request, instead of having noise on the MISO line.


SOLUTION:
1) Comment out the flash.sleep(); line in the main loop if one is not using the flash chip. (To be tested)
2) Try a 10k pull down resistor on the MISO line. (Not sure of the consequences of this on other SPI devices)
3) See if library could be made more robust to handle this problem of noise on the MISO line
4) Figure out why MISO line has started going high all of the sudden, and prevent that "Noise" from getting on the MISO line. (Maybe possible  ???)
« Last Edit: June 07, 2021, 09:32:26 AM by Felix »

Jason

  • Jr. Member
  • **
  • Posts: 57
Re: Moteino "Frozen" - Only sends one data point after power up
« Reply #1 on: June 04, 2021, 10:13:50 PM »
On GitHub, see SPIFlash issue #26 and pull request #27. Tested with the weather mote code to fix this issue where the Moteino was freezing in the busy() function of the SPIFlash library if there was a noisy SPI MISO line.

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6866
  • Country: us
    • LowPowerLab
Re: Moteino "Frozen" - Only sends one data point after power up
« Reply #2 on: June 07, 2021, 09:07:15 AM »
Sorry I haven't had a chance to reply properly yet. I believe the issue is that after a flash.sleep() command, only the device_id and wake()  commands will be responsive, see section 8.2.19 from the w25x40CL datasheet. This is normal behavior ;)
I chose not to pollute the library with variables that keep track of what the user is doing. I believe that is something that the user should do in their firmware code. If you issue a sleep(), then first thing upon wakeup is to call flash.wake() to use it. Otherwise it is unresponsive, as per the datasheet.

Jason

  • Jr. Member
  • **
  • Posts: 57
Re: Moteino SPIFlash Frozen after wake up
« Reply #3 on: June 07, 2021, 11:48:29 PM »
Thanks Felix! You already fixed the weather node code before I got around to it!

I edited the comments of the FlashSPI library which I hope might be helpful to some.
Do you find that helpful or not really?

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6866
  • Country: us
    • LowPowerLab
Re: Moteino SPIFlash Frozen after wake up
« Reply #4 on: June 08, 2021, 09:22:43 AM »
Yes it's good thanks for your patience with me on this one  :)