Author Topic: Over the air programming corrupted on reset on custom board [solved]  (Read 10774 times)

RatTrap

  • NewMember
  • *
  • Posts: 20
So I have a design that is extremely similar to the moteino mega and am using the dual optiboot bootloader to fascilitate over the air programming.  For some reason the bytes written to the flash memory before the reset are changed during reset and programming is not executed. 

Here are the differences in the board and the changes in the bootloader. This prototype moved a few pins around and foolishly uses a separate ldo from the mcu to power the flash memory.  That LDO is on Pin B0 and the flashSS is now on B1.  The changes to the bootloader are as follows

accounting for the new pin to power on the LDO
Code: [Select]
#elif defined (__AVR_ATmega1284P__) || defined (__AVR_ATmega644P__)
  #define FLASHSS_DDR     DDRB
  #define FLASHSS_PORT    PORTB
  #define FLASHSS         PINB1
  #define SS              PINB4
  ////////////////////////////////////////
  #define MEMON_DDR   DDRB
  #define MEMON_PORT   PORTB
  #define MEMON_PIN   PINB0
#endif   


and turning on the LDO
Code: [Select]
void CheckFlashImage() {
#ifdef DEBUG_ON
  putch('F');
#endif
  watchdogConfig(WATCHDOG_OFF);

// POWER ON THE FLASH MEMORY -ADDED BY CHARLIE
MEMON_DDR  |= _BV(MEMON_PIN);
MEMON_PORT |= _BV(MEMON_PIN);
/////////////////////////////////////////////////

The code has been delivered and verified intact and uncorrupted to the device over the air and has been successfully and accurately written to the flash.  After the code is verified I add the FLXIMG:<Size>: to the first 10 address on the flash and issue the reset.  For these test I am triggering the reset manually.  The reset itself is caused by a watch dog time out. 

Here is the code to finish that final part, as I just described
Code: [Select]
flash.writeByte(0,(byte)'F');while(flash.busy());
            flash.writeByte(1,(byte)'L');while(flash.busy());
            flash.writeByte(2,(byte)'X');while(flash.busy());
            flash.writeByte(3,(byte)'I');while(flash.busy());
            flash.writeByte(4,(byte)'M');while(flash.busy());
            flash.writeByte(5,(byte)'G');while(flash.busy());
            flash.writeByte(6,(byte)':');while(flash.busy());
            flash.writeByte(9,(byte)':');while(flash.busy());
            flash.writeByte(7,(ProgramSize>>8));while(flash.busy());
            flash.writeByte(8,(ProgramSize&0xFF));while(flash.busy());
            for(int i=0;i<7;i++){
              Serial.print(char(flash.readByte(i)));
            }
            uint16_t Size=(flash.readByte(7)<<8)|(flash.readByte(8)&0xFF);
            Serial.print(Size);
            Serial.println(char(flash.readByte(9)));
            Serial.println("Press enter to reset");
            Serial.readString();
            while(!Serial.available()){
              wdt_reset();
            }
            delay(30000);

Here is the output from the serial monitor of the device
The first line is the first 10 bytes read from the flash memory cast as their correct data types, which verifies that the code has written correctly.  The second to last line copied is the same bytes read after reset, and for good measure I also printed those same bytes hex values in the final line.
Quote
04:17:38.557 -> FLXIMG:65534:
04:17:38.557 -> Press enter to reset
04:18:04.361 -> ------------------------------------------------------------------
04:18:04.361 -> Starting--------------------------
04:18:04.361 -> Firmware Version: 0x1
04:18:04.361 -> ------------------------------------------------------------------
04:18:09.347 -> SerialNumber: E C A 5
04:18:09.347 -> Report Time is: 16:30
04:18:10.965 -> Turning off Radio
04:18:11.001 -> NetID: 14. 12. 10. 4
04:18:11.001 -> NodeID: 3
04:18:11.001 -> Frequency: 915
04:18:11.367 -> Radio Started
04:18:11.367 ->  
04:18:11.367 -> 0 0 22 3 0 4 0 20 0 0 0 20 0 0 20 20


What I'm trying to understand is what in the bootloader could be causing this?
I thought maybe the flashmemory wasn't turned on correctly so I jumped a wire from 3v3 to its enable pin so the flash memory never actually lost power but the problem persisted.

If someone has some insight please please let me know.  I am really scratching my head here
« Last Edit: September 18, 2019, 03:04:51 PM by Felix »

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6867
  • Country: us
    • LowPowerLab
Re: Over the air programming corrupted on reset
« Reply #1 on: September 16, 2019, 10:28:44 AM »
Quote
foolishly uses a separate ldo from the mcu to power the flash memory.  That LDO is on Pin B0 and the flashSS is now on B1
Huh? Ldo from MCU pin to power a flash memory chip?   ???
How exactly does that work? That could be the problem.

What flash chip is that? Did you verify it has the same commands used in the bootloader?

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #2 on: September 16, 2019, 01:45:21 PM »
Fair questions I am using the same Windbond 4mbit W25X40CLSNIG that is on the moteino mega.

The separate power system has been a thorn in my side for a while and is being eliminated on the next run but before they will produce the next run they want to make sure that I have the system fully functional. grrrrr :(

in any event each of the peripherals are powered by their own individual LDO regulators and each of those regulators enable pins are connected to a different IO pin on the mcu.  To turn on the peripheral device all one needs to do is to write that pin high.  In the case of the flash and normal programming that is simply
Code: [Select]
pinMode(B0,OUTPUT);
digitalWrite(B0,HIGH);

Turn it off would've been writing it low but putting it to sleep actually uses less power.
Code: [Select]
//digitalWrite(B0,LOW);

flash.sleep();

in the case of the boot loader and I'm not sure I got this correct because I question my understanding
Code: [Select]
MEMON_DDR  |= _BV(MEMON_PIN);
MEMON_PORT |= _BV(MEMON_PIN);
should turn it on at the very beginning of the CheckFlashImage() function in the bootloader.  Or at least that is what I'd like it to do

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #3 on: September 16, 2019, 05:46:35 PM »
I forgot to be clear the Flash LDO is powered from the battery line.  It is only enabled from the mcu.  Perhaps I have a timing issue.  The LDO might take longer to turn on that the time between turning it on and the Bootloader attempting to read the bytes.  Doesn't explain why the bytes are changing during the reset but it is a theory.  One that I will test as soon as I figure out how to add a slight delay to the bootloader.  Modifying the bootloader is way outside my wheel house and I wouldn't attempt it if I didn't have absolutely no other choice.  Do you think that a timing issue may be the problem?

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #4 on: September 17, 2019, 02:18:13 AM »
So I have a theory and I need a little help to figure out the best way to test it.  The ldo takes a few time to start up.  Not a lot but it for sure takes a few millis to actually settle.  I'm wondering if the reading is being done so quickly after powering on that the flash isn't fully powered, and that my pullup that I tried wasn't actually connected, plus the pullup won't actually do for my short term purpose.  I want to create a small delay but ofcourse the bootloader doesn't have tools like delay in it, so I thought I would just write one. 

Code: [Select]
uint8_t z;
for(z=0;z<150;z+=1){
uint16_t time0=TCNT1;
while((TCNT1-time0)<100){}
while(TCNT1!=0){}
}

and TCNT1 is in the included files for the atmega1284p specifically <iom1284p.h>
Code: [Select]
#define TCNT1 _SFR_MEM16(0x84)

#define TCNT1L _SFR_MEM8(0x84)
#define TCNT1L0 0
#define TCNT1L1 1
#define TCNT1L2 2
#define TCNT1L3 3
#define TCNT1L4 4
#define TCNT1L5 5
#define TCNT1L6 6
#define TCNT1L7 7

#define TCNT1H _SFR_MEM8(0x85)
#define TCNT1H0 0
#define TCNT1H1 1
#define TCNT1H2 2
#define TCNT1H3 3
#define TCNT1H4 4
#define TCNT1H5 5
#define TCNT1H6 6
#define TCNT1H7 7

but alas it just gives an error during compiling of

Quote
c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/bin/ld.exe: address 0x2001a of optiboot_atmega1284p.elf section .text is not within region text
c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/bin/ld.exe: address 0x2001a of optiboot_atmega1284p.elf section .text is not within region text
make: *** [optiboot_atmega1284p.elf] Error 1
rm optiboot.o


any thoughts on how to create a delay, maybe?

include <util/delay.h>
and _delay_ms(100);

didn't work either for what it's worth
« Last Edit: September 17, 2019, 08:09:41 AM by RatTrap »

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6867
  • Country: us
    • LowPowerLab
Re: Over the air programming corrupted on reset
« Reply #5 on: September 17, 2019, 10:24:11 AM »
LDO enable is entirely different, "uses a separate ldo from the mcu to power the flash memory" sounded like the MCU pin is powering the LDO.

An oscilloscope would be priceless for answering these questions about the LDO and its timing.
Do you have one?

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #6 on: September 17, 2019, 12:30:56 PM »
I do have an oscope but it doesn't work very well.  I could use one of my moteinomegas to make a make shift one but I'm not sure its sample rate and resolution would be enough.

On the upside I've compare the bootloaders line by line and they are identical with the exception of the FLASHSS PIN and PORT, and of course the turning the LDO on.  So I'm pretty positive the LDO is my problem.

I haven't been able to get a delay to successfully compile though which is really frustrating.  Just tried

Code: [Select]
uint16_t z;
for(z=0;z<1200;z++){
  putch('*');
}

which returns

Quote
c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/bin/ld.exe: address 0x20028 of optiboot_atmega1284p.elf section .text is not within region text
c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/bin/ld.exe: address 0x20028 of optiboot_atmega1284p.elf section .text is not within region text
make: *** [optiboot_atmega1284p.elf] Error 1

ironically I made a mistake tried this and it did compile but I need a small delay not an infinite one
Code: [Select]
uint8_t z;
for(z=0;z<1200;z++){
  putch('*');
}


what is your recommendation?

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6867
  • Country: us
    • LowPowerLab
Re: Over the air programming corrupted on reset
« Reply #7 on: September 17, 2019, 12:39:37 PM »
Maybe use NOP's or just eliminate that LDO.

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #8 on: September 17, 2019, 01:00:46 PM »
believe me I really really wish I could eliminate the LDOs and on the next design I already have, but unfortunately with these prototypes I haven't gotten that choice.

I can confirm that the bootloader is setting the enable pin as an output and is writing it high.  So the flash memory is turned on by the time the system is booted, but I'm not sure it is booted by the time the bootloader goes to read it yet.

I just tried 2 different NOP approaches to creating a delay and both gave me the same compile error that I've been getting all along.  I don't understand this error or what is actually causing it.

 

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6867
  • Country: us
    • LowPowerLab
Re: Over the air programming corrupted on reset
« Reply #9 on: September 17, 2019, 01:32:58 PM »
That looks like the compiled size is over the maximum of 1KB that is allocated.

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #10 on: September 18, 2019, 03:19:30 AM »
yeppers that's exactly what it was.  In any event I was able to confirm that the bootloader is turning on the flash properly, and that even if the flash is turned on and never turns off prior to the reset that it still didn't reprogram.  And so now I am down to trying to determine what else it could be.  Can you think of any reason that changing the port and pin for flashSS would cause a problem with the reprogramming.  That seems far less likely to me to be a problem but as usual I wouldn't be messaging online unless I was grasping at straws. 

I will keep contemplating the issue, and hit the Arduino forums and see if there is a problem I can help with.  Maybe the universe will grant me some insight with karma :)

PS found the source of the corruption.  It's a boot log that is being saved when the main code starts running.  The bootloader isn't corrupting the flash.  It just isn't installing the new code.  Just an FYI
« Last Edit: September 18, 2019, 03:44:59 AM by RatTrap »

Felix

  • Administrator
  • Hero Member
  • *****
  • Posts: 6867
  • Country: us
    • LowPowerLab
Re: Over the air programming corrupted on reset
« Reply #11 on: September 18, 2019, 09:34:53 AM »
Great, problem solved?
So was this in your code (not the bootloader)?

RatTrap

  • NewMember
  • *
  • Posts: 20
Re: Over the air programming corrupted on reset
« Reply #12 on: September 18, 2019, 12:25:37 PM »
The code on the device creates a log whenever it restarts.  part of the ota install for the device is to clear a bunch of flags and variables like the addresses of logs that it is saving.  Part of the code creates event logs and saves them to flash.  One of the events that gets logged is powering on.  More or less so I can see if the device crashed or hung and watchdog reset.

So I figure when it reboots the record address are lost from the MCU eeprom.  Default address is zero.  Boot log gets created and writes the log to the flash memory, since that memory isn't cleared it should either do nothing or corrupt.  Just my theory.

answers the question about the corruption which is caused by my code but doesn't answer the question about why the bootloader didn't install the update before that occurred.  I'll send you more information as I get it. :)