Page 1 of 6

Hardware Flash Corruption Issue

Posted: Thu Jan 20, 2022 11:37 am
by Ritesh
Hello Team,

We have used ESP32-WROVER module into one of our Inverter based product in which it was working fine till few days.

But, Suddenly we are getting following issues of board restart issue from boot loader side
[2022-01-20 10:45:27.931] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.931]
[2022-01-20 10:45:27.931] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.931] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.931] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.931] mode:DOUT, clock div:2
[2022-01-20 10:45:27.931] load:0x3fff0018,len:4
[2022-01-20 10:45:27.931] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.931] load:0x40078000,len:7756
[2022-01-20 10:45:27.931] load:0x40080000,len:5876
[2022-01-20 10:45:27.931] csum err:0xb3!=0xab
[2022-01-20 10:45:27.931] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.947] csum err:0xa3!=0xab
[2022-01-20 10:45:27.947] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.947] csum err:0x23!=0xab
[2022-01-20 10:45:27.947] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.963] csum err:0xa3!=0xab
[2022-01-20 10:45:27.963] ets_main.c 384
[2022-01-20 10:45:27.963] ets Jul 29 2019 12:21:46
So, We are completely stuck due to above issues and only recovered after erasing complete flash only.

does anyone has idea like what can be the issue as i have mentioned above? Most probably it is due to power fluctuation due to which some or few section of flash memory has been corrupted but that was recovered after erasing whole flash memory.

Re: Hardware Flash Corruption Issue

Posted: Thu Jan 20, 2022 3:10 pm
by WiFive
It is calculating a different checksum on each boot but everything else is consistent so maybe some of the flash cells are unstable.

Re: Hardware Flash Corruption Issue

Posted: Fri Jan 21, 2022 2:03 am
by ESP_Sprite
Either that, or possibly some periodic EMI on the data lines corrupting the signal between flash and ESP32. You could probably find out which it is by only powering the ESP32+flash, isolating it from the rest of the device, and seeing if it boots. If it does not and the issue is the same, I'd suggest using esptool to read out the entire flash a few times, then comparing the binaries.

You don't happen to write to that flash chip very often (directly or by using spiffs/fatfs/nvs/...) while running the device?

Re: Hardware Flash Corruption Issue

Posted: Sat Jan 22, 2022 7:17 am
by Ritesh
WiFive wrote:
Thu Jan 20, 2022 3:10 pm
It is calculating a different checksum on each boot but everything else is consistent so maybe some of the flash cells are unstable.
Thanks for your quick response.

But I am wondered that how it is stable after just erasing flash memory?

Do you have any idea like how it will become unstable?

Re: Hardware Flash Corruption Issue

Posted: Sat Jan 22, 2022 7:21 am
by Ritesh
ESP_Sprite wrote:
Fri Jan 21, 2022 2:03 am
Either that, or possibly some periodic EMI on the data lines corrupting the signal between flash and ESP32. You could probably find out which it is by only powering the ESP32+flash, isolating it from the rest of the device, and seeing if it boots. If it does not and the issue is the same, I'd suggest using esptool to read out the entire flash a few times, then comparing the binaries.

You don't happen to write to that flash chip very often (directly or by using spiffs/fatfs/nvs/...) while running the device?
Thanks for your quick response.

Yes. There are chances that we are writing into ESP32 Flash memory via SPIFFS and using flash write API and also reading some flash data whenever state is changed..

So, Will it create any impact if we write or read data into SPIFFS frequently but not within milliseconds?

I also want to understand that why issue has been resolved after erasing flash memory?

Re: Hardware Flash Corruption Issue

Posted: Sat Jan 22, 2022 11:30 am
by desp32fun
I'm also investigating flash corruption. It simply happens in the field, even after weeks.
(WROVER-E Modules with 16MB Flash)
I'm experiencing a reboot loop (?) with RTCWDT_RTC_RESET, too.

However, I was not able to recover the chip by deleting the flash.
Is there any way to recover? Or are the modules now toast?

Re: Hardware Flash Corruption Issue

Posted: Sat Jan 22, 2022 11:34 am
by Ritesh
Hello,

anyone has any idea like from where i can get those errors from code point of view? those errors are coming from boot loader source code or from library because i have tried to check into boot loader source code but didn;t find it yet.

Let me know if anyone has idea to trace that error from source code so that it will be easy for us to track it.

Re: Hardware Flash Corruption Issue

Posted: Sat Jan 22, 2022 2:21 pm
by WiFive
On how many units does the problem occur and after flash erase how often does it reappear?

Re: Hardware Flash Corruption Issue

Posted: Sun Jan 23, 2022 7:55 am
by Ritesh
desp32fun wrote:
Sat Jan 22, 2022 11:30 am
I'm also investigating flash corruption. It simply happens in the field, even after weeks.
(WROVER-E Modules with 16MB Flash)
I'm experiencing a reboot loop (?) with RTCWDT_RTC_RESET, too.

However, I was not able to recover the chip by deleting the flash.
Is there any way to recover? Or are the modules now toast?
Can you please send complete logs of continues reboot of your device?

Re: Hardware Flash Corruption Issue

Posted: Sun Jan 23, 2022 8:01 am
by Ritesh
WiFive wrote:
Sat Jan 22, 2022 2:21 pm
On how many units does the problem occur and after flash erase how often does it reappear?
Right now we have faced issue into one device which was at customer field. Then our customer reported to us that device is not working.

So, We have received back that device from our customer for troubleshooting purpose. We checked and found from UART logs that device is continuously restarting.

Then after doing some investigation we erased complete flash and reprogrammed firmware and then device was working fine. So we have sent back that device to customer again to start operating at their end.