Hardware Flash Corruption Issue

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Hardware Flash Corruption Issue

Postby Ritesh » Thu Jan 20, 2022 11:37 am

Hello Team,

We have used ESP32-WROVER module into one of our Inverter based product in which it was working fine till few days.

But, Suddenly we are getting following issues of board restart issue from boot loader side
[2022-01-20 10:45:27.931] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.931]
[2022-01-20 10:45:27.931] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.931] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.931] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.931] mode:DOUT, clock div:2
[2022-01-20 10:45:27.931] load:0x3fff0018,len:4
[2022-01-20 10:45:27.931] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.931] load:0x40078000,len:7756
[2022-01-20 10:45:27.931] load:0x40080000,len:5876
[2022-01-20 10:45:27.931] csum err:0xb3!=0xab
[2022-01-20 10:45:27.931] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.947] csum err:0xa3!=0xab
[2022-01-20 10:45:27.947] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.947] csum err:0x23!=0xab
[2022-01-20 10:45:27.947] ets_main.c 384
[2022-01-20 10:45:27.947] ets Jul 29 2019 12:21:46
[2022-01-20 10:45:27.947]
[2022-01-20 10:45:27.947] rst:0x10 (RTCWDT_RTC_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[2022-01-20 10:45:27.947] configsip: 0, SPIWP:0xee
[2022-01-20 10:45:27.947] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2022-01-20 10:45:27.947] mode:DOUT, clock div:2
[2022-01-20 10:45:27.947] load:0x3fff0018,len:4
[2022-01-20 10:45:27.947] load:0x3fff001c,len:5796
[2022-01-20 10:45:27.947] load:0x40078000,len:7756
[2022-01-20 10:45:27.947] load:0x40080000,len:5876
[2022-01-20 10:45:27.963] csum err:0xa3!=0xab
[2022-01-20 10:45:27.963] ets_main.c 384
[2022-01-20 10:45:27.963] ets Jul 29 2019 12:21:46
So, We are completely stuck due to above issues and only recovered after erasing complete flash only.

does anyone has idea like what can be the issue as i have mentioned above? Most probably it is due to power fluctuation due to which some or few section of flash memory has been corrupted but that was recovered after erasing whole flash memory.
Regards,
Ritesh Prajapati

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Hardware Flash Corruption Issue

Postby WiFive » Thu Jan 20, 2022 3:10 pm

It is calculating a different checksum on each boot but everything else is consistent so maybe some of the flash cells are unstable.

ESP_Sprite
Posts: 8884
Joined: Thu Nov 26, 2015 4:08 am

Re: Hardware Flash Corruption Issue

Postby ESP_Sprite » Fri Jan 21, 2022 2:03 am

Either that, or possibly some periodic EMI on the data lines corrupting the signal between flash and ESP32. You could probably find out which it is by only powering the ESP32+flash, isolating it from the rest of the device, and seeing if it boots. If it does not and the issue is the same, I'd suggest using esptool to read out the entire flash a few times, then comparing the binaries.

You don't happen to write to that flash chip very often (directly or by using spiffs/fatfs/nvs/...) while running the device?

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Re: Hardware Flash Corruption Issue

Postby Ritesh » Sat Jan 22, 2022 7:17 am

WiFive wrote:
Thu Jan 20, 2022 3:10 pm
It is calculating a different checksum on each boot but everything else is consistent so maybe some of the flash cells are unstable.
Thanks for your quick response.

But I am wondered that how it is stable after just erasing flash memory?

Do you have any idea like how it will become unstable?
Regards,
Ritesh Prajapati

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Re: Hardware Flash Corruption Issue

Postby Ritesh » Sat Jan 22, 2022 7:21 am

ESP_Sprite wrote:
Fri Jan 21, 2022 2:03 am
Either that, or possibly some periodic EMI on the data lines corrupting the signal between flash and ESP32. You could probably find out which it is by only powering the ESP32+flash, isolating it from the rest of the device, and seeing if it boots. If it does not and the issue is the same, I'd suggest using esptool to read out the entire flash a few times, then comparing the binaries.

You don't happen to write to that flash chip very often (directly or by using spiffs/fatfs/nvs/...) while running the device?
Thanks for your quick response.

Yes. There are chances that we are writing into ESP32 Flash memory via SPIFFS and using flash write API and also reading some flash data whenever state is changed..

So, Will it create any impact if we write or read data into SPIFFS frequently but not within milliseconds?

I also want to understand that why issue has been resolved after erasing flash memory?
Regards,
Ritesh Prajapati

desp32fun
Posts: 7
Joined: Sat Jan 22, 2022 10:44 am

Re: Hardware Flash Corruption Issue

Postby desp32fun » Sat Jan 22, 2022 11:30 am

I'm also investigating flash corruption. It simply happens in the field, even after weeks.
(WROVER-E Modules with 16MB Flash)
I'm experiencing a reboot loop (?) with RTCWDT_RTC_RESET, too.

However, I was not able to recover the chip by deleting the flash.
Is there any way to recover? Or are the modules now toast?

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Re: Hardware Flash Corruption Issue

Postby Ritesh » Sat Jan 22, 2022 11:34 am

Hello,

anyone has any idea like from where i can get those errors from code point of view? those errors are coming from boot loader source code or from library because i have tried to check into boot loader source code but didn;t find it yet.

Let me know if anyone has idea to trace that error from source code so that it will be easy for us to track it.
Regards,
Ritesh Prajapati

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Hardware Flash Corruption Issue

Postby WiFive » Sat Jan 22, 2022 2:21 pm

On how many units does the problem occur and after flash erase how often does it reappear?

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Re: Hardware Flash Corruption Issue

Postby Ritesh » Sun Jan 23, 2022 7:55 am

desp32fun wrote:
Sat Jan 22, 2022 11:30 am
I'm also investigating flash corruption. It simply happens in the field, even after weeks.
(WROVER-E Modules with 16MB Flash)
I'm experiencing a reboot loop (?) with RTCWDT_RTC_RESET, too.

However, I was not able to recover the chip by deleting the flash.
Is there any way to recover? Or are the modules now toast?
Can you please send complete logs of continues reboot of your device?
Regards,
Ritesh Prajapati

Ritesh
Posts: 1365
Joined: Tue Sep 06, 2016 9:37 am
Location: India
Contact:

Re: Hardware Flash Corruption Issue

Postby Ritesh » Sun Jan 23, 2022 8:01 am

WiFive wrote:
Sat Jan 22, 2022 2:21 pm
On how many units does the problem occur and after flash erase how often does it reappear?
Right now we have faced issue into one device which was at customer field. Then our customer reported to us that device is not working.

So, We have received back that device from our customer for troubleshooting purpose. We checked and found from UART logs that device is continuously restarting.

Then after doing some investigation we erased complete flash and reprogrammed firmware and then device was working fine. So we have sent back that device to customer again to start operating at their end.
Regards,
Ritesh Prajapati

Who is online

Users browsing this forum: lucacusso and 30 guests