ESP32 Forum

Posted: **Wed Apr 22, 2026 12:09 pm**

I am working on a product based on ESP32-U4WDH using ESP-IDF v4.4.4. We have more than 2000 devices deployed with same firmware, and recently one device failed in the field.
After retrieving the device, I found that a large region of flash had been overwritten with 0x00.

Observed Flash Corruption

Flash contents from 0x9000 to 0xA2000 were all 0x00
Everything after 0xA2000 was intact and correct
Unfortunately, I could not read contents before 0x9000 when the device was available
0x9000 corresponds to the start of the NVS partition, but the corrupted region extends far beyond the NVS size
The boundary at 0xA2000 is clean — no partial corruption at the edge

Comparative_bin.png (93.68 KiB) Viewed 134 times

(left: expected content in flash, right: contents dumped from ESP32's flash. address starts at an offset so it does not show 0xA2000)

Firmware / System Context

Flash writes are performed rarely on address starting from 0x145000.
There is an NVS write during brownout detection
No intentional large flash erase operations are performed in normal operation.
There is no OTA in the system.
This issue has been observed on only one device so far

Reproduction Attempts
So far I have tried:

Added an infinite loop inside the NVS write section during brownout, so the write operation does not complete normally till the device powers down.
Repeatedly powered the device ON and OFF at ~5 second intervals for about 2 days.

Even after these tests, I have not been able to reproduce the issue.

Power Environment

The supply may be noisy in the system.
Occasional voltage spikes of ~6–7V lasting a few microseconds have been observed during startup
System operates from a regulated 3.3V supply

Questions

What possible mechanisms could cause a large contiguous flash region to become 0x00?
Could this be caused by:
- Brownout or unstable supply during flash operations?
- Short voltage spikes?
- Anything other than the above mentioned?

What would be the recommended way to reproduce this type of failure in the lab?

Any suggestions or similar experiences would be very helpful.
Please let me know if anymore details are needed.

Posted: **Thu Apr 23, 2026 1:30 am**

That's not an easy fault to get... assuming no hardware issues, something in the software had to load up those zeroes into the flash and write them, so that's multiple stages of failure (rather than e.g. flash erase, where you don't have to load the ff's). Do you know if the flash of the affected device otherwise is OK?

Posted: **Thu Apr 23, 2026 8:57 am**

I reflashed the firmware on the affected device, and it is now working normally again, so it appears that the flash hardware itself is still functional.

However, I am trying to understand what could have caused the contents to be overwritten with 0x00.

The firmware partition starts at 0x10000, and I observed that the flash contents were set to 0x00 up to 0xA2000. This means that more than the first half of the firmware image was overwritten with 0x00.

Given that the corrupted region includes part of the firmware itself, is it possible that the running firmware could have caused this kind of overwrite?

Posted: **Fri Apr 24, 2026 12:00 am**

Given that the corrupted region includes part of the firmware itself, is it possible that the running firmware could have caused this kind of overwrite?

Not trivially. For one, most (all?) of the flash writing functions have guards that stop them from erasing/overwriting the app (see CONFIG_SPI_FLASH_DANGEROUS_WRITE menuconfig option). Secondly, I'd expect any flash writes to happen through the partition API, which has its own checks to see if writes are in-bounds for the partition addresssed.

Posted: **Fri Apr 24, 2026 11:51 am**

CONFIG_SPI_FLASH_DANGEROUS_WRITE is set to "Aborts" in my menuconfig, and I am only using standard ESP-IDF APIs for flash read/write — no low-level function calls. Given this, would it be reasonable to assume that unintended writes outside defined partitions are unlikely, or are there still scenarios where this protection could be bypassed?

Also, since the affected region contained 0x00 (not 0xFF), what mechanisms could cause such a large contiguous region to be written with 0x00?

Could this also be caused by a hardware-related fault, and if so, what specific types of hardware faults (ex. power-related faults) could result in this behavior?

Are there any specific stress tests or fault-injection methods you would recommend to help reproduce this type of failure in the lab?

Thanks.

Posted: **Mon Apr 27, 2026 2:50 am**

CONFIG_SPI_FLASH_DANGEROUS_WRITE is set to "Aborts" in my menuconfig, and I am only using standard ESP-IDF APIs for flash read/write — no low-level function calls. Given this, would it be reasonable to assume that unintended writes outside defined partitions are unlikely, or are there still scenarios where this protection could be bypassed?

I'd say so. It's hard to give an absolute 'software can never have been the issue here' - you could for instance have a dangling callback pointer pointing to the middle of a flash write routine, which when called causes the routine to immediately start writing garbage - but it seems vanishingly unlikely to me.

Also, since the affected region contained 0x00 (not 0xFF), what mechanisms could cause such a large contiguous region to be written with 0x00?

Flash writes with data of all 0s would do that. Nothing else to my knowledge.

Could this also be caused by a hardware-related fault, and if so, what specific types of hardware faults (ex. power-related faults) could result in this behavior?

Yes, for sure. Glitches in the power supply can make microcontroller cores do weird things. Reproducing it would generally be very hard, though, as a glitch would need to happen at exactly the right time.

Posted: **Mon Apr 27, 2026 12:24 pm**

If power glitches could cause unintended flash writes, what precautions would you recommend to prevent this — both from a hardware and firmware perspective?

Currently, I have about 100 µF capacitance on the 3.3 V rail for stabilization. Would increasing bulk capacitance or adding other protections (e.g., filtering or supervision) help in this case?

Thank you.

Posted: **Tue Apr 28, 2026 6:12 am**

If power glitches could cause unintended flash writes, what precautions would you recommend to prevent this — both from a hardware and firmware perspective?

Currently, I have about 100 µF capacitance on the 3.3 V rail for stabilization. Would increasing bulk capacitance or adding other protections (e.g., filtering or supervision) help in this case?

Thank you.

Generally, good power supply design where the power supply rails stay within specs at all time would be recommended. In your particular case, saying that you have 100uF of capacitance isn't really relevant: you could have e.g. a high-capacitance electrolytic on the far side of the board, which (if you take the high ESR of the cap and the impedance of the lines) would do very little with respect to stopping short glitches from happening. In general, I'd advise you to at the very least make sure that all the capacitors mentioned in the hardware design guidelines are all there, are not derated (some ceramics loose 90% of their capacitance if you run them close to their rated voltages!) and are very close to the chip with thick, direct lines connecting them to their power supply pins.

ESP32 Forum

0x99000 Bytes of Flash Erased During Runtime

0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime

Re: 0x99000 Bytes of Flash Erased During Runtime