After retrieving the device, I found that a large region of flash had been overwritten with 0x00.
Observed Flash Corruption
- Flash contents from 0x9000 to 0xA2000 were all 0x00
- Everything after 0xA2000 was intact and correct
- Unfortunately, I could not read contents before 0x9000 when the device was available
- 0x9000 corresponds to the start of the NVS partition, but the corrupted region extends far beyond the NVS size
- The boundary at 0xA2000 is clean — no partial corruption at the edge (left: expected content in flash, right: contents dumped from ESP32's flash. address starts at an offset so it does not show 0xA2000)
- Flash writes are performed rarely on address starting from 0x145000.
- There is an NVS write during brownout detection
- No intentional large flash erase operations are performed in normal operation.
- There is no OTA in the system.
- This issue has been observed on only one device so far
So far I have tried:
- Added an infinite loop inside the NVS write section during brownout, so the write operation does not complete normally till the device powers down.
- Repeatedly powered the device ON and OFF at ~5 second intervals for about 2 days.
Power Environment
- The supply may be noisy in the system.
- Occasional voltage spikes of ~6–7V lasting a few microseconds have been observed during startup
- System operates from a regulated 3.3V supply
- What possible mechanisms could cause a large contiguous flash region to become 0x00?
- Could this be caused by:
- Brownout or unstable supply during flash operations?
- Short voltage spikes?
- Anything other than the above mentioned?
Any suggestions or similar experiences would be very helpful.
Please let me know if anymore details are needed.
