Page 1 of 1

Dealing with a bug that persists a restart

Posted: Tue May 24, 2022 11:23 am
by JosuGZ
I've witnessed this behavior 4 times in several years so it is very hard to debug:

My chip is trying to read from an UART and doesn't get a response. After a while, it assumes that something is wrong and crashes, restarting again. And it happens again. And again.

Either the chip is not writing (hence not getting a response) or it is unable to read. And the big problem is that this survives my watchdog task restarting the chip.

Then, I trigger a restart with the enable pin, and the bug goes away.

Any suggestions? Perhaps this is a known bug fixed on some IDF version? It is like some bit flipped and not properly initialized after a restart.

The biggest problem is that if this happens, a manual restart is required.

Re: Dealing with a bug that persists a restart

Posted: Tue May 24, 2022 12:30 pm
by boarchuz
A software restart (including panic, abort, etc) doesn't reset the whole system.

Only power on, RTC WDT, and brownout will do that. You can do a 'full' reset in software by using the RTC WDT.

This is a good start: https://github.com/espressif/esp-idf/bl ... #L749-L754

(Although I should add that this might still be a bug. The UART should not break because of a restart. The above is a workaround to guarantee a clean reset.)

Re: Dealing with a bug that persists a restart

Posted: Wed May 25, 2022 9:14 am
by JosuGZ
I will investigate that so I can make a full restart (can I trigger the watchdog manually?). Would a full restart like the one you mention clear the RTC_NO_INIT memory?
Although I should add that this might still be a bug. The UART should not break because of a restart
Just for clarification, the UART does not break because of a restart, it does not "heal" after the restart (the original cause of the bug that causes the UART to break is unknown). I'm on an old version so perhaps this is fixed, who knows.

Re: Dealing with a bug that persists a restart

Posted: Wed May 25, 2022 10:33 am
by boarchuz
JosuGZ wrote:
Wed May 25, 2022 9:14 am
can I trigger the watchdog manually?
The linked IDF snippet will do this - set the timeout to 0 to reset immediately.
JosuGZ wrote:
Wed May 25, 2022 9:14 am
Would a full restart like the one you mention clear the RTC_NO_INIT memory?
It shouldn't ever be cleared.
I'm guessing memory isn't powered down during a RTCWDT reset so I would expect whatever is there to remain intact. (Maybe someone who knows a lot more about it can answer that for sure?)
Of course you'd use a CRC or similar to check contents on reset anyway.

Re: Dealing with a bug that persists a restart

Posted: Wed May 25, 2022 10:23 pm
by JosuGZ
Of course you'd use a CRC or similar to check contents on reset anyway.
I use a special 64bits key, if it is there, I know I'm dealing with data from the previous run.

Re: Dealing with a bug that persists a restart

Posted: Thu May 26, 2022 2:06 pm
by JosuGZ
My particular problem was a pin staying high after a reset. Something as simple as this can reproduce it:

Code: Select all

main:
delay 10s
set pin_32 high
abort
It starts low, but then goes high and never goes back to low, both with abort, with esp_restart, and entering a critical zone so the interrupt watchdog causes a reset. Only the enable pin can get back the pin_32 to its initial configuration.

So I believe the solution with the watchdog won't work.

This probably fixed on some newer version I hope.

Re: Dealing with a bug that persists a restart

Posted: Thu May 26, 2022 5:41 pm
by boarchuz
If pin 32 is configured as a RTC GPIO then it should be unaffected by a software restart so that's the expected behaviour. If it's configured as a digital GPIO then it should be reset back to default.

RTC WDT will reset everything. It's functionally equivalent to toggling the enable pin.