WDT Timeout on Core 0

berlinetta
Posts: 41
Joined: Tue May 21, 2019 8:33 pm

Re: WDT Timeout on Core 0

Postby berlinetta » Wed Jul 29, 2020 2:58 pm

Thanks for the pointing out the fact that this WDT timeout is specifically related to spending too long in an interrupt... that is a tremendous help!

I have configured the UART to trigger an interrupt based on RX FIFO FULL, RX FIFO OVERFLOW or RX FIFO TIMEOUT (60). The original FIFO FULL threshold was set to 112 bytes, but testing showed we could run into FIFO overflow issues, so the threshold was backed down to 75 bytes and the interrupt priority was raised to '1'. PlatformIO has recently made improvements which allow us to use menuconfig. The current Interrupt WDT timeout is set to 300ms.

I don't know how the interrupt WDT timer is reset... is it occurring periodically within a task? If so, how easy is it to starve that task and cause this issue?

Based on the fact that we are operating at a baud rate of 500Kbps coupled with the fact that the FIFO depth is limited to 128 bytes and he interrupt has high priority, could this be an issue with repeatedly entering the handler during high traffic times?

A background task which consumes the UART data is packetizing it and sending it up to the web-server client in chunks of 4400 bytes. I am concerned that during high traffic times this may cause additional bottlenecks with the WiFi stack. You mentioned the fact that most of the work done in the stack is executed within a task rather than an interrupt context. Is there any part of that data transfer which may compound the interrupt WDT timeout problem?

My handler is instrumented with some GPIO signalling to indicate when I am in the routine and for what reason. I can check for what those signals are indicating when the timeout scenario hits. This scenario is difficult to create, so I am very interested in determining what may be happening in the context of an improper servicing of a UART interrupt which causes re-entry. It must be a corner case but the instrumentation should help smoke it out.

Best Regards,
Mark

berlinetta
Posts: 41
Joined: Tue May 21, 2019 8:33 pm

Re: WDT Timeout on Core 0

Postby berlinetta » Thu Jul 30, 2020 3:44 am

I was able to capture some good information with my instrumented code...

I determined that the httpd_resp_send_chunk() call from the esp_http_server appears to be the culprit. This call is normally executed within milliseconds, however, it was occasionally taking much longer to execute as I attempt to send the data chunks to the web-server client. On one occasion, I measured an execution time of 225ms!

I also determined that this excessive delay was blocking my task execution which consumes the UART traffic, causing the UART circular buffer to fill to capacity. When the circular buffer became full, my interrupt handler was not able to pull any more data from the FIFO because it had no place to store it. I modified the handler to simply report a buffer overrun in this situation and flush the FIFO to prevent re-entry. The interrupt WDT timeouts no longer occur.

Now I need to understand why this httpd_resp_send_chunk() call is taking so long to execute. I could potentially increase my UART circular buffer size, but this execution delay is very large and worrisome. Any insight as to why this may be occurring and what I could do to alleviate the delay?

Best Regards,
Mark

Who is online

Users browsing this forum: Google [Bot] and 121 guests