In a recent build, I have noticed that 1 or 2 characters are occasionally lost from my debug log messages. It is somewhat random but generally happens when the CPU is quite busy. Missing characters are usually near the end of a log message.
The system is otherwise working fine (no crashes) but I am worried that this loss of data from the debug log might be a symptom of some serious problem.
Useful Info:
* Chip: ESP32
* ESP-IDF version 5.5.1
* I am monitoring the logs with a Windows 11 PC. ESP32 is connected via a USB programmer board and USB cable.
* Logging UART baud rate is 115,200.
* Dropped characters have been observed with the ESP-IDF monitor program (`idf.py monitor`) and also other serial terminals such as Teraterm.
Here is an example of a couple of common log message being corrupted (other messages filtered out):
Code: Select all
I (...) UI: Auto-off timer restarted. <==== Correct.
I (...) UI: Auto-off timer restartd. <==== Missing "e"!
I (...) TRANSMIT: Trigger press, Single read mode <==== Correct.
I (...) TRANSMIT: Trigger press, Single read ode <==== Borked!
I (...) TRANSMIT: Trigger press, Singleread mode <==== Borked!
I (...) TRANSMIT: Trigger press, Sinle read mode <==== Borked!
* Testing the serial port and cable: I made the "main" task of the app log "The quick brown fox jumped over the lazy dog" at 10 message per second for a long time while the device wasn't doing much else. No lost characters seen. Thus, the USB serial cable seems OK (no random loss of data). When I made the ESP32 do more CPU intensive work, I started to see dropped characters again/ But characters were not dropped from the main task logging, only from tasks which were doing more work.
* Task profiling: - I checked the stack usage of all tasks in case there is a stack overflow. It looks like all tasks have plenty of spare stack space.
* Heap corruption detection: I turned on "HEAP_POISONING_COMPREHENSIVE" config option, and also used the "heap_caps_check_integrity_all" function in various places in my code. No heap corruption was detected.
Could this problem result from the UART buffer overflowing? If that happened, how could I detect it? Would it cause an exception or error?
Is there anything else I should look for?