ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Nerez83
Posts: 2
Joined: Tue Jun 23, 2026 1:29 pm

ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Postby Nerez83 » Tue Jun 23, 2026 3:07 pm

Hi,
for some reason calling the function esp_flash_read(), while CONFIG_FREERTOS_UNICORE is not set, times out the watchdog. This function gets called for example while calling esp_vfs_littlefs_register() or nvs_flash_init(). Before printing the backtrace it prints this log:

Code: Select all

E (330090) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (330090) task_wdt:  - IDLE1 (CPU 1)
E (330100) task_wdt: Tasks currently running:
E (330100) task_wdt: CPU 0: IDLE0
E (330100) task_wdt: CPU 1: main
E (330100) task_wdt: Print CPU 1 (current core) backtrace
The backtrace is sometimes different, even though the ESP still runs after the first timeout but every 5 seconds it times out again and prints the errors and the backtrace.
The things that are same in every backtrace are:

Code: Select all

--- 0x4202057e: task_wdt_timeout_handling at /home/andy/.espressif/v6.0/esp-idf/components/esp_system/task_wdt/task_wdt.c:436
--- (inlined by) task_wdt_isr at ~/.espressif/v6.0/esp-idf/components/esp_system/task_wdt/task_wdt.c:509
--- 0x40377971: _xt_lowint1 at ~/.espressif/v6.0/esp-idf/components/xtensa/xtensa_vectors.S:1240
--- 0x400559dd: _xtos_set_intlevel in ROM
(some FreeRTOS functions, always different)
--- 0x40375602: spi_flash_disable_interrupts_caches_and_other_cpu at ~/.espressif/v6.0/esp-idf/components/spi_flash/cache_utils.c:153
--- 0x4037548a: cache_disable at ~/.espressif/v6.0/esp-idf/components/spi_flash/spi_flash_os_func_app.c:78
--- 0x40375464: spi1_start at ~/.espressif/v6.0/esp-idf/components/spi_flash/spi_flash_os_func_app.c:136
--- 0x40385d01: spiflash_start_core at ~/.espressif/v6.0/esp-idf/components/spi_flash/esp_flash_api.c:226
--- 0x40385d24: spiflash_start_default at .espressif/v6.0/esp-idf/components/spi_flash/esp_flash_api.c:237
--- 0x403856a3: esp_flash_read at ~/espressif/v6.0/esp-idf/components/spi_flash/esp_flash_api.c:969

some examples of what functions can be in the backtrace (a lot of the times no FreeRTOS function is there):

Code: Select all

--- 0x4037d497: vPortClearInterruptMaskFromISR at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:547
--- (inlined by) vPortExitCritical at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:518
--- 0x420ca53d: vTaskSuspendAll at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:2523

Code: Select all

--- 0x4037d4fd: vPortYieldOtherCore at ??:?

Code: Select all

--- 0x420c8701: vPortClearInterruptMaskFromISR at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:547
--- (inlined by) xTaskGetSchedulerState at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5031
--- (inlined by) xTaskResumeAll at ~/.espressif/v6.0/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:2597
Here are also the logs printed by the ESP when booting:

Code: Select all

I (24) boot: ESP-IDF v6.0 2nd stage bootloader
I (25) boot: compile time Jun 23 2026 16:51:49
I (25) boot: Multicore bootloader
I (25) boot: chip revision: v0.2
I (27) boot: efuse block revision: v1.4
I (31) boot.esp32s3: Boot SPI Speed : 80MHz
I (35) boot.esp32s3: SPI Mode       : DIO
I (39) boot.esp32s3: SPI Flash Size : 16MB
I (43) boot: Enabling RNG early entropy source...
I (47) boot: Partition Table:
I (50) boot: ## Label            Usage          Type ST Offset   Length
I (56) boot:  0 nvs              WiFi data        01 02 00009000 00005000
I (62) boot:  1 otadata          OTA data         01 00 0000e000 00002000
I (69) boot:  2 app0             OTA app          00 10 00010000 003f0000
I (75) boot:  3 app1             OTA app          00 11 00400000 003f0000
I (82) boot:  4 storage          Unknown data     01 83 007f0000 00800000
I (88) boot:  5 coredump         Unknown data     01 03 00ff0000 00010000
I (95) boot: End of partition table
I (98) esp_image: segment 0: paddr=00010020 vaddr=3c0d0020 size=38e24h (232996) map
I (148) esp_image: segment 1: paddr=00048e4c vaddr=3fc98100 size=04b3ch ( 19260) load
I (152) esp_image: segment 2: paddr=0004d990 vaddr=40374000 size=02688h (  9864) load
I (154) esp_image: segment 3: paddr=00050020 vaddr=42000020 size=cbde8h (835048) map
I (309) esp_image: segment 4: paddr=0011be10 vaddr=40376688 size=11a60h ( 72288) load
I (325) esp_image: segment 5: paddr=0012d878 vaddr=50000000 size=00024h (    36) load
I (334) boot: Loaded app from partition at offset 0x10000
I (334) boot: Disabling RNG early entropy source...
I (344) cpu_start: Multicore app
I (352) cpu_start: GPIO 44 and 43 are used as console UART I/O pins
I (353) cpu_start: Pro cpu start user code
I (353) cpu_start: cpu freq: 160000000 Hz
I (355) app_init: Application information:
I (359) app_init: Project name:     esp-project
I (363) app_init: App version:      1
I (366) app_init: Compile time:     Jun 23 2026 16:51:45
I (371) app_init: ELF file SHA256:  318171f59...
I (376) app_init: ESP-IDF:          v6.0
I (379) efuse_init: Min chip rev:     v0.0
I (383) efuse_init: Max chip rev:     v0.99 
I (387) efuse_init: Chip rev:         v0.2
I (391) heap_init: Initializing. RAM available for dynamic allocation:
I (397) heap_init: At 3FCA7CB0 len 00041A60 (262 KiB): RAM
I (402) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (407) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (413) heap_init: At 600FE000 len 00001FE8 (7 KiB): RTCRAM
I (419) spi_flash: detected chip: boya
I (421) spi_flash: flash io: dio
I (425) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (431) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (0) main_task: Started on CPU1
I (40) main_task: Calling app_main()
The code is pretty long but the nvs_flash_init() is basically the first thing (apart from logging) that I am calling from app_main().

I am using ESP-IDF on an open source version of VSCode on Linux. In sdkconfing I've basically only changed the partition to my custom csv file and changed the flasher config to DIO, 40 MHz, 16 MB.

I've poured hours on finding how to make it work on the two cores so that i can isolate some tasks (mainly WiFi on one and the app on the other core). I've tried putting it in a separate task and pinning it to one of the cores. I've also played with priority and affinity settings in the sdkconfig but nothing worked. Should I just give up on multicore and run everything on a single one?

MicroController
Posts: 2669
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Postby MicroController » Wed Jun 24, 2026 11:17 am

Might be memory/stack corruption. Try increasing the main task's stack size.

Nerez83
Posts: 2
Joined: Tue Jun 23, 2026 1:29 pm

Re: ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Postby Nerez83 » Wed Jun 24, 2026 9:31 pm

I've tried increasing the stack size on a bunch of tasks but its not it.
Here are some things that I found out.
Even just this fails:

Code: Select all

void app_main(void) {
    esp_err_t ret = nvs_flash_init();
    while (true) {
        vTaskDelay(1000 / portTICK_PERIOD_MS);
    }
}
The errors that the watchdog prints look weird. If I understand it correctly it says that the task "IDLE0" running on CPU0 did not reset the watchdog, but when it lists currently running tasks, there is the task "main" running on CPU0 and the task "IDLE1" running on CPU1. That doesn't make sense to me, how can "IDLE0" cause it if on CPU0 is running "main". Am I missing something?

Also what was very weird to me is that after boot the watchdog gets triggered every 5 seconds and the backtrace is changing. It gets to the function spi_flash_disable_interrupts_caches_and_other_cpu() every time but in it it either goes through the xTaskResumeAll() or vTaskSuspendAll() function. Sometimes it doesn't go through neither. This sparked a thought: It's just a timeout not an assert or something "illegal", so it can fail basically anywhere. But it always fails somewhere "randomly" in the function spi_flash_disable_interrupts_caches_and_other_cpu() so there is probably somewhere an infinite loop and when the watchdog trips it prints a backtrace somewhere in the loop. Lo and behold in the function is this loop:

Code: Select all

do {
#if ( ( CONFIG_FREERTOS_SMP ) && ( !CONFIG_FREERTOS_UNICORE ) )
    //Note: Scheduler suspension behavior changed in FreeRTOS SMP
    vTaskPreemptionDisable(NULL);
#else
    // Disable scheduler on the current CPU
    vTaskSuspendAll();
#endif
    cpuid = xPortGetCoreID();
    other_cpuid = (cpuid == 0) ? 1 : 0;
#ifndef NDEBUG
    s_flash_op_cpu = cpuid;
#endif
    s_flash_op_can_start = false;
    ipc_call_was_send_to_other_cpu = esp_ipc_call_nonblocking(other_cpuid, &spi_flash_op_block_func, (void *) other_cpuid) == ESP_OK;
    if (!ipc_call_was_send_to_other_cpu) {
        // IPC call was not send to other cpu because another nonblocking API is running now.
        // Enable the Scheduler again will not help the IPC to speed it up
        // but there is a benefit to schedule to a higher priority task before the nonblocking running IPC call is done.
        #if ( ( CONFIG_FREERTOS_SMP ) && ( !CONFIG_FREERTOS_UNICORE ) )
            //Note: Scheduler suspension behavior changed in FreeRTOS SMP
            vTaskPreemptionEnable(NULL);
        #else
            xTaskResumeAll();
        #endif
    }
} while (!ipc_call_was_send_to_other_cpu);
From this loop I can see that the function changes when switching to multicore. To double check I simply added a log in the loop, recompiled and it confirmed that it looped infinitely. Digging deeper I gave up on trying to understand why the hell does the ipc_call_was_send_to_other_cpu variable not turn true. If it helps someone, here is the esp_ipc_call_nonblocking() function:

Code: Select all

esp_err_t esp_ipc_call_nonblocking(uint32_t cpu_id, esp_ipc_func_t func, void* arg) {
    if (cpu_id >= portNUM_PROCESSORS || s_ipc_task_handle[cpu_id] == NULL) {
        return ESP_ERR_INVALID_ARG;
    }
    if (cpu_id == xPortGetCoreID() && xTaskGetSchedulerState() != taskSCHEDULER_RUNNING) {
        return ESP_ERR_INVALID_STATE;
    }

    // Since it can be called from an interrupt or Scheduler is Suspened, it can not wait for a mutex to be released.
    if (esp_cpu_compare_and_set((volatile uint32_t *)&s_no_block_func[cpu_id], 0, (uint32_t)func)) {
        s_no_block_func_arg[cpu_id] = arg;
        s_no_block_func_and_arg_are_ready[cpu_id] = true;

        if (xPortInIsrContext()) {
            vTaskNotifyGiveFromISR(s_ipc_task_handle[cpu_id], NULL);
        } else {
#ifdef CONFIG_ESP_IPC_USES_CALLERS_PRIORITY
            vTaskPrioritySet(s_ipc_task_handle[cpu_id], IPC_MAX_PRIORITY);
#endif
            xTaskNotifyGive(s_ipc_task_handle[cpu_id]);
        }
        return ESP_OK;
    }

    // the previous call was not completed
    return ESP_FAIL;
}

Sprite
Espressif staff
Espressif staff
Posts: 10612
Joined: Thu Nov 26, 2015 4:08 am

Re: ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Postby Sprite » Thu Jun 25, 2026 12:39 am

It's an odd problem in general. Would you be able to whittle down your program to the bare minimum that still does that, then post that here? That way we can reproduce it.

MicroController
Posts: 2669
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: ESP32-S3 esp_flash_read() times out watchdog when running in multicore

Postby MicroController » Thu Jun 25, 2026 5:33 am

If I understand it correctly it says that the task "IDLE0" running on CPU0 did not reset the watchdog, but when it lists currently running tasks, there is the task "main" running on CPU0 and the task "IDLE1" running on CPU1. That doesn't make sense to me, how can "IDLE0" cause it if on CPU0 is running "main". Am I missing something?
At any point in time, there can naturally be only one task actually executing/running on each core. When&while other tasks are running/runnable, the IDLE task does not get any CPU time to execute; it keeps "waiting" for unused CPU time. In this state, it cannot reset its watchdog, so the watchdog detects that the IDLE task does not make progress and triggers.

In short, the watchdog triggering on an IDLE task indicates that some task(s) use(s) 100% of the CPU for extended periods.
It's just a timeout not an assert or something
Yes, it's basically just a hardware timer which triggers an interrupt if it's not reset before it expires; the interrupt occurs asynchronously.

Who is online

Users browsing this forum: Bing [Bot], ChatGPT-User and 1 guest