OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

milan pipaliya
Posts: 21
Joined: Fri Oct 11, 2024 4:14 am

OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

Postby milan pipaliya » Tue May 13, 2025 12:14 pm

Hello ESP RainMaker Community,

I’m facing a persistent issue with OTA updates on custom ESP32 devices using ESP RainMaker, and I’m hoping for some guidance or suggestions from the community.

Issue Summary
When deploying a new device with our custom firmware, the first OTA update attempt almost always fails.

After rebooting the device, OTA updates typically succeed on the next attempt.

We monitor heap status using logs like:
I (251986) MEM: Free heap: 70520, Largest block: 31744

We have already increased the task stack size, but the issue persists.

Our custom code stores data in heap and stack at runtime, which might be related.

Error Logs and Behavior
Here’s what we observe during the failed OTA attempt:

OTA starts, firmware download is initiated.

Heap size appears stable and sufficient before and during OTA.

The OTA process fails with errors such as:

E (335568) Dynamic Impl: alloc(16749 bytes) failed
E (335568) esp-tls-mbedtls: read error :-0x7F00:
E (335568) transport_base: esp_tls_conn_read error, errno=Success
E (335578) HTTP_CLIENT: transport_read: error - -1 | ESP_FAIL
E (335588) esp-tls-mbedtls: read error :-0x7200:
E (335588) transport_base: esp_tls_conn_read error, errno=Success
E (335598) HTTP_CLIENT: transport_read: error - -1 | ESP_FAIL
E (335608) esp_https_ota: data read -1, errno 0
E (335608) esp_rmaker_ota: ESP_HTTPS_OTA upgrade failed ESP_FAIL
I (335618) esp_rmaker_ota: Reporting failed: OTA failed: Error ESP_FAIL
I (335618) esp_rmaker_mqtt: (D)CONFIG_ESP_RMAKER_MQTT_USE_BASIC_INGEST_TOPICS

and

E (291656) esp-tls-mbedtls: read error :-0x7F00:
E (291736) esp_rmaker_ota: ESP_HTTPS_OTA upgrade failed ESP_FAIL
E (292046) esp_image: Checksum failed. Calculated 0x5e read 0xf7
E (292056) esp_rmaker_ota: Image validation failed, image is corrupted

The device reports OTA failure and does not update.

After a manual reboot, the next OTA attempt usually works without any issues.

What We've Tried
Increased task stack size for OTA and main tasks.

Verified that heap remains stable and not fragmented during OTA.

Ensured the OTA binary is built correctly with an incremented version and matching project name.

Used the recommended API: esp_rmaker_ota_enable_default() for enabling OTA.

Suspected Cause
It appears that something in our custom code (possibly heap/stack usage or memory fragmentation) affects the OTA process after initial device provisioning. After a reboot, the memory is cleaned up, and OTA works as expected. This suggests some initialization or resource allocation issue that only resolves after a reboot.

Any insights, suggestions, or references to similar issues would be greatly appreciated!

Thank you,

Piyush
Espressif staff
Espressif staff
Posts: 372
Joined: Wed Feb 20, 2019 7:02 am

Re: OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

Postby Piyush » Tue May 13, 2025 2:02 pm

This seems an issue with memory fragmentation as OTA module tries to allocate multiple large buffers on heap. Can you try these config options to see if it helps?

Reduce the size of Wi-Fi tx/rx buffers to half:

Code: Select all

CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=5
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM=16
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER_NUM=16
Some additional configs used internally for for ESP32-C2 SoC. Some of these may be useful for you too.

Code: Select all

CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=3
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM=6
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER_NUM=6
CONFIG_ESP32_WIFI_IRAM_OPT=n
CONFIG_ESP32_WIFI_RX_IRAM_OPT=n
CONFIG_ESP32_WIFI_ENABLE_WPA3_SAE=n
CONFIG_ESP32_WIFI_ENABLE_WPA3_OWE_STA=n
CONFIG_ESP_WIFI_STA_DISCONNECTED_PM_ENABLE=n
Additionally, there is probably some bug in the esp idf's OTA module which tries to allocate more memory than required. I am following that up internally to verify and fix it. I will update you on that too, but please check if the above helps.

milan pipaliya
Posts: 21
Joined: Fri Oct 11, 2024 4:14 am

Re: OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

Postby milan pipaliya » Wed May 14, 2025 4:40 am

Thanks for the suggestion!

We will try reducing the Wi-Fi buffer sizes as recommended. Just to confirm:

Are these configurations available directly in menuconfig?
We're using ESP32-S3 with ESP-IDF v5.3, and we couldn't find all of these options under Component config → Wi-Fi or similar. Please guide us to the correct location if available.

If not in menuconfig, is it okay to set them directly in sdkconfig or via sdkconfig.defaults?

❓Additional Questions:
We're currently using a 4MB flash size. If we switch to a 16MB partition table, will the OTA process work more reliably (assuming firmware size remains the same)?

Will increasing flash size reduce the chances of OTA image corruption or is it mostly unrelated?

🔴 Current Error Logs:
We still see these OTA-related errors:

E (335568) Dynamic Impl: alloc(16749 bytes) failed
E (335568) transport_base: esp_tls_conn_read error, errno=Success
E (335578) HTTP_CLIENT: transport_read: error - -1 | ESP_FAIL
E (291736) esp_rmaker_ota: ESP_HTTPS_OTA upgrade failed ESP_FAIL
E (292056) esp_rmaker_ota: Image validation failed, image is corrupted
So far:

The OTA URL is valid and downloadable.

The image starts downloading but fails halfway due to memory allocation.

Then image checksum verification fails (corruption).

We appreciate any update regarding the esp_https_ota internal bug you mentioned (related to memory over-allocation).

Thanks again for your help!

milan pipaliya
Posts: 21
Joined: Fri Oct 11, 2024 4:14 am

Re: OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

Postby milan pipaliya » Thu May 15, 2025 11:34 am

Hi, @Piyush,

In our code, we delete all running tasks just before starting the OTA update. This seems to reduce the overall stack usage and increases the available heap size. After doing this, the OTA update completes successfully.

Can you please confirm — is this the correct and recommended way to ensure successful OTA updates? Or is there a better approach to manage memory for OTA?

Thank you.

Piyush
Espressif staff
Espressif staff
Posts: 372
Joined: Wed Feb 20, 2019 7:02 am

Re: OTA Update Fails on First Attempt After Custom Code Deployment, Succeeds After Reboot – Heap/Stack Issue?

Postby Piyush » Mon May 26, 2025 1:07 pm

You can check out the docs here to further assist in optimising RAM and ensuring better reliability of OTA. OTA not only depends on the free heap available, but also size of largest free block available. Too many allocations/free calls can cause significant memory fragmentation, which may affect OTA.

Who is online

Users browsing this forum: No registered users and 2 guests