Page 1 of 1

Proper netif/ppp disconnect/close? TCP timeout retry is crashing my app

Posted: Mon Sep 04, 2023 9:44 pm
by tpbedford
Hi, we are using a quectel EC21 modem in nb-IOT mode (slow! dial-up speeds with latency measured in seconds).

What's the proper process for initiating a close/disconnect/destroy of our modem driver task?

Our issue is if we try to close an active data connection and close down DTE, DCE, PPP etc., then about 5s after this the TCPIP thread invariably attempts some kind of a retry after timeout, and tries to access destroyed/freed objects.

We're using ESPIDF5.1 (lwip,ppp,esp-netif) and our modem driver was built on some esp-modem example code that we found.

Our stack trace causing the error is this:

Code: Select all

0x4014021d: pppos_low_level_output at C:/Espressif/frameworks/esp-idf/components/esp_netif/lwip/esp_netif_lwip_ppp.c:195
0x40135048: pppos_output_last at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/netif/ppp/pppos.c:878
0x401351e9: pppos_write at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/netif/ppp/pppos.c:241
0x40134afa: ppp_write at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/netif/ppp/ppp.c:996
0x4013e07b: fsm_sdata at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/netif/ppp/fsm.c:796
0x4013e0dc: fsm_timeout at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/netif/ppp/fsm.c:282
0x40130f35: sys_check_timeouts at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/core/timeouts.c:401
0x4012ac8a: tcpip_timeouts_mbox_fetch at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/api/tcpip.c:109
0x4012ad46: tcpip_thread at C:/Espressif/frameworks/esp-idf/components/lwip/lwip/src/api/tcpip.c:142
0x4008cf12: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162  
The top-most line there is:

Code: Select all

esp_err_t ret = esp_netif_transmit(netif, data, len);
being

Code: Select all

esp_err_t esp_netif_transmit(esp_netif_t *esp_netif, void* data, size_t len)
{
    return (esp_netif->driver_transmit)(esp_netif->driver_handle, data, len);
}
(which is clearly inlined so doesn't show in the stack trace). I believe the esp_netif->driver_transmit dereference is the issue, as the netif instance has been destroyed/freed.

Our close/disconnect sequence is this:

Code: Select all

// we call this from our task
esp_modem_stop_ppp(dte); // which posts ESP_MODEM_EVENT_PPP_STOP
// the abov event leads to these being executed by the driver:
esp_netif_stop()
esp_netif_stop_ppp()
// here we see   --> NETIF_PPP_PHASE_TERMINATE   --> NETIF_PPP_PHASE_NETWORK  --> NETIF_PPP_PHASE_ESTABLISH
// we then call deinit DCE
dce->deinit(dce);
ec21_deinit(dce); // frees DCE instance

// then we destroy netif
esp_modem_netif_clear_default_handlers(modem_netif_adapter);
esp_modem_netif_teardown(modem_netif_adapter);
esp_netif_destroy(esp_netif);
// finally clean up dte
dte->deinit(dte);
... and 5s after the esp_netif_destroy() part we see the TCP thread hit a timeout, and retry, and we crash with the stacktrace from earlier.

Re: Proper netif/ppp disconnect/close? TCP timeout retry is crashing my app

Posted: Thu Feb 27, 2025 12:50 pm
by Sincap
Did you solve it?
I'm trying to create a proper ppp connection but I'm lost right now. Where did you find anything useful about it? Can you share it with us even if they are just documents?
I checked esp-modem component's way but I really didn't understand how they started PPP communication, I know I can attach driver using esp_netif_attach and create config using ESP_NETIF_DEFAULT_PPP but I just getting panic's due to NULL pointers. I really want to learn what I'm doing about these netif interfaces but I didn't find anything useful about it. If you found any, can you please share it?
Regards,

Re: Proper netif/ppp disconnect/close? TCP timeout retry is crashing my app

Posted: Wed Jun 04, 2025 2:19 am
by tpbedford
Did you solve it?
I'm trying to create a proper ppp connection but I'm lost right now. Where did you find anything useful about it? Can you share it with us even if they are just documents?
I checked esp-modem component's way but I really didn't understand how they started PPP communication, I know I can attach driver using esp_netif_attach and create config using ESP_NETIF_DEFAULT_PPP but I just getting panic's due to NULL pointers. I really want to learn what I'm doing about these netif interfaces but I didn't find anything useful about it. If you found any, can you please share it?
Regards,
Hi @Sincap yes we did solve our problem, or work-around.

For entering PPP, see these methods in esp-modem:
  • esp_err_t esp_modem_dte_change_mode(modem_dte_t *dte, modem_mode_t new_mode)
  • void esp_handle_uart_data(esp_modem_dte_t *esp_dte)
Basically the first calls the specific modem implementation to change mode (e.g. ec21 methods that send +++, ATD, ATO and then set a state var to indicate MODEM_PPP_MODE)
The second method, if the modem state is MODEM_PPP_MODE then pushes the received UART data into esp_modem_try_parse_ppp()

Re: Proper netif/ppp disconnect/close? TCP timeout retry is crashing my app

Posted: Wed Jun 04, 2025 2:26 am
by tpbedford
For the issue with crashing relating to destroying netif, the short answer is don't destroy the netif instance if you can help it.

In sdkconfig you'll see references to mbox slots for TCP/UDP. In ESP-IDF these mbox slots alloc a pbuf, which captures a reference to the active netif instance.

If you destroy the netif instance, and then later (perhaps in a parallel task) poll a socket that has unread data in an mbox slot, that mbox has a reference to the netif that was active when the socket data was received. Polling the socket will clear the mbox and it will cause the pbuf to be free'd, which in turn calls a callback into the esp-idf implementation that touches the netif instance (which has been destoryed, leading to crash).

So, if you must destroy the netif instance, be sure to wait on any other task that might be polling sockets to process all unread data first. THEN you can destroy the netif instance.