Failure to provision with 4.x IDF but works with 3.3 IDF

jsam589
Posts: 67
Joined: Sat Aug 17, 2019 9:31 pm

Failure to provision with 4.x IDF but works with 3.3 IDF

Postby jsam589 » Mon Dec 16, 2019 4:19 am

Hi - I am porting a working Azure IoT application from v3.3 IDF to v4.x. I have tried 4.0-beta2 as well as 4.1 from Oct 15, 2019. I have struggled to find the reason why the 4.x fails during provisioning (during handshaking specifically), when essentially the same code built against v3.3 IDF works correctly.

I am using esp-azure 1.0 release, unchanged. Device is ESP32 WROVER. I have done many experiments but fail to find anything that I may be doing wrong. In hopes that someone can reproduce the issue and solve it, I am attaching a bare-bones example, built on the IDF's Wi-Fi "station" example. I have added esp-azure and just try to get SNTP and then request provisioning.

What is attached.
Basic project, including partitions.csv and sdkconfig.defaults files. Project has been built against v4.0-beta2 (CMake) and v3.3 (make). It does not include esp-azure (because of its size), but expects this to be symbolically linked to components/esp-azure/. This works for Linux or Windows and the build tools don't know the difference.

What is not included.
Dummy Wi-Fi credentials are in the sdkconfig.defaults file, so anyone wishing to reproduce the issue should substitute theirs in that place. Also, my cert and key files are not included. I use lines like the following to program them in from the command line:

Code: Select all

Ver 3.3 IDF:
Four (4) reset-button-sequence needed to get through all script steps.
parttool.py -p /dev/ttyUSB0 --partition-name cert write_partition --input partn_cfg/b4e62dd609_cert.bin
parttool.py -p /dev/ttyUSB0 --partition-name key  write_partition --input partn_cfg/b4e62dd609_key.bin

Wrote 8192 bytes (4113 compressed) at 0x00efd000 in 0.4 seconds (effective 178.1 kbit/s)...
Hash of data verified.
Wrote 4096 bytes (2497 compressed) at 0x00eff000 in 0.2 seconds (effective 146.4 kbit/s)...
Hash of data verified.

# v4.0_beta2 IDF:
parttool.py -p /dev/ttyUSB0 write_partition --partition-name cert --input partn_cfg/b4e62dd609_cert.bin

Written contents of file 'partn_cfg/b4e62dd609_cert.pem' at offset 0x930000

parttool.py -p /dev/ttyUSB0 write_partition --partition-name key --input partn_cfg/b4e62dd609_key.bin

Written contents of file 'partn_cfg/b4e62dd609_key.pem' at offset 0x932000
The console output when the provisioning / handshaking fails looks like this (more complete file attached):

Code: Select all

I (3680) wifi station: got ip:192.168.1.106
D (3680) wifi station: STA Connect bit received.
Initializing SNTP
ESP platform sntp inited!
Time is not set yet. Connecting to WiFi and getting time over NTP. timeinfo.tm_year:70
Waiting for system time to be set... tm_year:0[times:1]
Waiting for system time to be set... tm_year:70[times:2]
Waiting for system time to be set... tm_year:70[times:3]
I (9700) platform: The current date/time is: Mon Dec 16 01:51:23 2019
W (9700) wifi station: Attempt MQT INiT.
D (9700) nvs: nvs_open_from_partition storage 0
D (9700) nvs: nvs_close -2146599950
AAA AAA provis nvs = 0
E (9710) MQTT: Reading provision flag from NVS failed
Provisioning AbCdEf API Version: 1.2.14
Iothub API Version: 1.2.14f
AbCdEf custom HSM create
gen key: 3ffcede4
gen cert: 3ffccddc
Provisioning AbCdEf line 214
Provisioning AbCdEf line 219 trusted cert samples
Provisioning AbCdEf line 226 call reg dev 0x3ffccb40
Provisioning AbCdEf line 234 about to enter DO LOOP 40
D (9780) esp-tls: host:global.azure-devices-provisioning.net: strlen 37
D (10020) esp-tls: connecting...
D (10450) esp-tls: handshake in progress...
D (10500) esp-tls: handshake in progress...
D (10540) esp-tls: handshake in progress...
[b]E (10770) esp-tls: mbedtls_ssl_handshake returned -0x7500[/b]
I (10770) esp-tls: Certificate verified.
I would love to hear that others have seen this issue and have a solution. If not solved, I must continue to use v.3.3, even though I like 4.0 more.
Thanks!
Attachments
iotProvErr.tar.gz
(12.93 KiB) Downloaded 94 times
v40b2_provFail.txt
(19.67 KiB) Downloaded 109 times

ESP_Angus
Posts: 2182
Joined: Sun May 08, 2016 4:11 am

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby ESP_Angus » Mon Dec 16, 2019 11:25 pm

Hi jsam589,

Mbedtls error -0x7500 is MBEDTLS_ERR_SSL_CERTIFICATE_TOO_LARGE, meaning the client cert is too large to send in a single TLS message. Between v3.3 and v4.0 we enabled asymmetric mbedTLS buffer sizes by default so the transmit buffer is smaller than the receive buffer, this saves memory by default but limits the maximum TLS message size than can be sent.

Try either disabling asymmetric content length in menuconfig, or (better) increase the maximum outgoing fragment length (you may not need to increase it a lot, the default is 4096 bytes so try increasing it 1024 bytes at a time, for example - the smaller this number the lower the RAM overhead.) The problem should go away.

I'll check with the esp-azure team if this is a limit that's likely to apply to all Azure certificates, in which case we can check for it in the component.

jsam589
Posts: 67
Joined: Sat Aug 17, 2019 9:31 pm

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby jsam589 » Wed Dec 18, 2019 2:29 am

Thanks for the tips. I actually had previously tried using 16384 for the outgoing length and got a different error. No matter what config changes I try, I still get various errors during the handshaking phase. I just now tried again with outgoing length set for 8192. I am attaching my sdkconfig - there may be other settings that you will find incorrect or suspicious.

As I mentioned in OP, the same app code and same esp-azure component work if built against v3.3.1 IDF.

Here is the error I got this time. Notice that the error is followed by "Certificate verified". Does this shed any new light? Thanks!

Code: Select all

D (23848) esp-tls: handshake in progress...
E (23848) esp-tls: mbedtls_ssl_handshake returned -0x7280
I (23848) esp-tls: Certificate verified.
MORE CONTEXT:

Code: Select all

D (6138) wifi station: STA Connect bit received.
Initializing SNTP
ESP platform sntp inited!
Time is not set yet. Connecting to WiFi and getting time over NTP. timeinfo.tm_year:70
Waiting for system time to be set... tm_year:0[times:1]
Waiting for system time to be set... tm_year:70[times:2]
I (10158) platform: The current date/time is: Wed Dec 18 02:15:04 2019
W (10158) wifi station: Attempt MQT INiT.
D (10158) nvs: nvs_open_from_partition storage 0
D (10158) nvs: nvs_close -2146588714
AAA AAA provis nvs = 0
E (10168) MQTT: Reading provision flag from NVS failed
Provisioning AbCdEf API Version: 1.2.14
Iothub API Version: 1.2.14
AbCdEf custom HSM create
gen key: 3ffdde40
gen cert: 3ffdbe38
Provisioning AbCdEf line 214
Provisioning AbCdEf line 219 trusted cert samples
Provisioning AbCdEf line 226 call reg dev 0x3ffdbc0c
Provisioning AbCdEf line 234 about to enter DO LOOP 40
D (10238) esp-tls: host:global.azure-devices-provisioning.net: strlen 37
D (10418) esp-tls: connecting...
D (10908) esp-tls: handshake in progress...
D (10958) esp-tls: handshake in progress...
D (10998) esp-tls: handshake in progress...
D (23808) esp-tls: handshake in progress...
D (23848) esp-tls: handshake in progress...
E (23848) esp-tls: mbedtls_ssl_handshake returned -0x7280
I (23848) esp-tls: Certificate verified.
Attachments
sdkconfig.txt
(26.27 KiB) Downloaded 103 times

ESP_Angus
Posts: 2182
Joined: Sun May 08, 2016 4:11 am

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby ESP_Angus » Wed Dec 18, 2019 3:10 am

Hi jsam589,
jsam589 wrote:
Wed Dec 18, 2019 2:29 am
Here is the error I got this time. Notice that the error is followed by "Certificate verified". Does this shed any new light? Thanks!

Code: Select all

D (23848) esp-tls: handshake in progress...
E (23848) esp-tls: mbedtls_ssl_handshake returned -0x7280
I (23848) esp-tls: Certificate verified.
Interesting, this is MBEDTLS_ERR_SSL_CONN_EOF. It means the other end closed the TCP connection during the handshake, for some reason.

I don't know about Azure specifically, but a lot of cloud TLS providers will do this if they don't trust the client certificate. But you mentioned the same client certificate works in ESP-IDF v3.3?

Can you please try enabling mbedTLS debugging to give the full details of what happens during the handshake?

To do this, enable CONFIG_MBEDTLS_DEBUG . The esp-tls layer will automatically do the rest, and you should see a lot more log output.

EDIT: Changed directions as azure uses esp-tls.

jsam589
Posts: 67
Joined: Sat Aug 17, 2019 9:31 pm

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby jsam589 » Wed Dec 18, 2019 6:23 pm

Performed using v4.0-beta2. I activated the MBEDTLS debug feature and I have attached output files. One of the attached files shows my steps programming the board. Also attaching new sdkconfig file.

This is just so strange. Do you think it is some esoteric configuration param that differs from 3.3.x to 4.x?

EDIT: Added debug log from 3.3.1 IDF. This should be informative when compared against that from 4.0-beta2.
Attachments
sdkconfig2a.txt
(26.45 KiB) Downloaded 95 times
mbed_detail.txt
(32.93 KiB) Downloaded 94 times
flash_board.txt
(3.53 KiB) Downloaded 89 times

jsam589
Posts: 67
Joined: Sat Aug 17, 2019 9:31 pm

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby jsam589 » Wed Dec 18, 2019 6:44 pm

MBEDTLS log from v3.3.1 which has success.

First diff in the two debug files seems to be the root of the failure:

SUCCEEDS:

Code: Select all

I (3356) mbedtls: ssl_tls.c:2721 in_left: 0, nb_want: 5
I (3356) mbedtls: ssl_tls.c:2722 ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
FAILS:

Code: Select all

I (11044) mbedtls: ssl_tls.c:2721 in_left: 0, nb_want: 5
I (11054) mbedtls: ssl_tls.c:2722 ssl->f_recv(_timeout)() returned 0 (-0x0000)
W (11064) mbedtls: ssl_tls.c:4973 mbedtls_ssl_fetch_input() returned -29312 (-0x7280)
Attachments
v331_run_output.txt
(10.78 KiB) Downloaded 96 times

ESP_Angus
Posts: 2182
Joined: Sun May 08, 2016 4:11 am

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby ESP_Angus » Thu Dec 19, 2019 5:58 am

Thanks jsam589, good idea providing logs from the working session also.

So there is no difference in firmware/config between the working and non-working, except for ESP-IDF version?

I looked at the logs and it seems like the server is closing the connection after receiving the TLS "Client Hello" message (normally the server responds with a TLS Server Hello message). The weird thing is that the logs show the parameters for the Client Hello are the same in v3.3 and v4.0-beta2 versions, so unclear what the server doesn't like about the Client Hello.

Will pass this on to esp-azure folks to take a look at.

ESP_Mahavir
Posts: 36
Joined: Wed Jan 24, 2018 6:51 am

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby ESP_Mahavir » Thu Dec 19, 2019 12:42 pm

Can you please try once by enabling internal memory allocation strategy for mbedtls (please refer https://github.com/espressif/esp-idf/bl ... config#L22)?

I am just trying to see if it is not something related to issue mentioned at https://github.com/espressif/esp-idf/issues/3624

jsam589
Posts: 67
Joined: Sat Aug 17, 2019 9:31 pm

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby jsam589 » Fri Dec 20, 2019 1:36 am

This is a promising path to pursue! On the bare-bones example app I have been using, changing MBEDTLS to use internal memory seems to have eliminated the error (4.0-beta2 IDF). It successfully provisioned.

BUT... I now have a problem on my real application. It builds, but when it runs, I get error
#define MBEDTLS_ERR_SSL_ALLOC_FAILED -0x7F00

How much memory does MBEDTLS require?
What would you all suggest to free enough internal RAM so that MBEDTLS can run?

Note: I use static allocation methods to create most of my FreeRTOS objects (tasks, queues, semaphores). Are there attributes I can apply to force these into SPIRAM at build time?

I think we are getting close, but I obviously need to make it workable for my real app.

ESP_Mahavir
Posts: 36
Joined: Wed Jan 24, 2018 6:51 am

Re: Failure to provision with 4.x IDF but works with 3.3 IDF

Postby ESP_Mahavir » Fri Dec 20, 2019 9:22 am

Thanks for confirmation. So this is related to same toolchain SPIRAM issue being discussed at https://github.com/espressif/esp-idf/issues/3624. Fix for this is to upgrade toolchain to latest `xtensa-esp32-elf-gcc8_2_0-esp-2019r2` (ESP-IDF specific commit can be found at https://github.com/espressif/esp-idf/co ... 3c00dba496)

Typical mbedTLS connection with asymmetric content length (16K for RX buffer and 4K for TX buffer) takes 28KB to 30KB dynamic memory and roughly 5-6KB of stack memory. It is quite possible to run typical use-case with WiFi and 1/2 TLS connection entirely from internal memory, but again it depends on application requirement and its usage of internal memory.

For SPIRAM allocation, you may use dynamic allocation with capability `MALLOC_CAP_SPIRAM` or force BSS into external memory using attribute `EXT_RAM_ATTR`. Please note placing DATA section in external SPIRAM is not yet supported.

Who is online

Users browsing this forum: No registered users and 3 guests