Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby RobMeades » Wed Nov 18, 2020 2:42 pm

EDIT: I think the issue here is simply that heap_caps_get_free_size(MALLOC_CAP_8BIT) reports the wrong number some of the time. Here's a really simple example to illustrate the problem:

Code: Select all

#include "freertos/FreeRTOS.h"
#include "freertos/semphr.h"

#include "esp_heap_trace.h"
#include "esp_heap_caps.h"

#define NUM_RECORDS 10
static heap_trace_record_t traceRecord[NUM_RECORDS]; 

void app_main(void)
{
    SemaphoreHandle_t mutex1;
    SemaphoreHandle_t mutex2;
    size_t heapUsed;
    size_t heapUsedAtStart;

    heap_trace_init_standalone(traceRecord, NUM_RECORDS);

    printf("Starting up...\n");

    heap_trace_start(HEAP_TRACE_ALL);

    heapUsedAtStart = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    printf("Heap free at start %d.\n", heapUsedAtStart);

    heapUsed = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    mutex1 = xSemaphoreCreateMutex();
    printf("Heap used by first mutex %d.\n", heapUsed - heap_caps_get_free_size(MALLOC_CAP_8BIT));

    heapUsed = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    mutex2 = xSemaphoreCreateMutex();
    printf("Heap used by second mutex %d.\n", heapUsed - heap_caps_get_free_size(MALLOC_CAP_8BIT));

    heapUsed = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    vSemaphoreDelete(mutex1);
    printf("Heap free'd after deleting just first mutex %d.\n", heap_caps_get_free_size(MALLOC_CAP_8BIT) - heapUsed);

    heap_trace_stop();
    heap_trace_dump();

    if (!heap_caps_check_integrity_all(true)) {
        printf("Heap is corrupt!\n");
    }

    heapUsed = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    printf("Heap used at end %d, therefore total heap used %d.\n", heapUsed, heapUsedAtStart - heapUsed);

    while (1);
}
This produces the output:

Code: Select all

Starting up...
Heap free at start 286244.
Heap used by first mutex 84.
Heap used by second mutex 84.
Heap free'd after deleting just first mutex 80.
2 allocations trace (10 entry buffer)
80 bytes (@ 0x3ffb98f4) allocated CPU 0 ccount 0x0518e10c caller 0x40087b06:0x40087d5b
freed by 0x400881f7:0x400e1878
80 bytes (@ 0x3ffb9948) allocated CPU 0 ccount 0x051f3ca4 caller 0x40087b06:0x40087d5b
80 bytes alive in trace (1/2 allocations)
total allocations 2 total frees 1
Heap used at end 286156, therefore total heap used 88.
So even though only 84 (80 plus a 4 byte overhead) bytes of heap have been used, heap_caps_get_free_size(MALLOC_CAP_8BIT) reports that 88 bytes have been used, hence a leak on top of the actual leak of 4 bytes. I will raise this as an issue on Github.

Original post continues:
I'm adding memory leak checking to my tests, as measured by making calls to xPortGetFreeHeapSize() at the start and end of each test. In one particular test I have found that 40 bytes of heap are being "leaked" [see also a much simpler case of a similar problem in my reply below]. This test creates dynamic tasks and deletes them (blocking for the idle task to run after each deletion in order that resources can be cleaned-up) and installs and then uninstalls a pair of UART drivers that are connected back to back (though no characters are ever sent over either UART).

I have covered the intervening code in many more calls to xPortGetFreeHeapSize() but I am unable to get a consistent answer as to what's going on: basically blocks of four or eight bytes just seem to go missing. For instance I might create a mutex with a call to xSemaphoreCreateMutex(), costing 84 bytes (as measured by calls to xPortGetFreeHeapSize() either side of xSemaphoreCreateMutex()), but some time later when all my code is doing is deleting the mutex (no serial driver or other task [aside from idle] [should be] active at the time) with a call to vSemaphoreDelete() only 80 bytes come back (as measured by calls to xPortGetFreeHeapSize() either side of vSemaphoreDelete()).

I have run heap_trace in standalone/leak mode over this test and it only finds that 2 x 12 bytes have been allocated but not free()'d, both in get_desc_for_int()/esp_intr_alloc_intrstatus() at intr_alloc.c:230/intr_alloc.c:599, which I guess is the UART drivers doing something. It also says "total allocations 52 total frees 52". A total memory trace for the period is attached.

My question is: how can I find out where the other 16 bytes are going? And how can I get all 40 bytes back again?

The code that lies above the native ESP-IDF calls is also being run on NRF52 and STM32F4 and shows no memory leaks there, so whatever is going on must be connected with the underlying ESP32 code or how I'm using it. No tasks aside from idle and the main task are being run before or after the test. I'm not printf()ing anything other than integers. I have printf()'ed before this test starts so any initial stdio allocation for this task is not a part of the sums. This is on ESP-IDF release/v4.1 (573f5de99).
Attachments
total_mem_trace.txt
(10.9 KiB) Downloaded 311 times
Last edited by RobMeades on Thu Nov 19, 2020 10:04 am, edited 9 times in total.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Unable to trace a memory leak

Postby RobMeades » Wed Nov 18, 2020 6:10 pm

FYI, I have another fail case which does not involve the UART or creating/deleting any tasks and which also shows a mysterious additional 4 byte loss that I cannot understand [this happens to be the last test I run in a sequence, so not the first outing for stdio from this task]:

  • xPortGetFreeHeapSize() (and heap_caps_get_free_size(MALLOC_CAP_8BIT), I've tried both) reports 287,984 bytes free at the start.
  • I initialise myself, which involves creating a couple of semaphores: xPortGetFreeHeapSize() reports 287,904 and then 287,820, so for some reason only 80 bytes for the first and then 84 bytes for the second.
  • I call the three mbedtls crypto functions: mbedtls_sha256(), mbedtls_md_hmac() and mbedtls_aes_crypt_cbc(), of which mbedtls_sha256() holds onto some memory: xPortGetFreeHeapSize() reports 287,736 bytes free, so a loss of 84 bytes, which I guess is semaphore protection the crypto functions need.
  • I deinitialise myself, deleting the two semaphores I created at initialisation, and xPortGetFreeHeapSize() shows 287,896 bytes, so an overall loss, from the 287,984 at the start, of 88 bytes, whereas we know that the only thing holding memory is mbedtls_sha256() which cost 84 bytes, not 88.
And heap_trace shows:

Code: Select all

3 allocations trace (100 entry buffer)
80 bytes (@ 0x3ffb95cc) allocated CPU 0 ccount 0xacd27c20 caller xQueueCreateMutex at C:/esp32/esp-idf-latest/components/freertos/queue.c:498
freed by vQueueDelete at C:/esp32/esp-idf-latest/components/freertos/queue.c:1836 (discriminator 2)
80 bytes (@ 0x3ffb96c8) allocated CPU 0 ccount 0xacd2acd8 caller xQueueCreateMutex at C:/esp32/esp-idf-latest/components/freertos/queue.c:498
freed by vQueueDelete at C:/esp32/esp-idf-latest/components/freertos/queue.c:1836 (discriminator 2)
80 bytes (@ 0x3ffb971c) allocated CPU 0 ccount 0xacde0f38 caller sha_get_engine_state at C:/esp32/esp-idf-latest/components/mbedtls/port/esp32/sha.c:142 (discriminator 1)
80 bytes alive in trace (1/3 allocations)
total allocations 3 total frees 4
So xPortGetFreeHeapSize() over the entire function reports that I've lost 88 bytes, xPortGetFreeHeapSize() over just the bit that lost something reports that I've lost 84 bytes and heap_trace reports that I've lost 80 bytes. Now I see from other posts that allocations are accompanied by a 4 byte overhead which is fine but it must be consistent(?), it shouldn't be present or absent depending on some criteria I cannot be aware of (can it?).

My key question then is: how can I reliably detect memory leaks in my unit tests? I can't have heap_trace switched on and parse its output, there has to be a single number I can check with an assert (i.e. something like TEST_ASSERT(freeAtStart - freeAtEnd == lostToClib) where hopefully lostToClib is zero, kinda as you do in your TEST_TEARDOWN). And (secondarily) is there a way to reclaim the 24/40 (depending on who is right) and the 80/84/88 (depending on who is right) from the UART driver and mbedtls_sha256() respectively?

For completeness, heap_caps_check_integrity_all() reports no errors.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby PeterR » Thu Nov 19, 2020 5:04 pm

My key question then is: how can I reliably detect memory leaks in my unit tests?
I run all unit tests cross platform. Visual Studio etc will then tell you the error of your ways.
If you use the C++11 stuff then quickly doable and much easier/better to step through!
Needs a bit/lot more work with drivers but you can get all your logic tested.
& I also believe that IDF CAN should be fixed.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby RobMeades » Thu Nov 19, 2020 6:04 pm

Agreed, but surely Visual Studio still has to use the same "how much heap do I have left now" functionality of the target code, which is the thing that isn't working here?

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby ESP_Angus » Thu Nov 19, 2020 11:37 pm

RobMeades wrote:
Wed Nov 18, 2020 2:42 pm
EDIT: I think the issue here is simply that heap_caps_get_free_size(MALLOC_CAP_8BIT) reports the wrong number some of the time. Here's a really simple example to illustrate the problem:
Hi Rob,

I think what you are measuring here is the cost of heap fragmentation.

To give an overly simplified example, assume you start with one large contiguous 0x1000 byte heap.
  • Block 0 @0x3ff00004 size 0xffc (FREE)
Total free space: 0xffc

The block has a 4 byte header and is then free space.

Then, after allocating two semaphores you will have 3 blocks
  • Block 0 @0x3ff00004 size 0x54 (USED)
  • Block 1 @0x3ff00058 size 0x54 (USED)
  • Block 2 @0x3ff000b0 size 0xf4c (FREE)
Total free space: 0xf4c (0xffc - 0x58 - 0x58)

Note there are now 12 bytes of headers (4 * 3).

Then you free the first semaphore:
  • Block 0 @0x3ff00004 size 0x54 (FREE)
  • Block 1 @0x3ff00058 size 0x54 (USED)
  • Block 2 @0x3ff000b0 size 0xf4c (FREE)
Free space is now 0xfa (0xf4c + 0x54). There are still 3 blocks, so still 12 bytes of header overhead.

The heap implementation can't merge the two free blocks until Block 1 is also free, in which case it will go back to the "one free block" start state.

I can't say that this is the exact allocation pattern you're seeing, because that will depend on the exact firmware behaviour and the layout of the heap at the start of your test will be more complex than a single empty block.

If you're very keen then you can use the heap_caps_dump() function to actually dump every block in the heap, and see the real-world version of this simplified case.

(Note also that in ESP-IDF v4.3 - current master branch - we have switched heap implementations, and the specifics become a bit less straightforward because the implementation is optimised for allocation speed. But the general principle of heap fragmentation is the same.)
which mbedtls_sha256() holds onto some memory:

Code: Select all

80 bytes (@ 0x3ffb971c) allocated CPU 0 ccount 0xacde0f38 caller sha_get_engine_state at C:/esp32/esp-idf-latest/components/mbedtls/port/esp32/sha.c:142 (discriminator 1)
ESP-IDF currently "lazy initializes" some static locks (mutex semaphores) the first time certain code is executed. This is for implementation reasons (biggest one is that FreeRTOS static initialization feature is disabled by default to save code size). This SHA mutex is one of these. We know it's not ideal, but at least the total amount of memory used by these locks is fairly low.

One way to check between "lazy initialization" and "memory leak" is to run the sequence of code more than once and check the usage stabilizes after the first call (plus or minus some artifacts of heap fragmentation that may cause more or less free memory to be left after a certain pattern of allocates and frees).

These two factors are major reasons why the leak checks that you noted in our unit tests do not have a zero threshold for detecting a memory leak.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby RobMeades » Fri Nov 20, 2020 12:59 am

Thanks for the very comprehensive answer, much appreciated. To be clear, I don't mind the lazy block allocation, it presents a reasonable solution in many cases. The problem is that I need to know its impact in my tests so that I can determine whether a heap loss is a result of such an action by the underlying system versus a loss caused by my code.

The issue as I understand it is that the overhead in the allocation system, the mysterious four bytes, is not per "user allocation" it is per "block of contiguous user allocations". As there are more separated sets of these contiguous user allocations the number of these four byte overheads, which have to be taken account of in the "heap free" number, goes up.

However this is only a problem where a free() has not occurred, preventing blocks being merged. If this is down to me then it is a legitimate fail, if it is down to the underlying system performing a lazy allocation then I can potentially get that out of the way at start of day, before I do anything, so that it drops out of my heap sums

I can do that for the crypto stuff just by calling SHA whatever before my tests start. Where I have a problem is the stuff that is lost to the UART driver: that seems to just hang on to 12 bytes and prevent it being coalesced. It is not lazy in the sense of a once-only allocation, it occurs with each UART driver install/uninstall. Do you know if there is any way for me to do anything about this? Either get the driver to free those 12 bytes or not take them in the first place?

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby ESP_Angus » Fri Nov 20, 2020 5:16 am

Hi Rob,
RobMeades wrote:
Fri Nov 20, 2020 12:59 am
The issue as I understand it is that the overhead in the allocation system, the mysterious four bytes, is not per "user allocation" it is per "block of contiguous user allocations". As there are more separated sets of these contiguous user allocations the number of these four byte overheads, which have to be taken account of in the "heap free" number, goes up.
Not quite. Every block in the heap (free or used) has a four byte header. This is the metadata that the heap uses to keep track of its own internal structure - ie how it knows how big each allocation is, and how much free space is available.

"Multiple contiguous user allocations" have one 4 byte header for each individual allocation, which tells the heap how long it is.

Allocating from the heap will usually split a free block in half, to leave you with one block allocated and a block with the remaining free space (the only time this doesn't happen is if the free block is already the right size, or close enough that the remaining free block would not contain any free space).

When you call free(), the heap allocator will merge any adjacent free blocks to create one large free block. If there's still an allocated block in between the two free blocks then it can't do this so you end up a smaller free block on each side - this is the concept of heap fragmentation.

The more fragmentation, the more total free heap size is also lost due to lots of non-contiguous free blocks that each need their own 4 byte header.
RobMeades wrote:
Fri Nov 20, 2020 12:59 am
I can do that for the crypto stuff just by calling SHA whatever before my tests start.
This is a good solution.
RobMeades wrote:
Fri Nov 20, 2020 12:59 am
Where I have a problem is the stuff that is lost to the UART driver: that seems to just hang on to 12 bytes and prevent it being coalesced. It is not lazy in the sense of a once-only allocation, it occurs with each UART driver install/uninstall. Do you know if there is any way for me to do anything about this? Either get the driver to free those 12 bytes or not take them in the first place?
If you enable the heap tracing feature in memory leak mode, do you see any allocations left after calling driver install/uninstall?

If not, then this might be overhead due to fragmentation - the same number of total bytes are allocated before and after, but due to the exact sequence of allocations and frees the addresses of some memory has changed, and the heap is more fragmented.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby RobMeades » Fri Nov 20, 2020 9:28 am

Understood, there is an additional overhead for each contiguous block.

On the UART thing, yes, each install/uninstall of the driver does leave 12 bytes unrecovered each time. The loss is always in get_desc_for_int()/esp_intr_alloc_intrstatus() at intr_alloc.c:230/intr_alloc.c:599: you'll see it if you look for esp_intr_alloc_intrstatus in the attachment:
total_mem_trace.txt
(10.9 KiB) Downloaded 319 times

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby ESP_Angus » Tue Nov 24, 2020 6:40 am

Hi Rob,

Thanks for the extra info. I can't reproduce this on the master branch, could you please give some more information about which ESP-IDF version you're using and the exact arguments being passed to uart_driver_install() and uart_driver_delete() each time? Do both functions return ESP_OK each time they are called?

Thanks,

Angus

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Problem with xPortGetFreeHeapSize()/heap_caps_get_free_size()

Postby RobMeades » Tue Nov 24, 2020 9:34 am

Thanks very much for trying it, I appreciate your efforts in looking into what is quite an esoteric problem here, not many company support organisations are so quick and helpful :-).

We call uart_driver_install(2, 1024, 0, 20, &queueA, 0) and then uart_driver_install(1, 1024, 0, 20, &queueB, 0), followed a little later by uart_driver_delete(2) then uart_driver_delete(1) (deliberately in the "wrong" order as that's one of the things in our code that we're testing here).

We always pull the very latest release/v4.1 before testing, currently #c3324a82a.

Now it _might_ be that this loss of 12 bytes for each UART only happens the first time we install and then delete the driver. Did you see no heap loss at all in your headrev tests when you tried it?

I will put together some test code to see if I can reproduce it in a simple way.

Who is online

Users browsing this forum: No registered users and 58 guests