heap corruption inconsistency

0xffff
Posts: 41
Joined: Tue Jun 19, 2018 1:53 am

heap corruption inconsistency

Postby 0xffff » Thu Jul 19, 2018 4:59 am

Hi,

I'm chasing a heap corruption issue so I enabled comprehensive heap poisoning, and built with gdbstub to break when this occurrs. When it happened I saw:

Code: Select all

CORRUPT HEAP: Invalid data at 0x3ffdc0b8. Expected 0xfefefefe got 0xfefefeff
CORRUPT HEAP: Invalid data at 0x3ffdc190. Expected 0xfefefefe got 0xfefefeff
assertion "verify_fill_pattern(data, size, true, true, true)" failed: file "/dev/p/Firmware/esp-idf/components/heap/./multi_heap_poisoning.c", line 183, function: multi_heap_malloc
abort() was called at PC 0x400dfacb on core 0


However, in gdb when I see:

Code: Select all

(gdb) x/12x 0x3ffdc0b0
0x3ffdc0b0:	0xcececece	0xcececece	0xcececece	0xcececece
0x3ffdc0c0:	0xcececece	0xcececece	0xcececece	0xcececece
0x3ffdc0d0:	0xcececece	0xcececece	0xcececece	0xcececece
which seems inconsistent with the error message. What am I missing?

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: heap corruption inconsistency

Postby WiFive » Thu Jul 19, 2018 5:59 am


ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: heap corruption inconsistency

Postby ESP_Angus » Thu Jul 19, 2018 6:09 am

Hi 0xffff,

This had me scratching my head for a minute as well!

The reason is that verify_fill_pattern() is swapping each word from 0xfefefefe (free memory) to 0xcececece as it goes through the memory region during allocation (uses one pass for performance), and even if it finds an invalid word it finishes the sweep before aborting (the idea being to report all of the invalid bytes in the region).

Clearly this is a bit confusing when you go to do a post-mortem in gdb.

I'm going to benchmark the performance of checking all words before swapping them (I suspect it's OK on regular RAM but may have issues on PSRAM). If we can do this, I'll change the function (premature optimisation is the root of all evil, etc, etc).

If the performance impact is too high, we can at least stop swapping patterns from the invalid word onwards. You can make this change yourself by putting "swap_pattern = false;" underneath "valid = false;" in multi_heap_poisoning.c:149

(BTW You can assume that every word in the buffer which doesn't trigger an error message was 0xfefefefe before it was 0xcececece, and the others were the values shown in the error message. 0xfefefeff usually means something has done "var++" on a freed address.)

Who is online

Users browsing this forum: No registered users and 122 guests