Problems after updating to latest master (reproduced in IDF example code)

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby WiFive » Sun Oct 08, 2017 9:53 pm

Are you running this on custom hardware or a devkit?

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Sun Oct 08, 2017 10:41 pm

The same Esp32 Thing from sparkfun I've used for the last months. Like I wrote above, no thing except IDF and toolchain has changed.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby ESP_Angus » Sun Oct 08, 2017 11:30 pm

permal wrote:The same Esp32 Thing from sparkfun I've used for the last months. Like I wrote above, no thing except IDF and toolchain has changed.
Fairly recently (pre-V3.0) we changed the default crystal frequency from "auto-detect" to 40MHz. The ESP32 Thing uses a 26MHz crystal (it's one of two development board models we know of that do). Have you set the crystal to 26MHz in your sdkconfig as well? It's possible for autodetection to get it wrong (which may cause weird behaviours when Wifi starts up.)

If you can, could you please post the full boot log output and the full sdkconfig file from the example somewhere? https://pastebin.com and https://gist.github.com/ are both good options.

Also please double check you have the update wifi library submodules (running git submodule update --init --recursive is the easy to ensure this).

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby kolban » Sun Oct 08, 2017 11:33 pm

As a data point, I've been running the latest ESP-IDF with the latest toolchain and been "ok". I have all kinds of boards including the Sparkfun. I ran into a problem with the latest toolchain and C++ classes but that is a very different story. My development environment is Ubuntu Linux. If the issue were grossly pervasive, I would have expected many more forum posts on the issue. If it were me, I'd check for environmental contamination. Things like environment variables for ESP, things in the PATH that are interfering ... maybe create a new Virtual Box image from the ground up and validate that works?
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Mon Oct 09, 2017 7:23 am

ESP_Angus wrote:
permal wrote:The same Esp32 Thing from sparkfun I've used for the last months. Like I wrote above, no thing except IDF and toolchain has changed.
Fairly recently (pre-V3.0) we changed the default crystal frequency from "auto-detect" to 40MHz. The ESP32 Thing uses a 26MHz crystal (it's one of two development board models we know of that do). Have you set the crystal to 26MHz in your sdkconfig as well? It's possible for autodetection to get it wrong (which may cause weird behaviours when Wifi starts up.)

If you can, could you please post the full boot log output and the full sdkconfig file from the example somewhere? https://pastebin.com and https://gist.github.com/ are both good options.

Also please double check you have the update wifi library submodules (running git submodule update --init --recursive is the easy to ensure this).
Yes, I have set it to 26Mhz.
I've double checked that the git submodules are up to date, but I'll triple check when I get home tonight, I'll also post the full boot log and sdkconfig file.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Mon Oct 09, 2017 6:49 pm

I cleared out the entire IDF and xtensa-gcc folders and cloned/downloaded it all again and set XTAL to 26Mhz - still the same result.

Here is the entire log and sdkonfig
Log: https://pastebin.com/T9s6yQ9q
sdkconfig: https://pastebin.com/zutyCGuu


Also, have a look at this paste from another run with the same codebase, line 165: https://pastebin.com/BRNLPgiV
D (14542) tcpip_adapter: if0 start ip lost tmr: enter
D (14552) tcpip_adapter: if0 start ip lost tmr: no need start because netif=0x3ffce760 interval=120 ip=0
D (14554) tcpip_adapt***ERROR*** A stack overflow in task tiT has been detected.
abort() was called at PC 0x40087a7c on core 0
0x40087a7c: vApplicationStackOverflowHook at /home/permal/esp/esp-idf/components/esp32/./panic.c:553
Task "tiT" is the lwip task, with a default stack of 2560. I increased that to ~10k, but the only difference is that it now hangs without garbage being printed.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Tue Oct 10, 2017 6:18 pm

Solved it!

So it turns out that running make in a new terminal window produces a functioning binary, both for iperf and my own code. I don't know why this matters though - would love to hear your theories as to why this was needed.

Just FYI: I'm doing development on a VM and neither it, nor the terminal windows, has been closed for at least three months or so; I've had no reason to. Until now apparently.


I take that back - issue still happening.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Tue Oct 10, 2017 10:28 pm

Ok, so after some hours of chatting and troubleshooting on IRC it turns out that FulminatingGhost also sees these problems . I'm attaching the entire nights chat log here for reference. The current conclusion is that this really is a stack overflow in the lwip task "tiT", or that some other task overwrites the stack of "tiT".

I ran some additional tests before heading off to bend and got so far as that when the stack size of tiT is >=4000 things seem stable, lowering down to the default value of 2506 causes immediate crashes again so the sweet spot is somewhere in between. However, as already pointed out - it might be something else that is trashing the stack for the "itT" task.

Code: Select all

Just had a WTF-moment: https://www.esp32.com/viewtopic.php?f=2&t=3250&start=10#p15382
<Yoari> ...now I can finally get some work done
<FulminatingGhost> very odd
<Yoari> yeah. I was even considering that I had broken my hardware somehow
<FulminatingGhost> have you looked at a core dump, when the crash happened?
<Yoari> There were no dump, it just did a hard-lock
<crashovrd> new terminal has different environment variables
<crashovrd> that is the first place i would start looking for the reason
<crashovrd> i use "stale" terminals for build too. have not encountered any issues
<Yoari> crashovrd, yeah, my thought too. I found this solution by accident - closed a window too many, so I have no possibility to determine what it actually was.
<Yoari> I've done many many updates of IDF and every toolchain without issues so far...
<Yoari> oh wtf. Same thing again :( Ok, so it is not an environment variable.
<FulminatingGhost> hmm
<Yoari> enabled cor dumping to console - hard locks, no dump
<FulminatingGhost> how much stack is allowed for the "tiT" Task ?
<FulminatingGhost> 10k now?
<Yoari> no, its back to default
<Yoari> I don't understand - how could it work for ~15 minutes, then break again?!
<Yoari> somewhat frustrated :P
<crashovrd> do a "printenv" on a new terminal
<crashovrd> when it stops working, do it again and diff it
<Yoari> huh, now it seems to work again. I wonder what I did...
<Yoari> yeah, will do
<crashovrd> alternatively, your build computer may be cursed and you will need to set it on fire and get a new one
<Yoari> crashovrd, Oh, sounds tempting
<Yoari> n/m its not working - it just got a little longer this time
<Yoari> it's not an enviroment variable that is to blame. OLDPWD and WINDOWID are the only one that have changed
<FulminatingGhost> well, something is happening in that task
<Yoari> yup
<Yoari> any time it starts the Wifi in STA mode
<crashovrd> does "make clean" cause a change?
<Yoari> nope
<crashovrd> does it happen on a different ESP32 device?
<Yoari> I have only one
<crashovrd> how about "make erase_flash" ?
<Yoari> tried that too
<crashovrd> well, thats all i got 4 u
<crashovrd> :P
<FulminatingGhost> have you tried a default sdkconfig?
<Yoari> yup
<Yoari> hm, this is interesting: W (54) rtc_clk: Possibly invalid CONFIG_ESP32_XTAL_FREQ setting (26MHz). Detected 26 MHz.
<FulminatingGhost> yeah thats normal
<FulminatingGhost> did you increase the wifi rx and tx buffer?
<FulminatingGhost> your sdkconfig shows them to be 64
<Yoari> no, I haven't.
<FulminatingGhost> and your ampdu window size seems too high
<Yoari> my what?
<FulminatingGhost> your wifi configuration seems a bit odd
<Yoari> oh, got a dump I think https://pastebin.com/NGtX6Efv Tells me nothing though...
<Yoari> I've barely touched the config, set 26Mhz for XTAL and that's it
<FulminatingGhost> delete your config and run make defconfig
<FulminatingGhost> then set your xtal
<Yoari> ok
<Yoari> done, flashing
<Yoari> what's that "ampdu window" to mentioned?
<Yoari> it hung, like previous
<Yoari> Seems setting it up as an AP also causes this, not only as STA
<FulminatingGhost> ampdu window is for frame aggregation, but I cannot really explain more about it
<FulminatingGhost> can you put your sdkconfig to pastebin again?
<Yoari> sure
<FulminatingGhost> is it happening in a task that you create?
<Yoari> no, this is the iperf example
<FulminatingGhost> oh, right
<Yoari> https://pastebin.com/gcwWy4aD
<Yoari> current sdkconfig
<Yoari> I would really like to get my hands on an older toolchain, just to see if this happens with that too
<Yoari> ..since this all started when I updated IDF and the toolchain
<FulminatingGhost> weird
<FulminatingGhost> your default sdkconfig very much differs from my default config
<Yoari> oh? interesting
<Yoari> past your and let me try it?
<FulminatingGhost> let me make a diff...
<FulminatingGhost> yeah, sec
<Yoari> what tool chain and IDF-version are you using? I'm on master and xtensa-esp32-elf-linux64-1.22.0-73-ge28a011-5.2.0
<FulminatingGhost> https://pastebin.com/UEvjws6t
<FulminatingGhost> toolchain -61
<FulminatingGhost> and latest idf
<FulminatingGhost> the master branch IDF
<Yoari> yeah, smae IDF then
<Yoari> but I've got a newer toolchain
<Yoari> would be awesome if you could try with the latest one too
<FulminatingGhost> but then you should have the same default sdkconfig
<Yoari> yeah, you'd think that...
<Yoari> trying your config now, with 26Mhz set
<Yoari> that resulted in a restart: rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
<Yoari> do you have a copy of your toolchain?
* miller7 (~miller7@unaffiliated/miller7) has joined
<FulminatingGhost> something in your environment seems to be weird
<FulminatingGhost> are you sure the IDF_PATH is set correctly?
<Yoari> yup
<FulminatingGhost> when you do "env | grep IDF_PATH" it shows the right path?^^
<Yoari> IDF_PATH=/home/permal/esp/esp-idf
<Yoari> yup, it is correct
<Yoari> looking at the diff between our sdkconfig-files...there are quite some diffs
<Yoari> do you really have commit de750e99214a51c1bcfdb42b5ada931652a3c531 of the IDF?
<Yoari> dated 6:th of October?
<FulminatingGhost> commit de750e99214a51c1bcfdb42b5ada931652a3c531
<FulminatingGhost> yup
<Yoari> huh. that is odd
<Yoari> how come we get different defaults then?!
<FulminatingGhost> have you tried restarting your VM yet?^^
<Yoari> yes
<FulminatingGhost> do you have an older idf somewhere in your system?
<Yoari> nope, I deleted that when I updated
<FulminatingGhost> let me try the example
<Yoari> please do
<FulminatingGhost> i see
<FulminatingGhost> in the iperf example sdkconfig.defaults the defaults are beeing set in the wifi section
<Yoari> oh. ofc
<FulminatingGhost> compiling on my slow machine ....
<Yoari> turn that handle :)
<Yoari> just type "sta your_ssid your_pwd" once it is running
<Yoari> that's all I do to cause it to hang
<FulminatingGhost> k
<Yoari> hm - this time it seems to have worked
<Yoari> All I did was logout then back in
<FulminatingGhost> connected
<Yoari> lets see if it works a second time..
<Yoari> well, ofc it works on your machine :P
<Yoari> flashing a second time...
<Yoari> nope, second flash -> hang
<Yoari> going to try and logout again
<FulminatingGhost> runs smoothly
<FulminatingGhost> then give the 61 toolchain a try
<FulminatingGhost> wait i'll get you a link
<Yoari> relogging didn't do anything, hung on first try this time
<FulminatingGhost> http://esp-idf.readthedocs.io/en/v2.1/get-started/linux-setup.html
<FulminatingGhost> try that toolchain
<Yoari> will do. Ah, you can read other than the latest version....didn't think of that when looking for an older toolchain :)
<FulminatingGhost> it works smoothly for me here
<Yoari> downloading
<Yoari> rub it in, rub it in :P
<FulminatingGhost> another thing you might ant to check is power
<FulminatingGhost> cable length
<Yoari> its the same hardware I've been using for months
<FulminatingGhost> it the toolchain does not work
<FulminatingGhost> oh i even have a 73 lying around here
<FulminatingGhost> i'll try that one
<Yoari> flashing...
<Yoari> dammit, same thing.
<FulminatingGhost> odd
<Yoari> oh. might have found something
<Yoari> hold on...
<FulminatingGhost> ooh ?
<Yoari> verifying, wait a bit
<Yoari> oh bow
<Yoari> boy*
<FulminatingGhost> ok... I'm listening
<FulminatingGhost> :P
<Yoari> I think I need to get my scope to verify this 100%, but it may actually be a power-issue. If I disconnect the two I2C lines it seems stable.
<Yoari> If I reconnect them, it crashes with a stack overflow.
<FulminatingGhost> ooops
<Yoari> I don't even have I2C running (since it is iperf that is loaded), but I guess the added load on the 3.3V line might be enough...
<Yoari> though I don't understand how it result in a stack overflow...
<Yoari> ...and this has been running fine previously...
<Yoari> oh wait - I might know the cause...
<FulminatingGhost> by I²C lines you mean the data or the power lines of those devices?
<FulminatingGhost> oh
<Yoari> I meant the data lines
<Yoari> but it wasn't them, had to disconnect power too now
<Yoari> probably isn't i2C - I think it is the added load of two potentiometers connected to and ADC that are the cuplrits
<Yoari> an* ADC
<FulminatingGhost> uuh, pots? added load?
<FulminatingGhost> how's that?
<Yoari> on the 3.3 rail
<Yoari> its all fed via USB atm.
<FulminatingGhost> what pots would load the 3v3 line so mucht?
<Yoari> those on my breadboard I'm using to test the ADC readings
<FulminatingGhost> how are they wired that they may put so much load on the rail?
<Yoari> but its not them either - even when disconnecting everything the ESP reboots itself when running the STA command, but only intermitently now
<FulminatingGhost> can you measure the USB voltage?
<FulminatingGhost> at the ESP
<Yoari> yeah, just about to get my USB plug for that
<FulminatingGhost> if possible best with a scope
<Yoari> yup. brb
<FulminatingGhost> then you can see the spikes from the wifi
* tavish has quit (Quit: Leaving)
<FulminatingGhost> looking at the "ESP32 Thing"s schematic, it does not have a very large cap on there to deal with that much wifi bandwidth
<Yoari> give me a few minutes too hooke up the scope
<FulminatingGhost> sure
<FulminatingGhost> there only a 2.2µF cap on the 3V3 rail
<SpeedEvil> 2.2uF is not meant to carry teh load for long
<SpeedEvil> it is made to cope with delta-spikes of ~300mA until your regulator or external cap kicks in
<FulminatingGhost> yeah
<FulminatingGhost> true
<Yoari> hm, I'm not getting any voltage drops, even when it reboots during init of Wifi
<Yoari> doesn't ever seem to get below 3.24V
<Yoari> and even that never last >500us
<FulminatingGhost> hm
<FulminatingGhost> another thing might be the radiation from the antenna itself causing EMV issues
<Yoari> well, yeah. but why would it start now?
<FulminatingGhost> you mean it works fine now?
<Yoari> every time it crashes I get this: ***ERROR*** A stack overflow in task tiT has been detected.
<Yoari> if this really was a power or EMI issue, wouldn't it be logical that some other task would be affected and not only this?
<Yoari> could it be that my flash is bad?
<Yoari> I have written to it quite alot of times
<FulminatingGhost> but the hash is verified every time ...
<Yoari> true
<Yoari> 3.24V, never goes below that, but still ccrashes/restarts every 2nd od third time I issue the STA command
* bvernoux has quit (Quit: Leaving)
<Yoari> 3.24V, never goes below that, but still ccrashes/restarts every 2nd od third time I issue the STA command
<Yoari> ops
<Yoari> goign to try with an external cap, just for the hell of it
<FulminatingGhost> okay
<Yoari> nope, same thing
<FulminatingGhost> so your esp32 thing is now just floating in the air
<FulminatingGhost> and the only thing connected to it is the usb cord?
<Yoari> It's mounted in an empty breadboard, with scope connected, but yeah...
<FulminatingGhost> do you have a different PC that you can plug it in to?
<Yoari> in the lab, yes. what are you thinking?
<FulminatingGhost> can you unmount the board to change the environment around the wifi antenna
<Yoari> certainly
<FulminatingGhost> I won't be able to sleep until this is fixed :P
<Yoari> hehe. Well, it's 23:04 here so I'll have to get some sleep soon
<Yoari> it's now unmounted and only connected to the USB cable, ~1m from its original location. Same thing happens
* Nedlinpopo (~james@12.246.170.150) has joined
<Yoari> disabling Wifi NVS...
<TheSeven> which peripheral would I use to measure the frequency of a square wave (fed into a GPIO) on esp32?
<Yoari> TheSeven, RMT
<TheSeven> I was wondering if there is something else, such as a simple input capture timer
<TheSeven> RMT seems kinda overpowered for just measuring a frequency ;)
<Yoari> well, you could use an ISR I guess
<Yoari> FulminatingGhost, disabling Wifi NVS didn't help.
* Nedlinpopo has quit (Quit: Leaving.)
* miller7 has quit (Ping timeout: 248 seconds)
<TheSeven> <Yoari> hm, this is interesting: W (54) rtc_clk: Possibly invalid CONFIG_ESP32_XTAL_FREQ setting (26MHz). Detected 26 MHz.
<TheSeven> ^ that's just what the code does if you set a fixed clock freq
<TheSeven> it always outputs that line at a baud rate that's based on an autodetected frequency, then switches to the configured one
<TheSeven> nothing specific to your setup
<Yoari> oh, that's an odd behaviour
<Yoari> considering that it detected the same as the configured one
<crashovrd> it does that on my boards too except its 40Mhz
<TheSeven> yeah, confused me at first as well, but that's just what it's programmed to do ;)
<crashovrd> so i guess its "normal" behavior in the current esp-idf
<TheSeven> I assume you have the brownout detector enabled? if that doesn't trip, it's quite unlikely to be a power issue
<TheSeven> I've had a ton of issues with unstable power here, but BOR tripping was always the first thing happening (if it was enabled)
<crashovrd> i would suggest using esptool.py to read back the flash and compare it with what was programmed
<crashovrd> best choice is to flash a diff esp32 though
<crashovrd> so you can eliminate hardware
<TheSeven> I haven't quite followed that long discussion - what kind of issues are you facing?
<TheSeven> mysterious stack overflow issues when enabling STA for the second time during one boot if I got that right?
<FulminatingGhost> its the iperf example
<Yoari> it looked that way, also happens on the first one too, just more seldom
<Yoari> yeah, iperf example
<Yoari> here's the forum thread: https://www.esp32.com/viewtopic.php?f=2&t=3250
* [Butch] has quit (Quit: I'm out . . .)
<FulminatingGhost> when it does work, it will run the iperf test without errors?
<Yoari> I haven't actually tried
<FulminatingGhost> would be interesting
<Yoari> trying
<Yoari> duh. from iperf on my Linux: terminate called after throwing an instance of 'std::bad_alloc'
<Yoari> it's running now, but the output is unexpected: https://pastebin.com/Sv3Bx4KG
<FulminatingGhost> that's not much :P
<FulminatingGhost> if it crashes, will it do so the instance you hit enter on that STA command?
<FulminatingGhost> or is there a delay?
<Yoari> it crashes when the connection to the AP succeeds, at leat that is my understanding
<Yoari> this is the other side: https://pastebin.com/RqXMxPr3
<Yoari> still really low thoughput
<FulminatingGhost> yes, very
<FulminatingGhost> i was getting ~ 20mbit/s
<Yoari> wow. I wonder if it is related? 
<Yoari> this is what I see when it crashes https://pastebin.com/rEWA41SJ
<Yoari> its always the same
<FulminatingGhost> and you have core dump to uart enabled and never get an actual dump?
<Yoari> I disabled stack checks, just to see the behaviour. it now crashes with this message instead: D (21706) event: SYSTEM_EVENT_STA_GOT_IP, ip:19/home/2permal/esp/esp-idf/components/freertos/./queue.c:1439 (xQueueGenericReceive)- assert failed
<FulminatingGhost> !!
<FulminatingGhost> i got the same error now
<TheSeven> is tiT a valid task name or some junk?
<Yoari> ^^
<Yoari> it is valid
<FulminatingGhost> when i run as AP
<Yoari> AP or STA doesn't matter for me
<Yoari> happens with both
<Yoari> Its a little comfort to know that I'm no longer alone with this problem!
<TheSeven> maybe try to enable heap poisoning? it might be something else that's overflowing into the stack from below
<Yoari> would be great if you could post a "me too" in the forum thread to let Espressif know that its not only me
<TheSeven> or, if you haven't already done that, enable the watchpoint-based stack overflow detection, which should pinpoint where it actually overflows
<TheSeven> (CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK)
<Yoari> well, I've not managed to get the debugger to work yet (not for lack of trying) so watch points are of no use to me..
<FulminatingGhost> got one in sta mode now
<Yoari> FulminatingGhost, you have no idea how happy that makes me :)
<TheSeven> Yoari: they should at least just fault with the backtrace pinpointing where the overflow happened
<TheSeven> (if no debugger is attached)
<Yoari> TheSeven, well, I'll enable it
<crashovrd> oh man, the bug is contagious! i hope i dont get it too
<crashovrd> :P
<Yoari> hehe
* crashovrd uses extra hand sanitizer
<FulminatingGhost> https://pastebin.com/z5ayqLfc
<FulminatingGhost> haha @ crashocrd
<Yoari> with what TheSeven suggested the behaviour is quite different
<Yoari> FulminatingGhost, Oh, look the xQueueGenericReceive  again
<TheSeven> xQueueGenericReceive is basically some kind of blocking task operation, i.e. waiting on just about anything
<TheSeven> quite likely just a symptom of something in memory being trashed
<Yoari> yeah, but its the same one my trace pointed to
<Yoari> https://pastebin.com/9ukSf5Mj
* jstein_ (~quassel@gentoo/developer/jstein) has joined
* jstein_ is now known as jstein
<Yoari> that's with the breakpoint enabled
<Yoari> still points to task tiT though
<TheSeven> well that's the one that seems to overflow
<Yoari> yeah
<Yoari> at leat it is consistent P
<TheSeven> the backtrace looks trashed though... 0xa5a5a5a5 frame pointer, hmmm
<TheSeven> too bad that we don't know who called that printf thing
<Yoari>  0xa5a5a5a5 lloks like canary bytes...
<TheSeven> yeah but how does the unwinder run into that
<Yoari> could it be the stack trace checker that actually trashes the stack?!
<TheSeven> anyway this is in newlib and __sbprintf will likely allocate quite a bit of stack
<TheSeven> so it might indeed just be an undersized stack perhaps?
<TheSeven> what is this tiT task and how much stack does it have?
<Yoari> tiT is the lwip task, it has some 2k by default
<Yoari> I've increaded it to 10k, but that doesn't help
<Yoari> increased*
<TheSeven> hm, but perhaps it might change the backtrace of that stack overflow dump?
<Yoari> perhaps, but not that I noticed
<TheSeven> just something that might be worth a try... check with a bigger stack and overflow watchpoints
<TheSeven> if something changes, it might give a clue, if not, we either have some kind of endless recursion or it's in fact the allocation below the stack that's overflowing into it
<Yoari> sure, doing 15 k now
<TheSeven> that vfprintf stacktrace looked like it would be the stack itself overflowing though
<Yoari> hm, with 15k stack for the LWIP task it no longer ccrashes
<Yoari> I've run the STA command ~20 times now and still no crash.
<Yoari> never took more than 4 with the smaller stack, even at 10k
* tgaz has quit (Ping timeout: 276 seconds)
<TheSeven> so I guess that some iperfs stuff is running in some callback, called from the lwip thread, and is wasting a ton of stack by doing some weird kind of I/O
<TheSeven> if yo go for something like 10k stack size you'll probably get a backtrace of something that's closer to the point of peak stack usage
<Yoari> well, this isn't just iperf that has problems. I get the same issues in my code when Wifi is enabled
<Yoari> but yes, something breaks the callback from lwip
* tgaz (~tgaz@h83-209-159-183.cust.se.alltele.net) has joined
<TheSeven> hm I haven't had this so far, but I'm not using wifi extensively, and when I do, I'm using it through the socket API which probably moves a lot of application-level processing out of the lwip thread
<Yoari> I'll do one last run with 10k, then I need to get some sleep.
<Yoari> TheSeven, As am I. I don't think this has anything to do with data being sent/received - this happens shortly after the STA_CONNECTED (or whatever) event happens
<Yoari> pah, now it won't crash with 10k stack :P
<Yoari> goign down to 5
* jstein has quit (Remote host closed the connection)
<FulminatingGhost> I'll go and get some sleep now
<Yoari> FulminatingGhost, please make a post on the forums about you seeing this too <3
<Yoari> FulminatingGhost, and thank you so much for the help
<Yoari> actually, I can just post the entire chat log...
<FulminatingGhost> you're welcome, I'll do that tomorrow
<Yoari> sleep well
<FulminatingGhost> you too
* FulminatingGhost has quit (Quit: Leaving)

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby ESP_Angus » Wed Oct 11, 2017 5:16 am

Hi permal,

I tried your sdkconfig with commit de750e9921 and I'm unable to reproduce the problem you're seeing with the iperf example.

The only difference is I had to set the crystal back to 40MHz, as I don't have a board with a 26MHz crystal.

Do you have any other ESP32 hardware at hand that you can test this same code with?
permal wrote:Ok, so after some hours of chatting and troubleshooting on IRC it turns out that FulminatingGhost also sees these problems .
Can you find out what hardware they are using? It didn't seem to be mentioned in the IRC log.
permal wrote: I'm attaching the entire nights chat log here for reference. The current conclusion is that this really is a stack overflow in the lwip task "tiT", or that some other task overwrites the stack of "tiT".
..snip...

Code: Select all

<Yoari> every time it crashes I get this: ***ERROR*** A stack overflow in task tiT has been detected.
Where are you seeing that log line? I haven't seen this in any of the logs posted yet.

If changing the stack size fixes the problem, can you try with the default stack size and the following modifications: I saw the second option mentioned in the IRC log, but I didn't see if it made any difference.
<Yoari> oh, got a dump I think https://pastebin.com/NGtX6Efv Tells me nothing though...
<Yoari> I've barely touched the config, set 26Mhz for XTAL and that's it
This thread looks like one core is dumping a debug trace while another core is still running and writing conflicting output to the UART. This shouldn't generally happen, so I'm at a loss to explain it.

If this additional log output only appeared after "Set a Debug Watchpoint at End of Stack" was set, it would appear that there is a stack overflow and this is the crash handler for it.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Wed Oct 11, 2017 6:49 am

ESP_Angus wrote: I tried your sdkconfig with commit de750e9921 and I'm unable to reproduce the problem you're seeing with the iperf example.
I'm not surprised. FulminatingGhost didn't succeed at first either. What did you attempt? Running "sta ssid password" once or sometimes multiple times seems to be a sure way to trigger it.
ESP_Angus wrote: Do you have any other ESP32 hardware at hand that you can test this same code with?
I managed to find another ESP32 thing from Sparkfun, but it is the same hardware as I'm currently running. Do you think it is a h/w problem?
ESP_Angus wrote: Can you find out what hardware they are using? It didn't seem to be mentioned in the IRC log.
I hope s/he will make a post here, otherwise I'll ask on IRC.
ESP_Angus wrote: Where are you seeing that log line? I haven't seen this in any of the logs posted yet.
Might not have been in any of the posted logs last night, but it is in the log in an earlier post. Line 165: https://pastebin.com/BRNLPgiV
ESP_Angus wrote: If changing the stack size fixes the problem, can you try with the default stack size and the following modifications: I saw the second option mentioned in the IRC log, but I didn't see if it made any difference.
No, I don't think it made any difference. I will try with Heap Corruption Detection set as suggested when I get home tonight.

For the record: It doesn't appear to be related to the xtensa-gcc version since this happens also after downgrading to version -61.

Who is online

Users browsing this forum: Bing [Bot] and 118 guests