WDT 0 timeouts caused by critical sections deleting high resolution timers? Disabling critical/going unicore fixes.

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

WDT 0 timeouts caused by critical sections deleting high resolution timers? Disabling critical/going unicore fixes.

Postby jcsbanks » Wed Mar 06, 2019 11:50 pm

Code: Select all

	//portENTER_CRITICAL(&mux_timer);
	if (handle->timer_handle)
	{	
		esp_timer_stop(handle->timer_handle);
		esp_timer_delete(handle->timer_handle);
		handle->timer_handle = NULL;
	}
	//portEXIT_CRITICAL(&mux_timer);
I was using these commented out macros around high resolution timer delete code because I wanted to avoid race conditions where the timer could expire whilst being deleted. However, they were leading to occasional WDT 0 timeouts when this code is running on core 1 I think from initial testing of commenting them out. What I do know is that going to unicore mode fixed them.

I was using this in the timer callbacks to delete themselves:

Code: Select all

	//portENTER_CRITICAL(&mux_timer);
	if (handle->timer_handle)
	{
		esp_timer_delete(handle->timer_handle);
		handle->timer_handle = NULL;
	}
	//portEXIT_CRITICAL(&mux_timer);
I need to narrow this down more, which is difficult in a large project with sporadic errors, but in core dumps I often see are related to getting mutex/critical, or mention timers, but on a variety of tasks on core 0, which gave me the hint that this might be the problem.

I suppose I could take take the actual esp_timer_ stuff out of the critical sections and just make sure that access to the timer_handle is safe from race conditions, but if there are any suggestions for a good way to deal with deleting a not yet expired timer safely when its callback might race to delete, please advise. Will report back with progress anyway.

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

Re: WDT 0 timeouts caused by critical sections deleting high resolution timers? Disabling critical/going unicore fixes.

Postby jcsbanks » Thu Mar 07, 2019 9:57 pm

Code: Select all

			esp_timer_handle_t copyHandle = NULL;
			portENTER_CRITICAL(&mux_timer);
			if (handle->timer_handle) {	
				copyHandle = handle->timer_handle;
				handle->timer_handle = NULL;
			}
			portEXIT_CRITICAL(&mux_timer);
			if (copyHandle) {
				esp_timer_stop(copyHandle);
				esp_timer_delete(copyHandle);
			}
This is working (as did having no critical section at all but that scares me). The only thing done in the critical section is reading and resetting handle->timer_handle so that other tasks and callbacks doing the same do not race. Outwith the critical section, the timer stop and delete occur. If the timer were to expire elsewhere between the end of the critical section and it being stopped and deleted, then they know that the timer_handle is invalid because it is NULL so they can act appropriately on the expired timer.

Any suggestions or comments welcomed.

Who is online

Users browsing this forum: No registered users and 136 guests