Advice on application architecture - Sensors/WiFi/MQTT

meowsqueak · Postby **meowsqueak** » Mon Sep 04, 2017 12:28 am

I'd like to ask if anyone can help me figure out the "best practice" architecture for my relatively simple ESP32 application please. I'm using the ESP-IDF with C, on a "DOIT" ESP32 board.

I have a OneWire bus with five DS18B20 temperature sensors attached, and my testing with these shows the set-up to be robust and reliable. I'm getting error-free readings and can reliably take measurements from all five sensors every second, sustained, for many hours. I detect CRC errors so I typically know when communication with a sensor has failed.

What I'd like to do is integrate this simple read/print loop with an MQTT client so that it can do two new things:
[*] subscribe to a topic on an external MQTT broker and use values published on this topic to affect the operation, for example adjust the sampling period, or change the sampling resolution.
[*] publish the temperature readings to topics on the MQTT broker that higher-level applications can subscribe to and see.

I've also managed to get the MQTT side of things working over WiFi, using this component:

https://github.com/tuanpmt/espmqtt

Where I'm now running into problems is integrating these two sides of the application. My initial thought was to run the temperature sensing routine as a task, and the MQTT side as a separate task (or at least a task to read from a queue written to by the temp sense task and call `mqtt_publish`). However my attempts to do this have failed because I've run into several problems:

[*] The MQTT task blocks on a socket read and does not yield the CPU, which seems to block all other tasks on that CPU.
[*] The one-wire protocol is timing sensitive so microsecond delays must be carefully honoured - but interruptions from the WiFi/MQTT side cause bad readings and therefore CRC errors (which I can see).

I've tried using vTaskSuspendAll()/xTaskResumeAll() around my temperature sensing code to avoid interruptions during time-sensitive GPIO, however this causes the MQTT side to assert (and reset the micro) because it's not expecting the scheduler to be disabled (I assume therefore that it is running on the other CPU, so is not suspended, but doesn't know that so the check for a non-suspended task scheduler fails and it asserts).

I also tried setting the priority of the temperature sensing task to be higher than the MQTT task, but this seems to cause random problems - sometimes the MQTT task cannot connect to the remote server, and sometimes the temperature sensing task doesn't even run! This seems very strange to me.

In some cases when there's clearly a clash between tasks, I've seen a simple "printf" loop from 0 to 4 simply not print anything for, say, values 3 and 4. But the application doesn't crash - so I have no idea where the output from that loop has gone - it vanished.

I suspect I'm going to need to add a timeout to the blocked MQTT read() so that it can check a queue for any outgoing MQTT publications, send them, then go back to reading the socket in case of incoming values.

What I'd like to know is what would be the best approach for this kind of application. Is it wise to split the workload into separate tasks? How can I ensure a task isn't interrupted for ~1 millisecond? Should I explicitly bind each task to a CPU to avoid random issues with the scheduler? I'd prefer to find a solution that would work regardless of which task runs on which CPU though, as I'd also like to eventually run this on a single CPU.

If anyone can offer me some advice on how to go about structuring this kind of application I'd really appreciate it.

Also, if anyone knows of a good FreeRTOS book that covers such topics, with good advice on how to organise such applications, especially with regards to sockets/LWIP and multi-CPU systems, please let me know.

Postby **ESP_Sprite** » Mon Sep 04, 2017 2:57 am

If any, for the precision timing of the one-wire interface: maybe you could use the RMT peripheral? That is designed to generate/read signals with a precise timing without relying on the CPU.

meowsqueak · Postby **meowsqueak** » Mon Sep 04, 2017 4:36 am

That's an interesting idea. I looked at the TRM (chapter 12) to understand how it works.

Unfortunately, I'm not sure it's usable in this scenario, because the 1-Wire protocol requires the CPU to drive the wire for a period of time, and then immediately put the pin into input mode in anticipation of the device pulling it low or keeping it high (there's a pull-up resistor on the bus). Then it switches back to driving the wire for the next bit, etc. It does this very frequently and I don't see a way to interleave these directions in the RMT. It looks capable of generating sequences or receiving sequences but not interleaving them on a single pin.

Perhaps it would be possible to use three channels - one to transmit control levels, a second to provide a "gate" for some external electronics to drive or release the bus, and a third to simply monitor the bus and return the sequence to the CPU. I'll think about this some more - maybe it could work. This does feel a bit complicated though - is there really no way for a task to disable the scheduler on both CPUs for a millisecond or so?

If the problem is that tasks on the second CPU aren't always suspended when the scheduler is suspended, then maybe I can look at just using a single CPU. However to do that I need to solve that problem I mentioned in another thread where it seems that the LWIP read() call blocks the entire CPU. I'm still working to determine whether that's a real problem or something I've inadvertently created myself.

WiFive · Postby **WiFive** » Mon Sep 04, 2017 5:06 am

https://github.com/nodemcu/nodemcu-firm ... /onewire.c

meowsqueak · Postby **meowsqueak** » Mon Sep 04, 2017 5:13 am

WiFive wrote:https://github.com/nodemcu/nodemcu-firm ... /onewire.c

Thanks for that - looking at the code I can see that it uses the RMT to handle each "bit slot" - that makes a lot more sense as I was thinking that I needed to protect the entire byte as "time critical" - but in fact only each "bit" (or reset sequence) is. I'll see what I can do with this.

Incidentially, I tried wrapping my one-wire transactions with taskENTER_CRITICAL/taskEXIT_CRITICAL and that seems to prevent task interruption (as expected) - and since it's < 1ms it doesn't seem to break the WiFi/LWIP - but I do need to test that more to be sure.

meowsqueak · Postby **meowsqueak** » Mon Sep 04, 2017 5:34 am

WiFive wrote:https://github.com/nodemcu/nodemcu-firm ... /onewire.c

There's one thing I don't understand about that code - when reading bits around line 378 the pin is put into INPUT mode (via onewire_rmt_attach_pin, line 200), then at 381 the wire is put into open-drain mode, and then a TX channel is used to generate the read-bit timeslots. However at line 392 the RMT is instructed to write these out to the wire, just after starting the RX "record".

How is it possible to drive the timeslots in the TX direction when the pin is set to the input/RX direction? The port is in open-drain mode so I suppose as long as the driver is not pulling it high then the wire is floating during the read timeslot - is that how it works? But why does the TX direction even drive the pin if it's not in output mode to start with?

meowsqueak · Postby **meowsqueak** » Mon Sep 04, 2017 6:08 am

After realising above that only the read/write bit transactions are timing sensitive and need protection, I moved my top-level taskENTER_CRITICAL calls to wrap just the bit read/write - this means that interrupts get disabled/enabled much more often, but for a significantly shorter period of time overall.

Unfortunately this causes the ESP32 to crash repeatedly and fairly randomly, or for the sensor task to simply stop running. I've triple-checked my enter/exit pairs and there's also no recursive use, so I'm not sure what's going on here. Maybe toggling the interrupt enable control too much/fast is fatal?

WiFive · Postby **WiFive** » Mon Sep 04, 2017 10:21 pm

Input and output can both be enabled on the same pin so when the driver is open drain an external device can pull the line low and generate useful input.

meowsqueak · Postby **meowsqueak** » Wed Sep 06, 2017 2:40 am

I'm aware of the Input/Output mode that can be used for open-drain operation, but I can't see where the code sets that mode. It sets the pin to input mode as per line 200:

https://github.com/nodemcu/nodemcu-firm ... ire.c#L200

I note that the RMT is set for both TX and RX mode on the same pin - perhaps that implies that it's also in output mode.

Just trying to understand how the RMT interacts with the normal GPIO drive mode.

WiFive · Postby **WiFive** » Wed Sep 06, 2017 3:06 am

https://github.com/espressif/esp-idf/issues/809

Advice on application architecture - Sensors/WiFi/MQTT

Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Re: Advice on application architecture - Sensors/WiFi/MQTT

Who is online

About Us

Extra

Information