Page 1 of 1

Solution needed for a custom bit banging protocol.

Posted: Thu Nov 13, 2025 9:17 am
by G2AURAV
Ok we are working on a device that had custom protocol from that had life span of 1995 to 2015 (This device is discontinued long time ago), this was a custom bit banging protocol, Similar SPI, but slave can also low the clock for ACK/BUSY similar to I2C clock stretching. What we are making is slave device of custom SPI.
• DCL/MOSI (C. device → L. device), input-only
• DLC/MISO (L. device → C. device), open-drain output
• LCLK/CLK (shared), input +open-drain
And many more behaviour, that cannot be filled.
The problem is that we are using ESP32 S3, no matter what changes we make, we are not able to control MISO, means we are not able to send desired byte required for communication. E.g. if master is sending 0x0A we want to reply 0xAA.

Short technical code summary
This project implements a small, timing-sensitive bit-engine that emulates a simple three-wire synchronous device. It receives bits on one wire, drives reply bits on another wire (open-drain), and observes a shared clock line. Key behaviour and components:

Overall architecture
o Edge ISR on the shared clock line — rising edges sample the inbound data line; falling edges prepare the outbound data bit.
o A tiny state machine tracks byte assembly (MSB-first), per-byte ACK timing, and reply shifting.

GPIO behaviour
o Inbound line configured as input with pull-up.
o Outbound data line configured open-drain (drive LOW for 0, release for 1).
o Clock line configured input + open-drain so the engine can also pull it LOW for ACKs.

Reply path (robust, low-latency)
o A lock-free single-producer/single-consumer FIFO (SPSC) is provided so the main application can enqueue reply bytes quickly (producer = main task, consumer = ISR).
o FIFO is power-of-two sized (default 8). When full it overwrites the oldest entry to guarantee the ISR always has something to read (configurable behaviour).
o If FIFO is empty, the ISR falls back to a single-staged atomic next_tx_byte or a forced one-shot reply created by a timer.

Timing & handshake
o When a full byte is assembled, the engine schedules an ACK pulse on the clock line after a configurable delay (ACK start / ACK low duration).
o A “first-edge one-shot” timer: on the very first observed edge the engine waits a short delay, pulls clock LOW for a longer hold (3 Ms default), then optionally forces a one-time reply value. This is implemented carefully so it does not stomp user-staged replies (uses a separate force_next_tx_byte flag).

ISR safety & ordering
o ISR latches reply bytes only at a byte boundary (rx_bitcount == 0) to avoid mid-byte misalignment.
o All cross-context shared state uses __atomic memory ops to avoid torn writes and subtle reorder races between ISR/timers and the main task.

Debugging & observability
o Lightweight counters exposed and logged once per second: total bytes driven, FIFO-consumed count, fallback-consumed count, no-reply count, enqueue-overwrite count, and current FIFO depth.
o Logging is optional and should be disabled during timing-sensitive tests.

APIs provided
o ef_be_init(...) — initialize engine with pin assignments and RX callback.
o ef_be_poll() — call from the main task to invoke the higher-level RX callback when a byte completes.
o ef_be_set_reply(byte) — atomic single-byte staging fallback.
o ef_be_enqueue_reply(byte) — fast, non-blocking enqueue into the reply FIFO (preferred).

Usage recommendations
o Keep the RX callback small and call ef_be_enqueue_reply() immediately (defer heavy processing to another task) so replies are staged with minimal latency.
o Run the task that polls ef_be_poll() at a high priority or in a tight loop to reduce callback latency.
o For timing debugging, disable verbose logging and use the built-in counters plus a logic analyser on the three signal lines.

Bit_engine.h

Code: Select all


#pragma once
#include <stdint.h>
#include <stdbool.h>
#include "driver/gpio.h"
#include "esp_err.h"

// RX callback: byte received + timestamp (microseconds)
typedef void (*ef_be_rx_cb_t)(uint8_t byte, uint32_t ts_us);

// Public API
esp_err_t ef_be_init(gpio_num_t dcl_mosi_in,
                     gpio_num_t dlc_miso_od,
                     gpio_num_t lclk_od,
                     ef_be_rx_cb_t on_rx);

void ef_be_enable(bool en);
void ef_be_poll(void);

// Stage a reply atomically (single-byte staging)
void ef_be_set_reply(uint8_t byte);

// Fast non-blocking enqueue for replies (producer: main/task, consumer: ISR)
bool ef_be_enqueue_reply(uint8_t b);

// Debug helper
void ef_be_debug_fifo(void);

Engine.c

Code: Select all

#include "bit_engine.h"
#include "esp_timer.h"
#include "esp_attr.h"
#include "esp_log.h"
#include <string.h>
#include "esp_rom_sys.h" // for esp_rom_delay_us()
#include "driver/gpio.h"
#include "esp_err.h"
#include <stdatomic.h>

// ===== Config =====
#define TAG "EF_BE"

// Logging toggle — set to 0 for timing runs
#ifndef EF_BE_LOGGING
#define EF_BE_LOGGING 1
#endif

#if EF_BE_LOGGING
  #define BE_LOGI(...) ESP_LOGI(TAG, __VA_ARGS__)
  #define BE_LOGD(...) ESP_LOGD(TAG, __VA_ARGS__)
#else
  #define BE_LOGI(...) do{}while(0)
  #define BE_LOGD(...) do{}while(0)
#endif

// Timing (microseconds) — tune to your hardware
static const uint32_t T_ACK_LOW_US        = 25;
static const uint32_t T_ACK_START_US      = 14;
static const uint32_t T_RELEASE_DLC_US    = 5;

static const uint32_t T_FIRST_LOW_DELAY_US = 250;
static const uint32_t T_FIRST_LOW_HOLD_US  = 3000;

typedef enum { BE_IDLE=0, BE_RECV, BE_ACKING } be_state_t;

static struct {
    gpio_num_t pin_dcl_in;   // DCL/MOSI (C.device→L.device), input-only
    gpio_num_t pin_dlc_od;   // DLC/MISO (L.device→C.device), open-drain output
    gpio_num_t pin_lclk_io;  // LCLK/CLK  (shared),    input+open-drain

    volatile be_state_t state;
    volatile bool enabled;

    volatile uint8_t  rx_shift;
    volatile uint8_t  rx_bitcount;
    volatile uint8_t  last_rx_byte;
    volatile uint32_t last_rise_ts_us;
    volatile bool     cb_pending;

    // single staged reply (legacy fallback)
    volatile uint8_t  next_tx_byte;
    volatile uint8_t  cur_tx_byte;
    volatile int8_t   tx_bit_idx;
    volatile bool     tx_armed;

    // forced-one-shot from timer (separate from next_tx_byte)
    volatile bool     force_next_tx_once;
    volatile uint8_t  force_next_tx_byte;

    ef_be_rx_cb_t rx_cb;

    // timers
    esp_timer_handle_t t_release_dlc;
    esp_timer_handle_t t_ack_start;
    esp_timer_handle_t t_ack_end;
    esp_timer_handle_t t_first_low;

    // first-edge one-shot state
    volatile bool first_edge_seen;
    volatile bool in_first_low;
} be;

// ---------- reply FIFO (SPSC) ----------
#define REPLY_FIFO_SZ 4
_Static_assert((REPLY_FIFO_SZ & (REPLY_FIFO_SZ - 1)) == 0, "REPLY_FIFO_SZ must be power of two");

static volatile uint8_t reply_fifo[REPLY_FIFO_SZ];
static volatile uint8_t reply_head; // producer index (main)
static volatile uint8_t reply_tail; // consumer index (ISR)

// ---------- helpers ----------
static inline uint32_t now_us(void) { return (uint32_t)esp_timer_get_time(); }
static inline void od_pull_low(gpio_num_t pin) { gpio_set_level(pin, 0); }
static inline void od_release (gpio_num_t pin) { gpio_set_level(pin, 1); }

// FIFO helpers
static inline uint8_t reply_count(void) {
    uint8_t h = __atomic_load_n(&reply_head, __ATOMIC_ACQUIRE);
    uint8_t t = __atomic_load_n(&reply_tail, __ATOMIC_ACQUIRE);
    return (uint8_t)(h - t) & (REPLY_FIFO_SZ - 1);
}

bool ef_be_enqueue_reply(uint8_t b) {
    uint8_t h = __atomic_load_n(&reply_head, __ATOMIC_RELAXED);
    uint8_t next = (uint8_t)((h + 1) & (REPLY_FIFO_SZ - 1));
    uint8_t t = __atomic_load_n(&reply_tail, __ATOMIC_ACQUIRE);
    if (next == t) {
        // FIFO full
        return false;
    }
    reply_fifo[h] = b;
    __atomic_store_n(&reply_head, next, __ATOMIC_RELEASE);
    return true;
}

// ISR-side dequeue (inline)
static inline bool ef_be_dequeue_reply_from_isr(uint8_t *out) {
    uint8_t t = __atomic_load_n(&reply_tail, __ATOMIC_RELAXED);
    uint8_t h = __atomic_load_n(&reply_head, __ATOMIC_ACQUIRE);
    if (t == h) return false; // empty
    *out = reply_fifo[t];
    __atomic_store_n(&reply_tail, (uint8_t)((t + 1) & (REPLY_FIFO_SZ - 1)), __ATOMIC_RELEASE);
    return true;
}

// atomic staged setter (task context)
void ef_be_set_reply(uint8_t byte) {
    // store value then arm; use atomic ops to avoid torn writes / reordering
    __atomic_store_n(&be.next_tx_byte, byte, __ATOMIC_RELAXED);
    __atomic_store_n(&be.tx_armed, true, __ATOMIC_RELEASE);
    BE_LOGD("ef_be_set_reply: staged 0x%02X", byte);
}

void ef_be_debug_fifo(void) {
    BE_LOGI("fifo head=%u tail=%u cnt=%u", reply_head, reply_tail, reply_count());
}

// ===== Timer callbacks =====
static void t_release_dlc_cb(void* arg) {
    od_release(be.pin_dlc_od);
    BE_LOGD("t_release_dlc_cb: released DLC");
}

static void t_ack_end_cb(void* arg) {
    od_release(be.pin_lclk_io);
    be.state = BE_IDLE;
    BE_LOGD("t_ack_end_cb: released LCLK");
}

static void t_ack_start_cb(void* arg) {
    if (!be.enabled) return;
    be.state = BE_ACKING;
    BE_LOGD("t_ack_start_cb: pulling LCLK low for ACK");
    od_pull_low(be.pin_lclk_io);
    esp_timer_start_once(be.t_ack_end, T_ACK_LOW_US);
}

// t_first_low_cb: perform first-edge one-shot, but do NOT overwrite user's next_tx_byte.
// Instead set force_next_tx_byte/flag so ISR consumes it only if FIFO empty and at byte boundary.
static void t_first_low_cb(void* arg) {
    if (!be.enabled) return;

    BE_LOGI("t_first_low_cb: executing first-edge one-shot");
    be.in_first_low = true;

    // ensure DLC released while we hold LCLK low
    od_release(be.pin_dlc_od);
    esp_rom_delay_us(10);

    od_pull_low(be.pin_lclk_io);
    esp_rom_delay_us(T_FIRST_LOW_HOLD_US);
    od_release(be.pin_lclk_io);

    be.in_first_low = false;

    // Force next TX to 0xAA once (write to force flag only)
    be.force_next_tx_byte = 0xAA;
    be.force_next_tx_once = true;
    BE_LOGI("t_first_low_cb: scheduled force_next_tx_byte=0xAA");
}

// ===== LCLK edge ISR (IRAM) =====
static void IRAM_ATTR isr_lclk_edge(void* arg) {
    if (!be.enabled) return;

    int level = gpio_get_level(be.pin_lclk_io);
    uint32_t ts = now_us();

    // schedule the one-shot first-edge action the very first time we see a (C.device) edge.
    if (!be.first_edge_seen && be.state != BE_ACKING) {
        be.first_edge_seen = true;
        esp_timer_start_once(be.t_first_low, T_FIRST_LOW_DELAY_US);
    }

    // ignore edges caused by our own ACK pulse
    if (be.state == BE_ACKING) return;

    if (level == 1) {
        // Rising edge (sample)
        uint32_t bit = (uint32_t)gpio_get_level(be.pin_dcl_in) & 0x1;
        be.rx_shift = (be.rx_shift << 1) | (uint8_t)bit;
        be.rx_bitcount++;
        be.last_rise_ts_us = ts;

        // schedule DLC release after the last rising edge
        esp_timer_stop(be.t_release_dlc);
        esp_timer_start_once(be.t_release_dlc, T_RELEASE_DLC_US);

        if (be.rx_bitcount >= 8) {
            be.last_rx_byte = be.rx_shift;
            be.rx_bitcount = 0;
            be.rx_shift = 0;
            be.cb_pending = true;
            be.tx_bit_idx = -1; // byte finished, mark tx idle

            // schedule ACK
            esp_timer_stop(be.t_ack_start);
            esp_timer_start_once(be.t_ack_start, T_ACK_START_US);
        }
    } else {
        // Falling edge - prepare DLC for the next rising sample
        if (be.in_first_low) {
            // keep DLC released during one-shot
            return;
        }

        // Latch reply at beginning of a byte only if rx_bitcount == 0
        if (be.tx_bit_idx == -1 && be.rx_bitcount == 0) {
            uint8_t from_fifo;
            if (ef_be_dequeue_reply_from_isr(&from_fifo)) {
                // use FIFO-provided reply
                be.cur_tx_byte = from_fifo;
                be.tx_bit_idx  = 7;
            } else {
                // No FIFO entry — prefer forced-one-shot if available, otherwise staged next_tx_byte
                if (be.force_next_tx_once) {
                    be.cur_tx_byte = be.force_next_tx_byte;
                    be.tx_bit_idx = 7;
                    be.force_next_tx_once = false;
                } else if (be.tx_armed) {
                    be.cur_tx_byte = be.next_tx_byte;
                    be.tx_bit_idx  = 7;
                    be.tx_armed = false;
                } else {
                    be.tx_bit_idx = -1;
                }
            }
        }

        if (be.tx_bit_idx >= 0) {
            uint8_t outbit = (be.cur_tx_byte >> be.tx_bit_idx) & 0x1;
            if (outbit == 0) od_pull_low(be.pin_dlc_od);
            else             od_release(be.pin_dlc_od);
            be.tx_bit_idx--;
        } else {
            od_release(be.pin_dlc_od);
        }
    }
}

// ===== GPIO config =====
static void gpio_conf_input(gpio_num_t pin) {
    gpio_config_t io = {
        .pin_bit_mask = 1ULL << pin,
        .mode = GPIO_MODE_INPUT,
        .pull_up_en = GPIO_PULLUP_ENABLE,
        .pull_down_en = GPIO_PULLDOWN_DISABLE,
        .intr_type = GPIO_INTR_ANYEDGE
    };
    gpio_config(&io);
}

static void gpio_conf_od(gpio_num_t pin) {
    gpio_config_t io = {
        .pin_bit_mask = 1ULL << pin,
        .mode = GPIO_MODE_OUTPUT_OD,
        .pull_up_en = GPIO_PULLUP_DISABLE,
        .pull_down_en = GPIO_PULLDOWN_DISABLE,
        .intr_type = GPIO_INTR_DISABLE
    };
    gpio_config(&io);
    gpio_set_level(pin, 1);
}

static void gpio_conf_od_input_output(gpio_num_t pin) {
    gpio_config_t io = {
        .pin_bit_mask = 1ULL << pin,
        .mode = GPIO_MODE_INPUT_OUTPUT_OD,
        .pull_up_en = GPIO_PULLUP_ENABLE,   // enable internal pull-up for LCLK to ensure edges
        .pull_down_en = GPIO_PULLDOWN_DISABLE,
        .intr_type = GPIO_INTR_ANYEDGE
    };
    gpio_config(&io);
    gpio_set_level(pin, 1);
}

static void make_timers(void) {
    const esp_timer_create_args_t t1 = { .callback = &t_release_dlc_cb, .arg = NULL, .name = "be_relDLC" };
    const esp_timer_create_args_t t2 = { .callback = &t_ack_start_cb,   .arg = NULL, .name = "be_ackStart" };
    const esp_timer_create_args_t t3 = { .callback = &t_ack_end_cb,     .arg = NULL, .name = "be_ackEnd" };
    const esp_timer_create_args_t t4 = { .callback = &t_first_low_cb,   .arg = NULL, .name = "be_firstLow" };
    esp_timer_create(&t1, &be.t_release_dlc);
    esp_timer_create(&t2, &be.t_ack_start);
    esp_timer_create(&t3, &be.t_ack_end);
    esp_timer_create(&t4, &be.t_first_low);
}

// ===== Public API =====
esp_err_t ef_be_init(gpio_num_t dcl_mosi_in,
                     gpio_num_t dlc_miso_od,
                     gpio_num_t lclk_od,
                     ef_be_rx_cb_t on_rx)
{
    memset(&be, 0, sizeof(be));
    be.pin_dcl_in  = dcl_mosi_in;
    be.pin_dlc_od  = dlc_miso_od;
    be.pin_lclk_io = lclk_od;
    be.rx_cb       = on_rx;
    be.enabled     = false;

    be.next_tx_byte = 0x01;
    be.cur_tx_byte  = be.next_tx_byte;
    be.tx_bit_idx   = -1;
    be.tx_armed     = true;

    be.force_next_tx_once = false;
    be.force_next_tx_byte = 0x00;

    be.first_edge_seen = false;
    be.in_first_low = false;

    // FIFO init
    __atomic_store_n(&reply_head, 0, __ATOMIC_RELAXED);
    __atomic_store_n(&reply_tail, 0, __ATOMIC_RELAXED);

    // Configure pins
    gpio_conf_input(be.pin_dcl_in);               // DCL input (internal pull-up)
    gpio_conf_od(be.pin_dlc_od);                  // DLC open-drain output
    gpio_conf_od_input_output(be.pin_lclk_io);    // LCLK input+OD with internal pull-up

    // Install ISR service (ignore if already installed)
    esp_err_t err = gpio_install_isr_service(0);
    if (err != ESP_OK && err != ESP_ERR_INVALID_STATE) {
        BE_LOGI("gpio_install_isr_service failed: %d", err);
        return err;
    }
    gpio_isr_handler_add(be.pin_lclk_io, isr_lclk_edge, NULL);

    make_timers();

    // Ensure lines are released initially
    od_release(be.pin_dlc_od);
    od_release(be.pin_lclk_io);

    be.state = BE_IDLE;
    be.enabled = true;

    BE_LOGI("ef_be_init done: DCL=%d DLC=%d LCLK=%d", be.pin_dcl_in, be.pin_dlc_od, be.pin_lclk_io);
    return ESP_OK;
}

void ef_be_enable(bool en) {
    be.enabled = en;
    if (!en) {
        od_release(be.pin_dlc_od);
        od_release(be.pin_lclk_io);
        be.state = BE_IDLE;
    }
}

void ef_be_poll(void) {
    if (be.cb_pending) {
        be.cb_pending = false;
        if (be.rx_cb) be.rx_cb(be.last_rx_byte, be.last_rise_ts_us);
        BE_LOGD("ef_be_poll: rx_cb called for 0x%02X", be.last_rx_byte);
    }
}
Main.c

Code: Select all

#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "driver/gpio.h"
#include "bit_engine.h"

#define PIN_DCL GPIO_NUM_18   // C.device-> L.device (MOSI)
#define PIN_DLC GPIO_NUM_19   // L.device-> C.device (MISO, open-drain)
#define PIN_LCLK GPIO_NUM_21  // shared clock (LCLK)

static const char *TAG = "MAIN_EX";

// Simple reply decision: echo 0xAA for pings (0x0A), else reply 0x55
static void on_rx(uint8_t byte, uint32_t ts_us) {
    ESP_LOGI(TAG, "on_rx: got 0x%02X @%u", byte, ts_us);
    uint8_t reply = 0x55;
    if (byte == 0x0A || byte == 0x00) reply = 0xAA;

    // attempt fast enqueue; if FIFO full fallback to set_reply
    if (!ef_be_enqueue_reply(reply)) {
        ef_be_set_reply(reply);
        ESP_LOGW(TAG, "reply fifo full — used ef_be_set_reply()");
    }
}

void be_task(void *arg) {
    // init engine
    if (ef_be_init(PIN_DCL, PIN_DLC, PIN_LCLK, on_rx) != ESP_OK) {
        ESP_LOGE(TAG, "ef_be_init failed");
        vTaskDelete(NULL);
        return;
    }

    while (1) {
        ef_be_poll(); // calls on_rx via cb_pending
        vTaskDelay(pdMS_TO_TICKS(1)); // poll every 1ms (tune as desired)
    }
}

void app_main(void) {
    xTaskCreatePinnedToCore(be_task, "be_task", 4096, NULL, 5, NULL, 0);
}

Re: Solution needed for a custom bit banging protocol.

Posted: Wed Jan 14, 2026 9:42 pm
by real_mon2
TLDR;

Why not use a low density FPGA and marry to the ESP32 or BEKEN (cheaper) with WIFI / BLE interface over I2C / SMBUS / SPI?

FPGAs are very inexpensive and and kick arse on any custom spec like you have raised. Try the Gowin devices.

Try the LILY-GO T-FPGA board based on the ESP32-S3 (can run Arduino) + Gowin FPGA and this toolkit is < $50 USD from Aliexpress. We absolutely love this combination that allows for quick prototyping of our crazy ideas. We are using Verilog for the FPGA coding.

Re: Solution needed for a custom bit banging protocol.

Posted: Thu Jan 15, 2026 12:20 am
by Sprite
Moving this topic to the hardware forum, as it has little to do with the IoT solution libraries we provide.

Can you expand a little bit more on the requirements of the protocol? How fast does it run, what specifically is on the line, what latencies are acceptable etc? I can potentially infer it from your code but I really don't want to.

You say your issue is that you're 'not able to control MISO, means we are not able to send desired byte required for communication'. Can you expand a bit further on this: what do you expect to see, what do you actually see?

Re: Solution needed for a custom bit banging protocol.

Posted: Thu Jan 15, 2026 12:21 am
by Sprite
FPGAs are very inexpensive and and kick arse on any custom spec like you have raised. Try the Gowin devices.
Offtopic, but where do you get the bare Gowin chips for your production runs? I'm not having an easy time tracking down a reliable source.

Re: Solution needed for a custom bit banging protocol.

Posted: Thu Jan 15, 2026 7:40 pm
by real_mon2
Locally - DBlectro (Quebec, Canada) for us.

USA - you have a mix of vendors selling the line like Edge Electronics. If really stuck, there is Mouser but $$ higher than others.

With some effort, Shenzhen, CN. Reach out to those who build kits using the same parts - ride them for silicon that they are already consuming. Regardless, the Gowin devices are easier (for us) to work with, the official toolchain compiles in seconds, IP is mainly stable albeit we have head bashed against Gowin; Efinix and also Lattice while learning about the fpga devices. Each stated it was us and not them. In the end, after too much time consumed by us, each vendor corrected the bugs and agreed that it was them who had the faults. We love the FPGA devices and see a huge potential. Sucks when the documentation has typos and/or broken IP examples and worse yet, encrypted IP that is broken. How can an end-user correct an encrypted IP?? The public Easy CDR public example on Gowin is still broken. We have the working version from the factory which consumed 3 months of complaints before they finally listened to our raised concerns. In summary, the IP works at slow speed and does not at the reported 1 Gbps using a simple wire loop-back test. Too many vendors want you to buy containers of their broken garbage and do so quickly. Nasty.

Re: Solution needed for a custom bit banging protocol.

Posted: Fri Jan 16, 2026 1:11 am
by Sprite
Locally - DBlectro (Quebec, Canada) for us.

USA - you have a mix of vendors selling the line like Edge Electronics. If really stuck, there is Mouser but $$ higher than others.
Thanks - didn't see Mouser stocked them by now. The usual suspects (lcsc, farnell/element14, digikey) didn't have them which made me a bit suspicious about availability.