Page 1 of 1

ESP32-S3 GP-SPI2 (FSPI) DMA Configurable Segmented Transfer

Posted: Tue Sep 23, 2025 1:46 am
by ryancog
Hello,

I've something of an X/Y problem, but I'll explain the Y first (as it's the title of this post and a matter of curiosity at this point, at the very least), and follow up with the X.

The Title (Y) Problem

I'm trying to set up the SPI peripheral with the configurable segmented transfer (in master mode), and am a bit confused. As far as I can tell, the only difference between a normal transfer and the segmented transfer is the expected SPI_BIT_MAP_WORD (and any specified trailing words), the slave.usr_conf value being set, and the slave.dma_seg_magic_value.

I've read through Chapter 30 a few times (and specifically 30.5.8.5) of the TRM, and between that and dissecting the ESP-IDF SPI API, I've come up with the following minimal master example:

Code: Select all

    constexpr auto& SPI_CH{GPSPI2};
    constexpr auto& DMA_CH{GDMA.channel[3]};

    SYSTEM.perip_clk_en0.spi2_clk_en = 1;
    SYSTEM.perip_rst_en0.spi2_rst = 1;
    SYSTEM.perip_rst_en0.spi2_rst = 0;

    DMA_CH.out.conf0.out_rst = 1;
    DMA_CH.out.conf0.out_rst = 0;
    DMA_CH.out.peri_sel.sel = 0;

    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_15 * 4), MCU_SEL, PIN_FUNC_GPIO);
    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_17* 4), MCU_SEL, PIN_FUNC_GPIO);
    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_16 * 4), MCU_SEL, PIN_FUNC_GPIO);

    esp_rom_gpio_connect_out_signal(GPIO_NUM_15, FSPICLK_OUT_IDX, false, false);
    esp_rom_gpio_connect_out_signal(GPIO_NUM_17, FSPID_OUT_IDX, false, false);
    esp_rom_gpio_connect_out_signal(GPIO_NUM_16, FSPIQ_OUT_IDX, false, false);

    SPI_CH.user1.cs_setup_time = 0;
    SPI_CH.user1.cs_hold_time = 0;

    SPI_CH.user.usr_miso_highpart = 0;
    SPI_CH.user.usr_mosi_highpart = 0;

    SPI_CH.slave.val = 0;
    SPI_CH.user.val = 0;

    SPI_CH.clk_gate.mst_clk_active = 1;
    SPI_CH.clk_gate.mst_clk_sel = 1;

    SPI_CH.dma_conf.val = 0;
    SPI_CH.dma_conf.tx_seg_trans_clr_en = 1;
    SPI_CH.dma_conf.rx_seg_trans_clr_en = 1;
    SPI_CH.dma_conf.dma_seg_trans_en = 0;

    SPI_CH.dma_int_ena.val = 0xFFFFFFFF;

    SPI_CH.ctrl.d_pol = false;

    // Update and wait for update...
    SPI_CH.cmd.update = 1;
    while (SPI_CH.cmd.update);

    array<uint8, 1023> dataBuffer;
    dataBuffer.fill(0b10100101);

    while (true) {
        SPI_CH.misc.master_cs_pol = 0;

        SPI_CH.clock.clk_equ_sysclk = 0;
        SPI_CH.clock.clkdiv_pre = 0;
        SPI_CH.clock.clkcnt_n = 3;
        SPI_CH.clock.clkcnt_l = 3;
        SPI_CH.clock.clkcnt_h = 1;

        SPI_CH.ctrl.wr_bit_order = 0;

        // These correspond to SPI "modes"
        SPI_CH.misc.ck_idle_edge = 0;
        SPI_CH.user.ck_out_edge = 0;

        SPI_CH.user.doutdin = 0;

        SPI_CH.user.sio = 0;

        SPI_CH.user.cs_hold = 0;
        SPI_CH.user.cs_setup = 0;

        SPI_CH.misc.cs0_dis = 0;
        SPI_CH.misc.cs1_dis = 1;
        SPI_CH.misc.cs2_dis = 1;
        SPI_CH.misc.cs3_dis = 1;
        SPI_CH.misc.cs4_dis = 1;
        SPI_CH.misc.cs5_dis = 1;

        SPI_CH.clk_gate.mst_clk_sel = 1;

        SPI_CH.dma_int_clr.trans_done = 1;
        assert(SPI_CH.cmd.usr == 0);

        SPI_CH.ctrl.val &= ~SPI_LL_ONE_LINE_CTRL_MASK;
        SPI_CH.user.val &= ~SPI_LL_ONE_LINE_USER_MASK;
        SPI_CH.user.fwrite_dual = 1;

        SPI_CH.ms_dlen.ms_data_bitlen = (1023 * 8) - 1;

        SPI_CH.user.usr_addr = 0;
        SPI_CH.user.usr_command = 0;

        array<Hardware::DMA::Outlink, 1> outlinks;
        outlinks[0].size = dataBuffer.size();
        outlinks[0].length = dataBuffer.size();
        outlinks[0].buffer = dataBuffer.data();
        outlinks[0].suc_eof = 1;
        outlinks[0].ownedByDMA = 1;
        outlinks[0].next = nullptr;

        DMA_CH.out.conf0.out_rst = 1;
        DMA_CH.out.conf0.out_rst = 0;

        SPI_CH.dma_conf.dma_afifo_rst = 1;
        SPI_CH.dma_conf.dma_afifo_rst = 0;

        SPI_CH.dma_int_clr.val = 0xFFFFFFFF;

        SPI_CH.dma_conf.dma_tx_ena = 1;
        SPI_CH.dma_conf.dma_rx_ena = 1;

        DMA_CH.out.link.addr = reinterpret_cast<uint32>(outlinks.data());
        DMA_CH.out.link.start = 1;

        SPI_CH.user.usr_mosi = 1;

        // Update and wait for update...
        SPI_CH.cmd.update = 1;
        while (SPI_CH.cmd.update);

        SPI_CH.cmd.usr = 1;

        while (not SPI_CH.dma_int_raw.trans_done);

        Log::dbug("Core0", "Trans Done: ", Log::HEX, SPI_CH.dma_int_raw.val);
        vTaskDelay(10);
    }
This is more or less what spi_bus_initialize, spi_bus_add_device, and (in the while loop) spi_device_polling_transmit decomposes to, as best as I can tell. As usual (I've deconstructed a few ESP-IDF APIs this way, for different reasons), the flow doesn't quite match the TRM, but generally it's pretty close, and obviously it works.

I just have this directly within my app_main.

Here, the magic value has been cleared (the entire slave register has been), and so if I don't want anything to be changed for the conf transaction (with the purpose of a minimal example), it's my understanding I should simply be able to make sure user.usr_conf_nxt is clear, have a zeroed SPI_BIT_MAP_WORD at the beginning of my data buffer, and set slave.usr_conf, and things should "Just Work."

I've enabled all interrupts and wait on the value (for any interrupt), yet nothing seems to happen after I set cmd.usr. The while loop triggers the watchdog and I see nothing on the outputs on the oscilloscope:

Code: Select all

    constexpr auto& SPI_CH{GPSPI2};
    constexpr auto& DMA_CH{GDMA.channel[3]};

    SYSTEM.perip_clk_en0.spi2_clk_en = 1;
    SYSTEM.perip_rst_en0.spi2_rst = 1;
    SYSTEM.perip_rst_en0.spi2_rst = 0;

    DMA_CH.out.conf0.out_rst = 1;
    DMA_CH.out.conf0.out_rst = 0;
    DMA_CH.out.peri_sel.sel = 0;

    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_15 * 4), MCU_SEL, PIN_FUNC_GPIO);
    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_17* 4), MCU_SEL, PIN_FUNC_GPIO);
    REG_SET_FIELD(IO_MUX_GPIO0_REG + (GPIO_NUM_16 * 4), MCU_SEL, PIN_FUNC_GPIO);

    esp_rom_gpio_connect_out_signal(GPIO_NUM_15, FSPICLK_OUT_IDX, false, false);
    esp_rom_gpio_connect_out_signal(GPIO_NUM_17, FSPID_OUT_IDX, false, false);
    esp_rom_gpio_connect_out_signal(GPIO_NUM_16, FSPIQ_OUT_IDX, false, false);

    SPI_CH.user1.cs_setup_time = 0;
    SPI_CH.user1.cs_hold_time = 0;

    SPI_CH.user.usr_miso_highpart = 0;
    SPI_CH.user.usr_mosi_highpart = 0;

    SPI_CH.slave.val = 0;
    SPI_CH.user.val = 0;

    SPI_CH.clk_gate.mst_clk_active = 1;
    SPI_CH.clk_gate.mst_clk_sel = 1;

    SPI_CH.dma_conf.val = 0;
    SPI_CH.dma_conf.tx_seg_trans_clr_en = 1;
    SPI_CH.dma_conf.rx_seg_trans_clr_en = 1;
    SPI_CH.dma_conf.dma_seg_trans_en = 0;

    SPI_CH.dma_int_ena.val = 0xFFFFFFFF;

    SPI_CH.ctrl.d_pol = false;

    // Update and wait for update...
    SPI_CH.cmd.update = 1;
    while (SPI_CH.cmd.update);

    array<uint8, 1023 + 4> dataBuffer;
    dataBuffer.fill(0b10100101);
    dataBuffer[0] = 0;
    dataBuffer[1] = 0;
    dataBuffer[2] = 0;
    dataBuffer[3] = 0;

    while (true) {
        SPI_CH.misc.master_cs_pol = 0;

        SPI_CH.clock.clk_equ_sysclk = 0;
        SPI_CH.clock.clkdiv_pre = 0;
        SPI_CH.clock.clkcnt_n = 3;
        SPI_CH.clock.clkcnt_l = 3;
        SPI_CH.clock.clkcnt_h = 1;

        SPI_CH.ctrl.wr_bit_order = 0;

        // These correspond to SPI "modes"
        SPI_CH.misc.ck_idle_edge = 0;
        SPI_CH.user.ck_out_edge = 0;

        SPI_CH.user.doutdin = 0;

        SPI_CH.user.sio = 0;

        SPI_CH.user.cs_hold = 0;
        SPI_CH.user.cs_setup = 0;

        SPI_CH.misc.cs0_dis = 0;
        SPI_CH.misc.cs1_dis = 1;
        SPI_CH.misc.cs2_dis = 1;
        SPI_CH.misc.cs3_dis = 1;
        SPI_CH.misc.cs4_dis = 1;
        SPI_CH.misc.cs5_dis = 1;

        SPI_CH.clk_gate.mst_clk_sel = 1;

        SPI_CH.dma_int_clr.trans_done = 1;
        assert(SPI_CH.cmd.usr == 0);

        SPI_CH.ctrl.val &= ~SPI_LL_ONE_LINE_CTRL_MASK;
        SPI_CH.user.val &= ~SPI_LL_ONE_LINE_USER_MASK;
        SPI_CH.user.fwrite_dual = 1;

        SPI_CH.ms_dlen.ms_data_bitlen = (1023 * 8) - 1;

        SPI_CH.user.usr_addr = 0;
        SPI_CH.user.usr_command = 0;

        SPI_CH.user.usr_conf_nxt = 0;
        SPI_CH.slave.usr_conf = 1;

        array<Hardware::DMA::Outlink, 1> outlinks;
        outlinks[0].size = dataBuffer.size();
        outlinks[0].length = dataBuffer.size();
        outlinks[0].buffer = dataBuffer.data();
        outlinks[0].suc_eof = 1;
        outlinks[0].ownedByDMA = 1;
        outlinks[0].next = nullptr;

        DMA_CH.out.conf0.out_rst = 1;
        DMA_CH.out.conf0.out_rst = 0;

        SPI_CH.dma_conf.dma_afifo_rst = 1;
        SPI_CH.dma_conf.dma_afifo_rst = 0;

        SPI_CH.dma_int_clr.val = 0xFFFFFFFF;

        SPI_CH.dma_conf.dma_tx_ena = 1;
        SPI_CH.dma_conf.dma_rx_ena = 1;

        DMA_CH.out.link.addr = reinterpret_cast<uint32>(outlinks.data());
        DMA_CH.out.link.start = 1;

        SPI_CH.user.usr_mosi = 1;

        // Update and wait for update...
        SPI_CH.cmd.update = 1;
        while (SPI_CH.cmd.update);

        SPI_CH.cmd.usr = 1;

        while (not SPI_CH.dma_int_raw.val);

        Log::dbug("Core0", "Trans Done: ", Log::HEX, SPI_CH.dma_int_raw.val);
        vTaskDelay(10);
    }
I've been looking at this for a while, so there's a good chance I'm overlooking something relatively simple...

The X of the problem

Ultimately, my goal is to drive a shift register to generate PWM outputs. The shift register is a type with an additional output/storage register, so there's a total of 3 inputs: Serial Data In, Data Clock, and Storage Clock. The storage clock moves data from the shift register into a register which drives the IC outputs, so I want it to pulse every time I've sent out 8 bits (the width of the shift register).

My original idea was that the shift register could be driven by the SPI peripheral, with the storage clock triggered by CS, however CS only toggles between each "transaction," and as far as I can tell the only way to get it to trigger every 8 bits like I want requires the segmented transfer feature (and at least a word/4 bytes for each 8 bits)?

In any case it seems I'll need the segmented transfer feature, as it seems it's the only way to cause infinite output with SPI.

However, to save memory, I figured I could use Dual SPI, with only the first bit of data1's bits set (for every 8 bits of data0), which would be satisfactory for 1023 samples (possible duty cycle increments). 16-bits for every 8 bits out, multiplied by 1023 is still ~2KB, which seems reasonable. Then I would only need a single bitmap word (to allow "restarting" the transaction), and loop the DMA outlinks.

That brought up another uncertainty: How does alignment work with the bitmap word? If I have a bitmap word, and then a number of bits (equal to ms_dlen + 1). Should the next bitmap word be aligned to the next word? The next byte? How is that read? Otherwise I'm not sure how multiple conf buffers would work. (Which is essentially all looping the outlinks would appear as in hardware, as far as I can tell).

Anyways, SPI still seems to be the best peripheral for this, assuming the capabilities I think the hardware has, and provided I'm able to get answers to those uncertainties.

Perhaps it is not though, in which case I'm open to suggestions for what peripherals may be better to drive this shift register which requires a clock, serial data, and essentially another clock signal with a x8 divider.

For clarify, I'm using the shfit register in the first place because I'll already be using 4 PWM channels for other hardware (so a total of 12 PWM channels is my end goal), and I've only 4 GPIO pins left after all my other hardware is connected. I'm on a custom-designed PCB using ESP32-S3 SoC.

Re: ESP32-S3 GP-SPI2 (FSPI) DMA Configurable Segmented Transfer

Posted: Fri Sep 26, 2025 9:16 am
by Sprite
I'm trying to set up the SPI peripheral with the configurable segmented transfer (in master mode), and am a bit confused. As far as I can tell, the only difference between a normal transfer and the segmented transfer is the expected SPI_BIT_MAP_WORD (and any specified trailing words), the slave.usr_conf value being set, and the slave.dma_seg_magic_value.
It's a bit more complicated. In segmented transfer, for each actual SPI transaction, you need two buffers (and as such 2 or more descriptors): the second one is your actual data, like you have, but the first one needs to be the special CONF buffer type. In your case, as you're not changing the config, it only needs to be the SPI_BIT_MAP_WORD and otherwise a bitmap of 0. But it does need to be a separate buffer with a separate descriptor pointing to it.
My original idea was that the shift register could be driven by the SPI peripheral, with the storage clock triggered by CS, however CS only toggles between each "transaction," and as far as I can tell the only way to get it to trigger every 8 bits like I want requires the segmented transfer feature (and at least a word/4 bytes for each 8 bits)?
Even more... Off the top of my head (as in don't take the values for granted), you'd need 2 descriptors of 12 bytes each, plus one byte data, plus one CONF field of 4 bytes, meaning 29 bytes per PWM step (so 10-bit PWM takes 29K of RAM).
However, to save memory, I figured I could use Dual SPI, with only the first bit of data1's bits set (for every 8 bits of data0), which would be satisfactory for 1023 samples (possible duty cycle increments). 16-bits for every 8 bits out, multiplied by 1023 is still ~2KB, which seems reasonable. Then I would only need a single bitmap word (to allow "restarting" the transaction), and loop the DMA outlinks.
This is true. Note that the PWM values need to be interleaved, as in: pwm channel 0 is bit 0, pwm channel 1 is bit 2, pwm channel 2 is bit 4 etc.

Re: ESP32-S3 GP-SPI2 (FSPI) DMA Configurable Segmented Transfer

Posted: Fri Sep 26, 2025 6:06 pm
by ryancog
In segmented transfer, for each actual SPI transaction, you need two buffers (and as such 2 or more descriptors): the second one is your actual data, like you have, but the first one needs to be the special CONF buffer type.
I see. I did read that in the TRM. I thought I had set it up, but to no avail. Since I was under the impression the outlinks didn’t make any difference to the peripheral, (which it seems is not the case at least for SPI in this mode?) I got rid of the extra outlink and buffer (leading to the example above). I’m wondering how I went wrong initially, because I did try with two outlinks.

When you say special CONF buffer type, is there anything special in the outlink about that? Or do you simply mean it contains the magic word and subsequent fields?
you'd need 2 descriptors of 12 bytes each, plus one byte data, plus one CONF field of 4 bytes, meaning 29 bytes per PWM step (so 10-bit PWM takes 29K of RAM).
Hah, yeah, that’s why I considered dual SPI instead, a lot of overhead that way.
Note that the PWM values need to be interleaved, as in: pwm channel 0 is bit 0, pwm channel 1 is bit 2, pwm channel 2 is bit 4 etc.
I did notice that, thank you.

In the meantime I setup I2S to do this, but depending on how I want to use my peripherals, I might try to revisit SPI in the future.

I’ll report back if I do, success or otherwise. I appreciate your clarifications. I am curious that the DMA outlinks influence the peripheral reading. I didn’t realize that the peripheral “noticed” the outlink boundaries/buffers.