Page 1 of 1

Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Tue Aug 26, 2025 1:53 pm
by mburger
Project-Goal:
I work on a project where i need to drive a ADV7391 Video Encoder to generate analog PAL Signal.
I know, that neither the ESP32-S3 nor the EPS32-P4 supports BT.656 Protocol from the getgo, but i can generate this data myself. (With SAV EAV porches in YCrCB color space and so on.)

On the data side:
A Video line consists of 1728 Bytes
A Video Full Image consists of 625 lines.
This results in 1080000 Bytes of Data.
The whole data Transmission is clocked at a whopping 27MHz
So basically the whole 1MByte of Data needs to be sent out about 25 Times per second, which results in a read rate from the SPI-Ram (and write rate to the peripheral of course) of about 27MByte/s

Technical half-Successes:
I successfully used the LCD-RGB Peripheral on the ESP32-S3 to feed a framebuffer (in spi-ram) to the adv7391. (more or less continuous)
But changing the framebuffer on the fly is not really easy. And rewriting the whole buffer takes quite some time.
The Double Buffering Feature never really worked fully for me. I would wish for a function like "switchframebuffer(...)"
I tried "esp_lcd_panel_draw_bitmap" which should have the effect of switching the framebuffer, but this always took way too much time and the data stream was not continuous anymore.

Technical Difficulties:
I used the parlio_tx driver on the P4 to do the same. Here I split the Buffer into 25 * 1728 Bytes. 25 of these Packages would result in one full image.
25 Lines result in more then 1ms of data-length which is kind of compatible to use it in a task-scope again.
But no real success until now.
I also saw the "loop_transmission" flag in the parlio_transmit_config_t which exists since 5.5. But this only loops one buffer, which is not enough. And the maximum buffer size parlio is offering is too small to hold the full image.

Open for Ideas:
Does anyone have any Ideas how to solve this Problem.
- Does Espressif have tried something like that?
- Generating BT.656 Data in general
- Writing a huge (>1MByte) amount of data continuously at over parallel bus
- Does anyone else have experience with this?
- Any Ideas for other peripherals to use for that?
- Any Ideas for other configurations to use?

Best regards and thanks for any input/advice
Martin

Re: Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Wed Aug 27, 2025 1:58 am
by Sprite
27MHz at (I assume) 8 bit is 27MByte/second, or 216MBit/second. Assuming you write to the framebuffer at the same speed, this gets multiplied by 3 (because a write from the CPU reads the rest of the cache line first) for 648MBit/second. What type of PSRAM were you using? Quad PSRAM can be clocked at 120MHz, for a datarate of 120*4=480MBit, which is not enough. Octal PSRAM is a lot faster, it runs at 8 bit, 80MHz, DDR, so 80*2*8=1280MBit, which should be enough. The P4 uses 16-bit DDR PSRAM at 200MHz, which gives you 6400MBit/sec to play with.

Have you tried porting your LCD code to the P4? The P4 has the same LCD peripheral, so it should mostly just compile. If at LCD driver initialization, you ask it for two framebuffers and then you use esp_lcd_panel_draw_bitmap with the memory address of the 2nd, the switch should be instantaneous; if it's not, feel free to post your code and we'll take a look.

Re: Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Wed Aug 27, 2025 9:21 pm
by MicroController
For the S3 see also https://github.com/project-x51/esp32-s3 ... /README.md

@Sprite: We still can't DMA from PSRAM to peripheral on an S3, can we?

Re: Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Thu Aug 28, 2025 12:13 am
by Sprite
For the S3 see also https://github.com/project-x51/esp32-s3 ... /README.md

@Sprite: We still can't DMA from PSRAM to peripheral on an S3, can we?
No, that should work. There's some restrictions wrt aligment, and Bad Things happen if you use more bandwidth than the psram link can provide, but generally it works well.

Re: Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Thu Aug 28, 2025 5:57 am
by ok-home
A different problem arises there.
PSRAM and Flash are on the same SPI bus, and cache swapping/flushing has the highest priority. Because of this, dropouts can occur during DMA operations on the PSRAM. Even if you have octal PSRAM, the flash on the ESP32-S3 is, if I remember correctly, quad.

In practice, dropouts in DMA already start to occur at speeds above 10 MB/s.

Therefore, if reliable transfer is needed, the fastest option is ping-pong DMA to RAM and then memcpy to PSRAM.

Re: Output >1MByte Buffer continuously from SPI-Buffer at high speeds

Posted: Tue Oct 21, 2025 4:07 pm
by mburger
Thank you for the answers.
Meanwhile i am quite a bit further with the project. Some stuff is even working! 8-)
The Concept surely is working.

I have switched over to a esp32p4 with the much improved 200MHz 16Bit spi-ram Interface and more Computing power in general.

Although the principle of writing BT656 Data via RGB Display Interface worked on the S3, it was already at its limit just with outputting a static image.

As for the P4 I work with the ESP32-P4-FUNCTION-EV-BOARD including SC2336 MIPI CSI CAM.
I have set up a ISP pipeline to prep the camera data and convert it to yuv422 data. From there i send it over to the bt656rgbdriver to continuously output it to the ADV7391.

There are some problems i encountered on the way.
==> I tried to use the ppa unit to resize rotate mirror my image. But yuv422 seams to be not supported yet (as it states in the sources) This is kind of unlucky, as yuv420 and yuv444 are supported. In the code there is a remark, that 422 should be supported in P4 ECO2 in the future? (What is P4 ECO2?) Info found in ppa_srm_color_mode_t typedef.
==> As the Camera Image is a lot of data still, filling the image buffer with this data takes some time. In numbers: Filling my imagebuffer (with EAV/SAV and cam data) takes about 22ms. Writing the buffer to the driver with esp_lcd_panel_draw_bitmap takes a whopping 64ms. This is a lot and results in lower framerate then expected.

Im not sure if i did something wrong in the config of the rgb-lcd module.

Here my Setup:

Code: Select all

esp_lcd_rgb_panel_config_t panel_config = {
        .data_width = 8,
        .dma_burst_size = 64,
        .num_fbs = 2,

        .clk_src = LCD_CLK_SRC_DEFAULT,
        .disp_gpio_num = -1,
        .pclk_gpio_num = GPIO_PCLK,
        .vsync_gpio_num = -1,
        .hsync_gpio_num = -1,
        .de_gpio_num = -1,
        .data_gpio_nums = {
            GPIO_D0,
            GPIO_D1,
            GPIO_D2,
            GPIO_D3,
            GPIO_D4,
            GPIO_D5,
            GPIO_D6,
            GPIO_D7,
        },
        .timings = {
            .pclk_hz = BT656_PCLK_HZ,
            .h_res = BT656_RES_H,
            .v_res = BT656_RES_V,
            .hsync_back_porch = 1,
            .hsync_front_porch = 1,
            .hsync_pulse_width = 0,
            .vsync_back_porch = 1,
            .vsync_front_porch = 0,
            .vsync_pulse_width = 0,
            .flags = {
                .pclk_active_neg = true,
            },
        },
        .flags.fb_in_psram = true,
        .flags.double_fb = true,
    };
    ESP_ERROR_CHECK(esp_lcd_new_rgb_panel(&panel_config, &panel_handle));

    ESP_LOGI(TAG, "Initialize RGB LCD panel");
    ESP_ERROR_CHECK(esp_lcd_panel_reset(panel_handle));
    ESP_ERROR_CHECK(esp_lcd_panel_init(panel_handle));

    image0 = malloc(BT656_RES_H*BT656_RES_V);
    fillBlanking(image0);
    esp_lcd_panel_draw_bitmap(panel_handle,0,0,BT656_RES_H, BT656_RES_V, image0);
    ESP_LOGI(TAG, "Register event callbacks");
    esp_lcd_rgb_panel_event_callbacks_t cbs = {
        // .on_color_trans_done = screenupdate_ready,
        // .on_frame_buf_complete = screenupdate_ready,
        .on_vsync = screenupdate_ready,
    };
    ESP_ERROR_CHECK(esp_lcd_rgb_panel_register_event_callbacks(panel_handle, &cbs, NULL));
And with this function i write new data to the rgblcd driver from the camera task:

Code: Select all

void writeYUV422ImageToADV(uint8_t* source, uint16_t hres, uint16_t vres) {
    if(mtx_imgswitch==NULL) return;
    uint8_t (*cambuffer)[1024 * 2] = (uint8_t (*)[1024 * 2])source;
    xSemaphoreTake(mtx_imgswitch, portMAX_DELAY);
    TickType_t stamp = xTaskGetTickCount();
    fillImage(image0, &cambuffer[0][0], 600, 1024*2);
    ESP_LOGI(TAG, "fillimage took %i ticks", xTaskGetTickCount()-stamp);
    stamp = xTaskGetTickCount();
    esp_lcd_panel_draw_bitmap(panel_handle,0,0,BT656_RES_H, BT656_RES_V, image0);
    ESP_LOGI(TAG, "draw image took %i ticks", xTaskGetTickCount()-stamp);
    xSemaphoreGive(mtx_imgswitch);
}
The mutex is not nescessary at the moment, but does not hinder anything either.
The Camera-Driver gets new data in a task and calls this writeYUV422ImageToADV Function with the new buffer-pointer.
The Camera runs with a double buffer setup and sends the buffer pointer via queue from the s_camera_get_finished_trans interrupt.

The whole setup works, but is slow. Slower then expected and desired. Slow as in about 12 fps. 25fps would be standard for PAL analog video.
Why is the esp_lcd_panel_draw_bitmap function so slow? Shouldnt this be basically a glorified memcpy in the background?

Any Ideas on how to improve the performance?
Thank you anyway for the support! It is highly appreciated!

Best regards
Martin