Page 1 of 1

Some assembly required... seeing trap when using ee.vld.128.ip instruction

Posted: Mon Oct 17, 2022 11:09 am
by aaronw
I am seeing a trap when I use the ee.vld.128.ip instruction on an ESP32 S3. As far as I can tell, the way I'm using it should be fine. The value of a12 before this instruction points to a valid memory location: 0x40378d40. The trap also says that the address being accessed is invalid, though the address can be accessed just fine in the preceding l32i instruction which I put in to verify I didn't screw anything up. While I have extensive assembly language experience, it's mostly 64-bit MIPS with some ARM, so I'm new to Xtensa.

My goal for this is to speed up the FastLED code by using a lookup table for each nibble rather than doing a comparison for each bit. My benchmarking of my current method shows over a 25% speedup, but I want to see how far I can take it. The ee.vld.128.ip instruction looks ideal since I need to load 16 bytes of data then perform 4 word writes to the RMT memory buffer.

What am I doing wrong here? The l32i instruction accessing the exact same address works fine and loads the expected value.

I have also added code to set the wur.sar_byte and wur.accx_x registers to 0, but this makes no difference, nor do I see why this would be necessary.

Code: Untitled.txt Select all


entry 0x403c98d8
[ 83][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 6 = 80 Mhz, APB: 80000000 Hz
Guru Meditation Error: Core 1 panic'ed (LoadStoreError). Exception was unhandled.

Core 1 register dump:
PC : 0x403752b8 PS : 0x00060830 A0 : 0x80375558 A1 : 0x3fce2bc0
A2 : 0x3fc94088 A3 : 0x40378d40 A4 : 0x00000030 A5 : 0x00000000
A6 : 0x02ce3644 A7 : 0x00ffffff A8 : 0x60016800 A9 : 0x00000120
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x40378d40 A13 : 0x00000000
A14 : 0x0028800a A15 : 0x40378d40 SAR : 0x00000010 EXCCAUSE: 0x00000003
EXCVADDR: 0x40378d40 LBEG : 0x400570e8 LEND : 0x400570f3 LCOUNT : 0xffffffff
Backtrace:0x403752b5:0x3fce2bc00x40375555:0x3fce2be0 0x4037558f:0x3fce2c00 0x40375a05:0x3fce2c20 0x42001486:0x3fce2c40 0x42001729:0x3fce2c60 0x42001849:0x3fce2cb0 0x4200166c:0x3fce2cf0 0x4200260d:0x3fce2d20

My code up until the crash looks like:

Code: Untitled.asm Select all


                "   srli            %[tmp], %[p], 4             \n"
" slli %[tmp], %[tmp], 4 \n"
" add.n %[tmp], %[tmp], %[bitTable] \n"
" mov.n a15, %[tmp] \n"
" l32i a14, %[tmp], 0 \n"
" ee.vld.128.ip q0,%[tmp],0 \n"
And the compiled code according to objdump is:

Code: Untitled.txt Select all

403752a2:       72b8            l32i.n  a11, a2, 28
403752a4: fc6131 l32r a3, 40374428 <_iram_text_start+0x8>
403752a7: bbaa add.n a11, a11, a10
403752a9: 000bb2 l8ui a11, a11, 0
403752ac: 41c4b0 srli a12, a11, 4
403752af: 11ccc0 slli a12, a12, 4
403752b2: cc3a add.n a12, a12, a3
403752b4: 0cfd mov.n a15, a12
403752b6: 0ce8 l32i.n a14, a12, 0
403752b8: 8300c4 ee.vld.128.ip q0, a12, 0
My total inline code looks like:

Code: Untitled.txt Select all

            register uint8_t pData = mPixelData[mCur];
register rmt_item32_t *bitTablePtr = &bitTable[0][0];
__asm__ __volatile__(
" srli %[tmp], %[p], 4 \n"
" slli %[tmp], %[tmp], 4 \n"
" add.n %[tmp], %[tmp], %[bitTable] \n"
" mov.n a15, %[tmp] \n"
" l32i a14, %[tmp], 0 \n"
" ee.vld.128.ip q0,%[tmp],0 \n"
" extui %[tmp], %[p], 0, 4 \n"
" slli %[tmp], %[tmp], 4 \n"
" add.n %[tmp], %[tmp], %[bitTable] \n"
" ee.vld.128.ip q1,%[tmp],0 \n"
" ee.movi.32.a q0, %[tmp], 3 \n"
" s32i %[tmp], %[pRmtMem], 0x0 \n"
" ee.movi.32.a q0, %[tmp], 2 \n"
" s32i %[tmp], %[pRmtMem], 0x4 \n"
" ee.movi.32.a q0, %[tmp], 1 \n"
" s32i %[tmp], %[pRmtMem], 0x8 \n"
" ee.movi.32.a q0, %[tmp], 0 \n"
" s32i %[tmp], %[pRmtMem], 0xc \n"
" ee.movi.32.a q1, %[tmp], 3 \n"
" s32i %[tmp], %[pRmtMem], 0x10 \n"
" ee.movi.32.a q1, %[tmp], 2 \n"
" s32i %[tmp], %[pRmtMem], 0x14 \n"
" ee.movi.32.a q1, %[tmp], 1 \n"
" s32i %[tmp], %[pRmtMem], 0x18 \n"
" ee.movi.32.a q1, %[tmp], 0 \n"
" s32i %[tmp], %[pRmtMem], 0x1c \n"
" addi %[pRmtMem],%[pRmtMem], 0x20 \n"
" memw \n"
: [tmp] "=&r"(tmp), [pRmtMem] "+r"(pItem)
: [bitTable] "r"(bitTablePtr), [p] "r"(pData)
: "a14", "a15");
mCur++;
where pItem points to RMT memory. a14 and a15 are currently just for debugging.

Any help would be appreciated.

-Aaron

Re: Some assembly required... seeing trap when using ee.vld.128.ip instruction

Posted: Mon Oct 17, 2022 7:54 pm
by ESP_igrr
Hi Aaron,
I'll check this with the hardware designers, but very likely the issue you are seeing is because the source address is mapped to the instruction bus of the CPU, not to the data bus. ESP32-S3 has a 128-bit data bus which is used by the instruction extensions. However the CPU doesn't know how to perform a 128-bit access over an instruction bus — there simply isn't any hardware in the CPU for this.
Please try moving the array into data memory — either RAM (.data) or Flash (.rodata).

(Similar issue occurs with FPU instructions — loads/stores to/from the FPU registers can't use pointers which are in the instruction bus range)

Re: Some assembly required... seeing trap when using ee.vld.128.ip instruction

Posted: Wed Oct 19, 2022 9:18 am
by aaronw
This may be the case. It looks like the static allocation of the table is being placed in instruction memory.