Bug in FPU Coprocessor float division?

michal1
Posts: 2
Joined: Tue May 14, 2019 7:26 pm

Bug in FPU Coprocessor float division?

Postby michal1 » Tue May 14, 2019 8:03 pm

Hello,

I have noticed that using the native FPU division instructions leads to unexpected results when dividing two floating numbers of significantly different exponents.

It seems that if a division operator is used in C code, gcc inserts a call to __divsf3() which does not use the native FPU division instructions but instead relies on software implementation and gives the correct result. As I am working on computationally intensive DSP code, performance is really important to me and I would like to use the native instructions instead. Using the FPU instruction sequence for division as documented in the ISA Reference manual and implemented in divsf() : https://github.com/espressif/esp-idf/bl ... /test_fp.c however leads to wrong results (typically 0.0) when the two exponents differ a lot. The FPU division should however be IEEE compliant according to documentation.

An example code:

Code: Select all

float x = 24e9;
float y = 4;

printf("C: %.2f/%.2f = %.2f\n", x,y, x/y);
printf("ASM: %.2f/%.2f = %.2f\n", x,y, divsf(x,y));

x = 17;
y = 4;

printf("C: %.2f/%.2f = %.2f\n", x,y, x/y);
printf("ASM: %.2f/%.2f = %.2f\n", x,y, divsf(x,y));
Output:

Code: Select all

C: 24000000000.00/4.00 = 6000000000.00
ASM: 24000000000.00/4.00 = 0.00
C: 17.00/4.00 = 4.25
ASM: 17.00/4.00 = 4.25
I was wondering if anybody has come across those issues and if this is indeed a hardware bug in the FPU or if there are perhaps limitations to the native FPU?

Thank you very much!

michal1
Posts: 2
Joined: Tue May 14, 2019 7:26 pm

Re: Bug in FPU Coprocessor float division?

Postby michal1 » Wed May 15, 2019 4:33 pm

It turns out there is a bug in divsf() implementation in https://github.com/espressif/esp-idf/bl ... /test_fp.c

I believe the correct implemenation should be:

Code: Select all

float divsf(float a, float b)
{
    float result;
    asm volatile (
        "wfr f0, %1\n"
        "wfr f1, %2\n"
        "div0.s f3, f1 \n"
        "nexp01.s f4, f1 \n"
        "const.s f5, 1 \n"
        "maddn.s f5, f4, f3 \n"
        "mov.s f6, f3 \n"
        "mov.s f7, f1 \n"
        "nexp01.s f8, f0 \n"
        "maddn.s f6, f5, f3 \n"
        "const.s f5, 1 \n"
        "const.s f2, 0 \n"
        "neg.s f9, f8 \n"
        "maddn.s f5,f4,f6 \n"
        "maddn.s f2, f9, f3 \n" /* Original was "maddn.s f2, f0, f3 \n" */
        "mkdadj.s f7, f0 \n"
        "maddn.s f6,f5,f6 \n"
        "maddn.s f9,f4,f2 \n"
        "const.s f5, 1 \n"
        "maddn.s f5,f4,f6 \n"
        "maddn.s f2,f9,f6 \n"
        "neg.s f9, f8 \n"
        "maddn.s f6,f5,f6 \n"
        "maddn.s f9,f4,f2 \n"
        "addexpm.s f2, f7 \n"
        "addexp.s f6, f7 \n"
        "divn.s f2,f9,f6\n"
        "rfr %0, f2\n"
        :"=r"(result):"r"(a), "r"(b)
    );
    return result;
}
Another question is, why does gcc by default use __divsf3() when the performance of the native FPU is significantly better?

ESP_Angus
Posts: 1577
Joined: Sun May 08, 2016 4:11 am

Re: Bug in FPU Coprocessor float division?

Postby ESP_Angus » Thu May 16, 2019 2:15 am

Thanks for pointing this out, michal.

EDIT: The libgcc __divsf3() implementation in the toolchain uses FPU registers. IDF is accidentally linking the version in ROM which does not use FPU registers. We'll fix this so that you get the FPU version when building a project that uses floating point division.

EDIT 2: Fix is in internal review now.

Who is online

Users browsing this forum: No registered users and 3 guests