ESP32 Forum

Posted: **Thu Dec 12, 2024 3:02 pm**

I have written some code to benchmark the Integer performance of esp32 series chips i have ( esp32, esp32 s2, esp32 s3 and esp32 c3). I know Risc v is still being optimized but somehow benchmark shows that esp32 C3 @ 160 MHz (DIO) performs better than chips like esp32 s2 @ 160 MHz ( QIO ). I dont know if my code is faulty or if the c3 is just as fast as it seems.

the esp32 s2 gets 12 MIops when clocked at 160 MHz

esp32 c3 (DIO) @ 160 MHz --- Add -> 17.6 MIops

esp32 (QIO) @ 240 MHz --- Add -> 18.39 MIops

esp32 s2 (QIO) @ 240 MHz --- Add -> 18.40 MIops

esp32 s3 (QIO) @ 240 MHz --- Add -> 23.88 MIops

Code i used to test them:

Code: Select all

 void Add()
{
  uint32_t AInt = 0;
  uint32_t Num0 = 314;
   Serial.println(" *** Add Time ***");
   unsigned long clock0 = micros();

   for(int i = 0; i  < Loops;i++)
   {
      asm volatile("" : : : "memory"); // Prevent optimization
      AInt = AInt + Num0;
   }

  float pclock = (micros() - clock0);
  Iops = mtime / pclock; 

  AInt -= 10; 

  //Serial.print(pclock);
  Serial.print("Final Value: "); 
  Serial.println(AInt);
  Serial.print(Iops);
  Serial.println(" MIops");
}

Code test the time it takes for each chip to do 1m operations in microseconds then finds out how much it can do in 1s.

Posted: **Thu Dec 12, 2024 3:34 pm**

This kind of "nano"-benchmark is hard to draw conclusions from. For one, it heavily depends on the optimizations the compiler is able to do, with different levels of sophistication between architectures. Then, you may be measuring some implicit instruction sequence(s) more than you are aware of, which may hugely impact the results. For example, an additional 1 clock cycle latency when loading data from memory makes quite a difference when one loop iteration only takes like 3 clock cycles.
The "asm ..." in the loop also makes gcc not use the Xtensa's "zero-overhead loop", which otherwise would speed things up by about 3-4x in this case. So, how meaningful is it to compare the result of an implicitly optimization-inhibited Xtensa benchmark to the RISC-V without this inhibition?

But yes, in another random, a little more complex scenario I also got about 5-10% more throughput per MHz out of a -C3 compared to an -S3. - It all depends on the workload. Try integer division or floats, for example, or let the compiler make use of the ZOLs and things can look quite different.

Posted: **Thu Dec 12, 2024 4:59 pm**

I didnt want the compiler to optimize away the loop. Without "asm volatile("" : : : "memory")" the compiler just removes the loop so the micros() reports 0. Also found a better way to keep loop so i removed "asm volatile("" : : : "memory")" and @160 MHz s2 went from 12 to 14 MIops but c3 is still faster.

ESP32 Forum

Esp32 C3 seems to perform better than s2 at same clock speed

Esp32 C3 seems to perform better than s2 at same clock speed

Re: Esp32 C3 seems to perform better than s2 at same clock speed

Re: Esp32 C3 seems to perform better than s2 at same clock speed