I've been "cannibalizing" a copy of the SPI code that uses the 64 byte FIFO.
The original code was not too bad but had a lot of overhead and locking that was not needed. First iteration of moving some stuff to inline and removing (via some #define's) locking and checking.
A lot of the code had...
if (!spi)
{
return;
}
... however not all bits of code did this checking. Thing is... spi is a pointer that's used and should not be NULL and by returning it's just passing the problem somewhere else or making the calling function get unexpected results as some operation just didn't happen. Much better for all these would be an ASSERT in debug mode so it blows up with elegance.
Also... because of the way the layers are coded... a simple SPI.write(val) might take 3+ calls to do the actual write.
SPI.write() -> layer1() -> layer2() -> actual write.
Moved a lot of the 1 and 2 line functions to inline... now the call above is :-
SPI->write() (does the actual writing as layer1(), layer2() and the actual write are inlined which makes a huge difference when code is called 1000+ times in a loop. (especially since the checking of the spi on every occurrence has been removed).
Updating of a 128*64 OLED went from a few hundred frames a second to almost 1000 frames a second while decoding an MP3 as a result of minor changes.
Next part I plan on doing is allow write operations to return before the SPI transfer has completed so other tasks can be weaved while SPI transfers complete. Implemented this on Arduino Due and it allows pixel scanlines to be processed while the previous one is being sent.
can now do :-
SPI.write(val, SPI_NO_WAIT);
do_small_stuff();
SPI.waitBusy();
If SPI_NO_WAIT is not put in (ie. SPI.write(val); ) then the default is to wait.
Also ... SPI.waitBusy() now returns a value that will indicate whether it was still busy or not when called. In a video decoder... I found I was able to process an entire new scanline before the SPI operation had completed (including all the loop overheads).
Going to see how far I can push the SPI FIFO before looking at DMA because at the moment it's looking very promising. The other thing is... at a certain point the SPI bus will get maxed out because the device is getting bits at the maximum speed. However, being able to do other stuff while waiting could easily get 20%+ extra performance. (In my previous video code... 100% of my loop processing was done during transfers).