I heard on some hacker forums Bituni simple ripped that code off of someone else's project. Not sure but this code looks a bit sloppy and it is heavy mixed with Arduino. If you look at I2S technical reference info there are a lot of functionalities that are untouched by most of the code out there. I'm not even sure if whole I2S is used properly in most of esp projects because it chokes at 20MHz+.
Definitely some updates and code polishing are needed. Also if you just go with inline asm for direct register write to multiple pins you will probably achieve speeds faster than I2S. Maybe Espressif will show us more examples in the future.
I've already tried the GPIO w1ts/w1tc byte permutation trick with a lookup table that some YouTuber made and it's not really fast. It goes up to 38FPS and the tearing is very significant and ugly. It also doesn't utilize DMA, but it's all just bitbanging. I don't think that assembly would help there since C/C++ most probably optimizes that code as much as possible since it isn't some digital_write bloat or things like that. I could be wrong, maybe the compiler doesn't compile the code into a direct register write, but I don't know how to get the disassembly out of an already compiled C/C++ code.
The idea is inevitably to use I2S with DMA. However, as far as I could see from my perspective, the key is that Bitluni used a feature in his I2S implementation that splits the 32-bit buffer into a pair of 16-bit buffers which increases the bandwidth effectively to 40MHz. That unfortunately caused him to have jittery pixels until he utilized the precision clock for calibration at 500MHz and above ("out of spec values" @ 580MHz) which led him to far faster speeds and results.
So, what we'd need to do is make up the code that does this precision clock + DMA + I2S binding, but without looking into Bitluni's code so that we don't get "stained" with the ShareAlike license's virality. So, could you tell me what code approximately connects this clock thing with the I2S bus? Also, I couldn't fully read through everything that was written here and, frankly, I don't want to overwhelm myself with all/any of that. I just want to start out fresh on doing this myself with a summary of things to be cautious of and possibly help from some of you. Things like which bytes are sent in which order in which condition and such. This is important for me because I don't have any electronic measurement equipment, but only microcontrollers, displays, resistors, capacitors and buttons. You guys have oscilloscopes and various things and you've probably gone through some of the conclusions. Most probably my findings here could help some of you so you could try to use them for your own testings and that way we can build this thing altogether.
Now, I'm certainly sure that the LCD display has some maximum speed and that by looking into the datasheet to try out different speeds, I could see what the fastest speed is. Maybe we don't need 580MHz like how Bitluni used for his VGA implementation. If my calculations are correct, 320*240*2*60=9216000 means that we'd need approximately a bit less than 10MHz while utilizing the "pair of 16-bit buffers" thing in order to have a 320x240x16-bit@60FPS display. This could be a great contribution to the LittleVGL library in order to make it superfast and not rely so much on the HSPI and the waitings and the rendering buffers.
However, regarding my game console project ideas, I saw an NES emulator written in C++ and SDL2 under the MIT license. I could use that code to rewrite the emulator for my game console so that way it has less bloat than the nofrendo GPL'd emulator and which has faster rendering. As for my own fantasy console, my rendering engine idea has been very simple. It's a tile-based engine akin to the SNES/GBA's with the 16-bit RGAB5515 color format, many 8x8/16x16/32x32 8-way rotateable sprites per scaline, 16 color-per-tile tilesets with 32 pallette sets, 4 scrollable and matrix-skewable nametables and etc., except that my engine renders layers one by one per each scanline onto a scanline buffer along with the sprites without looking at which pixel is transparent and which isn't (which SNES had to do, but we don't since we have ESP32 which is like 20x faster and has more memory). Then, the DMA copies the scanline buffer onto the screen while another scanline is being rendered. And the process is repeated until the whole screen is rendered. I've been into NES and SNES emulators for approximately a decade and I cannot resist but to want to finally make something useful off of it.
I think that a full 60FPS 320x240 16-bit color ESP32 game console that doesn't waste too much time and memory on video is the holy grail of microcontrollers. Let's go for it!