In this post I’ll talk more about the upcoming software renderer including progress, performance and future plans. Note that this work is happening in parallel to the version 0.20 support, the software renderer won’t be available until after the next build.
As I’ve been working on the next release, I occasionally fiddle with the software renderer. Currently I’m writing it in a separate test app, to avoid integration issues until after version 0.20 is released. Until recently, screenshots would be really boring since it was all boilerplate code but I’ll go ahead and show the first screenshot now:
Statistics for images shown:
- 32 bit renderer (8 bit will also be supported with all the palette effects)
- Perspective correct texturing
- 16 bit Z-Buffer (will probably be upgraded 32 bit)
- Sub-pixel/sub-texel accuracy (no wobbling edges or textures)
- Per polygon color, used for lighting (supports RGB, only intensity shown here)
- Z-Fogging using a fog table.
And of course it supports 2D blits (including arbitrary scaling):
Currently performance isn’t where I want it to be for multiple reasons:
- It’s using GDI for the final blit which can be slow. Obviously it should use DirectDraw and/or Direct 3D when available (just for the final blit though).
- The bottleneck is the actual rasterization, in the future this will be optimized using SSE/SSE2 when available. However this won’t occur until it has all the base features.
- SSE/SSE2 optimizations for other parts of the pipeline as needed (transform, clipping, etc.).
Some of the additional features required before integration (post-0.20):
- Dynamic lights (using the proper radial attenuation).
- Mipmapping (probably with mip selection done per scanline).
- Various pipeline improvements, backface culling and so on.
Finally I plan on adding hierarchical z-buffer based occlusion culling – at the batch, polygon and maybe even span level in order to reduce overdraw to near zero. This, combined with an optimized rasterizer, should allow the software renderer to run at very high resolutions on modern CPUs. This is more practical with software rendering (or at least simpler to get right) due to the lack of latency and also the fact that it handles small batches very well, which means that rendering can be sorted from front to back at a finer granularity without hurting batch performance or state switching performance.
Ultimately the goal is 60fps at 1024×768 on a 1 GHz CPU. Once the final blit is fixed (i.e. not taking 11ms at 1680×1050 because of GDI), I’m close to that goal now but still have to make it faster using the above methods in order to also have gameplay too (i.e. collision detection, AI, combat, etc.).
SSE will only be used when available, potentially widening the CPU support if you run at low resolutions (i.e. 320×200 or 640×480).
Hopefully this will allow anyone with a semi-modern computer (i.e. a computer purchased within the last 11 years) to play DaggerXL at good framerates, at least with the basic feature set. In addition the hardware renderer will go through another round of optimizations, as well as general program speed improvements and loading improvements upon start up.