Another year of working on rendering and game data formats in Clover’s Toy Box.
There was 191 code revisions adding 16,000 new lines of code (including comment and blank lines and 2,048 lines for a vertex normal table). It’s about 20% of Clover’s Toy Box ~83,000 lines of code (not including third party code). There is an additional 9,000 lines of uncommitted code going back to at least 2018 of unfinished features and debug code.
Quake 3 BSP Rendering
I continued working on Quake 3 BSP support. Building on what I wrote last year: I polished some things up and added a proper BSP rendering API. I was over-complicating it but the API is still missing features. BSP files now have a model handle that can be used with generic AddModelToScene() and BSP specific AddWorldModelToScene() that has surface culling based on view point.
I added initial support for BSP curved bézier patches but there is still more work to do. I fixed Wolfenstein: Enemy Territory external lightmaps to apply r_mapOverBrightBits. I added support for Wolf:ET instanced foliage meshes. In testing on Wolf:ET’s radar map there was better performance when uploading all foliage vertexes and triangle indexes each frame instead of drawing each foliage instance with a separate draw call with pre-uploaded vertexes/triangle indexes. (Maybe it would benefit from actual instanced rendering but it wasn’t supported by the 2008 Intel graphics I was using at the time. I also don’t have Wolf:ET foliage distance culling.)
“Toy Box renderer for Spearmint” compared to Spearmint opengl1 renderer—with some features disabled—is indistinguishable aside from curved patches (disabled Quake 3 shader files, world lighting on models, dynamic lights, wallmarks, sky, TrueType fonts). Though screenshot comparison shows minor difference over textures everywhere; maybe textures filter differences(?), post-process framebuffer object, or something.
Texture coordinates are incorrect on curved bézier patches if the control points are not equal distance apart. (This is the case in Turtle Arena’s subway map for no particular reason.) I think the issue is that the texture coords at a point on the curve needs to use the fraction between the control points instead of applying the same quadratic formula as the 3D position.
I added support for uploading dynamic textures. It’s used by “Toy Box renderer for Spearmint” to display cinematic videos and my Doom fork (based on the original Doom source release) to upload the software rendered Doom frame to use with OpenGL.
I added support for drawing debug lines for model vertex normals (lighting direction). I’m still missing it for BSP drawing and other non-model primitives though.
I added fallback for decoding DXTn compressed images if the OpenGL driver does not support it (mobile hardware usually doesn’t support DXTn).
I did some OpenGL rendering optimization. I combined the GLSL shader text with defines for different variations (such as animation type) so it’s easier to modify. I used shader variations to disable things when not needed (clip plane, colorizing, alpha test, etc) to improve performance; this was a known issue, I just finally got around to it. I changed shader uniform values to only be updated when they change; it had way more impact on performance than I expected.
I looked at OpenGL function calls in apitrace which helped track down excessive settings of shader uniforms. It also showed that draw calls for text in the (Toy Box) menu was not combined very well. The text is drawn as opaque with a transparent black shadow. I did not combine opaque and forced transparent drawing (blending is slower). This lead to each menu item’s text and text shadow being a separate draw call, for every menu item. 20 text buttons? 40 draw calls (this slows down rendering).
The text image always uses alpha blending so I corrected it merge regardless of forced transparent when the image has alpha blending. Now 20 text buttons can be 1 draw call. It’s still not prefect: if the menu item has a checkbox (image) it breaks up merging text with the next menu item.
I looked at performance in Linux tool “perf” and found a couple arenas to make minor CPU optimization.
For a less than 1% CPU-time improvement, I changed setting vertexes lightmap texcoords to base texcoords to directly set the same value as base (
vert[i].st[ST_LIGHTMAP] = s) instead of reading back from base texcoords (
vert[i].st[ST_LIGHTMAP] = vert[i].st[ST_BASE]). It’s done for all vertexes, even though they are not using a lightmap. Though I may change it from “lightmap” to “filter” in the future for blending a texture over text.
For another less than 1% CPU-time improvement, I changed a lot of vertexes[base+i].xyz, etc to vert pointer in the functions looping through setting up all the vertexes for dynamic geometry (like 2D drawing and sprites).
Ideas for performance improvement don’t always work. My renderer was built around 16-bit vertex indexes because 32-bit indexes may not be supported by OpenGL ES. I thought desktop OpenGL might be optimized for 32-bit indexes so I add dual support for 16- and 32-bit vertex indexes, selected at run-time. For my GTX 750 Ti at least, it didn’t make a difference.
I started developing a game engine and OpenGL ES 2 renderer under the working title Clover’s Toy Box at the beginning of 2017. After 5 years I’ve finally caught up with the performance of the ioquake3/Spearmint opengl1 renderer. There is still a lot missing so I can’t quite brag about out performing a 20 year old renderer. It has been way more difficult than I expected though.
My renderer is still missing many features for Quake 3 (Quake 3 .shader files, lighting, wallmarks, sky, …) and I disabled some features in Toy Box renderer because they currently have very poor performance (curves and overbright post-process). I disabled some features in ioquake3/Spearmint to make it more comparable. There will be more performance comparison in the future—which is hopefully better documented—after more features are implemented.
So yes, these are preliminary tests that may be misrepresentative.
2008 Intel (ioquake3)
Due to life circumstances 2 years of the development was done a 2010 Dell Optiplex 780 with a 2008 Intel CPU with integrated graphics (OpenGL 2.1). Tested with 1280×1024 display because I’m too lazy to hook a 1080 monitor back up to it.
timedemo 1; demo four in ioquake3 and frames per-second of the initial spawn in
map tvy-bench (which has a lot of geometry). Higher frames per-second (FPS) is better.
12.1 seconds 104.5 fps – Toy Box renderer for ioquake3
14.5 seconds 86.9 fps – opengl1
43.3 seconds 29.1 fps – opengl2
30 FPS – Toy Box renderer for ioquake3
30 FPS – opengl1
16 FPS – opengl2
opengl2 = 0.3× speed of opengl1
Toy Box renderer = 1× to 1.2× speed of opengl1 (3× to 3.6× speed of opengl2)
ioquake3’s opengl2 renderer is known to perform better on NVIDIA and worse on Intel and AMD graphics. This comparison does not play to it’s strengths.
2014 NVIDIA (Spearmint)
I finally got a new CPU/motherboard this year so I could use my NVIDIA graphics card again. Ryzen 5800x + NVIDIA GTX 750 Ti at 1920×1080 resolution.
timedemo 1; demo four is 1.3 seconds 995~1000 FPS for both opengl1 and Toy Box renderer for ioquake3. (I did not include opengl2 in my notes for this test.)
I tested performance of Raph MD3 (vertex animation) and IQM (skeletal animation) in Turtle Arena running on Spearmint. Milliseconds to render the frame is listed but the minimum is 1 due to Spearmint engine limitations. Higher frames per-second (FPS) is better.
Turtle Arena (map team1) with 15 MD3 players:
1 millisecond – 1000 FPS – Toy Box renderer for Spearmint
4 milliseconds – 250 FPS – opengl2
7 milliseconds – 150 FPS – opengl1
Turtle Arena (map team1) with 15 IQM players:
1.1 milliseconds – 900 FPS – Toy Box renderer for Spearmint
5 milliseconds – 200 FPS – opengl2
8 milliseconds – 125 FPS – opengl1
Turtle Arena (map team1) with 63 MD3 players:
8 milliseconds – 125 FPS – Toy Box renderer for Spearmint
25 milliseconds – 40 FPS – opengl2
50 milliseconds – 20 FPS – opengl1
Turtle Arena (map team1) with 63 IQM players:
10 milliseconds – 100 FPS – Toy Box renderer for Spearmint
33 milliseconds – 30 FPS – opengl2
66 milliseconds – 15 FPS – opengl1
opengl2 = 2× speed of opengl1
Toy Box renderer = 6× speed of opengl1 (3× speed of opengl2)
It may be interesting to note: The Toy Box renderer still only uses OpenGL features supported by 2008 Intel graphics (Mesa i965 OpenGL driver). Though some of the features are part of OpenGL 4.x. There is still room for improvement using newer hardware features.
I made various model and image handling improvements. Saving images can convert between DXTn compression and uncompressed formats. The model loader intermediate state (used by many unoptimized formats) can be saved to IQE and OBJ. More work is needed for general model exporting support.
I added loading additional archive formats:
- Doom (.WAD)
- Sonic Robo Blast 2’s compressed .WAD (ZWAD, rarely used)
- Sin (.SIN)
- Daikatana (.PAK)
- Anachronox (.DAT)
I added loading additional model formats:
- Anachronox: .MD2
- Heavy Metal F.A.K.K.²: .TAN
- Return to Castle Wolfenstein: .MDC
- Inter-Quake Export: .IQE — animation incomplete
- Misfit Model 3D: .MM3D — material/animation incomplete
- Sonic R: .BIN — animation incomplete
- Doom level — very incomplete
I added loading additional image formats:
- DirectDraw Surface .DDS (DXTn only) — There is special handling for S3Quake3 and Iron Grip: Warlord DDS files.
- Doom graphics (flat, patch, DeePSea tall patches)
- Serious Sam 2 (demo) textures (DXTn, uncompressed)
- A.J. Freda’s .BUZ (used by Wizard, Roly Poly Putt, and SRBX but only Wizard is fully supported)
I also added saving .DDS, Doom flats, and Serious Sam 2 (demo) textures.
Anachronox .MD2 is similar to Quake 2 MD2 but adding support for it was quite involved. The only documentation that I could find online was incomplete and/or inaccurate. It has an odd vertex mode of 11-10-11 bit position encoding. I spent a lot of time trying to figure out the vertex normal encoding. In the end I created a 2048 table of vertex normal directions by using a custom Anachronox MD2 model in the Anachronox particle application with vertex normal debug lines enabled and used apitrace to get the line end points.
I tried to scan the Anachronox EXE for the floating point vertex normal table but all I could find was the Quake 2 MD2 vertex normal table so either it’s not stored as floating point or it’s generated somehow (X values clear goes from approximately -1 to 1 in the table but Y and Z values jump around not making sense to me). I think I was only able to complete Anachronox MD2 thanks to Anachronox’ particle application which displays model information (like 11-10-11 bit XYZ encoding) and vertex directions. It doesn’t make me excited to try figuring out other model formats but I’m doing it anyway (though currently less obscure formats).
Anachronox MD2 has “tagged surfaces” which are named triangles(?). I haven’t figured out what to do with them yet (are they a position/rotation like MD3 tags? or for applying effect to the mesh?).
Clover Resource Utility
Clover Resource Utility (a Qt 5 application) gained High DPI support (for PinePhone), checkered grey background visible for transparent images, and work to keep up with other development (new BSP rendering, option to display vertex normals).
I also fixed a long running issue that resizing the window when viewing an image would turn the image black. The framebuffer object (FBO) recreation on window resize set the bound image to 0 without updating my OpenGL state tracker so my renderer thought the displayed image was still bound. Instead 0 (no image) was used.
I don’t know why it took so many years to think of how to make it easier to pass vec3 as separate arguments without copying and pasting it a bunch of times.
#define VEC3_ARGS( x ) (x), (x), (x) #define QUAT_ARGS( x ) (x), (x), (x), (x) Com_Printf( "Translate %f,%f,%f, Rotate %f,%f,%f,%f, Scale %f,%f,%f.\n", VEC3_ARGS( joint.translate ), QUAT_ARGS( joint.rotate ), VEC3_ARGS( joint.scale ) );
I don’t really have plans I’m very focused on right now. I started adding support for additional model formats. I still want to eventually reach feature parity with Spearmint opengl1 renderer. I may also try to work on some other area this year.
March 1, 2022: I corrected the article to state the renderer has been under development for 5 years, not 4 years as originally written.