Clover’s Toy Box development: improved performance 650%*, fixed curved surfaces, mirrors/portals, and large levels.
* 650% improvement in one level on one set of hardware.
Performance
It took a couple years to get my reimplementation of the Quake 3 renderer (“Toy Box renderer for Spearmint”) to the performance of the ioquake3/Spearmint “opengl1” renderer. It very challenging to meet the performance despite using more modern OpenGL features and it was still missing many features. I also had to disable curved surfaces for it to be faster.
My renderer fell behind again after upgrading to new better hardware and adding additional features (notably Quake 3 materials).
Most of the official Quake 3 maps ran at 1000 frames per-second (FPS). The slowest case that I was aware of was at the center of the Quake 3 add-on level ct3ctf2. It ran at only 100 FPS. Using the Spearmint opengl1 renderer on ct3ctf2 is somewhere between 500 and 666 FPS and the OpenGL2 renderer is somewhere between 800 and 1000 FPS. That’s kind of disappointing for my renderer. (Higher frames per-second is better.)
After making several changes I’ve clawed my way from 100 FPS to 650 FPS at the center of ct3ctf2. A 650% improvement. (These changes will improve other levels as well but it’s not 650% everywhere.) This is hopefully only the tip of the performance ice burg.
The main goal for improving the frame rate is being able to render more content at a lower frame rate or use less power to render the same content (which is particularly relevant for mobile devices).
BeginPerformanceQuery
I reviewed what it was drawing for the ct3ctf2 level. It was uploading 300,000 vertexes and issuing 1,000 draw calls per-frame at 100 FPS.
The curved surfaces were using materials that required using the CPU vertex shader and uploaded 3x vertexes (for each material layer) each frame. Additionally the surface order interleaved flat surfaces (using static vertex buffer) and curves (using dynamic vertex buffer). This resulted in not merging them into the same draw call despite using the same materials.
Materials had for example a base texture, environment map (using CPU vertex shader), and then standalone lightmap (using CPU vertex shader).
1. BSP surface merging
I changed the Quake 3 level surface sorting to sort flat (static vertex buffer) and curved (dynamic vertex buffer) surfaces together so they can be merged into single flat or curve draw calls.
2. Standalone lightmap
Lightmaps are often merged with an adjacent layer to use multi-texture. Both layers are drawn in a single draw call. However multiply blended lightmaps cannot be merged with the additive blended environment map and some other effects. This separate lightmap layer used the CPU vertex shader to copy the lightmap texture coordinate from the multi-texture slot to the base slot.
I could do the same thing in modern OpenGL shaders by adding a texcoord source or additional shader type (both that have more over head). For OpenGL 1.x fixed-function, I would potentially need to rebind the vertex attributes per draw call for the world. It seemed like a mess to do both of these in the same backend.
Instead I made standalone lightmap layers use multi-texture with a white base image. No CPU vertex shader needed or complicated re-architecture the backend. (OpenGL 1.2 is needed for multi-texture so OpenGL 1.1 still falls back to the CPU vertex shader.)
3. Duplicate vertexes
The CPU vertex shader has to add draw calls for each layer. This added vertexes for each layer with a TODO for separating vertex upload from layers. As a quick solution, I made layers that do not require the CPU vertex shader share the same unmodified vertexes. This allowed the base and standalone lightmap layers to use the same vertexes.
4. Hardware tcGen
In the previous article on adding Quake 3 material support I talked about hardware texture coordinate generation and how it can be done in hardware. I added support to OpenGL shaders so it doesn’t need the CPU vertex shader. (I haven’t implemented it for OpenGL fixed-function and Wii yet.)
5. View frustum culling
I’ve supported Quake 3 level’s Potentially Visible Set for a while to drop rendering areas of the map. However this doesn’t work well for large open area.
I added view frustum culling to drop drawing level surfaces and models that are not in front of the camera within the view angle. I wrote most of this code in 2021 or earlier but I didn’t merge it due to a rendering issue that was apparently already resolved since then (a few specific surfaces in some maps disappeared when clearly on screen).
6. Curve tessellation
The curved surfaces in the Quake 3 level formats are Bézier curves that have 3 by 3 patch of control points to define the mathematical curve. I convert it to triangles (tessellate) in order to draw it.
I originally implement it as tessellating it each frame and at a fixed number of triangles; even if say, it’s flat square and only needs 2 triangles. I knew it was slow but it seemed easier at the time when I was trying to get it to work at all. There was an issue that some vertical rounded corners had incorrect lightmap texture coordinates. I tried unsuccessfully to fix it two or three times over the last 3 years. Improving performance was kind of on hold for this reason.
I found the lightmap texture coordinate issue while experimenting with limiting the number of rows/columns. It turns out Quake 3 levels have invalid lightmap texcoords if the control points are not equal distance apart. The vertical rounded corners are flat vertically and rounded horizontally. Quake 3 doesn’t add any rows; it’s just single triangles from the top to bottom. It doesn’t use the middle row of control points with invalid lightmap texture coordinates.
I completely rewrote the tessellation code. Curves are now tessellated at level load instead of each frame. The curves are now subdivided based on how curved it is (less triangles in most cases). Curves could have gaps between touching patches due to how they’re subdivided now. Curves are now stitched together by detect common edges and adding rows and columns to adjacent patches.
Curves are now faster to draw and match Quake 3 visually. (Though I haven’t added dynamic lower detail far away yet.)
EndPerformanceQuery
The center of ct3ctf2 has moved from uploading 300,000 vertexes per-frame to only 3,500 vertexes and from 1,000 draw calls to 500 draw calls. 100 FPS to now 650 FPS. I still have more ideas for improving performance but I got sidetracked on adding features again.
Mirrors and Portals
I previously added mirror and portal rendering support in Toy Box but it has fallen into disrepair. I hadn’t ever hooked it up for Quake 3 maps or “Toy Box renderer for Spearmint”.
I fixed the mirrors in Toy Box to handle framebuffer object support that was added ages ago and fixed OpenGL 1 clip plane rotating based on whatever GL_MODELVIEW matrix was previously set.
However remember that new view frustum culling? Yeah, the culling for the main view applied to the mirror views so mirrors didn’t draw anything behind the player. Sprites also faced the main view instead of the mirror view. I was in mirror hell for a month and a half. I didn’t want to work on it for whatever reason but felt like I shouldn’t do something else so I just didn’t work on Toy Box very much as a result.
When the level model was added to the scene it immediately performed culling and added entities for the visible geometry. I was able to add mirror/portal entities here.
In my mirror system, it drew a model the stencil buffer and only drew the mirror view in the marked area. (This allows multiple mirrors in the main view without issues.) I was hung up for a while with how to draw the surface for Quake 3 mirrors. As a initial hack I just used the Quake 3 explosion model (it’s just a square) so I could continue working on it.
If I add the mirror/portal after the CPU vertex shader processes the material, it may have the wrong surface normal for the camera. So the mirror needs to use the source surface. However the material could move around so limiting the mirror to the source surface area is not correct. Quake 3 only draws one mirror/portal view and it draws on the whole screen and then draw the level over it. That’s ultimately what I decided to do. It’s essentially what I was doing with the square model scaled up but with less steps. This allowed mirrors to work but with the wrong view culling and sprite orientation.
I changed adding the level model to just create an entity with the information and later when processing entities for a scene actually add the entities for the visible geometry and mirrors/portal surfaces. This was not entirely straight forward but it had been my on TODO list for a while.
After this I made processing entities for a scene end with looping through the mirrors and add mirror view entities with have their own list of entities to draw. This included culling and added level models and generating sprite vertexes for the mirror view.
In Quake 3 levels mirror/portal surfaces do not directly specific the destination view. This is specified by the game code at run-time but it doesn’t directly specific which surface it’s for. How/when to connect this was logistic problem. However the bigger problem is it just specific a vector for the view directory and a bunch of options for how to set up the view axis (even though it literally passes the view direction in a view axis).
I still haven’t entirely implemented it. One option is the roll for the camera and it’s mainly used to just fix Quake 3 being terrible at setting it correctly. So currently there is an inconsistency in an add-on level that the view in upside-down in Quake 3 but right-sideup in Toy Box (this doesn’t look like it’s intentional upside-down).
This was working pretty well but things were getting unexpected culled in mirrors. I thought it might be a problem with like the matrix math for modifying the mirror view by the main view or the view frustum for culling. I spent a fair amount trying to debug it by drawing the camera location. This was a annoying problem of how do you draw the mirror view location in the main view and vice versa when I don’t easily have access to it with how this is structured? Two static variables and flipping the which you set and read.
However this wasn’t the problem at all. It turn out I was using the mirror camera location modified by the current main view location for the Quake 3 level’s Potentially Visible Set and it moved through walls and obscured parts of the level and so they were not drawn. Using the actual mirror location solved the issue.
I added support for only drawing models in the main view or in portal views so that Quake 3 player models draw in mirrors when using first person model. Now I’m just disappointed I went though all this work and Quake 3 only has like 7 unique mirrors/portals in it.
Whatever, I’m out of mirror hell for now.
Large level support
Rendering has a maximum distance. I set it fairly high but it cuts off some large levels (such as Quake 3: Team Arena mpterra[1-3] maps) and q3map2 _skybox entity. Setting it higher reduces depth precision and (for Quake 3 support) I don’t have control or ability to review all of the content to set the max depth distance to fit the content.
I added dynamic depth near and far plane by tracking the bounds of the 3D scene and then calculating the minimum and maximum distance from the camera. I haven’t added bounds tracking for all render commands yet. Though I also need it for adding view frustum culling for everything.
I had to rework the skybox drawing as it needs to have the geometry inside the max depth but also be depth value farther than everything else. I use glDepthRange( 1, 1 ) to set it to the max depth value and expand the scene bounds to include the skybox size. Though recently there seems to be some issues in mirrors. (Mirror hell doesn’t end.)