Development – Clover.moe https://clover.moe Sat, 12 Oct 2024 00:39:42 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://clover.moe/wp-content/uploads/2016/04/20151213_Clover_favicon-150x150.png Development – Clover.moe https://clover.moe 32 32 Open Source Status https://clover.moe/2024/10/01/open-source-status/ Tue, 01 Oct 2024 07:26:41 +0000 https://clover.moe/?p=1855 I’m going to do my own things in my own time rather than acting as an open source software maintainer.

I’ve completed most of the things I want to do in my open source projects (Spearmint, Maverick, etc). There was of course a lot of other ideas and many are now redirected at Clover’s Toy Box. At this point I mainly work on Maverick and Quake 3 related open source projects for the sake of other people. While I like assisting, it’s not satisfying and can be stressful.

I’m withdrawing from providing user support and fulfilling requests for open source projects. I’m not interested in continuing to discuss these projects. I’ve made the Clover.moe Community Discord server be read-only after giving a month notice. I’ve thought about this for some time; I decided to do this in March.

I will continue to contribute to Quake 3 open source projects if it’s something I personally want. There is still a few loose ends I want to deal with and I may find other things in the future. So it’s not entirely the end of me working on these projects.

]]>
Clover’s Toy Box 24.09 https://clover.moe/2024/09/29/clovers-toy-box-24-09/ Sun, 29 Sep 2024 09:40:24 +0000 https://clover.moe/?p=1847 Clover’s Toy Box development: improved performance 650%*, fixed curved surfaces, mirrors/portals, and large levels.

* 650% improvement in one level on one set of hardware.

Performance

It took a couple years to get my reimplementation of the Quake 3 renderer (“Toy Box renderer for Spearmint”) to the performance of the ioquake3/Spearmint “opengl1” renderer. It very challenging to meet the performance despite using more modern OpenGL features and it was still missing many features. I also had to disable curved surfaces for it to be faster.

My renderer fell behind again after upgrading to new better hardware and adding additional features (notably Quake 3 materials).

Most of the official Quake 3 maps ran at 1000 frames per-second (FPS). The slowest case that I was aware of was at the center of the Quake 3 add-on level ct3ctf2. It ran at only 100 FPS. Using the Spearmint opengl1 renderer on ct3ctf2 is somewhere between 500 and 666 FPS and the OpenGL2 renderer is somewhere between 800 and 1000 FPS. That’s kind of disappointing for my renderer. (Higher frames per-second is better.)

After making several changes I’ve clawed my way from 100 FPS to 650 FPS at the center of ct3ctf2. A 650% improvement. (These changes will improve other levels as well but it’s not 650% everywhere.) This is hopefully only the tip of the performance ice burg.

The main goal for improving the frame rate is being able to render more content at a lower frame rate or use less power to render the same content (which is particularly relevant for mobile devices).

BeginPerformanceQuery

I reviewed what it was drawing for the ct3ctf2 level. It was uploading 300,000 vertexes and issuing 1,000 draw calls per-frame at 100 FPS.

The curved surfaces were using materials that required using the CPU vertex shader and uploaded 3x vertexes (for each material layer) each frame. Additionally the surface order interleaved flat surfaces (using static vertex buffer) and curves (using dynamic vertex buffer). This resulted in not merging them into the same draw call despite using the same materials.

Materials had for example a base texture, environment map (using CPU vertex shader), and then standalone lightmap (using CPU vertex shader).

1. BSP surface merging

I changed the Quake 3 level surface sorting to sort flat (static vertex buffer) and curved (dynamic vertex buffer) surfaces together so they can be merged into single flat or curve draw calls.

2. Standalone lightmap

Lightmaps are often merged with an adjacent layer to use multi-texture. Both layers are drawn in a single draw call. However multiply blended lightmaps cannot be merged with the additive blended environment map and some other effects. This separate lightmap layer used the CPU vertex shader to copy the lightmap texture coordinate from the multi-texture slot to the base slot.

I could do the same thing in modern OpenGL shaders by adding a texcoord source or additional shader type (both that have more over head). For OpenGL 1.x fixed-function, I would potentially need to rebind the vertex attributes per draw call for the world. It seemed like a mess to do both of these in the same backend.

Instead I made standalone lightmap layers use multi-texture with a white base image. No CPU vertex shader needed or complicated re-architecture the backend. (OpenGL 1.2 is needed for multi-texture so OpenGL 1.1 still falls back to the CPU vertex shader.)

3. Duplicate vertexes

The CPU vertex shader has to add draw calls for each layer. This added vertexes for each layer with a TODO for separating vertex upload from layers. As a quick solution, I made layers that do not require the CPU vertex shader share the same unmodified vertexes. This allowed the base and standalone lightmap layers to use the same vertexes.

4. Hardware tcGen

In the previous article on adding Quake 3 material support I talked about hardware texture coordinate generation and how it can be done in hardware. I added support to OpenGL shaders so it doesn’t need the CPU vertex shader. (I haven’t implemented it for OpenGL fixed-function and Wii yet.)

5. View frustum culling

I’ve supported Quake 3 level’s Potentially Visible Set for a while to drop rendering areas of the map. However this doesn’t work well for large open area.

I added view frustum culling to drop drawing level surfaces and models that are not in front of the camera within the view angle. I wrote most of this code in 2021 or earlier but I didn’t merge it due to a rendering issue that was apparently already resolved since then (a few specific surfaces in some maps disappeared when clearly on screen).

6. Curve tessellation

The curved surfaces in the Quake 3 level formats are Bézier curves that have 3 by 3 patch of control points to define the mathematical curve. I convert it to triangles (tessellate) in order to draw it.

I originally implement it as tessellating it each frame and at a fixed number of triangles; even if say, it’s flat square and only needs 2 triangles. I knew it was slow but it seemed easier at the time when I was trying to get it to work at all. There was an issue that some vertical rounded corners had incorrect lightmap texture coordinates. I tried unsuccessfully to fix it two or three times over the last 3 years. Improving performance was kind of on hold for this reason.

I found the lightmap texture coordinate issue while experimenting with limiting the number of rows/columns. It turns out Quake 3 levels have invalid lightmap texcoords if the control points are not equal distance apart. The vertical rounded corners are flat vertically and rounded horizontally. Quake 3 doesn’t add any rows; it’s just single triangles from the top to bottom. It doesn’t use the middle row of control points with invalid lightmap texture coordinates.

I completely rewrote the tessellation code. Curves are now tessellated at level load instead of each frame. The curves are now subdivided based on how curved it is (less triangles in most cases). Curves could have gaps between touching patches due to how they’re subdivided now. Curves are now stitched together by detect common edges and adding rows and columns to adjacent patches.

Curves are now faster to draw and match Quake 3 visually. (Though I haven’t added dynamic lower detail far away yet.)

EndPerformanceQuery

The center of ct3ctf2 has moved from uploading 300,000 vertexes per-frame to only 3,500 vertexes and from 1,000 draw calls to 500 draw calls. 100 FPS to now 650 FPS. I still have more ideas for improving performance but I got sidetracked on adding features again.

Mirrors and Portals

I previously added mirror and portal rendering support in Toy Box but it has fallen into disrepair. I hadn’t ever hooked it up for Quake 3 maps or “Toy Box renderer for Spearmint”.

I fixed the mirrors in Toy Box to handle framebuffer object support that was added ages ago and fixed OpenGL 1 clip plane rotating based on whatever GL_MODELVIEW matrix was previously set.

However remember that new view frustum culling? Yeah, the culling for the main view applied to the mirror views so mirrors didn’t draw anything behind the player. Sprites also faced the main view instead of the mirror view. I was in mirror hell for a month and a half. I didn’t want to work on it for whatever reason but felt like I shouldn’t do something else so I just didn’t work on Toy Box very much as a result.

(If this show up in Inksie’s analytics, this is just a game programming blog where I talk to myself that isn’t profitable and I really like your shorts. And maybe mirrors are the problem.)

a: want to know who is to blame for your low self-esteem?
b: yes!
a: close your eyes and I’ll show you.
a: open.
a: that’s right-
b: mirrors!
a: no.

When the level model was added to the scene it immediately performed culling and added entities for the visible geometry. I was able to add mirror/portal entities here.

In my mirror system, it drew a model the stencil buffer and only drew the mirror view in the marked area. (This allows multiple mirrors in the main view without issues.) I was hung up for a while with how to draw the surface for Quake 3 mirrors. As a initial hack I just used the Quake 3 explosion model (it’s just a square) so I could continue working on it.

If I add the mirror/portal after the CPU vertex shader processes the material, it may have the wrong surface normal for the camera. So the mirror needs to use the source surface. However the material could move around so limiting the mirror to the source surface area is not correct. Quake 3 only draws one mirror/portal view and it draws on the whole screen and then draw the level over it. That’s ultimately what I decided to do. It’s essentially what I was doing with the square model scaled up but with less steps. This allowed mirrors to work but with the wrong view culling and sprite orientation.

I changed adding the level model to just create an entity with the information and later when processing entities for a scene actually add the entities for the visible geometry and mirrors/portal surfaces. This was not entirely straight forward but it had been my on TODO list for a while.

After this I made processing entities for a scene end with looping through the mirrors and add mirror view entities with have their own list of entities to draw. This included culling and added level models and generating sprite vertexes for the mirror view.

In Quake 3 levels mirror/portal surfaces do not directly specific the destination view. This is specified by the game code at run-time but it doesn’t directly specific which surface it’s for. How/when to connect this was logistic problem. However the bigger problem is it just specific a vector for the view directory and a bunch of options for how to set up the view axis (even though it literally passes the view direction in a view axis).

I still haven’t entirely implemented it. One option is the roll for the camera and it’s mainly used to just fix Quake 3 being terrible at setting it correctly. So currently there is an inconsistency in an add-on level that the view in upside-down in Quake 3 but right-sideup in Toy Box (this doesn’t look like it’s intentional upside-down).

This was working pretty well but things were getting unexpected culled in mirrors. I thought it might be a problem with like the matrix math for modifying the mirror view by the main view or the view frustum for culling. I spent a fair amount trying to debug it by drawing the camera location. This was a annoying problem of how do you draw the mirror view location in the main view and vice versa when I don’t easily have access to it with how this is structured? Two static variables and flipping the which you set and read.

However this wasn’t the problem at all. It turn out I was using the mirror camera location modified by the current main view location for the Quake 3 level’s Potentially Visible Set and it moved through walls and obscured parts of the level and so they were not drawn. Using the actual mirror location solved the issue.

I added support for only drawing models in the main view or in portal views so that Quake 3 player models draw in mirrors when using first person model. Now I’m just disappointed I went though all this work and Quake 3 only has like 7 unique mirrors/portals in it.

Whatever, I’m out of mirror hell for now.

Large level support

Rendering has a maximum distance. I set it fairly high but it cuts off some large levels (such as Quake 3: Team Arena mpterra[1-3] maps) and q3map2 _skybox entity. Setting it higher reduces depth precision and (for Quake 3 support) I don’t have control or ability to review all of the content to set the max depth distance to fit the content.

I added dynamic depth near and far plane by tracking the bounds of the 3D scene and then calculating the minimum and maximum distance from the camera. I haven’t added bounds tracking for all render commands yet. Though I also need it for adding view frustum culling for everything.

I had to rework the skybox drawing as it needs to have the geometry inside the max depth but also be depth value farther than everything else. I use glDepthRange( 1, 1 ) to set it to the max depth value and expand the scene bounds to include the skybox size. Though recently there seems to be some issues in mirrors. (Mirror hell doesn’t end.)

]]>
Clover’s Toy Box 24.05 https://clover.moe/2024/05/28/clovers-toy-box-24-05/ Wed, 29 May 2024 04:31:18 +0000 https://clover.moe/?p=1658 Clover’s Toy Box development: adding material support and changing future plans.

Background: Clover’s Toy Box (2017—) is my private 3D game engine from scratch project. It’s a continuation of ideas for Spearmint (2008—), my enhanced version of the Quake 3 (1999) engine. I cut my teeth modifying SRB2 / the Doom (1993) engine in 2006-2008. I’ve done computer programming for about 18 years.

Materials

About a year ago I started working on adding support for Quake 3 materials (*.shader files) to Clover’s Toy Box and Toy Box renderer for Spearmint (my private reimplementation of the Spearmint renderer using Toy Box).

Quake 3 materials define how to draw an image on a 3D model, game level surface, or in the menu. They allow for multiple blended images, animated image sequences, and dynamic effects such as scrolling an image, flashing color, or changing the position of the surface.

It’s difficult to implement the Quake 3 material system as there isn’t a complete definition of how it works, it’s complicated, and it interacts with all rendering. It kind of spiraled out into implement or fix all the rest of Quake 3 renderer features in Toy Box renderer for Spearmint.

Most of the Quake 3 material features are supported by Toy Box now. It’s missing the sky dome, fog, and a few position modifiers (deformVertexes). It has some of the additions made in Spearmint but I haven’t focused on it. However there is still many issues for rendering Quake 3 that are not directly part of the material system.

My Toy Box renderer—including the implementation of the Quake 3 material system—runs on OpenGL, OpenGL ES, and WebGL as well as the Wii console. Though the Wii console runs out of memory loading level textures (only 88 MB of RAM) and it’s missing an implementation of polygonOffset for preventing decals from flickering.

It runs on modern and legacy OpenGL. (OpenGL is a programming interface for hardware accelerated graphics rendering on a graphics card.) Toy Box is compatible with modern OpenGL 3.2+ Core Profile and streams geometry using OpenGL 4.4 persistent mapped buffers. I’m particularly proud (amused?) of the Quake 3 material system being fully implemented on legacy OpenGL 1.1 which Quake 3 supported in 1999.

(I also fixed ioquake3 and Spearmint to fully support a sky box using OpenGL 1.1 which may be useful for some Intel graphics under Windows 10 stuck with Microsoft’s generic OpenGL 1.1 driver.)

Implementation

Some parts of the material system were straight forward to add. Others not so much. I spent a lot of time testing Quake 3 format levels (official, add-ons, and other games). I found issues and made a list of them to look into. Looking into issues often found it was a different manifestation of a known issue that I hadn’t fixed yet.

I have a list of like 50 issues and not everything made it onto the list. It’s honestly not that exciting to recount. Things were broken and then I fixed them.

The material system can kind of be broken down into five categories:

  • Material file parsing, implicit keyword behavior, and sort order.
  • Changing OpenGL rendering state.
  • Changing vertex attributes (position, normal, color, texture coordinates).
  • Changing vertex attributes but behavior very specific to Quake 3.
  • Special handling for the sky and fog volumes.

Material parsing

The Toy Box Quake 3 material loader handles various implicit behavior. If a material has “rgbGen vertex”, it default to using “alphaGen vertex”. However “rgbGen exactVertex” uses opaque alpha which is inconsistent. Using image blending disables writing depth which is to prevent surfaces behind it drawing over it. There is several things that affect the implicit sort order. It’s just a lot of random stuff to fill out the full Quake 3 material definitions correctly.

The Quake 3 material definitions are converted to a separate material system that has some additional features and different implementation of some features. My intention is to create a new material file format that doesn’t depend on Quake 3’s implicit behavior.

OpenGL state

Changing the OpenGL rendering state was mostly straight forward to add. Many of the material keywords are easy to infer how they directly map to the OpenGL API. Face culling, blend modes, alpha test, depth test, depth write, and polygon offset.

Though colors being floating-point (i.e., 1.0) in the material definition and converted to an 8-bit integer (i.e., 255) when it’s loaded was not obvious and affected alpha testing for conditional transparency.

A material for an unofficial Quake 3 add-on level (xccc_dm4) specified using alpha 0.5 and set the alpha test to greater-or-equal to 0.5. This caused transparency to flicker because OpenGL rendering doesn’t have perfect precision and alpha values vary slightly above and below 0.5.

Quake 3 converts the alpha 0.5 to 8-bit integer (0.5 × 255 = 127.5) and rounds it to 127 and then the OpenGL driver compares 127 ÷ 255 as a floating-point value 0.498039216 with alpha test reference 0.5. This way there is no flickering. Though it doesn’t draw at all as the alpha is always lower than 0.5. This is apparently what the creator intended as it’s not visible in a video they made of the level either.

So I convert the floating-point value from the Quake 3 material to an 8-bit integer and then back to a floating-point value as that’s what I’m using for colors in Toy Box.

Vertex attributes

There are vertex attribute generators/modifiers for vertex position, normal (direction), texture coordinates, and color. I implemented all of them in the CPU vertex shader (it’s just a function that outputs new vertexes). This way they work on all graphics APIs (OpenGL fixed-function and shaders, Wii, …).

The CPU vertex shader which I started quite some time ago (see Toy Box 22.04 § “OpenGL 1.1 fixed-function rendering”) had to be expanded to allow for multiple material layers with separate generated/modified vertex attributes and to support many new effects.

I also finally replaced the initial CPU vertex shader specific to OpenGL 1.x (low-level, non-VBO compatible) with the API-independent code. Now OpenGL fixed-function rendering can use persistent mapped buffers and vertex buffer objects.

I mark material layers for which attributes need to be processed on the CPU and then tell the GPU to just use the submitted values for that attribute. This allows mixing CPU and GPU vertex attribute handling. For example, processing the position modifiers on the CPU and using the GPU to apply the texture matrix for texture coordinate modifiers.

Most of the Quake 3 position modifiers need to be applied per-vertex (opposed to a matrix multiply). This is always done in the CPU vertex shader as it isn’t compatible with graphics APIs for the Wii or OpenGL 1. They could be implemented using OpenGL 2 GLSL shaders but it’s very specific to each position modifier. I’m trying to keep the graphics backends kind of generic.

Currently if any attribute requires the CPU vertex shader it results in animating the model on the CPU instead of using OpenGL 2 GPU skeletal or frame animation. This is kind of a disappointment for “skeletal models with mixed CPU and GPU vertex attributes” as the performance is probably largely affected by animation. Though I haven’t considered what cases it would be beneficial to optimize this for; a lot of the CPU-only effects use the vertex position.

Vertex colors

I split up some of the Quake 3 material keywords so they could be handled in a more general way.

Spearmint has twelve RGB vertex color generators. In Toy Box the color generators are split up into four base color generators (white, vertex, one minus vertex, model lighting) and five color modifiers (constant color, waveform for color intensity, entity color, one minus entity color, and underbright to counteract Quake 3 scene overbright).

This replaces having lightingDiffuse and lightingDiffuseEntity for model lighting without and with entity color, and const, wave, and colorWave for white with constant color, a waveform for color intensity, and both.

It’s easy to tell the Wii to use white or vertex color but one minus vertex and model lighting use the CPU vertex shader to generate the color arrays. The color modifiers can all be combined and applied using the Wii GPU. Most of the color generators are GPU accelerated on the Wii. One minus vertex which is rarely used (I haven’t actually checked if it’s used by anything). Eventually I need to figure out GPU model lighting for supporting normal maps (per-pixel light direction).

OpenGL 2 GLSL can do all of the base types + apply the color modifier on the GPU. Though I currently have one minus vertex disabled because I’m not sure it’s worth supporting it in the backend.

OpenGL 1 can only do white and vertex color (like the Wii) but it can only apply color modifiers on the GPU if it uses white and not vertex colors or model lighting. So the renderer just marks the material layers using vertex colors or model lighting to use the CPU vertex shader for color modifiers. Model lighting already uses the CPU vertex shader so it doesn’t really affect much.

Vertex texture coordinates

Spearmint has six texture coordinate generators (base texture, lightmap, vector w/ matrix, cel-shading, two types of reflection-mapping) and in Toy Box they’re split up into a five base generators (base texture, lightmap, position, normal, and reflect) and a matrix if needed. Reflect is shared by the two reflection mapping methods for Quake 3 and Raven Software’s Quake 3 based games.

This doesn’t reduce a lot but it does make it easier to implement GPU support as a base type + texture matrix. I haven’t added them yet but OpenGL 1.3 fixed-function and 2.0 GLSL can support all of them. The Wii has GPU support for all generators except reflect.

I tested generating reflect vectors and storing them in the vertex normals on OpenGL for using reflect via the normal generator and it works. This may be a way for the Wii to utilize CPU reflect with GPU texture matrix. Eventually Wii can have at least partial GPU support for all of the texture coordinate generators.

Vertex attribute conclusion

The Toy Box material handling is being designed around GPU color multiply and texture matrix multiply. Quake 3 has more specific handling for combining some generators and modifiers to allow for better CPU optimization. Scrolling a texture only needs two additions per-vertex, not a full matrix multiply.

My understanding is that the Wii always does a texture matrix multiply. It can be set to an identity matrix but it can’t be disabled. So it may not require extra processing to use a texture matrix multiply aside from the data transfer of the matrix to the GPU.

Very specific behavior

Some of the material keywords are very specific to Quake 3 and it would be very difficult to accurately remake them without basing it on the math from Quake 3. Math can be patented but not copyrighted. So it seems that using these equations would not be copyright infringement. Though they aren’t really necessary for creating a new game and I’d probably leave them out of a new game.

Keywords such as tcMod turb for turbulent lava texture scroll and alphaGen lightingSpecular for slightly non-standard specular with a hard coded light origin. I haven’t decided if I should try to remake the sky dome texture coordinate generator and so the sky dome is currently missing. Quake 3 uses these on basically every official level.

Future plans

I had thought about releasing closed-source commercial software (content tools, game engine) compatible with Quake 3 data formats as a way to try to monetize Toy Box. Though it’s very unrealistic to generate the level of revenue I would like it to and it would probably just become a burden.

I’m no longer planning to release any software based on Toy Box supporting Quake 3 data formats. I’m not planning to pursue selling or releasing a content application or game engine. I’m also not planning to reprogram and release Turtle Arena (without bots) on Toy Box.

I’m satisfied with what I’ve accomplished in Clover’s Toy Box so far and I intend to continue developing it. Currently the only planned software to release based on Toy Box is original games (which I don’t actually have plans for).

I want to move in the direction of focusing on art with the end goal being images and videos instead of a full game. Though that’s not necessarily related to Toy Box and there is a lot of stuff I need to deal with before that.

]]>
Open Source in 2023 https://clover.moe/2024/01/02/open-source-in-2023/ Wed, 03 Jan 2024 02:59:06 +0000 https://clover.moe/?p=1582 My open source software contributions in 2023 to SDL, Quake 3 based games, and Maverick Model 3D.

SDL

Simple DirectMedia Layer is a cross-platform abstraction layer for window creation, graphics initialize, audio, input, etc that is used by most games on Linux. (SDL source code)

I mentioned custom window decoration “hit testing” resizing issues at the end of my Toy Box on Wayland post. Someone fixed Linux (X11 and Wayland) to display cursors when hovering over resizable areas. It assumed the cursors were double arrows like on Windows and KDE. Resize left and right both had a single arrow pointing right on GNOME. I fixed it to have the correct cursors on GNOME using 8 cursors with each resize direction.

I helped explain an issue with Nintendo GameCube Adapter controller mapping so it could be fixed on Windows and Android.

I try to read most of the SDL 3 commits by following an RSS feed. I pointed out a few minor issues as commit comments.

ioquake3

ioquake3 is a project to maintain and extend the source code for the 1999 first-person shooter Quake III Arena. (ioq3 source code)

I made 25 posts to help solve issues on the ioquake3 forum.

Platform

I updated SDL libraries for Windows and macOS from SDL 2.0.14 to SDL 2.24.0. I cross-compiled mingw-w64 and macOS libraries on Linux. I did have to use Windows to build the MSVC libraries.

I made it so macOS can be built separate for “legacy” Universal Bundle (x86/x86_64/powerpc) and “modern” Universal Bundle 2 (x86_64/Apple Sillicon). Legacy macOS App Bundle was updated to SDL 2.0.22 as newer versions (jumping to SDL 2.24.0) dropped support for macOS 10.6. (Building for Apple Sillicon was added by Tom Kidd in 2022.)

I fixed ioquake3 failing to start from macOS terminal due to how the URI scheme handler support was added.

I updated the Windows “NSIS installer” script. ioquake3 and games based on it tend to just use a .zip file instead of an installer. It would be useful to use the installer as it adds integration for the “quake3://connect/127.0.0.1” URI scheme handler to Windows.

QVM

QVM (Quake Virtual Machine) is a cross-platform bytecode format used by Quake 3 that allows game modifications to be compiled on one platform and run on all platforms. (A similar recent technology being WebAssembly.)

I fixed compiling QVMs on Linux if the source code has Windows line endings. MS-DOS used CR+LF to end lines in text files (Carriage return, line feed; commands for a text printer). However Unix-like platforms used only LF. The Windows file API in text mode automatically reads CR+LF as LF but other platforms did not and caused the QVM tools to fail with strange syntax errors.

I made it so QVMs are compiled on all platforms by default even if they do not have a run-time QVM just-in-time compiler. The QVM interpreter works on all platforms. This made QVMs be built on Linux/macOS ARM64 by default.

OpenGL2 renderer

Compared to the opengl1 renderer—which closely resembles the original Quake 3 renderer—the ioquake3 OpenGL2 render has a lot of issues and (in my opinion) poor design decisions. I think it’s too much work to fix it. However when prompted about issues, it’s not that I can’t fix them…

I fixed the edge border for smaller view size (cg_viewsize) being drawn to an framebuffer object and not blit to the screen when using HDR or framebuffer multisample anti-alias with post-process. This was fixed to draw to the screen directly like other 2D drawing.

I fixed updating the loading screen with r_cubeMapping 1. The changes for the previous issue caused Quake 3’s drawing all images to the screen—to ensure the driver loads them—to be visible. However it shouldn’t be visible because the loading screen draws over it. It turns out generating cube maps for the level left the culling state set to hiding the front of polygons and the loading screen was culled until the level is loaded to reset the culling state. This was broken for years but it seemed like it was just really fast after loading the level.

I fixed the developer option to clear the screen to magenta (r_clear 1) before rendering when using HDR or framebuffer multisample anti-alias. This makes it obvious when some part of the screen isn’t drawn or is transparent and the previous frame is visible (hall of mirrors effect or random flashing between two different frames; it’s just bad).

I fixed framebuffer multisample anti-alias on AMD Windows driver. The driver incorrectly requires GL_EXT_direct_state_access extension to bind a renderbuffer for it to be valid. This should only be required for OpenGL Core context direct state access or GL_ARB_direct_state_access extension. Though yes, ioquake3 should probably stop using the EXT extension in an OpenGL Core context.

I fixed q3map2 lightstyles effects for dynamic pulsing lightmaps with r_mergeLightmaps 1 and r_sunlightMode 1.

r_mergeLightmaps 1 combines internal 128×128 pixel lightmaps into a larger atlas and modifies the geometry texture concordances. This broke materials using both internal and external lightmap and materials using texture concordance offset with internal lightmaps (two obscure things I didn’t know q3map2 did). I corrected external lightmap to apply a transform to convert the texture concordances back to the original and made offset for internal lightmaps use the scale of the texture atlas.

r_sunlightMode 1 changed lightmap stages in materials to support cascading sun shadows by using white image modulated by the lightmap texture. However it didn’t work with some blend mode that use the alpha of the lightmap texture. I fixed it to only apply to normal lightmap stage blend modes.

I fixed parsing q3gl2_sun without two additional “optional” tokens. Quake 3’s text parser has an option to not allow line breaks but it still sets the text pointer to after the line break. So effectively it can only check for an optional token once. And then must not check for any more because it will be on the next line. I fixed parsing q3gl_sun to only check for additional optional tokens if the previous optional tokens existed. (I previously fixed the parser in Spearmint to stop at end of line so it doesn’t have this problem but I’m more concerned about compatibility with weird content in ioquake3.)

World of Padman

World of Padman is a freeware first-person shooter based on the ioquake3 engine. (WoP source code)

My widescreen HUD support from “ZTM’s Flexible HUD for ioq3” was merged into World of Padman for version 1.7. I helped fix two widescreen issues with health station icons (distance independent 2D sprites on the HUD) and lens flares.

Some of the OpenGL2 renderer changes in ioquake3 were for World of Padman or at request of one of the World of Padman developers. q3map2 lightstyles effects are used by the wop_trashmap level.

Q3Rally

Q3Rally is a freeware racing and third-person car shooter based on the ioquake3 engine. (Q3Rally source code)

Some changes I made include:

  • Bots can drive around more race maps now (“Fix bots going in reverse for no reason in races”)
  • Fixed spectator observer camera to rotate smoothly
  • Fixed intermission view angles
  • Fixed players in race have an invisible chainsaw
  • Fixed client error dropping to menu when player dies if sv_maxclients is too high (“Fix out of range death events”)
  • Updated to latest ioquake3
  • Various other bug fixes

I fixed game network compatibility and enabled ARM64 support for the new Q3Rally Flatpak package for Linux.

Spearmint

Spearmint is my enhanced engine and game-logic based on ioquake3. (engine source code, game-logic source code)

opengl1 renderer

I fixed two issues in the opengl1 renderer due to adding changes from Wolfenstein: Enemy Territory.

Wolfenstein: Enemy Territory added far plane culling. The level BSP node bounds doesn’t include surfaces for q3map2 skybox portal hack (_skybox entity) so the far plane wasn’t set far enough and the skybox scene was cut off. Instead the bounds of surfaces in the BSP node should be used, which Wolfenstein: Enemy Territory also added.

Wolfenstein: Enemy Territory fixed surface culling for two sided level surfaces but this broke snow flakes in rgoer_seasons user created level for Quake 3 using “deformVertexes move” material feature to change the surface position. I changed back to Quake 3 behavior of not using exact view culling for two sided surfaces.

OpenGL2 renderer

I fixed bullet/explosion marks on doors and moving platforms in splitscreen player views. (It was pointed out to me that I fixed this in opengl1 years ago but I forgot to apply it to the OpenGL2 renderer.)

Game-Logic

I added an option to disable view kick when receiving damage.

I added setting the frame rate in the graphics options menu.

Maverick Model 3D

Maverick Model 3D is a 3D model editor and animator that I maintain that supports some game specific formats. It’s based on Misfit Model 3D that ceased development. (Maverick source code)

I developed/released Maverick Model 3D 1.3.14 in April.

In addition to that:

I fixed a couple issues for macOS and made GitHub Actions build and upload macOS build. If you’re logged into to GitHub, the macOS build can be downloaded from the bottom of the “Actions summary” page for each commit. I haven’t tested it as it requires newer macOS than I have access to.

I fixed a couple issues that I found through the Debian package tracker several years ago. CXXFLAGS are now respected by configure and it compiles for GNU Hurd. Though OpenGL—required for displaying the model—didn’t work in the Debian 2023 GNU Hurd virtual machine image.

I added support for exporting a Quake 3 player model with three separate models (head, upper, lower) to IQE format which can be converted to IQM for use with ioquake3 and derivative projects.

IQM uses skeletal animation which allows for less memory usage and better animation quality compared to Quake 3’s MD3 format that stores lower precision vertex positions for each frame and interpolates vertexes in a straight line between frames which may cause meshes to deform. Though IQM may be slower to draw.

I got IQE Quake 3 player model export working in 2018 (as reference in this ioquake3 forum thread where I was working on improving performance and fixing issues with ioquake3’s IQM support). I wanted to move Quake 3 player model export out on the individual format exporters but that never happened. So now it’s in IQE exporter.

P.S.

Most of these changes are a result of interacting with people rather than my own ideas or done for the sake of hypothetical other people.

If you found this useful, consider donating on Ko-fi (no account required).

]]>
Toy Box on Wii https://clover.moe/2023/09/29/toy-box-on-wii/ Fri, 29 Sep 2023 21:42:21 +0000 https://clover.moe/?p=1485 Clover’s Toy Box development: porting to the Wii console.

Wii

I ported Clover’s Toy Box to the Nintendo Wii console. It supports rendering, audio output, networking, Wii/GameCube controllers, and USB keyboard and mouse.

I used the Wii homebrew SDK (devkitPPC, libogc, and satellite libraries) which does most of the difficult work.

Wii Remote pointing on the screen (IR sensor) is supported but I haven’t integrated support for the accelerometer motion sensor or Wii Motion Pus (gyro sensor).

The Wii Remote speaker is not support by the Wii homebrew SDK. Using Wii Motion Plus with Nunchuk extension isn’t support by the homebrew SDK either. So adding support for those would be more involved.

Getting the renderer working was easier than I expected. I mainly had to implement the functions to draw triangles/lines and upload textures (not obvious how for RGBA8) to get something to display. There wasn’t any set up needed beyond the example Wii video init. The GX graphics API is very similar to fixed-function OpenGL. In some cases separate OpenGL functions are combined as a single GX function or slightly lower-level.

I added support for blend modes, depth test/write, alpha test, cull modes, multi-texture (lightmaps), texture-less (color only) rendering, vertex color, color multiply, and texture coordinates matrix. There is no support for normal maps and frame post-processing yet.

The ground work started in 2020 with getting Toy Box to compile for GameCube/Wii with stubbed out functionality and adding OpenGL 1.1 support in 2022 partially to make it easier to add fixed-function rendering on the Wii.

I need to improve performance. In some parts of the Turtle Arena Subway level it runs at 60 frames per-second. If it renders the whole map it drops to 20 frames per-second which is way too low for the level size.

I need to better handle allocating memory and out of memory errors. The Wii only has 88 MB of RAM (which is split into two separate parts) and I basically pretend Toy Box will never run out of memory.

GameCube

The GameCube and Wii consoles are similar. The graphics features are actually identical between the two consoles. The GameCube CPU/GPU are slower and it only has 27 MB of RAM. The Wii has a bunch of extra stuff added like internal flash memory, Bluetooth, etc.

I keep Toy Box able to compile for GameCube as well. However I don’t know an easy way to add data files for the GameCube build like placing them on the Wii’s SD card. So it just draws flat colored triangles and lines in the menu. I don’t have a way to run homebrew on a physical GameCube but it runs on the Dolphin emulator. (Technically the GameCube build should run on the Wii but I haven’t tested it.)

GX RGBA8 format

GameCube/Wii 8-bit RGBA format uses 4×4 pixel block tiles (similar to S3TC) with alpha and red stored in 32 bytes and then green and blue in next 32 bytes. I think I found this referenced somewhere in YAGCD (Yet Another GameCube Documentation) and had to dig into some asset converter to figure it out.

4×4 tile linear sequential memory (spaces / line break for readability):

ARARARAR ARARARAR ARARARAR ARARARAR

GBGBGBGB GBGBGBGB GBGBGBGB GBGBGBGB

It’s the pixel colors left-to-right top-to-bottom for the 4×4 block with alpha and red and then separately for green and blue. Additional blocks continue after this to make a 8×4 image or whatever size.

Other formats like RGB565 seem to work as commonly expected (not using tile blocks).

Doom

I have a private fork of the original Doom source code for Linux that uses Clover’s Toy Box. I updated it to run on the Wii.

My Doom fork is mainly for messing with “porting an old game” to use code from Clover’s Toy Box but Doom is already really portable to many platforms that there wasn’t much to do. There is already multiple versions of Doom for the Wii available, this isn’t anything new.

I fixed compiling the Doom big endian byte-swap functions, disabled Doom networking and audio which don’t compile for Wii, added uploading the Doom software rendered game frame as a GX texture, and added remapping controller buttons to Doom key values. And it basically worked.

Pressing the key to move backward, moved forward and it was faster than the forward key. The forward_move and side_move input command was a “char” value. It turns out on the Wii, char defaults to unsigned (range 0 to 255, instead of -128 to 127) so forward_move = -25 became 231 (256 – 25).

To avoid potential issues with using the homebrew SDK libraries, I opted to change Doom input to use “signed char” instead of overriding the default for “char” with a compiler option.

It supports Wii and GameCube controller joysticks/buttons and USB mouse and keyboard but pointing the Wii Remote for aiming isn’t supported.

For Doom on all platforms, I would need to fix audio and improve input integration.

Interlude

Following a Twitter trend, I made a top 25 games list.

It was hard to think of 25. There are other games but it’s been a long time since I played them or thought about them. I mostly replaced playing games with software dev and watching TV shows a decade ago.

(For whatever reason I included with the demo of Metroid Prime Hunters that was bundled with the Nintento DS. It’s kind of a weird game selection.)

Image created on topsters.org

I first played seven of the games on the GameCube or Wii.

  • PC: 9 games
  • GameCube: 4 games
  • Wii: 3 games
  • PlayStation 3: 3 games
  • PS Vita: 2 games
  • Nintendo DS: 1 game
  • Nintendo 3DS: 1 game
  • SEGA Genesis: 1 game
  • SEGA Dreamcast: 1 game

What this doesn’t represent well is that I may have played 75 games across the GameCube and Wii consoles compared to 5 or less per-consoles since then (PS3, PS4, PS Vita, 3DS) and I haven’t played as many new games on PC (Steam) since then either.

Why?

The simple answer is I ported Toy Box to the Wii because I wanted to even though there is no real use case anymore.

[This post was original titled Clover’s Toy Box 23.04 but I put off rewriting the Why? section for months to not be very long and off-topic.]

Summer of 2006

I started programming in the summer of 2006 by modifying Sonic Robo Blast 2 based on the Doom engine. I followed the online hype of the GameCube successor—the Revolution—that became the Wii released in November 2006. I was also looking forward to the Ninja Turtles 2003 TV series resuming in the fall, continuations of Sonic and TMNT 2003 game series on the Wii, and some other things eventually happening.

On the one hand, I was bored in the summer of 2006 so I started trying to make a game. On the other hand, I was excited looking forward to several things.

However nothing I was looking forward to met my expectations and some didn’t happen at all.

I wanted to make my own successor to the Ninja Turtle GameCube games that would run on the Wii. This a subject that I continue to work on over the years; whether it be Turtle Arena, controller support on desktop computers, or trying to run software on the Wii.

The Eternal Summer of 2006

I sarcastically refer to Clover’s Toy Box as taking place in “the eternal summer of 2006”. It was a point of excitement and looking forward to the possibility of the future.

And so, I ported Clover’s Toy Box (and Doom) to the Wii because I can.

Though I’m kind of bored.

]]>
Toy Box on Wayland https://clover.moe/2023/08/23/toy-box-on-wayland/ Wed, 23 Aug 2023 20:40:32 +0000 https://clover.moe/?p=1550 Clover’s Toy Box development: Fumbling around with transparency on Wayland.

Wayland is a newer application display/input protocol on Linux and some other operating systems. I recently got an AMD graphics card that works with Wayland and switched from using GNOME with X11 to Wayland.

Clover’s Toy Box and “Toy Box renderer for Spearmint” work on GNOME with Wayland using SDL 2 library.

However my Qt app Clover Resource Utility’s 3D model background was transparent and failed to clear the model between drawn frames which created a hall of mirrors effect.

It took 8 hours to understand and fix the Resource Utility issue yesterday. Though I did take a break at one point and was distracted by watching YouTube the entire time.

Fix it

What do I know:

  • I had previously read that OpenGL on Wayland (and also unrelated, Vulkan in general without enabling an extension) display alpha instead of displaying the window as opaque.
  • The Clover Resource Utility background has been broken for awhile—displaying as black—unrelated to Wayland.
  • The 3D scene is rendered to a framebuffer object and then drawn to the widget framebuffer (either blended or copied directly).

So combining these things, it seemed I broke the background to be transparent black. I got the background to work fairly easily by changing the 3D scene to be “transparent” which caused it to blend with background instead of overwriting the background alpha with 0 alpha of the 3D scene. This fixed the background color as well. This was the older behavior of the application.

Fix it again

The 3D scene was specifically not transparent/blended because of issues with Quake 3 map materials writing 0 alpha and expecting it to be opaque.

I previously added a hack to make non-blended materials write 1.0 alpha (only necessary when a transparent/blended framebuffer object was used). However this detection failed in simple cases like blending a texture over a lightmap. It basically only fixed textures without a Quake 3 shader.

Testing this older behavior however had very strange effect that I couldn’t guess causation and behaved differently on X11 and Wayland. On Wayland the transparent materials were accumulatively added to the window each frame with hall of mirrors effect until the surface was white. However I clear the 3D scene and widget framebuffers each frame. It kind of seemed like premultiplied alpha without clearing the widget framebuffer but that doesn’t make any sense either. Where is it coming from?

I tried to fix it by forcing writing 1.0 alpha when drawing the 3D scene framebuffer to the widget framebuffer. However I changed the color multiplier in the vertex shader, not the actual alpha of the 3D scene framebuffer texture in the fragment shader, and I didn’t realize it for hours.

Fix it again, correctly

This left me with an issue I could not explain the cause of and searching online lead to nothing (despite finding and reading about other issues related to Qt Wayland not displaying OpenGL content, general OpenGL transparency, etc).

Qt problem? GNOME with Wayland problem?

I was comparing using both of the Qt X11 and Wayland backends and testing changing various things, semi-random because it made no sense. I occasionally thought I fixed it but then realizing I need to test maps besides q3dm1 because my “non-blended shader hack” fixes it but I didn’t manged to fix the alpha. There was a lot of truly strange results.

Eventually I tried enabling alpha for the Qt OpenGL context. (I had code for it commented out and I was just semi-randomly testing things.) This fixed the weird “accumulatively added to the window each frame” and made X11 and Wayland behave the same.

After that I was able to understand, I did not correctly fix it to write 1.0 alpha when drawing the 3D scene framebuffer texture to the widget framebuffer. Correcting that to actually work, fixed the issue.

Conclusion

Make sure to write 1.0 alpha to the OpenGL framebuffer on Wayland if OpenGL alpha buffer isn’t enabled.

That’s how I spent 8 hours to fix the transparent HOM background while still correctly ignoring Quake 3 map materials writing weird alpha.

I thought about trying to use RenderDoc to inspect it at one point and I don’t really have a reason for why I didn’t.

Maybe if I took a break for the day, I would of realized that in order for non-1.0 alpha to misbehave when I’m clearing the 3D scene and widget framebuffers each frame, non-1.0 alpha must be making to the final framebuffer.

If having OpenGL alpha issues on Wayland (or maybe in general), enabling alpha in the OpenGL context so it’s displayed (correctly) may be helpful.

I tried to capture the Quake 3 map bug here by recording a video: surface is opaque, screenshot: surface is transparent. I love undefined behavior.

Future

The titlebar and resizing on GNOME with Wayland depends on the application to implement it. SDL supports it using the libdecor library. (libdecor will be added in Flatpak freedesktop 23.08 SDK releasing around the end of the month.) I would want to ship libdecor along with SDL in my application for full self-contain Wayland support.

(Outside of Flatpak) I would prefer not to deal with shipping libdecor and it’s plugins that assume that cairo or GTK 3 are installed. It seems like it would be simpler to add support for handling the titlebar and resizing in my application.

However SDL seems to need improvement for “hit testing” for determining if the cursor is over the titlebar/edge to work correctly. (In my test, resize doesn’t work on Windows, resize cursors don’t appear when hovering over edge on Linux (X11/Wayland), and I haven’t tested macOS.)

SDL 3 has gained support for popup menu and tooltip windows and displaying the default titlebar context menu across Windows, Linux, and macOS. I would like to build on it to make a simple GUI framework to replace using Qt for Clover Resource Utility.

]]>
Clover’s Toy Box 22.12 https://clover.moe/2023/01/26/clovers-toy-box-22-12/ Thu, 26 Jan 2023 08:25:28 +0000 https://clover.moe/?p=1467 Clover’s Toy Box development: adding an audio mixer, expanding DDS image support, and year’s end recap.

Audio

I added support for playing multiple audio files at once. My initial goal is more-or-less feature parity with Spearmint’s audio mixer. I haven’t completed that yet.

Quake 3 has a pretty basic audio mixer; OpenAL 1.1 and SDL_mixer support most of the features directly. The main complex features would need to be implemented on top of the mixer are ambient sounds (volume affected by multiple locations but only one playing sound with one max volume) and game-specific logic for stopping sounds when starting a sound (like limit per-object).

In Spearmint, I added support for multiple listeners for splitscreen. OpenAL 1.1 and SDL_mixer do not directly support this. It should be possible using OpenAL with multiple AL contexts to have “multiple listeners” by duplicating all audio buffers and sources. It would be possible to manually spatialize sounds for SDL_mixer.

OpenAL?

I originally planned to use OpenAL 1.1 (specifically mojoAL and OpenAL-Soft). The main reason to use OpenAL was to not have to deal with multi-threading myself and (using OpenAL-Soft) up to 7.1 surround sound mixing. (Though I didn’t find OpenAL implementation for GameCube/Wii which I pretend to target.)

Using OpenAL doesn’t seem too difficult for Spearmint features (though I’m not sure ambient sounds can be implemented correctly) but I think additional features will become very complicate or unnecessary memory usage.

Using OpenAL wasn’t as simple as I expected. I still had to write the framework of an audio mixer but without the literal audio mix function. Having to deal with multiple OpenAL devices and contexts with duplicated audio buffers is complicated and unnecessary memory usage. I plan to add basic effects like volume fading … which OpenAL 1.1 does not support. These kind of push me to want to streaming the audio for each source to OpenAL. (I handle the actual full audio buffers and preprocessed effects for the source and OpenAL just does spaticalizing/mixing with short temporary per-source audio buffers.)

I had a basic incomplete OpenAL mixer working and then I ripped it apart to just add my own audio mix function which at it’s core it just adding audio samples scaled by volume.

Toy Box audio mixer for Spearmint

I modified Spearmint to support an external audio mixer library similar to external renderer library. I made a small library to implement Spearmint sound interface over Toy Box mixer. This is how I did most of the testing of the audio mixer.

This hasn’t been merged into the public Spearmint git repo as it’s not really useful outside this test and I didn’t convert the built in mixers to use it.

Toy Box audio mixer

I implemented audio sources with instance IDs so that individual sources can be modified or stopped. (The instance IDs prevent reallocated sources being affected by old out of date source handles.) There is also object IDs to allow updating position of all sounds by object like in Quake 3.

I made a list of about 50 features in the Spearmint audio mixer and reviewed the RTCW/ET features I haven’t implemented in Spearmint yet. I implemented most of them.

I implemented some long wished for features: pausing audio sources and (RTCW/ET) per-source volume and range. It would be possible to pause audio sources when the game is paused now. Though to be more practical I probably need to add some setting for whether it’s a game or menu sound.

Spearmint has four separate audio source systems for playing sounds (sfx, looping sfx / real looping sfx, music / streaming sounds, and raw samples from videos/VoIP). Toy Box has one audio source system. The somewhat strange “stop looping sound if not added this frame” is not directly supported; it’s the applications job, so it’s implemented in “Toy Box audio mixer for Spearmint” library.

Missing features compared to with Spearmint:

  • Audio resampling (needed for playing sounds at correct speed/pitch)
  • Doppler (dynamic audio resampling)
  • Stop sound when starting sound (limit count per-object, free oldest when out of sources).
  • Limit total volume of looping sounds by sfx
  • Audio recording
  • Audio recording volume meter

DDS Images

I added support for uncompressed DDS image formats to have feature parity with image support in Spearmint opengl1 renderer. I added a few others DDS features that are supported by ioquake3/Spearmint opengl2 renderer.

  • Add support for uncompressed image formats, in additional to “ABGR8” format.
  • Add support for uncompressed mipmaps.
  • Add support for cube maps.
  • Add support for red and red-green channel-only compressed images (BC4/BC5).

(ioquake3 opengl2 renderer supports some additional compressed formats—BC4S, BC5S, BC6H and BC7—that Spearmint does not and I haven’t added support to Toy Box.)


The DDS image format (DirectDraw Surface) is a container format used by Windows DirectX SDK that supports image data in many different formats. Toy Box had support for DDS images with DirectX texture compression 1-5 (DXT1-5, also known as Block Compression 1-3) and uncompressed 8-bit RGBA.

I added support to Toy Box for loading DDS images with uncompressed RGBA image data with various bit depths and color channels present such as RGB565, L8A8 (luminance-alpha), and RGB10A2. All are converted to RGBA8 internally; known as “ABGR8” in DDS terms as the names use reversed channel order. (Spearmint also converts them to RGBA8.)

Mipmaps are a series of 50% size copies of the image that are used to improve image quality further away from the camera.

I added support to Toy Box for using the uncompressed RGBA mipmaps from the DDS file instead of generating new mipmaps. This can be used for faster loading or effects purposely using different images for the mipmap levels. (Spearmint incorrectly generates new mipmaps but ioquake3 opengl2 renderer uses the mipmaps from uncompressed RGBA DDS images.)

Cube maps are 6 images for the sides of a cube that can be used for reflections and the sky. I added support for loading DDS cube maps and added basic cube map reflection rendering.

There are variants of Block Compression for red and red-green channel-only compression (known as Block Compression 4/5, OpenGL RGTC1/2, DDS ATI1/2, and 3Dc+/3Dc). It uses the same compression format as DXT5/BC3 alpha block but for red and green blocks. Red-green compression is useful for normal-map compression. Red-green compression offers better precision for gradients than RGB compression (DXT1/BC1) and the normal-map light direction is normalized so the third missing value can be calculated in a shader. Red and red-green compression are supported by ioquake3’s opengl2 renderer but not Spearmint (different DDS loaders).

I added support for red and red-green compression formats to Toy Box. I support uploading R/RG compressed images to OpenGL directly or first manually decompress the image if it’s not supported by the OpenGL driver. My image writing/converting code will also decompressed them if writing a image format besides DDS (such as PNG or JPG).

ioquake3’s opengl2 renderer’s DDS loader also supports BC6H and BC7 compression. I’m not planning to add them to Toy Box at this point. Just passing the data to OpenGL isn’t complicated but decompression is complicated. I want to be able to decompress all images to load them on OpenGL 1.1 or write them to a PNG image.

I’m only targeting feature parity with Spearmint opengl1 renderer but BC4/5 red/red-green compression wasn’t very difficult to add to Toy Box. I will be ignoring the other parts for now.

Year’s End

There was 406 code revisions adding 27,000 new lines of code (including comment and blank lines). It’s about 25% of Clover’s Toy Box ~106,000 lines of code (not including third party code).

Changes of the year:

  • Improved rendering performance and fix various issues.
  • Added support for seven more skeletal model formats; Quake 3 MD4, Doom 3 MD5, and some derivatives of them.
  • Added support for OpenGL 1.1 and OpenGL ES 1.1 and to prefer OpenGL ES 2 via ANGLE on Windows over OpenGL 1.1.
  • Added an audio mixer and loading six audio file formats.
  • Improved game text input and added support for buttons on newer controllers.
  • Added browsing archive files and viewing text files to Cover Resource Utility.
  • Added basic support for running libretro cores.
  • Added support for controlling PC “RGB lighting” using OpenRGB network protocol.
  • Initial Android port.
  • Fixed running on Windows XP and ReactOS.

Year six of development ends. Year seven begins.

]]>
Clover’s Toy Box 22.10 https://clover.moe/2022/11/02/clovers-toy-box-22-10/ Wed, 02 Nov 2022 16:02:49 +0000 https://clover.moe/?p=1463 Clover’s Toy Box development: adding Wolfenstein model formats, MP3 with metadata, and vertex tangents.

After posts covering 4 months (22.04), 3 months (22.07), and 2 months (22.09) of development it seems only natural for this post to cover 1 month (22.10).

Compared to Spearmint, Toy Box now supports all audio codecs (no audio mixer yet), all image formats (except uncompressed DDS), and has partial support for all model formats. There is still a few difficult issues for models but it seem like these areas (audio, images, models) could have feature parity with Spearmint in the near future. There will be some unsupported model behavior that is unlikely to be used.

The other main areas that need work are Quake 3 .BSP (levels), collision, and networking. Toy Box is also missing many rendering features and the actual game-logic. So yeah, there is still a long way to go but it’s nice to feel like there is progress.

Models

  • Added support for Return to Castle Wolfenstein .MDS skeletal models.
  • Added support for Wolfenstein: Enemy Territory .MDM skeletal mesh and .MDX skeletal animation.
  • Added support for Wolfenstein: Enemy Territory .TAG models.
  • Added support for Kingpin: Life of Crime .MDX models (unrelated to Wolf:ET).
  • Added more support for vertex tangents.
  • For Enemy Territory: Quake Wars mod kit (SDK); Support vertex colors and missing bind pose in .MD5MESH.
  • Added support for Misfit Model 3D .MM3D skeletal “points” (tags in Quake terms).
  • Added support for vertex tangents and colors in Inter-Quake Export .IQE models.

RTCW / Wolf:ET

I added support for Return to Castle Wolfenstein .MDS. It’s another version of MD4 with unique method of bone joint storage and uses collapse map for level of detail support. I don’t support the dynamic torso rotation, torso animation, or collapse map yet.

I added support for Wolfenstein: Enemy Territory .MDM and .MDX. They are RTCW .MDS split into mesh and animation files; similar to FAKK/Alice .SKB and .SKA. MDM adds tags that are relative to a bone and bones are only in MDX.

For Wolf:ET model formats, I decided to try to properly support separate mesh (without bind pose joints) and skeletal animation models. (For FAKK/Alice .SKB/.SKA I cheat and load the other from the same directory, though this may not be correct.) There were some pretty intrusive changes to support skeletal mesh model without bind pose joints. I have to support models with no joints rendering using the joints in the (animation) frameModel and support MDM adding additional tag joints on top. The number of joints/tags is based on the combination of the model and (animation) frameModel, not either model on it’s own.

The Wolf:ET models work surprisingly well but there is some visible issues. I don’t correctly handle MDM vertexes with multiple bone joint influences yet due to missing bind pose joints and not improperly getting them from an animation.

I want to figure out how to calculate usable bind pose matrices using the vertex bone-space positions as it would be less intrusive model support through the engine and allow for GPU bone skinning. Though I currently only support 4 joint influences per-vertexes for GPU bone skinning and the Wolf:ET MDM models use up to 5. I may change to 8 in the future (it requires additional GLSL vertex attributes and shader variants). Worst case, I have to deal with storing and using the bone-space positions when rendering using CPU model animation.

I also added Wolf:ET .TAG “models” which are Quake 3 .MD3 models reduced to just one frame of MD3 tags. It’s used by the Wolf:ET server game-logic for attachment points on vehicles. The Wolf:ET client renderer doesn’t support them. In Spearmint I specifically decided not to add it to the engine because the game-logic can load it itself (even though Spearmint allows the server game-logic to load models through the engine and get the tags). I changed my mind for Toy Box, it’s easy to add it in the model code and I can also render debug axis for the tags and view the .TAG models in Clover Resource Utility.

Kingpin

A year ago I was looking into formats used by Quake 2-based games and started adding Kingpin Life of Crime .MDX based on Quake 2 .MD2. MDX adds a couple sections I don’t support; effects connected to a vertex and per-mesh bounding boxes. Quake 2 MD2 has separate vertex texture coords for software and OpenGL. Kingpin MDX however removed the software texture coords which are what I was using in Toy Box. So I had to parse the MD2 OpenGL commands list of triangle fans and strips and convert it indexed triangles using the triangle’s per-vertex texture coords.

Vertex Tangents

Normal maps allow per-texel lighting direction to give an appearance of more detail on a model. It’s one of the latest and greatest rendering feature of ~2003. Vertex tangent vectors are needed for normal map support to know the up and side orientation of the texture in 3D space. (The vertex normal is the forward orientation.)

IQM and IQE may include vertex tangents. For IQE and other formats they must be generated. Vertex tangents are generated by Doom 3 for .MD5MESH and by ioquake3/Spearmint opengl2 renderers for model formats they support (excluding IQM, which should have them).

I already supported vertex tangents for IQM models (and applying one normal map to all models as a test). I added vertex tangents to the generic vertex so that it’s possible for dynamic geometry, sprites, and CPU animated models to have vertex tangents. I added debug lines for vertex tangents which is very useful when generating them.

I added support to my general model handling (MeshBuilder API) for explicit and generated vertex tangents and vertex colors. This generate vertex tangents for all skeletal formats except IQM (MD4, MDR, MDS, MDM, SKB, MD5MESH, MMD3). I still need to add vertex tangent generation for formats that don’t use MeshBuilder; IQM and formats based on MD2 and MD3.

Conveniently Wolf:ET .MDM contains vertex texture coords and normals so it’s possible to generate tangents even though I don’t know the vertex model-space positions.

ET:QW

I previously added support for loading the Enemy Territory: Quake Wars variant of .MD5MESH (version 11). (The ET:QW game doesn’t use them but the ET:QW SDK contains them, albeit without any .MD5ANIM animations.)

I added support for vertex colors in ET:QW .MD5MESH. All models in the SDK have vertex colors but all the colors are white (“1 1 1 1”). It’s not very exciting but it reused the work I did for .IQE vertex color support.

I added support for loading “*_lod3.md5mesh” which are lower level of detail (LOD) models that do not contain the bind pose joints. This reused the work I did for Wolf:ET .MDM without joints. The LOD models can be drawn using the joints from the separate “*.md5mesh” file with the same base name. I handle this in Clover Resource Utility by treating the base model as the animation frameModel. This does not support animations yet (the ET:QW SDK doesn’t have any though) as it needs to connect the LOD model to the base model to get the joints instead of treating base model as an animation.

Misfit

Misfit Model 3D .MM3D models contain skeletal points (tags) connected to joints similar to Wolf:ET .MDM but with multiple joint influences like vertexes. I added support for this along with Wolf:ET .MDM tags which made it more complicated.

Inter-Quake Export

Inter-Quake Export .IQE, the text version of Inter-Quake Model .IQM, supports vertex colors and vertex tangents. I added support for them using my new common code in my MeshBuilder API.

Audio

  • Added .MP3 playback using public domain dr_mp3 library.
  • Added support for ID3v2 metadata in .MP3 and .WAV.
  • Cleaned up my .WAV loading.
  • Added support for LIST INFO metadata in .WAV.
  • Fixed a large .Opus file crashing the game.

MP3

I got side tracked processing the ID3 metadata that is typically in MP3 files before I added MP3 audio support. I thought it would be fairly simple key-value pairs but it turned out to be pretty complex.

ID3v1 (“TAG” data identifier) is 128 bytes at the end of the file. There is fixed length fields for title, artist, etc. (Values for specific keys, if you will.) Most (all my?) MP3 files have this and it’s very simple to load but it’s limited.

I detect audio files by the content. MP3 files usually start with ID3v2 metadata (“ID3” data identifier). The ID3v2 spec boasts it can be used with other audio codecs. I thought I could parse ID3v2 metadata to skip it and then check the audio data. It turned out MP3 files are frame based and scans for the next frame so arbitrary data can be mixed in (possible in error). I can’t check the data immediately after the ID3v2 metadata to detect MP3. ID3v2 in WAV files is stored as a RIFF chunk; not simply at the start of the file like MP3 files. So my idea of general ID3v2 handling at the start of the file turned out to not be useful and I moved it to MP3 and WAV specific code.

I had some difficulty understanding the ID3v2 specifications. As I understood more it reveled that ID3v2 is kind of full blown archive format with entries. It has separate text encoding per-entry and even offers deflate compression like .zip files. It also has a “unsynchronization” option for when the entry data is altered to avoid looking like a MP3 frame header. (I haven’t added support for deflate compression or “unsynchronization” flag.) There is three incompatible versions of ID3v2—I have all three in my meager library of MP3s.

I spent a bunch of time with trying to understand it and also converting text encodings (UTF-16 big and little endian to UTF-8). After a few hours of working on ID3 support, I realized I probably should of looked for a library for ID3 support but I was already too deep to want to stop.

I found of several weird cases when testing my library of MP3s. Such as placing the string terminator at the beginning of the entry… possibly a side affect of ID3 using a string terminator as a separator between values in the entry but these only had one value. Some files including a second ID3 metadata block (with the same content?). Some files there is garbage between the ID3 metadata and the MP3 audio data (in one file there is ID3v1 metadata, which is suppose to be at the end of the file).

So yeah, ID3v2 is complicated, weird edge cases, and not really a joy to implement. I worked on it three days and didn’t implement the full specifications.

Adding MP3 audio support using dr_mp3 was really easy though. It took like 2 hours to add using the dr_mp3 “pull API” to request samples with read and seek callbacks. I may want to change later to work with MP3 frames so I can handle ID3v2 tags in the middle of the stream. Though this is probably only likely to happen in Internet radio streams.

WAV

My .WAV loading somewhat bizarrely read 20kib of the file and then processed the byte array to find the audio format and start of the audio sample data. If the sample data started more than 20kib in (very unlikely), it would fail. Now it streams the file and processes it byte by byte with the sizes defined in the .WAV file.

I added support for the WAV metadata in Microsoft’s WAV specification (a “LIST” chunk with “INFO” identifier) and then proceeded to add support for “id3 ” chunk that contains ID3v2 metadata. Audacity exports both which gives a way to test it. Audacity also includes the ID3v2 “extended header” to show me yet another way I was incorrectly parsing ID3v2. The ID3v2 header size includes extended header size. I glossed over the wording in the specification; it’s there.

Opus

I tried loading a 8-hour 488 MB Opus file but it crashed. The 64-bit sample data size was 6.0 GB which is over max signed 32-bit integer value (2.4 GB). The size was passed to Opus library op_read_stero() as 32-bit and overflowed to negative and crashed in the opusfile library. After I fixed my code to limit the read size, it took 2 minutes to load the 8 hour Opus file, then reported an error because my sound effect loader only supports 32-bit size. So I moved the error up to before allocating and loading 6.0 GB of audio sample data. Large files should be streamed—and I support streamed reading for all audio codecs—but I don’t support playing streamed audio yet.

]]>
Clover’s Toy Box 22.09 https://clover.moe/2022/09/30/clovers-toy-box-22-09/ Fri, 30 Sep 2022 20:01:46 +0000 https://clover.moe/?p=1456 Clover’s Toy Box development: rendering Heavy Metal FAKK2 models, supporting more SDL features, fixing platforms I don’t usually test, and more.

Models

  • Added support for .SKB models and .SKA animations (version 1 to 3) from Heavy Metal: F.A.K.K.² / American McGee’s Alice and (version 4) from Star Trek Elite Force 2.
  • Added support for loading animations from .IQE models.
  • Added support for drawing skeletal models with separate animation files (works with all support skeletal formats; IQM, IQE, MD4, MD5, MDR, SKB).
  • Clover Resource Utility now automatically loads a mesh model (.md5mesh, .SKB) in the same directory when an animation-only file is viewed (.md5anim, .SKA). It makes browsing animation files more fun.

SKB support is missing dynamic legs rotation, collapse map level of detail support, and an API to get object movement specified for each animation frame.

Quake 3’s MD4 skeletal format was split into separate skeletal base mesh (.SKB) and animation (.SKA) files. They use quaternions similar to Doom 3’s MD5 format. Though material names have been removed as they are in a separate file.

The SKB/SKA files are not standalone; they both require information from the other to be usable.

To fit with my model system, when loading .SKB (FAKK/Alice versions) the renderer looks for a .SKA in the same directory to get the animation pose to convert vertexes from bone space to model space (this is needed for GPU bone skinning and it seems annoying to deal with supporting per-bone vertex positions). EF2 version of SKB includes the skeleton bind pose, so loading .SKA is not needed.

When loading .SKA the renderer has to load .SKB from the same directory to get the joint hierarchy / names. EF2 has to match SKA joints to SKB using names instead of listed order.

SDL

  • Added SDL 2.24.0 libraries for Windows, Linux, and macOS.
  • Added support for PS4 touchpad button (and other controller buttons added in SDL 2.0.14).
  • I can get data for touching PS4 touchpad and moving accel/gyro sensors but I don’t support using them yet.
  • Added support for SDL’s Input Method Editor (IME) composition API. (I still need to add support for rendering more text glyphs for this to be useful.)
  • SDL text input is now only enabled when needed, for IME support. So on-screen keyboard should only show when it’s needed and not immediately at start up (on Android / PinePhone).
  • Added support for pasting primary selection (Linux middle click). I’m still missing setting primary selection.

Dynamic geometry streaming

Fixed macOS running Clover’s Toy Box menu at 5 FPS and Raspberry Pi 3B crashing.

  • Added support for client-side vertex/index arrays.
  • Changed vertex/index buffer objects from quad-buffer to double-buffer due to out of memory issues / performance.

Current preference order:

  1. Persistent mapped buffer (OpenGL 4.3).
  2. Client-side arrays (OpenGL 1.1, not supported by OpenGL Core contexts and WebGL).
  3. Double-buffered vertex/index buffer objects (OpenGL 1.5).

I tried Clover’s Toy Box on my MacBook Pro (2008, OpenGL 4.1) and Raspberry Pi 3B (2016, OpenGL ES 2.0) for the first time in two years. On macOS the menu ran at 5 frames per-seconds (instead of the desired 60+) and on Raspberry Pi it crashed.

This was a result of increasing the size of the dynamic geometry buffer. Reverting the size fixed them but I want to be able to draw more.

On macOS and Raspberry Pi my renderer used round-robin cycling between vertex/index buffers for 4 frames (to avoid stalling if the buffer is still in use).

Raspberry Pi threw GL_OUT_OF_MEMORY for the fourth vertex buffer and crashed when trying to use it. macOS didn’t give a clear indication what the problem was (from testing; I only know it was the result of increased memory usage).

Using vertex/index buffer objects for dynamic geometry streaming is kind of known to potentially have performance issues.

I added support for client-side array (a feature added in OpenGL 1.1). You issue draw calls using CPU memory buffer instead of copying to a GPU buffer which the OpenGL driver has to manage the memory / reuse of the buffer. It probably has to make it’s own buffer internally of the client-side array data (how is this different?) but I guess old OpenGL drivers are more optimized for client-side arrays.

Client-side arrays worked fine on Raspberry Pi and macOS (when using legacy OpenGL 2.1 context). To use modern features on macOS you have to use an OpenGL Core context; which doesn’t support client-side arrays.

Buffer orphaning, double-, and triple-buffering worked on Raspberry Pi and GTX 750 Ti. Buffer orphaning didn’t work on macOS. Three buffers worked but two was faster. (This matches Apple’s OpenGL documentation that recommends double-buffering.)

I changed vertex/index buffer objects for dynamic geometry (including 2D quads for menu images/text) from quad-buffering to double-buffering.

OpenGL ES 2.0

My renderer now only require 8 shader vertex attributes; the minimum that OpenGL ES 2.0 will have.

  • Merged GLSL attributes for texture coordinates and lightmap texture coordinates.
  • Use separate vertex attribute index lists for for skeletal and vertex morph animation attributes. Before the attributes were all unique indexes even though they were not used simultaneously.
  • I switched ARM from uint32 indexes to uint16. Mainly to silence Raspberry Pi GL_KHR_debug messages about converting uint32 to uint16.

After I got past Clover’s Toy Box crashing on Raspberry Pi 3B, I was met with the level being rendered black. This was due to lightmap texture coords GLSL vertex attribute being index 10 but only 8 are supported on the Raspberry Pi 3B / required for OpenGL 2.0.

I also fixed MD3 vertex morph animation to use attribute indexes less than 8 by remapping them to skeletal attributes that aren’t used with vertex morph shaders.

ReactOS

  • Replaced WideCharToMultiByte( WC_ERR_INVALID_CHARS ) which requires Windows Vista, and MultiByteToWideChar(), with custom UTF-16 conversion functions.
  • Fixed crash due to NULL pointer dereference using my fixed-function rendering. It crashed on my main system as well.
  • Fixed out of bounds read for IQM triangle indexes when the model is loaded. It crashed on ReactOS but not my main system.

I fixed running Clover’s Toy Box on ReactOS—a Windows XP/2003 clone—and probably Windows XP.

It runs at 1 frame per-second in a virtual machine running ReactOS with the default OpenGL 1.1 software implementation. There are also a lot of warnings about my internal loopback system dropping packets. My internal loopback doesn’t buffer enough for the client to run at less than 20 FPS… which I should probably fix.

Misc

  • Added support for controlling LEDs (e.g., PC case) via OpenRGB network protocol and SDL joystick API (e.g., PS4 controller).
  • Added basic support for running emulators via the libretro API.

Controlling LEDs and running other games isn’t really useful for making a game but it’s entertaining.

I can now have external flashing lights like the arcade game Hatsune Miku: Project DIVA Arcade Future Tone. Held controller buttons affect lights. I can run it in the background to control lights while playing Hatsune Miku: Project DIVA Mega Mix+ on Steam.

RetroArch providing a single user-interface for multiple emulators is cool but I don’t like RetroArch’s user experience. I started adding support for the libretro API so I can run the emulators with my own UI/input/window-management. It’s surprising how little is needed to get SEGA Genesis and SNES emulators working. I haven’t started working on a UI yet though.

]]>
Clover’s Toy Box 22.07 https://clover.moe/2022/07/28/clovers-toy-box-22-07/ Fri, 29 Jul 2022 00:14:30 +0000 https://clover.moe/?p=1417 Clover’s Toy Box development: rendering Star Trek Voyager—Elite Force MDR models, Doom 3 MD5 models, browsing archives, and various issues I ran into.

  1. Miscellaneous.
  2. CPU vertex shader support.
  3. MDR Model Support.
  4. MDR Frame Interpolation.
  5. Model Memory Corruption?
  6. C++ Memory Corruption?
  7. Clover Resource Utility.
  8. MD5 Model Support.
  9. Toy Box renderer on a typical Windows PC.
  10. re: Can Turtle Arena go any faster?

1. Miscellaneous

I reworked vector length, normalize, distance, etc to replace 1.0f / sqrt( value ) with a rsqrt( value ) function using vrsqrtss assembly instruction. There was no noticeable performance difference though so I disabled it.

Use a CPU buffer with persistent mapped buffers. Direct access—even when never reading it—is slower than using a separate CPU buffer and memcpy()’ing to the persistent mapped buffer.

I finally committed the changes to cross-compile Qt model viewer for macOS from Linux. (I did it at the same time as for Maverick last year.)

I moved the website from it’s shared hosting location since 2009 to a VPS. This way I can have a VPS to test Toy Box server without increasing overall cost too much.

I got Hatsune Miku: Project DIVA Future Tone and Mega Mix+.

2. CPU Vertex Shader Support

I did a lot of work to support the ‘framework’ of generating vertexes for effects on the CPU but nothing to really show yet.

There can now be dynamic vertexes set up on the CPU for models and are uploaded all at once before drawing begins. It’s done on a per-mesh basis. If it’s a view dependent shader effect, it will be done separately for each scene it appears in. Otherwise it will be done once and reused in all scenes that specific object appears in. It prefers merging low vertex count surfaces (mainly sprites) to reduce drawcalls over reusing them in other scenes.

Vertexes that are passed in to be drawn are now stored in the entity buffer instead of copied to the frame’s vertex buffer directly. This allows for culling (still not implemented) and per-scene CPU vertex effects. I also had to change the surface vertex generation for all built-in primitive types (rectangle, circle, etc), CPU animated model mesh, and debug lines (for model bones, normals, etc) to write to entity buffer.

I’m considering reusing generated vertexes for the same model/material within the scene as well. The main concerning is how to handle lighting. It will ideally use GLSL for modern OpenGL but for OpenGL 1 I had planned to use CPU vertex shader for lighting like Quake 3 which prevents reusing the same geometry with different lighting, at least for OpenGL 1. I may look into OpenGL 1 light support as in Quake 3 models only use a single averaged light source anyway. I’m also a little concerned with performance/memory of scanning list of entities to see if the same mesh/material was already uploaded or having a specific list for this.

I still need to add support for multiple layers (uploading vertexes for each rendering pass). Mirrors/portals turned out to be broken already, so I’m currently ignoring that these changes would of broken them. I may of caused Doom-like multisided sprites to be invisible; I just noticed it anyway.

3. MDR Model Support

Lilium Voyager with Toy Box renderer.
(Models are full bright, missing player shadow, and missing Quake 3 shader support.)

I added support for rendering MDR models in Toy Box renderer. MDR models are used for player models in Star Trek Voyager — Elite Force. I added support for Lilium Voyager renderer API to “Toy Box renderer for Spearmint”; it doesn’t support additional Elite Force render entity types, like bézier lines.

MDR is a skeletal model format based on Quake 3’s MD4 format. I also added MD4 support but no models exist due to incomplete implementation in Quake 3. (Unlike what the Wikipedia id Tech 3 article says, the MD4 format data structures in the Quake 3 source code defines a complete working format.) Just about every game based on Quake 3 uses a modified version of MD4. Adding MDR is kind of a base for adding other model formats in the future.

I started working on MDR and a few other MD4 based model formats last year. In the last three months I cleaned up MD4/MDR support and added support for animations. (I don’t load/use the lower level-of-detail meshes though.)

In Toy Box, MDR is effectively converted into IQM format at load but has to use linear matrix lerp for bone joint interpolation instead of quaternion spherical lerp (more on that in the next section).

The performance should be more or less the same as IQM in Toy Box due to using the same rendering code. This also means that MDR uses GPU bone skinning in Toy Box renderer, unlike ioquake3/Spearmint OpenGL2 renderer where only IQM has GPU bone skinning. Under OpenGL 1/CPU vertex shader, MDR also uses CPU bone skinning with optimized deduplicated influence list that I implemented for IQM; similar to what I added for IQM in ioquake3/Spearmint opengl1 renderer. (I haven’t looked at performance comparison though.)

I added support for model meshes with more than 20 bones to use CPU bone skinning to fix a few Elite Force single-player models. (This applies to IQM and other formats as well). (The 20 bones per-mesh limit for GPU bone skinning is due to OpenGL ES minimum required shader uniform variable size.)

A few models have vertexes with 5 joints influences and I drop the lowest influence to fit 4 influences for GPU bone skinning. Some (or maybe all?) of the fifth joint influences are 1%, so it seems like a mishaps rather than utilizing a feature. IQM doesn’t support more than 4 joint influences for a vertex and I’m not excited to add it. I’d like to store all the weights and use CPU bone skinning instead of dropping the influence but I haven’t yet.

4. MDR Frame Interpolation

MD4/MDR uses a 3×3 rotation/scale matrix for bone joints. IQM uses rotation quaternion and scale vector so I convert the MD4/MDR bone joint matrix into that form. This should be fine but interpolating bones has issues.

The MD4 format was kind of strange to me but is actually straight forward. The joints are absolute position matrix and vertexes contain the vertex position in bone local space for each bone that influences the vertex (single vertex, multiple positions). To get the animated vertex position multiply the influence bone joint matrix times vertex bone local space position, scale by influence weight, and add all these influenced positions for the vertex together.

IQM (and most newer formats) have vertexes in a bind pose (vertex has a single position for all influences) with relative hierarchy of joints which can be modified and then calculate pose matrixes that are the joint offset from the bind pose. (I call it a pose matrix, I’m not sure what the correct name for “pose matrix” is.)

I was initially working on another MD4 based format, SKB format used by Heavy Metal FAKK2 and American McGee’s Alice, and vertexes are indeed in bone local space. (There is no default bone pose in the SKB file and animations are in separate SKA files so it’s actually not possible to sanely display the mesh-only file by itself.)

MDR models in Elite Force however… have vertexes in a bind pose (duplicating the same position in all of a vertexes influences ‘bone local space’ positions) and ‘absolute bone joint matrixes’ in the file are actually pose matrixes to offset from the vertex bind pose. Using quaternion spherical lerp interpolation with pose matrixes is a problem. Arms pop off torso between some frames, among other animation issues. (Converting pose matrix back to absolute joint is not possible.)

I expected there could be differences with MDR joint interpolation but I thought quaternion spherical lerp would be better than linear matrix lerp. I replaced IQM linear matrix lerp with quaternion spherical lerp in ioquake3/Spearmint due to linear matrix lerp badly deforming the model (though if I remember correctly this was the absolute joint matrix, not the pose matrix).

I kind of wonder if MDR’s linear matrix lerp on the pose matrix has less skewing then linear matrix lerp on the absolute joint matrix. Though I would guess it’s impracticable when dynamic joint rotation is supported as it would have to calculate pose matrixes for two model frames and then lerp between them (instead of just lerping joints a few steps earlier in the process).

I made MD4/MDR in Toy Box use linear matrix lerp like Quake 3 / Elite Force instead of quaternion spherical lerp. This is the only MDR rendering difference compared to IQM in Toy Box. (Unlike Spearmint where every model format is completely separate, resulting in copy-and-pasting a lot of code for every model format I add.)

(I still store MD4/MDR bone joints as rotation quaternion and scale vector instead of 3×3 matrix which is potentially a problem if skewed/shear matrixes are used. And yes, still missing MDR lower level-of-detail meshes.)

5. Model Memory Corruption?

Lilium Voyager with Toy Box renderer.
Left: Broken, Right: Not broken.

I ran into an issue that seemed like memory corruption. It wasn’t. The CPU bone skinning code expected per-mesh bone reference indexes but my loaded MDR had real joint indexes and caused out of bounds bone joint matrix access. This resulted in vertexes with NaNs (drawn at the center of the screen) and far away (which is not captured well in a single screenshot).

joints[ meshBoneReference[ vertexInfluence->boneIndex ] ] ].
meshBoneReference[] array size is 20 but vertexInfluence was incorrectly joints[] indexes.

IQM models were set up correctly but remapping influence bone joint indexes was unintentionally lost when I copy-and-pasted the code for MDR. The IQM code remapped bone joint indexes in-place but the code for MDR did not so it referenced the original real bone joint indexes.

The per-mesh bone references are only needed for GPU bone skinning (to only upload the referenced bones to the uniform buffer because OpenGL ES has a smaller uniform buffer that may only fit 20 bone joints). I ended up switching CPU bone skinning to use real bone joint indexes which removed unneeded bone reference look up (so I had to change IQM influences).

I ran into this while testing Toy Box renderer for Lilium Voyager (Star Trek Voyager — Elite Force Holomatch). Species 8472 player model had the issue. I think because it has more bones than other player models.

6. C++ Memory Corruption?

After I fixed the Elite Force bone joint issue I immediately ran into another issue that seemed like memory corruption. It wasn’t. A struct was initialized to 0 and then somehow crashed trying to free a non-NULL pointer in that struct. (I had issues trying to watch the memory in GDB because it was a class member and it went out of scope; but evidently the answer for that was watch -l <local variable>.)

I had re-applied a bunch of unfinished code I carry out of tree and Qt C++ GUI “Clover Resource Utility” crashed in changes related to loading file archives. The next day I figured out that my FS_FileSystem struct was different sizes in C++ and C resulting in C reading/writing past the end of the shorter C++ struct.

I thought it was related to field alignment adding padding between fields (which didn’t really make sense why it would be different between C++ and C). The field order was 32-bit, 64-bit, ... so there was 32-bit padding inserted to make the 64-bit field be 64-bit aligned. I moved a 32-bit field before the 64-bit field and that removed extra padding in C and made the sizes the same in both C++ and C.

However looking at offsets of all fields in the struct, I found that my boolean type was 8-bit in C++ and 32-bit in C. The order of 32-bit, 32-bit, 64-bit, boolean, boolean, 64-bit incidentally had padding in C++ that made the struct the same size with 8-bit and 32-bit boolean but reading/writing the booleans would be wrong. I spent some time searching for why my boolean enum size would be different in C++. I found -fno-short-enums GCC compiler flag but it didn’t fix the issue.

I forgot that my boolean type used C++ bool (8-bit) in C++ and enum (32-bit) in C resulting in different data sizes. I knew this could be an issue but I only thought about it in the context of reading/writing files and networking, not C++ vs C ABI… Making C++ use enum like C fixed the issue.

7. Clover Resource Utility

Viewing MD5 model in Doom 3’s base/pak002.pk4.

A year ago I made a prototype for Clover Resource Utility that could view files in archives and extract files. I finally integrated it. The file browser side-panel is resizable and hide-able and now supports directories in addition to archives. Unlike the prototype, the application supports opening regular files as well as archives.

I added support for file passed on command-line to reference a file in an archive (e.g., “clover-resource-utility pak0.pk3/maps/q3dm1.bsp”) which will load the archive and display the file.

I’ve done a lot for loading and displaying content but there was one area I ignored; freeing content. Models and their textures were not entirely freed so eventually Resource Utility silently failed to load any new models as the GPU vertex buffers were full. This has been been fixed by freeing all model GPU data when a new model is loaded in Resource Utility. Materials/textures are now referenced counted and freed if no longer used.

I added support for viewing text files.

8. MD5 Model Support

When working on various MD4-based model formats last year, I also added support for loading meshes from Doom 3 MD5 .md5mesh models. I ran into .md5anim files crashing Resource Utility when testing archive support.

I fixed .md5anim crashing the .md5mesh handler (missing MD5 joints/meshes) and added support for loading joints from .md5mesh files so it possible to move vertexes using the skeleton. This was more difficult for me than loading .md5anim files because I’m dumb and it took while to convert absolute joints in .md5mesh to be relative to parent joint.

I had to fix a few issues with rendering skeletal model with no animations.

I added loading .md5anim animation files. There isn’t a way to connect it to a .md5mesh model in Resource Utility so it can just draws the bones. I use the same function to load .md5mesh and .md5anim even though there isn’t much shared behavior. (This will be relevant when I loaded files based on content instead of file extension as both start “MD5Version”.) I also added a weird feature of allowing combined .md5mesh + .md5anim file so I could test that animation works correctly.

I haven’t done a full review but several MD5 models have vertexes with 5 to 7 joint influences. So like with MDR, I need to add support for more than 4 influences per-vertex.

9. Toy Box renderer on a typical Windows PC

I tested Toy Box renderer for ioquake3 on Windows for the first time. On a typical Windows PC running Windows XP with an NVIDIA GeForce 6800 graphics card (2004, OpenGL 2.1). I fixed some issues to get it to run and disabled slow code so it runs faster but still leaves a lot to be desired. (It does have a way better frame rate than the 2008 Intel integrated graphics I used in 2019-2020.)

Compiling WinMain() into a DLL failed to link. WinMain() wrapper in my Windows platform code was used to call main(). It’s not needed for the renderer DLL but instead of having DLL opt-out, I made Toy Box client opt-in. I had to reworked handling CFLAGS in Makefile as Toy Box renderer DLL also used it. I changed from setting flags specific for server, client, and Qt utility (reusing client stuff for renderer DLL) to be flags for CLI executable, GUI executable, GUI DLL, and Qt executable. (GUI here basically means SDL but unusable Wii port doesn’t use SDL.)

After it compiled, there was a fatal error at start up. Wide character string functions for copy and concatenate wcscpy() and wcscat() were changed to “secure” wcscpy_s() and wcscat_s() due to me compiling with _FORTIFY_SOURCE but the secure functions don’t exist in msvcrt.dll on Windows XP. (I wasn’t specifically aware it was changing function calls.) From what I read linking with mingwex library might add the missing functions (I haven’t tested) but I just implemented simple wcscpy() and wcscat(). (I’m just copying a couple constant strings for Windows UNC prefix “\\?\” or list files suffix “\*”.)

After it ran, I was greeted with the screen flashing black as it tried different OpenGL contexts until it found a working one (the window has to be created and destroyed for each try). After going through that a few times, I was avoiding looking at the screen. I got rid of the flashing by adding SDL_WINDOW_HIDDEN flag to the created windows and calling SDL_ShowWindow() after a supported OpenGL context is found. This also seemed to be faster.

Initially timedemo four ran at 44 FPS while ioquake3’s opengl1 renderer was 260 FPS. Not good. Testing r_overbrightBits 0 (to disable post process and speed up rendering) broke rendering and only went up to 50 FPS. Overbright uses a framebuffer to render to a texture and then draws to default framebuffer with additional brightness applied. The console printed default framebuffer color, depth, stencil bits as all 0. I later realized the issue was that the default framebuffer depth buffer was not cleared. I use framebuffer depth/stencil bits to determine what needs to be cleared, mainly to shut up debug mode/context GL_KHR_debug warnings about clearing stencil when there is no stencil.

I used the following to get the bits (GL_ARB_framebuffer / OpenGL 3.0):

glGetFramebufferAttachmentParameteriv( GL_FRAMEBUFFER, GL_DEPTH, GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE, &depth );

I looked into it a little and found that GL_EXT_framebuffer_object doesn’t have this. The Windows XP computer was turned off; I wasn’t sure whether it had the EXT or ARB framebuffer object extension but the handling for EXT was clearly wrong. I fixed GL_EXT_framebuffer_object to use glGetIntegerv( GL_DEPTH_BITS, &depth ) like already used for OpenGL ES 2.0.

However the Windows XP computer does have the ARB framebuffer object extension. Forcing it to use glGetIntegerv( GL_DEPTH_BITS, &depth ) worked. It’s valid except for OpenGL Core profile so I added a fallback if all color, depth, stencil bits are zero to use glGetIntegerv(). This fixed the rendering issue due to not clearing depth buffer but didn’t really improve performance.

It may be that using glGetFramebufferAttachmentParameteriv() for default framebuffer requires OpenGL 3.0 but I haven’t checked the OpenGL spec for it. I did find people talking about old NVIDIA driver being broken and not supporting getting default framebuffer bits using attachment parameter and in Core profile also not supporting getting bits from glGetIntegerv(). I added a fallback for the fallback that if the bits are still all zero, to go ahead and clear depth/stencil anyway. Not clearing depth/stencil buffers is after all to just shut up debug warnings that aren’t enabled by default.

I tested using only OpenGL 1.x features and it still ran timedemo four at 50 FPS. (So it’s not modern features / shaders messing it up.) I later remembered that my curved surface code is slow as hell and only my beast Ryzen 5800x can handle it. So I disabled curved surfaces and it moved up to 60 FPS. Also disabling overbright bits jumps the timedemo result to 170 FPS.

I tested with some things disabled (world, entities, HUD, …) but it’s still not super obvious what’s slow. Disabling drawing the world changes timedemo from 170 FPS to 260 FPS (comparable to opengl1 renderer but you know, missing the world). Disabling other things were much a smaller change. It seem like I need to improve several things, a lot, to match opengl1 performance.

As I was testing I noticed a couple rendering issues, vertex lit world surfaces are displaying lightmap (but vid_restart fixes it…) and in q3dm4 random color is getting applied to non-world models based on player location(?). (Oh joy. Is this memory corruption?)

ioquake3 timedemo four on an old Windows XP computer with an NVIDIA GeForce 6800 (2004, OpenGL 2.1). (Higher frames per second is better.)

  • 3.8 milliseconds – 260 fps – opengl1
  • 5.8 milliseconds – 170 fps – Toy Box renderer for ioquake3
  • 50 milliseconds – 20 fps – opengl2

My Toy Box renderer is missing features so it’s doing less work. I disabled poor performing features (curves, overbright). I’m using modern features that improve performance (I tested only using OpenGL 1.x features and it’s slower) but it’s still no match for Quake 3’s opengl1 renderer on this hardware.

10. re: Can Turtle Arena go any faster?

A day or two after the post Can Turtle Arena go any faster?, I decided to actually look at improving performance.

Turtle Arena connected to server (Toy Box renderer for Spearmint, map team1, 63 IQM bots) has gone from 2.5 milliseconds per-frame (400 frames per-second) to 1.5 milliseconds (666 FPS). Compared to 1.4 milliseconds (714 FPS) for 63 MD3 bots—which was unaffected by these changes.

It turns out when testing performance it helps to compile with optimizations enables instead of a debug build (~2.5 -> ~2.0 msec).

It also helps to review where it’s slow and do that less. I added skeleton pose caching to Toy Box renderer so it only sets up each bot’s legs and torso models pose once. Before the skeleton was set up by each call to get an attachment point (which is done multiple times) and when the model is added to the scene. (~2.0 -> ~1.5 msec.)

Using IQM models is in the range of 1.3 to 1.7 milliseconds (mainly 1.5) and overlaps with MD3 model’s range of 1.2 to 1.5 milliseconds (mainly 1.4). Watching over the whole Turtle Arena team1 map, IQM seems to be the same speed or faster than MD3 but IQM is slightly slower when following a bot in the midst of sea of bots. IQM and MD3 are closer in performance than I expected to get.

Compiling with optimization enabled improved Spearmint renderers as well.

Turtle Arena (map team1) connected to server:
Client/server running on Ryzen 5800x with NVIDIA GTX 750 Ti.

Before: No optimization enabled – 63 IQM players:
2.5 milliseconds – 400 FPS – Toy Box renderer for Spearmint
12 milliseconds – 80 FPS – opengl2
20 milliseconds – 50 FPS – opengl1

Now: Compiler optimizations enabled + Toy Box skeleton pose caching – 63 IQM players:
1.5 milliseconds – 666 FPS – Toy Box renderer for Spearmint
10 milliseconds – 100 FPS – opengl2
12 milliseconds – 80 FPS – opengl1

The skeleton caching is a hacky solution to support skeleton models with the Quake 3 renderer API. (Quake 3 uses model frame numbers instead of passing skeleton poses around.) The number of skeleton poses to cache must be greater than the number of skeletal models attached to the player so legs and torso models don’t get pushed out of the cache. If all models were skeletal it would need to cache like 8 models or something—legs, torso, head, weapon, weapon barrel, CTF flag, Team Arena persistent powerup, etc. and it would need to be increased if mods added more.

The Toy Box renderer API (that isn’t exposed to Spearmint game logic) supports getting the model skeletal pose (for checking attachment points and applying dynamic changes) and then specify drawing the model with the pose so the skeleton is only needs to be set up for the object once per-frame.

]]>
Clover’s Toy Box 22.04 https://clover.moe/2022/05/01/clovers-toy-box-22-04/ https://clover.moe/2022/05/01/clovers-toy-box-22-04/#comments Sun, 01 May 2022 23:59:23 +0000 https://clover.moe/?p=1390 Clover’s Toy Box development: A 10th anniversary, Turtle Arena performance, ANGLE, OpenGL 1.1, and Android.

  1. April 13th.
  2. Can Turtle Arena go any faster?
  3. OpenGL on Windows.
  4. OpenGL 1.1 fixed-function rendering.
  5. Android.

1. April 13th.

April 13th was the 10th anniversary of the release of Turtle Arena 0.6 (April 13th 2012).

Turtle Arena 0.6 was planned be the final release under the title Turtle Arena before replacing the turtles with new original characters (including the character Clover) and renaming the game to EBX (Extraordinary Beat X). I was interested in releasing it as a commercial game. I later picked up the OUYA and GameStick Android micro-consoles to try to market it on.

None of that happened but Turtle Arena 0.6 introduced the Clover red team icon.

In Turtle Arena 0.7 (2017) I changed it back to a red sai icon. The Clover icon can be re-enabled in Turtle Arena 0.7 using the console commands “g_redteam Clover; g_blueteam Shell;”. (Open the console using shift+esc.)

2. Can Turtle Arena go any faster?

I made some minor performance improvements to Clover’s Toy Box renderer which I felt pretty good about. I was interested to see how the performance in Turtle Arena compared to the Clover’s Toy Box 2021 report.

When testing Turtle Arena (running on the Spearmint engine using Toy Box renderer for Spearmint, map team1 with 63 IQM bots) the frame times were inconsistent. It was hitting lows of what I reported (100 FPS) but also much higher at some points (200 FPS) while just sitting with the camera still and viewing the whole level. I checked the 2021 version of “Toy Box renderer for Spearmint” and it behaves the same. So in essence, none of my recent improvements are really making a difference in this situation and I was conservative at reporting the performance.

Looking at the CPU usage (using Linux perf profiler program), most of the time is spent in the server’s functions SV_AreaEntities() and SV_AddEntitiesVisibleFromPoint(). Pausing the 63 bot players (using bot_pause 1 cvar) improves the frame time a lot and solves the very inconsistent frame times. (The frame time spiked every time the server runs the game logic at 20 Hz.) I think SV_AreaEntities() is called by melee attacking. Part of the problem (unconfirmed) may be that the level is one giant room is there is only one “area” and it has to check all objects to see if they’re in the attack box.

It seems like I’m kind of at the point where in order to improve the frame rate, I would need to optimize the server and game logic rather than the renderer. Though to take an easier route for testing, I ran a Turtle Arena dedicated server with 63 bots and then connected to it. That way the client can just render it without running the server-side game logic. (Run on GNU/Linux with AMD Ryzen 5800x and NVIDIA GTX 750 Ti.) Higher frames per-second (FPS) is better.

Turtle Arena (map team1)

Connected to server – 63 MD3 players:
1.4 milliseconds – 714 FPS – Toy Box renderer for Spearmint
9 milliseconds – 111 FPS – opengl2
17 milliseconds – 59 FPS – opengl1

Connected to server – 63 IQM players:
2.5 milliseconds – 400 FPS – Toy Box renderer for Spearmint
12 milliseconds – 80 FPS – opengl2
20 milliseconds – 50 FPS – opengl1

The 2021 report – when running the server and game logic in the same process:

Single player – 63 MD3 players:
8 milliseconds – 125 FPS – Toy Box renderer for Spearmint
25 milliseconds – 40 FPS – opengl2
50 milliseconds – 20 FPS – opengl1

Single player – 63 IQM players:
10 milliseconds – 100 FPS – Toy Box renderer for Spearmint
33 milliseconds – 30 FPS – opengl2
66 milliseconds – 15 FPS – opengl1

MD3 format has pre-calculated vertex positions for all frames and interpolates vertexes (in a straight line) between model frames.

IQM is a skeletal animated format which requires more computation (setting up bone transforms and matrix multiply of bone influences on each vertex) but it has better visual result (vertexes truly rotate when interpolated instead of a straight line between model frames), less memory usage, and allows for dynamic joint rotation.

Based on this and other testing it appears that the average frame time for the server logic is 3.5 to 7 milliseconds (explaining the inconsistent frame rate), average for the client logic is ~1 millisecond, and average for Toy Box renderer is ~0.4 milliseconds (MD3) or ~1.5 milliseconds (IQM).

Single player team1 map with 63 MD3 bots is 8 milliseconds per-frame (125 FPS). The renderer time is 0.4 milliseconds so I assume with no rendering it would only improve to 7.6 milliseconds per-frame (131 FPS). Only a 6 FPS increase if no rendering? That’s pretty fast rendering. Improving IQM would be nice though.

The dedicated server is running on the same computer. I suppose this would be is a reason to run the server-side logic simultaneously in a separate thread from the client. I haven’t worked with multi-threading but for Turtle Arena (Spearmint) this is probably difficult due to a lot of shared infrastructure between the client and server. Toy Box client/server is functional but does nearly nothing and I tried to make more things self contained (like the virtual filesystem) so it might be easier to try to make it run the server in a separate thread.

The strange thing is that the time delta isn’t consistent in all the renderers. When running the server and game logic; Toy Box renderer takes ~7 milliseconds longer, opengl2 takes ~17 milliseconds longer, opengl1 takes ~35 milliseconds longer.

The results for the opengl1 and opengl2 renderers are based on the Quake 3’s FPS displayed on the HUD which is rounded to whole milliseconds, averaged over a short period, and not very accurate. (For Toy Box renderer’s “714 FPS” the Quake 3 HUD was flashing between 666, 800, and 1000 FPS.) Toy Box renderer measures time in microseconds and averages over a 2 second period (with a graph so I can see inconsistent frame times).

In the future, I should measure all renderers with microsecond precision to get more accurate frames per-second / frame time. It’s kind of pain to integrate displaying the information.

tl;dr (too long; didn’t read)

  • Turtle Arena is slow with 63 bots unrelated to rendering.
  • Clover’s Toy Box renderer is fast.
  • I could use better performance measuring for other renderers.
  • I wrote a long post that performance hasn’t visibly changed since the last post.

3. OpenGL on Windows.

The OpenGL API (open graphics library application programing interface) allows software to use the graphics card or integrated graphics to improve rendering performance. OpenGL is an open standard that runs on multiple platforms. Microsoft has there own Direct3D library which is the native graphics API on Windows.

The OpenGL 1.1 API (1997) hasn’t been relevant since like the year 2001. Applications frequently require OpenGL 3.0 (2008) which has a more modern feature set. Even GZDoom, an enhanced version of a 1993 MS-DOS video game, requires OpenGL 3.0. Graphics cards that only offers OpenGL 1.1 are unlikely to be used in the present day.

However OpenGL support on Windows is kind of terrible. The OpenGL driver Microsoft provides only implements OpenGL 1.1. (To their credit for backward compatibility, it still exists.) The graphics hardware developers (NVIDIA, AMD, Intel) can make a OpenGL driver available that supports newer versions (and better accelerate OpenGL 1.1).

However for Windows in a virtual machine or for many computers with Intel integrated graphics running Windows 10; only OpenGL 1.1 is available. There is some OpenGL software renderers, such as LLVMpipe, and Microsoft’s OpenCL and OpenGL compatiblity pack for Windows 10—implemented using Direct3D 12—but I failed to get either to work in a Windows 10 virtual machine. There may be unofficial workarounds for physical Windows 10 computer with Intel graphics as there was drivers for older Windows versions.

I’ve been unable to run my software in some situations due to missing support for OpenGL 1.1. Spearmint 1.0.3 (requiring OpenGL 1.2) and Maverick Model 3D 1.3.13 (Qt 5’s QOpenGLWidget crashes the application on OpenGL 1.1).

Adding OpenGL 1.1 support to Toy Box renderer would not be the best way to solve Windows’ poor OpenGL support. Libraries such as ANGLE can convert OpenGL ES 2.0 (OpenGL for Embedded Systems 2.0) (2007) to Windows’ well supported native graphics APIs Direct3D 9 (2002) and Direct3D 11 (2008). Google Chrome and Mozilla Firefox use ANGLE for WebGL (OpenGL in a Web-browser) on Windows.

OpenGL ES 2.0 has more modern features than OpenGL 1.1 and is already supported by Toy Box renderer. I was able to add a check if it’s OpenGL 1.x to prefer OpenGL ES 2.0 (ANGLE) instead.

When using SDL 2.0.x for window/OpenGL context creation, you just add the ANGLE libEGL.dll and libGLESv2.dll into the application directory (“just” but there is no official downloads, build from source or do like I did and find a random build someone made using GitHub Actions), create a dummy window with SDL_WINDOW_HIDDEN flag and check the OpenGL version and destroy the window, and then if it was OpenGL 1.x tell SDL to use OpenGL ES 2 and create your game window and get OpenGL functions using SDL_GL_GetProcAddress(). Done. (Implementing a OpenGL ES 2.0 compatible renderer is an exercise for the reader.)

Now Toy Box “just works” in a Windows 10 virtual machine and even exposes OpenGL ES 3.0 and supports almost all the OpenGL extensions that I optionally use (except GL_EXT_buffer_storage one of the five extensions needed for persistent mapped buffers). The virtual machine ran Toy Box with Turtle Arena’s Subway map and turtle player model at 40 frames per-second at 720p. It’s not great (compared to 1500 FPS on GNU/Linux natively) but a whole lot better than failing to run at all.

I kind of wonder if I should have OpenGL 2.x on Windows default to ANGLE as well. It likely offers more features. In some ways, it’s kind of easy to understand why most “PC games” (e.g., Windows games) target Direct3D—Windows native graphics API—opposed to OpenGL. It will be there and work.

4. OpenGL 1.1 fixed-function rendering.

OpenGL 1.1 used fixed-function rendering where there is a bunch of options to change to control rendering (such as the color or being affected by a dynamic light with a set origin and color). Modern OpenGL uses OpenGL Shader Language (GLSL) shaders that give free form control to write instructions in code which opens up many more possibilities for effects.

When I started developing Toy Box I was targeting OpenGL ES 2.0 with GLSL shaders and I put most additional features behind a check for the OpenGL version and/or extension that added the functionality. I kept telling myself not to add OpenGL 1.x (fixed-function) support as it’s a waste of time. Naturally before I tried to get it to run on ANGLE, I went ahead and got it running on OpenGL 1.1. It ran at 7 frames per-second in a Windows 10 virtual machine. That’s worse performance and less available features than using ANGLE.

The four missing pieces for running on OpenGL 1.1 were fixed-function rendering (instead of GLSL shaders introduced in OpenGL 2.0), client-side vertex arrays (instead of vertex buffer objects, OpenGL 1.5), mipmap generation (OpenGL 1.4), and rendering multiple layers for a material (instead of multi-texture, OpenGL 1.3).

Getting it working wasn’t too difficult. Albeit with terrible mipmap generation (scaled down textures to use further away) and I haven’t added an additional draw pass for Quake 3 BSP lightmaps (as alternative to multi-texture) because code refactoring is a pain. BSP currently just uses vertex light instead of lightmaps.

The main problem was how to implement a custom “fixed-function vertex shader” (akin to a GLSL vertex shader) to handle dynamic vertex effects like model animation, lighting, texture scrolling/rotation, etc.

I had fixed-function rendering check if the geometry/material if it needs vertex shader effects when it’s about to be drawn. If so, I set it up in a temporary vertex array and drew it as a client-side vertex array. (This follows the general idea of Quake 3’s OpenGL 1.1 renderer.) I had fixed-function rendering just disable all vertex buffer object support. (The source data before applying fixed-function vertex shader was stored in the vertex buffer and would be needlessly uploaded.) This makes it unusable in some modern situations but the point is supporting legacy fixed-function rendering.

It seemed like I was adding a large maintain burden; the OpenGL 1.1 API support isn’t much but the whole “fixed-function vertex shader” that needs to be kept in-sync with the GLSL shaders is. Though part of my motivation for adding fixed-function rendering is that I want to run my 3D software on the Nintendo Wii and it’s “GX” API is similar to OpenGL 1.1.

While I was working on Toy Box’s OpenGL 1.1 fixed-function rendering, it occurred to me that Quake 3’s “fixed-function vertex shader” for dynamic vertex effects is actually just “CPU vertex shader” and could be streamed to the GPU like any other dynamic vertexes (2D menu, HUD, sprites, etc). It should of been obvious but there is sort of an ideology in OpenGL tutorials/discussion that passing as little data to the GPU as possible and putting everything in GLSL shaders is required for modern rendering and will automatically give you great performance.

This is kind of the first time I feel like I have an interesting solution for improving the Spearmint renderer situation. Implement fixed-function features (alpha test, clip plane, etc) using a basic GLSL uber shader with permutations so only needed features enabled. Use Quake 3’s CPU vertex shader to set the vertex positions, texture coords, and colors. GLSL shader support could be added for common cases to improve performance (light-mapped geometry, model lighting, etc) but other features could work the same as they do now in the opengl1 renderer (less regressions and no slower than it already is).

Combined with persistent mapped buffers—falling back to vertex buffer objects—it could be used in all modern situations (OpenGL 3.2 Core Profile (macOS), OpenGL ES 2, WebGL) and open the door to adding features that require GLSL shaders such as GPU skeletal animation. I wouldn’t say I have a full vision of how Spearmint opengl1 itself could be modernized yet.

For instance, I buffer all vertex data for the frame in Toy Box before issuing any draw calls for best performance. However I need to figure out how to only generate vertex data when the render entity (model, sprite, etc) is visible in a 3D scene (or mirror/portal) and in some cases generate separate vertexes for multiple scenes (like sprites). Though this mainly seems like changing the architecture of when vertexes are generated and storing references for drawing later.

Now I’m moving Toy Box’s “CPU vertex shader” support from only being usable by OpenGL 1.x (low-level, non-VBO compatible) to being usable by modern OpenGL. My intention is to implement features in the “CPU vertex shader” before GLSL vertex shaders so I can focus on the feature and then later (if needed) how to properly implement/optimize it for GLSL and pass the data into GLSL shaders. In particular I plan to add support for Quake 3 materials (.shader files) which expose a lot of flexibly that is rarely used, if at all. Accuracy for weird edge cases is more important than speed.

And that’s the story of how “OpenGL 1.1 fixed-function rendering”, a perceived pointless maintenance burden that no one will use, became a core feature that no one will use. The CPU vertex shader will be used but OpenGL 1.1 is unlikely to be used. In particular due to preferring ANGLE over OpenGL 1.1 on Windows.

As a side note, I also added OpenGL ES 1.1 support as it’s very similar to OpenGL 1.x.

5. Android.

Previously

Back in 2012 I wanted to rename Turtle Arena and replace the characters and release it as a commercial game. The PC gaming space seemed crowded and game controller support (for local multiplayer) did not seem easily accessible / well supported. I wanted to release it for the announced Android micro-consoles, such as the OUYA and GameStick, which utilize game controllers with support for local multiplayer. I also got a Motorola Droid phone (Android 2.3.3) as well because there was an ioquake3 port to it. The new characters models and expanding the content for commercial release never happened.

I tried to port Spearmint (the engine Turtle Arena runs on) to Android in 2013 using SDL 2. I got Spearmint to build for Android but running it—from what I remember—just had a black screen on Motorola Droid and immediately exited on the OUYA. I never solved it. I couldn’t get text output from the game or render anything. The renderer port from OpenGL 1 to OpenGL ES 1 was also untested. The SDL OpenGL ES test program worked so it was something wrong with my Spearmint port.

I have OpenGL ES 1.1 and 2.0+ working in Clover’s Toy Box but hadn’t revisited Android.

Present day

I haven’t felt like working on the CPU vertex shader support for a while (which needs to be done as a new core feature for “OpenGL 1.1 fixed-function renderer”…). One day I decided, hey why I don’t I try to port Clover’s Toy Box to Android?

I used the SDL 2 Android README file as a guide. I installed Android Studio (an IDE for developing Android apps). In 2013, I only had command-line tools. In Android Studio I used the SDK manager and installed the latest Android SDK (software development kit, for Java), NDK (native development kit, for C compatibility), and emulator. I used SDL’s build-scripts/androidbuild.sh script to set up SDL’s OpenGL ES test for building on Android.

./androidbuild.sh org.libsdl.testgles ../test/testgles.c

I opened the SDL project in Android studio (“SDL/build/org.libsdl.testgles/” directory). It went through some project processing and then sat at “configuring” with no additional information for several minutes. I gave up waiting. This was off to a great start. I tried using the command-line (“cd SDL/build/org.libsdl.testgles/ && ./gradlew installDebug”) to see if it would give any more information. It did not. I tried again in Android Studio. It was always stuck at “configuring”.

At some point as I was messing with it I saw a message like “NDK at ./ndk/21.xx did not have a source.properties file”. Installing the NDK had completed without error but I decided to look at the NDK files. It turns out I installed the latest NDK (24) but the gradle build system version listed in the SDL android project defaulted to using NDK 21. The “configuring” step was silently downloading NDK 21 in the background.

I deleted the NDK 21 directory and used the Android Studio’s SDK manager to install NDK 21. The download speed was terrible so I disabled and re-enabled Wi-Fi. The download speed increased. After it finished, it was able to “configure” and I was able to build the SDL OpenGL ES test app.

For whatever reason instead of trying to run it in the Android emulator, I tried to run it on my moto e (2019) phone. After some slightly confusing documentation and difficultly finding options in Android Settings, I managed to enable USB debugging which is required for installing/launching the application from Android Studio.

I was able to launch and run the SDL OpenGL ES test app on my moto e (2019) phone. In my brief time messing with it I found that it starts out running as fast as possible and if you switch to another app and back to SDL OpenGL ES test app it runs slower (and later confirmed it’s 60 FPS v-sync).

Later that Day

Equipped with my new knowledge for building and running an SDL application on Android with wonky v-sync, I set out to build Clover’s Toy Box for Android.

Android NDK uses CMake or custom (GNU Make) Makefiles for the application’s build system. (I prefer to use GNU Make.) NDK Makefile system has you define “modules” that are a static or dynamic library. The SDL library lists it’s source files and headers path, that it needs to link to Android NDK’s cpufeatures library, and that’s it. It’s really quite simple.

The application to run (SDL OpenGL ES test) is done the same way. A module defining a “libmain.so”. The SDL Java glue loads the libmain.so to call the main function (later I found that it actually calls SDL_main). I needed to build Toy Box into a “libmain.so” library.

In the Toy Box Makefile, I list all the source files (opposed to the generated object files like ioquake3/Spearmint) and set up libraries similar to NDK’s modules (with dependencies, CFLAGS, exported CFLAGS for executable/library using it, etc). It seemed like a fairly good match for setting it up as NDK modules.

I started out trying to build it all as one module (missing library private CFLAGS and per-file CFLAGS for enabling particular SIMD instructions…). I ran into issues with missing fseeko() and ftello(). On some architectures, such as ARM and x86, file access for getting/setting the read/write position is a signed 32-bit value which limits to 2GB files. There is a compile flag _LARGEFILE_SOURCE that enables using 64-bit offsets.

However support for 64-bit offset was not introduced until Android 7.0 (API level 24) and I am targeting Android 4.1 (API level 16)—the oldest that recent Android NDK and SDL support. Instead of ignoring _LARGEFILE_SOURCE and using 32-bit offset for fseeko() and ftello(), the NDK headers disable the functions resulting in compiler errors. So okay, I disabled _LARGEFILE_SOUCE in my compile flags.

minizip 1.1 was still failing to build. It turns out minizip source code was also adding _LARGEFILE_SOURCE. Disabling it was then asking for fseek64() / ftell64() which are also not present. I found that I could add USE_FILE32API to the compile options to use fseek() instead of fseek64(). I changed minizip’s ioapi.h so that USE_FILE32API would disable adding _LARGEFILE_SOURCE.

libpng leaves implementing a function to check if ARM NEON (SIMD) extension is supported as an exercise for the user. Mine failed to compile because getauxval() function isn’t supported by Android 4.1 (API level 16). In the past I used SDL’s SDL_HasNEON() but I stopped so I could use the same libpng build for SDL (the game) and Qt (model viewer) applications. I looked at how SDL handled it… which is complicated and also uses the NDK cpufeatures library (Apache-2.0 license, that isn’t mentioned in the SDL Android README). I went ahead and enabled using SDL_HasNEON() on Android as I don’t plan to build CLI or Qt applications for Android.

I ran into strange Freetype build errors so I added it as a separate NDK ‘static library’ module that used the correct Freetype private CFLAGS. Alright, the build continues. opusfile fails to build. Missing fseeko() / ftello() again. Like, okay I’m over this so I just added -Dfseeko=fseek -Dftello=ftell to the compile flags so it calls the correct functions. I ran into more compile issues and just started disabling libraries. Eventually I ran into missing linking to SDL and after adding that, it built libmain.so successfully.

Did I try to test it? Nah. Instead I decided to rip out using NDK module stuff and just have my Makefile set the compiler path (CC, CXX), compile flags (CFLAGS), and linker flags (LDFLAGS) that the NDK was adding. Instead of finding where it was in the NDK Makfiles, I just built the SDL library from the command-line with “./SDL/build-scripts/androidbuildlib.sh V=1” for verbose output and copied the information (for all four CPU architectures it’s built for). After that I was able to get Toy Box building with all libraries enabled. (And yes, this is not the best from the point of view different NDK versions may add different compile flags.)

New problem. How do I add the libmain.so I built (for arm, arm64, x86, x86_64) into the Android package (.apk)? In the Android NDK documentation I found a module type for pre-built shared library. Perfect. Except I could not get it to work. I was stuck here for a while.

I eventually found on Stack Overflow that someone said it doesn’t work and to use jniLibs in the gradle project to set a directory that contains the libraries. SDL already added it (add libraries in the “app/jni/libs/<ARCH>/” directory). I added the libraries (as “armebai/libmain.so”, “arm64-v8a/libmain.so”, etc) and they are added to the APK. Great!

I tried to run it on my moto e (2019) phone. It showed a message box with the point being “SDL_main not found”. C programs have a main() function. SDL uses a define to rename main to SDL_main. However, I don’t use this. Windows needs special handling; the entry point is WinMain() and it needs to call our main() and convert arguments from UTF-16 to UTF-8. I do it myself so that it works for the dedicated server without SDL. SDL_main does nothing on Linux and macOS so I haven’t had a reason to use SDL_main system and link to SDLmain library. I changed Toy Box main to be SDL_main on Android. However it still erred missing SDL_main. I was stuck here for a while.

After some searching I found the cause (from someone in an SDL bug report saying this solution didn’t work for them). In my compile flags, I disable exporting functions and symbols by default (“-fvisibility=hidden”). So I had to exported the SDL_main function by adding “__attribute__((visibility (“default”)))” before “int main( int argc, char **argv )”. I was then able to run Toy Box on Android (without assets). This is the title screen with white/gray square cursor, rotating white/grey quad, and red, green, blue lines for model axis.

It had been about 12 hours since I decided to try to get it working on Android. (I did some other things in the middle though.)

Several days later

Drawing a white/gray texture and red, green, blue model axis lines is great but I really need fonts. So I decided to add support for loading data files from the Android application package (.apk). The SDL Android README tells what directory to add your files into. I was able to get files into the apk easily. Now, how to read them?

SDL has functions for getting the Android internal and external storage paths. I added those to my virtual filesystem (VFS) search paths. Except it didn’t work. On the Android device, I manually browsed to the internal path using a Files app and it was empty. I was expecting the assets that were added to the apk.

Referring back to the SDL Android README it says “you can load [apk assets] using the standard functions in SDL_rwops.h“. SDL_RWops (read/write operations) is something I’ve never paid much attention to. I hadn’t heard of a reason to use it and I also don’t want file access to require SDL. Digging into the SDL source code (SDL_android.c), I found that APK assets don’t have a directory name and require reading using Android-specific AAssetManager instead of fopen() / open().

Fortunately I already have an abstraction layer over file access (instead of it being spread over the whole code base and using libraries built-in file support). I did this so that Windows can use the Win32 API and everything else can use POSIX file API (fopen(), fread(), fseeko(), fclose(), etc). SDL_RWops does this as well but I have some extra options.

I made it so that Android uses SDL_RWops if opening a file in a fake “/apk-assets/” directory (so I can sanely handle the APK assets paths) and will pass a relative path to SDL_RWFromFile() so it can be read from the APK. For other paths it uses my file opening code (with options to not overwrite existing file and make the file only be readable by this user account) and passes the FILE handle to SDL_RWFromFP() to allow SDL_RWops to use the file I opened. On Android, I have all file reading, seeking, etc is go through SDL_RWops so I don’t have to deal with handle being a FILE or a SDL_RWops.

I didn’t expect file access to be complicated. Though admittedly SDL_RWops does the hard part. However SDL doesn’t have a way to list files in directories so I’ll need to implement that myself in order to find archives to load. I haven’t tried to add a pk3 (zip) archive into the APK yet either. It sounds like something needs to be done to not compress the archive in the APK.

After adding network permissions to the AndroidManifest.xml and fixing a general issue in my game protocol TCP parsing, I got Clover’s Toy Box to run on Android—connecting to network servers and single player.

Clover’s Toy Box — moto e (2019, Android 10)
Black bar on right side due to camera notch.

It also runs on the OUYA (2012, Android 4.1). It runs at only 30 FPS with basically nothing happening so it’s going to be hard to make it usable. Let alone with four player splitscreen.

There is a lot of work needed to actually make it work well on Android. (Ignoring the fact the game logic barely exists. All it can do is move player and collide with map and it doesn’t do those well.) There are things pointed out in the SDL Android README (don’t use exit(), handle app suspend/resume, etc), for touch screen it would need on-screen controls, and the Android keyboard opens at startup due to calling SDL_StartTextInput() at start up instead of when a text field is selected.

I haven’t focused on making Clover’s Toy Box user friendly on desktop computers yet either. It requires using a keyboard to open the console in order to start or join a game and most “options” are things I changed in the code and recompile it. For testing on Android, I just hard code joining a server and recompile it… (The Toy Box server is just running on my desktop computer.)

Clover’s Toy Box now runs on GNU/Linux (desktop, Raspberry Pi, PinePhone), macOS, Windows, web browsers, and Android.

6. Summary

In this article:

  1. I remember the 10 year anniversary of Turtle Arena 0.6.
  2. I took another look at “Toy Box renderer for Spearmint” in Turtle Arena.
  3. I explained how to use ANGLE on Windows with SDL 2 in order to use OpenGL ES 2.0 instead of OpenGL 1.1.
  4. I went over porting Clover’s Toy Box renderer to OpenGL 1.1 and that a CPU vertex shader can be used with modern OpenGL.
  5. I detailed the initial port of Clover’s Toy Box to Android.

]]>
https://clover.moe/2022/05/01/clovers-toy-box-22-04/feed/ 1
Clover’s Toy Box 2021 https://clover.moe/2022/01/14/clovers-toy-box-2021/ https://clover.moe/2022/01/14/clovers-toy-box-2021/#comments Sat, 15 Jan 2022 00:39:32 +0000 https://clover.moe/?p=1348 Another year of working on rendering and game data formats in Clover’s Toy Box.

There was 191 code revisions adding 16,000 new lines of code (including comment and blank lines and 2,048 lines for a vertex normal table). It’s about 20% of Clover’s Toy Box ~83,000 lines of code (not including third party code). There is an additional 9,000 lines of uncommitted code going back to at least 2018 of unfinished features and debug code.

Quake 3 BSP Rendering

I continued working on Quake 3 BSP support. Building on what I wrote last year: I polished some things up and added a proper BSP rendering API. I was over-complicating it but the API is still missing features. BSP files now have a model handle that can be used with generic AddModelToScene() and BSP specific AddWorldModelToScene() that has surface culling based on view point.

I added initial support for BSP curved bézier patches but there is still more work to do. I fixed Wolfenstein: Enemy Territory external lightmaps to apply r_mapOverBrightBits. I added support for Wolf:ET instanced foliage meshes. In testing on Wolf:ET’s radar map there was better performance when uploading all foliage vertexes and triangle indexes each frame instead of drawing each foliage instance with a separate draw call with pre-uploaded vertexes/triangle indexes. (Maybe it would benefit from actual instanced rendering but it wasn’t supported by the 2008 Intel graphics I was using at the time. I also don’t have Wolf:ET foliage distance culling.)

“Toy Box renderer for Spearmint” compared to Spearmint opengl1 renderer—with some features disabled—is indistinguishable aside from curved patches (disabled Quake 3 shader files, world lighting on models, dynamic lights, wallmarks, sky, TrueType fonts). Though screenshot comparison shows minor difference over textures everywhere; maybe textures filter differences(?), post-process framebuffer object, or something.

Texture coordinates are incorrect on curved bézier patches if the control points are not equal distance apart. (This is the case in Turtle Arena’s subway map for no particular reason.) I think the issue is that the texture coords at a point on the curve needs to use the fraction between the control points instead of applying the same quadratic formula as the 3D position.

Other Rendering

I added support for uploading dynamic textures. It’s used by “Toy Box renderer for Spearmint” to display cinematic videos and my Doom fork (based on the original Doom source release) to upload the software rendered Doom frame to use with OpenGL.

I added support for drawing debug lines for model vertex normals (lighting direction). I’m still missing it for BSP drawing and other non-model primitives though.

I added fallback for decoding DXTn compressed images if the OpenGL driver does not support it (mobile hardware usually doesn’t support DXTn).

I did some OpenGL rendering optimization. I combined the GLSL shader text with defines for different variations (such as animation type) so it’s easier to modify. I used shader variations to disable things when not needed (clip plane, colorizing, alpha test, etc) to improve performance; this was a known issue, I just finally got around to it. I changed shader uniform values to only be updated when they change; it had way more impact on performance than I expected.

I looked at OpenGL function calls in apitrace which helped track down excessive settings of shader uniforms. It also showed that draw calls for text in the (Toy Box) menu was not combined very well. The text is drawn as opaque with a transparent black shadow. I did not combine opaque and forced transparent drawing (blending is slower). This lead to each menu item’s text and text shadow being a separate draw call, for every menu item. 20 text buttons? 40 draw calls (this slows down rendering).

The text image always uses alpha blending so I corrected it merge regardless of forced transparent when the image has alpha blending. Now 20 text buttons can be 1 draw call. It’s still not prefect: if the menu item has a checkbox (image) it breaks up merging text with the next menu item.

I looked at performance in Linux tool “perf” and found a couple arenas to make minor CPU optimization.

For a less than 1% CPU-time improvement, I changed setting vertexes lightmap texcoords to base texcoords to directly set the same value as base (vert[i].st[ST_LIGHTMAP][0] = s) instead of reading back from base texcoords (vert[i].st[ST_LIGHTMAP][0] = vert[i].st[ST_BASE][0]). It’s done for all vertexes, even though they are not using a lightmap. Though I may change it from “lightmap” to “filter” in the future for blending a texture over text.

For another less than 1% CPU-time improvement, I changed a lot of vertexes[base+i].xyz, etc to vert pointer in the functions looping through setting up all the vertexes for dynamic geometry (like 2D drawing and sprites).

Ideas for performance improvement don’t always work. My renderer was built around 16-bit vertex indexes because 32-bit indexes may not be supported by OpenGL ES. I thought desktop OpenGL might be optimized for 32-bit indexes so I add dual support for 16- and 32-bit vertex indexes, selected at run-time. For my GTX 750 Ti at least, it didn’t make a difference.

Performance

I started developing a game engine and OpenGL ES 2 renderer under the working title Clover’s Toy Box at the beginning of 2017. After 5 years I’ve finally caught up with the performance of the ioquake3/Spearmint opengl1 renderer. There is still a lot missing so I can’t quite brag about out performing a 20 year old renderer. It has been way more difficult than I expected though.

My renderer is still missing many features for Quake 3 (Quake 3 .shader files, lighting, wallmarks, sky, …) and I disabled some features in Toy Box renderer because they currently have very poor performance (curves and overbright post-process). I disabled some features in ioquake3/Spearmint to make it more comparable. There will be more performance comparison in the future—which is hopefully better documented—after more features are implemented.

So yes, these are preliminary tests that may be misrepresentative.

2008 Intel (ioquake3)

Due to life circumstances 2 years of the development was done a 2010 Dell Optiplex 780 with a 2008 Intel CPU with integrated graphics (OpenGL 2.1). Tested with 1280×1024 display because I’m too lazy to hook a 1080 monitor back up to it.

Result of timedemo 1; demo four in ioquake3 and frames per-second of the initial spawn in map tvy-bench (which has a lot of geometry). Higher frames per-second (FPS) is better.

demo four

12.1 seconds 104.5 fps – Toy Box renderer for ioquake3
14.5 seconds 86.9 fps – opengl1
43.3 seconds 29.1 fps – opengl2

map tvy-bench

30 FPS – Toy Box renderer for ioquake3
30 FPS – opengl1
16 FPS – opengl2

opengl2 = 0.3× speed of opengl1
Toy Box renderer = 1× to 1.2× speed of opengl1 (3× to 3.6× speed of opengl2)

ioquake3’s opengl2 renderer is known to perform better on NVIDIA and worse on Intel and AMD graphics. This comparison does not play to it’s strengths.

2014 NVIDIA (Spearmint)

I finally got a new CPU/motherboard this year so I could use my NVIDIA graphics card again. Ryzen 5800x + NVIDIA GTX 750 Ti at 1920×1080 resolution.

ioquake3 timedemo 1; demo four is 1.3 seconds 995~1000 FPS for both opengl1 and Toy Box renderer for ioquake3. (I did not include opengl2 in my notes for this test.)

I tested performance of Raph MD3 (vertex animation) and IQM (skeletal animation) in Turtle Arena running on Spearmint. Milliseconds to render the frame is listed but the minimum is 1 due to Spearmint engine limitations. Higher frames per-second (FPS) is better.

Turtle Arena (map team1) with 15 MD3 players:
1 millisecond – 1000 FPS – Toy Box renderer for Spearmint
4 milliseconds – 250 FPS – opengl2
7 milliseconds – 150 FPS – opengl1

Turtle Arena (map team1) with 15 IQM players:
1.1 milliseconds – 900 FPS – Toy Box renderer for Spearmint
5 milliseconds – 200 FPS – opengl2
8 milliseconds – 125 FPS – opengl1

Turtle Arena (map team1) with 63 MD3 players:
8 milliseconds – 125 FPS – Toy Box renderer for Spearmint
25 milliseconds – 40 FPS – opengl2
50 milliseconds – 20 FPS – opengl1

Turtle Arena (map team1) with 63 IQM players:
10 milliseconds – 100 FPS – Toy Box renderer for Spearmint
33 milliseconds – 30 FPS – opengl2
66 milliseconds – 15 FPS – opengl1

opengl2 = 2× speed of opengl1
Toy Box renderer = 6× speed of opengl1 (3× speed of opengl2)

It may be interesting to note: The Toy Box renderer still only uses OpenGL features supported by 2008 Intel graphics (Mesa i965 OpenGL driver). Though some of the features are part of OpenGL 4.x. There is still room for improvement using newer hardware features.

Data

I made various model and image handling improvements. Saving images can convert between DXTn compression and uncompressed formats. The model loader intermediate state (used by many unoptimized formats) can be saved to IQE and OBJ. More work is needed for general model exporting support.

I added loading additional archive formats:

  1. Doom (.WAD)
  2. Sonic Robo Blast 2’s compressed .WAD (ZWAD, rarely used)
  3. Sin (.SIN)
  4. Daikatana (.PAK)
  5. Anachronox (.DAT)

I added loading additional model formats:

  1. Anachronox: .MD2
  2. Heavy Metal F.A.K.K.²: .TAN
  3. Return to Castle Wolfenstein: .MDC
  4. Inter-Quake Export: .IQE — animation incomplete
  5. Misfit Model 3D: .MM3D — material/animation incomplete
  6. Sonic R: .BIN — animation incomplete
  7. Doom level — very incomplete

I added loading additional image formats:

  1. DirectDraw Surface .DDS (DXTn only) — There is special handling for S3Quake3 and Iron Grip: Warlord DDS files.
  2. Doom graphics (flat, patch, DeePSea tall patches)
  3. Serious Sam 2 (demo) textures (DXTn, uncompressed)
  4. A.J. Freda’s .BUZ (used by Wizard, Roly Poly Putt, and SRBX but only Wizard is fully supported)

I also added saving .DDS, Doom flats, and Serious Sam 2 (demo) textures.

Anachronox .MD2 is similar to Quake 2 MD2 but adding support for it was quite involved. The only documentation that I could find online was incomplete and/or inaccurate. It has an odd vertex mode of 11-10-11 bit position encoding. I spent a lot of time trying to figure out the vertex normal encoding. In the end I created a 2048 table of vertex normal directions by using a custom Anachronox MD2 model in the Anachronox particle application with vertex normal debug lines enabled and used apitrace to get the line end points.

I tried to scan the Anachronox EXE for the floating point vertex normal table but all I could find was the Quake 2 MD2 vertex normal table so either it’s not stored as floating point or it’s generated somehow (X values clear goes from approximately -1 to 1 in the table but Y and Z values jump around not making sense to me). I think I was only able to complete Anachronox MD2 thanks to Anachronox’ particle application which displays model information (like 11-10-11 bit XYZ encoding) and vertex directions. It doesn’t make me excited to try figuring out other model formats but I’m doing it anyway (though currently less obscure formats).

Anachronox MD2 has “tagged surfaces” which are named triangles(?). I haven’t figured out what to do with them yet (are they a position/rotation like MD3 tags? or for applying effect to the mesh?).

Clover Resource Utility

Clover Resource Utility (a Qt 5 application) gained High DPI support (for PinePhone), checkered grey background visible for transparent images, and work to keep up with other development (new BSP rendering, option to display vertex normals).

I also fixed a long running issue that resizing the window when viewing an image would turn the image black. The framebuffer object (FBO) recreation on window resize set the bound image to 0 without updating my OpenGL state tracker so my renderer thought the displayed image was still bound. Instead 0 (no image) was used.

Misc

I don’t know why it took so many years to think of how to make it easier to pass vec3 as separate arguments without copying and pasting it a bunch of times.

#define VEC3_ARGS( x ) (x)[0], (x)[1], (x)[2]
#define QUAT_ARGS( x ) (x)[0], (x)[1], (x)[2], (x)[3]

Com_Printf( "Translate %f,%f,%f, Rotate %f,%f,%f,%f, Scale %f,%f,%f.\n",
  VEC3_ARGS( joint.translate ),
  QUAT_ARGS( joint.rotate ),
  VEC3_ARGS( joint.scale ) );

Future

I don’t really have plans I’m very focused on right now. I started adding support for additional model formats. I still want to eventually reach feature parity with Spearmint opengl1 renderer. I may also try to work on some other area this year.

March 1, 2022: I corrected the article to state the renderer has been under development for 5 years, not 4 years as originally written.

]]>
https://clover.moe/2022/01/14/clovers-toy-box-2021/feed/ 2