Threads for rendering & game loop

Kaiser · August 7, 2012

I've been researching on how most game engines handle rendering and gameplay in the main loop code and noticed that a lot of these engines now in days uses two separate threads for the renderer and gameplay code. During my reverse-engineering experiments with PSXDoom and Doom64, I've noticed that they're set up the exact same way. Even the ios version of Classic Doom uses threads for rendering and gameplay.

My question is has anyone considered using this approach for their ports? Would there be any trade-offs, advantages or disadvantages? I've been considering doing this for the next version of Doom64EX and thought I ask what anyone else think of this method?

Maes · August 7, 2012

Kaiser said:
My question is has anyone considered using this approach for their ports? Would there be any trade-offs, advantages or disadvantages? I've been considering doing this for the next version of Doom64EX and thought I ask what anyone else think of this method?

I probably have experimented with multithreaded rendering in Doom for the longest time than anyone, and I considered several different models. However, in all of them, the gameplay "thread", so to speak, is the main thread, and at the end of a gameplay tick and ONLY then, the multi-threaded rendering starts, with all the possible variations (e.g. parallel walls + floors, 2 thread for walls, 1 for floor) and workload strategies (e.g. split walls by columns or segs, split floors by columns or visplanes etc.) that may apply. The problem with all of these approaches is that rendering is totally synchronous with gameplay code, as there is a serial dependence

The model I used was a serial-fork-join one: Gameplay code runs first, ensuring that everything is in its proper place on the map. Then rendering code forks into as many rendering threads as it has been set to, those threads join back into the main thread, and a new gameplay tic cycle starts. There's no concurrency between gameplay and rendering, and since it's required to know where everything is before rendering a frame, there seems to be no escaping that...or is there?

I recently hypothesized in another thread that a different "asynchronous tic/rendering" strategy might be possible: at the end of each gameplay tic, you copy ALL information required for rendering a frame starting from the map status to a temporary place, and start running a NEW gameplay tic IMMEDIATELY. Kind of like a "double buffering" of the map's objects' status. This copying strategy is already used in ports that do inter-frame interpolation, and it has the disadvantage that you need double the memory for map objects and their status. In large maps, this can be quite significant.

While this happens, you try rendering the screen based on the map info you just stored, on a separate thread. This allows actual gameplay-rendering concurrency.

How well might this work? If rendering and gameplay take more or less the same time, you might see an up to 200% speed improvement (in theory). If one of them is significantly slower though, it will dominate total tic time, and improvements will be much lower. Worst case: no difference with serial, or a bit slower because you cause loss of cache coherency by running two very different things at once. Best case: maximum 2x theoretical speedup if rendering and logic are equal in complexity, and you have no effects on cache coherency.

You could STILL have multithreaded rendering in combination with asynchronous tic/rendering, which would be "asynchronous tic/multithreaded rendering", possibly making thing a bit faster if rendering is the actual bottleneck.

Using such a model also changes Doom's speed throttling strategy dramatically (IMO it simplifies it), and you still need to arrange for a way for tic & rendering to sync up at the end of a tic. A simple thread barrier would force to render EVERY tic and destroy speed throttling. A semaphore would have to be used instead, so that e.g. if rendering takes too long, gameplay is allowed to 'run ahead' for an extra tic, and rendering only receives new data when it has completed rendering the previous content.

andrewj · August 8, 2012

I think you need to be careful comparing modernish game engines with DOOM, since they generally have a client/server architecture where the server is periodically sending all the stuff that is currently visible to the client. Since the client (renderer) is separate from the server (game code), having them run in parallel in separate threads makes sense, since they are not really simultaneously accessing/modifying a shared state.

In DOOM, the renderer and game code ARE both accessing/modifying the same state, and this is where all the headaches from trying to parallelize them is going to come from. Even something simple like animated flats can be a problem when the animation changes in the middle of a refresh -- it doesn't look good.

Personally I'd want to start with a source port (like Odamex, yay) which has already converted to a client/server architecture before attempting that.

Kaiser · August 8, 2012

I kinda figured. I plan on switching to client/server by using enet for Doom64EX anyways though. Maybe I should look into the threading stuff later once I fully implement client/server support...

LexiMax · August 9, 2012

andrewj said:
Personally I'd want to start with a source port (like Odamex, yay) which has already converted to a client/server architecture before attempting that.

Odamex (and indeed any "C/S" source port) does not actually split into a "game client" or "game server" unless you are playing on a multiplayer server. Technically, it supports both ways of playing the game, but uses the old gameloop for single player and playing .lmp demos and the C/S pathway for multiplayer.

A true C/S port would remove the old pathway completely. It's feasable, but .lmp demo compatibility might be a little tricky...perhaps sending it to the server to play back, sending clients a view of it.

Ladna · August 10, 2012

I don't know about "true", maybe "pure". EE's C/S branch supports singleplayer, client/server and the old, ticcmd-based netcode. There's really no reason to get rid of any of the functionality. The only benefit you really get from "singleplayer is just a local server" is testing. Your goal is actually to trick the player into believing there is no server... so why have a server?

More to the point, C/S Doom ports basically just drain their network message buffers at the beginning/end of the main loop. You need the network messages to set up game state for a TIC, and you need game state for a TIC before you can render the frame, so there's your sequence. You don't really need to go way out of your way to ensure that you don't update stuff during a render - sectors or things for example - because Doom is serial and you can't get away with that anyway.

All this threading stuff is kind of over my head, but maybe this is relevant: Doom effectively implements its own scheduler by running (and forgive me if Port X doesn't do it exactly this way) D_Display, G_Ticker, etc. etc. EE has a "d_fastrefresh" (uncapped framerate) option that unlatches the renderer refresh, so that even if a TIC hasn't elapsed, a frame is rendered instead of just sleeping.

Conceivably (and I think Maes suggested this, as PrBoom does it this way?) you could remember the previous gamestate, and have the "gamestate thread" calculating the next TIC's gamestate while the renderer continues to render the previous scene. I don't know anything about interpolation, particularly how you interpolate from the last frame to the one being built without buffering an additional frame, but I assume Pr+ holds the key... or buffers an additional frame.

Maes' work on multiple renderer threads is really interesting too. I know that there are some common cases where the renderer isn't the bottleneck, but even my Core2Duo 3.16GHz chugs over 1280x720 in some situations. If only there were... some way to use both my cores.................

LexiMax · August 10, 2012

Ladna said:
I don't know about "true", maybe "pure". EE's C/S branch supports singleplayer, client/server and the old, ticcmd-based netcode. There's really no reason to get rid of any of the functionality. The only benefit you really get from "singleplayer is just a local server" is testing. Your goal is actually to trick the player into believing there is no server... so why have a server?

Single-player "net"-demos would be a huge advantage. .lmp demos are handy in the same way that unit tests are for ensuring vanilla compatibility, but if you actually want to record a demo it would be a much better idea to do it in an unambiguous way that won't break into pieces if you fix a buffer overflow.

Ladna · August 11, 2012

You don't need a server for that.

Sign In

Threads for rendering & game loop

Recommended Posts

Kaiser

Share this post

Link to post

Maes

Share this post

Link to post

andrewj

Share this post

Link to post

Kaiser

Share this post

Link to post

LexiMax

Share this post

Link to post

Ladna

Share this post

Link to post

LexiMax

Share this post

Link to post

Ladna

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Downloads

Cacowards

Activity