Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
machine6

Easy-to-read DOOM 1 source code

Recommended Posts

Hey everyone,

I was looking at the original DOOM source code as well as the source code of the various ports (specifically chocolate-doom).

I was wondering if there's an easy to read version of the code anywhere, especially for the graphics renderer

I'm working on a project to port the DOOM renderer engine to CUDA and was having some trouble understanding the different components of the engine, and was hoping someone could point me to some easy-to-read documentation.

 

Thanks!

Share this post


Link to post

For the most part, the codebase is pretty well-structured. Rather than looking for an "easy to read" version of the source code my recommendation would be to use the Doom Wiki as a reference point for figuring things out. Those of us with knowledge of it have put a fair amount of work into documenting how the code works. Specifically:

If you're entirely lost and don't know anything about how BSP rendering engines work at all, maybe start here.

Share this post


Link to post

Thanks! Those are some valuable resources...

 

I had a few questions regarding the engine.

 

1. What do you think the bottleneck is in the DOOM renderer? I figure it's the BSP traversal, but I'd like some expert opinion on this.

2. What would be the most parallelizable part of the rendering engine? As in, if it could take advantage of a massive GPU with lots of execution units, what part of the engine would gain the largest boost in performance?

3. I'm assuming most of the rendering engine is in the files starting with the 'r_' prefix (eg: r_bsp.c, r_main.c, etc). Just to be sure are there any other files to dig through to modify the renderer?

Share this post


Link to post
8 hours ago, machine6 said:

2. What would be the most parallelizable part of the rendering engine? As in, if it could take advantage of a massive GPU with lots of execution units, what part of the engine would gain the largest boost in performance?

I think the gzdoom/qzdoom folks have looked at this and the answer is column drawing.

Share this post


Link to post
2 hours ago, Jon said:

I think the gzdoom/qzdoom folks have looked at this and the answer is column drawing.

In addition, you can parallelize the flat rendering with respect to wall rendering -once the BSP is done, you have all the information you need and there are no overlaps and dependencies between flats and regular walls. Masked textures and sprites are another matter, but they too can be internally parallelized as column draws.

 

Point is, this is far from being a clean or trivially parallelizable process.

Share this post


Link to post

Thanks for the answers! I'm slowly digging through the rendering source code. 

Are there any other files I should peruse besides the ones with the 'r_' prefix?

Share this post


Link to post
15 hours ago, machine6 said:

1. What do you think the bottleneck is in the DOOM renderer? I figure it's the BSP traversal, but I'd like some expert opinion on this.

"BSP traversal" describes the entire process so I'm not sure what you mean. Ultimately I'd expect the bottleneck to be in just getting pixels into the screen buffer. The original Doom source recognizes this and uses optimized versions of the R_DrawColumn / R_DrawSpan functions which are written in assembly (these are missing from the released Doom source but can be found in the Heretic source).

 

15 hours ago, machine6 said:

2. What would be the most parallelizable part of the rendering engine? As in, if it could take advantage of a massive GPU with lots of execution units, what part of the engine would gain the largest boost in performance?

Depends on how you want to implement it. Are you talking about running the software renderer itself on a GPU? That could be an interesting project. One idea could be to divide the screen into areas and assign a core to render each area. How you divide up the screen needs to be thought carefully with respect to stuff like CPU caching - for example it might be inefficient to have multiple cores all writing to the same linear screen buffer but I don't know a lot about how cache invalidation works with multiple cores, especially on something like a GPU.

 

One idea I had a while back was to convert the software renderer to use GPU-based versions of the basic drawing functions (R_DrawColumn / R_DrawSpan) - similar to what DOS Doom does, but nowadays "do it in GPU hardware" is the optimization rather than "write optimized assembly". This would have the advantage that some old maps which exploit tricks in the software rendering engine could be rendered the same (check out Requiem MAP31 for an example).

 

Realistically though if you're looking to make performance gains then you should do your own profiling and use it to carefully guide what you attempt to do.

 

15 hours ago, machine6 said:

3. I'm assuming most of the rendering engine is in the files starting with the 'r_' prefix (eg: r_bsp.c, r_main.c, etc). Just to be sure are there any other files to dig through to modify the renderer?

That's pretty much it, barring perhaps some common data structures you'll find in header files. It's pretty self-contained. Maybe p_setup.c to see how some of the structures are loaded from disk?

 

7 hours ago, Jon said:

+1 to this. Fabien's write-ups are great.

Share this post


Link to post
1 hour ago, fraggle said:

Are you talking about running the software renderer itself on a GPU? That could be an interesting project.

That's sort of what I had in mind, convert the software renderer into CUDA and run it on a GPU. It's definitely a fun side project, but I'm curious as to whether it will result in any worthwhile performance gains.

 

1 hour ago, fraggle said:

One idea could be to divide the screen into areas and assign a core to render each area.

Yep, we were planning on doing something similar after studying the high level design of the renderer.

Just one last question, given an area on the screen for the core to render, what work would the core do from start to finish (theoretically at least)?

Share this post


Link to post
1 hour ago, fraggle said:

"BSP traversal" describes the entire process 

Think about just the BSP recursive calls with their occlusion logic etc. minus any actual screen writes. That part is typically not tackled in parallelization attempts, even though it can be surprisingly time-consuming, and the level of recursion can go pretty deep.

Share this post


Link to post
15 hours ago, machine6 said:

Yep, we were planning on doing something similar after studying the high level design of the renderer.

Just one last question, given an area on the screen for the core to render, what work would the core do from start to finish (theoretically at least)?

It should perform all BSP traversals and occlusion tests that pertain to its portion of the screen, while rendering them. In that model, every core would actually run a full Doom renderer on a limited portion of the screen (horizontally divided).

 

Practical multi-threaded renderers however don't actually parallelize the BSP part: that one runs serially, producing all necessary occlusion and rendering order information, and that information is then divided among rendering cores.

Share this post


Link to post
On 18/4/2017 at 1:56 PM, Jon said:

I think the gzdoom/qzdoom folks have looked at this and the answer is column drawing.

Actually, it depends on the map. For simple maps, it is the drawers (column and span). For complex maps it is the BSP traversal and drawer setup costs. If we take a map like Frozen Time, then walking the BSP with 20k+ subsectors, iterating over every line segment, and then looking up texture and calculating texture coordinates begins to be a lot more expensive than drawing it. This is the primary reason why some ports runs that map so much better than others - the more advanced features supported on lines, the worse they run Frozen Time, unless significant time has been spent optimizing the traversal and setup.

Share this post


Link to post

Column drawing is just one part of the rendering process, and then again  you must specify whether it's walls, masked textures or sprites, as those are rendered at different times and with different prerequisites, and also have different degrees of symmetry. E.g., it's pretty safe to say that given any random Doom scene, by splitting the screen into N parts, there will be on average 1/Nth of the total wall surface, 1/Nth of the total floor surface etc. in each screen fraction, and that their "pixel volume" will be equal. Of course it's possible to deliberately craft maps/scenes where this is not true and heavy asymmetry is deliberately caused, but those are pathological, not average cases.

 

As dpJudas said, the BSP traversal can be quite expensive in some maps, and the more complex operations it's supposed to do, the harder it will be to parallelize. It may also not be trivial to "split" it into N parts as with a screen's wall or floor surface, so that part may always need to run serially, essentially limiting the maximum potential parallel speedup even if you manage to run the pixel-rendering part in zero time.

 

FWIW, here's an old post I made on the subject, showing how much time the BSP can take even on a map (NUTS.WAD) which has very little in the way of geometry (though I suspect the monsters do play a role).

 

A completely parallel BSP would probably require more total CPU time to run than its single-threaded version in order to sort out situations like e.g. a single linedef being encountered in both branches of a parallel recursion, or sorting out potential data dependencies/deadlocks/race conditions. A simple way to deal with that would be to allow parallelism only on paired binary recursions, which again, would require some pretty clever thread pool/maximum recursion depth limiting in order not to over-saturate available CPU cores and result in a pretty "stiff" parallism.

Edited by Maes

Share this post


Link to post

I didn't say column drawing was the most expensive part of the process; I said it was the most parallelizable.

Share this post


Link to post
4 hours ago, Jon said:

I didn't say column drawing was the most expensive part of the process; I said it was the most parallelizable.

I don't know if that was aimed at me, but I don't think anything I wrote implied otherwise. I just said there might be some constraints over which columns can be parallelized and when. E.g. you first have to draw those of unmasked walls first, then those of sprites and masked textures (with floors/spans being parallelizable with respect to the former, but not the latter).

 

Besides that, there are several strategies on how to pipeline a set of columns between several available threads or cores. E.g. split the screen in N horizontal zones, and only assign each core only columns in a specific zone, or split a rendering job into "segs" instead (an entire wall or  segment) and divide those among cores, regardless of where they will appear on the screen. So in essence deciding whether to pipeline column drawing or segment drawing operations (the latter has potentially less overhead, especially at high resolutions or complex geometry, where many independent column draws may be required for each column of vertical resolution). But discussing such aspects might be a bit premature at this point, especially if the OP has still got to learn the basics of Doom's rendering algorithms.

 

If I could sum it up in one paragraph, I'd say that the "classic" vanilla algorithm has the rendering of WALLS interspersed into BSP traversal, and the two phases are not separate. Drawing of floors and sprites/masked textures however is done separately with respect to the combined BSP/rendering phase. Parallelization typically requires isolating/deferring actual rendering from BSP traversal (hence introducing a pipelining process for individual columns or for walls), for one, and then finding a way to include floor and sprites/masked textures into the process.

Edited by Maes

Share this post


Link to post
2 hours ago, Maes said:

I don't know if that was aimed at me, but I don't think anything I wrote implied otherwise.

 

No, it was a response to dpJudas, I should have explicitly indicated that.

Share this post


Link to post

I'm curious about getting the data in and out of the graphics card. That seems like a time-consuming ordeal...unless you plan to somehow get the level data itself unto the graphics card. I can't claim to know much of what your possibilities are, but I wish you good luck!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×