DOOM hardware renderer

machine6 · April 12, 2017

For the final project of a parallel architecture course, my friend and I were thinking of writing a DOOM 1 renderer in NVIDIA's CUDA programming language.

The plan is to write the renderer in CUDA, maybe parallelize the BSP construction, etc and benchmark it against Chocolate DOOM, compare unlocked framerates, performance gains, etc.

For instance, run both renderers on large, compute-intensive maps like NUTS.WAD and compare performance results.

We have roughly a month for the entire project, and I was wondering if anyone has done anything similar or if this is something that's feasible.

It doesn't matter if the gains aren't that big, we were just looking to explore something related to parallel programming and this idea came up.

I'd really appreciate it for some thoughts/opinions/general advice about our idea.

EDIT:

After exploring this forum, I see that someone made a multi-threaded renderer for MochaDoom.

While that was a software renderer, any experiences/tips learned while implementing it could be useful to us.

Edited April 13, 2017 by machine6

Linguica · April 13, 2017

IIRC, QZDoom was / is experimenting with multi-core rendering by splitting the viewport into multiple sections and rendering each on a separate thread or what have you.

dpJudas · April 13, 2017

The QZDoom/GZDoom master branch does indeed have two approaches to multi-core software rendering:

1a) In the traditional software renderer it interlaces the rendering so that every N row of pixels are drawn by a specific core. It works by first collecting which drawers to call on the main thread, then calling all those drawers on worker threads (one for each core) and then in the drawer itself only process the lines relevant for that core.

1b) The softpoly renderer (r_polyrenderer 1) essentially works the same way, except it collects whole triangles to draw.

Pros: Simple to implement. The workload is roughly evenly distributed between the cores.

Cons: The cores are limited by the fact that the main thread has to walk the BSP and collect which drawers to call. If the scene is very complex the time spent walking the BSP becomes a significant portion of the frame rendering time, leaving the other cores waiting for work to do.

2) Splitting the viewport into multiple horizontal sections (r_scene_multithreaded 1). Each section is drawn by clipping everything outside each section, thereby effectively reducing the field of view for each so that BSP node walking will skip nodes that cannot be seen by that section. That allows it to do the BSP walking work on multiple cores because each section will walk a subset of the total subsectors visible.

Pros: If the scene complexity is evenly split across the entire viewport, then this should ideally get a N times speed improvement, where N is the number of cores.

Cons: Difficult to implement. The original Doom software renderer used globals to track state when walking the BSP - all that has to be modified to be thread local. The scene complexity is also rarely completely evenly split. The speed improvement will be limited by the section taking the longest time to render.

On my system I'm using a haswell i7 with 4 cores (8 threads) - theoretically that should give up to 4x-8x the speed of the original ZDoom renderer. However, because of the cons listed above, keeping all cores busy is very difficult. It depends a lot on the scene what the speed improvements are, but roughly half the potential speed increase is lost in setup costs and cores waiting for work.

Writing a Doom 1 renderer in CUDA will primarily give you problems like the above. Walking the BSP to collect visible subsectors involves updating the 1D occlusion buffer (the clipper) when culling which BSP nodes are visible and which are not.

It is therefore not possible to simply put the BSP walk into a CUDA/OpenCL kernel and then have them process a subset of it.

Method #2 I listed above attempts to do this by limiting what the occlusion buffer will clip, but there are diminishing returns involved here because each such section will still walk some of the same nodes (cores doing the same work, reducing the gains of parallelism). If you reduce the width of the section to 1 pixel, then you effectively turned the whole thing into a ray tracer.

Anyhow, hope that helps a bit. Good luck with it!

Edit: thinking about it a bit, treating the whole thing as a ray tracing exercise might actually be the best way to deal with it. The kernel would walk the BSP tree for one screen column of pixels, while using 'cliptop' and 'clipbottom' variables for plane visibility clipping.

Edited April 13, 2017 by dpJudas

machine6 · April 13, 2017

6 hours ago, dpJudas said:

On my system I'm using a haswell i7 with 4 cores (8 threads) - theoretically that should give up to 4x-8x the speed of the original ZDoom renderer. However, because of the cons listed above, keeping all cores busy is very difficult. It depends a lot on the scene what the speed improvements are, but roughly half the potential speed increase is lost in setup costs and cores waiting for work.

We certainly didn't expect this to be easy. I guess managing the work distribution properly is going to be one of the difficulties of this project. But even if we could get 10 - 15% improvements in performance, that'd be awesome.

Ray tracing also seems like an interesting approach to take; will post updates as we progress.

NinjaLiquidator · April 13, 2017

Linguica: Is it actually working or is it just a theory? Lets say I will split viewport in my game on 2x2 grid (and somehow do a frustum clip to determine objects for each portion) and do rendering in 4 separate threads, will the OpenGl really gain 4x performance (minus the additional visibility tests and shit), or just 1x cause there is just one GPU and one CPU? Or are you somehow able to say which GPU cores are doing which viewport?

Sign In

DOOM hardware renderer

Recommended Posts

machine6

Share this post

Link to post

Linguica

Share this post

Link to post

dpJudas

Share this post

Link to post

machine6

Share this post

Link to post

NinjaLiquidator

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Downloads

Cacowards

Activity