Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Csonicgo

How optimizeable is the Doom renderer?

Recommended Posts

(note to mods: if this belongs in sourceports, please move it there)

I've been dabbling in assembly lately, and my assembly professor (who also is famous for writing the chess program "Crafty") made the claim that if doom were written in assembly language, it would have been fast even on a budget 386 (but hard to port).

SO I ask, has anyone ever attempted to multithread any parts of the doom renderer, or convert any parts to ASM? Have there been any other serious optimization efforts done?

Share this post


Link to post
Csonicgo said:

SO I ask, has anyone ever attempted to multithread any parts of the doom renderer, or convert any parts to ASM? Have there been any other serious optimization efforts done?


Affirmative on the first account. I have experimented with multithreading the drawing of wall and sky columns, sky and floor flats, and even masked columns (sprites, transparent textures) in Mocha Doom for at least a year now. You can find posts and reports about this all over DW.

I also tried several multithreading/work distribution strategies. In general, you might get a measurable performance boost only in very specific situations, and then rarely over 10-15% at regular resolutions, when the time to render to the screen is usually a fraction of normal processing. Higher resolutions make multithreading more appealing (I'm always talking about software rendering here), but still nothing to write home about.

Such situations include e.g. rendering many separate visible walls at high resolutions (provided you parallelize by walls and not by columns), or drawing thousands of sprites in something like NUTS.WAD (provided you have devised an efficient method for splitting work/dealing with overlapping/masked textures and avoiding overdrawing).

With all the possible variations and experimentations I've tried in Mocha Doom, I never achieved speedups over 25% on a quad core, because in maps where multithreaded rendering would help, there are often non-rendering related bottlenecks as well (e.g. NOT rendering NUTS.WAD's sprites AT ALL results in 90 FPS at a given resolution, serial rendering results in 60 fps, best possible multithreaded method for rendering sprites resulted in around 70-72 fps, after I got around all stability/display glitches). Even if I could render in zero-time with infinitely many cores, there would still be at most a speedup of 90/60 = 1.5 aka 50%. An overall speedup of 16.7% with just 4 cores seems about right.

Read Amdahl's law about that.

Other people who work on ZDoom/prBoom/etc. (C/C++ based ports) occasionally reported maximum gains of 10% or so with their own multithreaded renderers, so the general feeling is that they are not worth the effort.

As for the second part of your question... it has been debated to death over here. The general opinion (and probably quite close to the truth) is that, from a language standpoint, Doom wouldn't gain much from being coded entirely in ASM instead of C: there would be next to zero benefits for the complex gameplay code (other than making it fucking hard to debug), and the final renderer functions for the DOS version were written in ASM anyway, so none the richer.

Share this post


Link to post

ZDoom features some optimized assembler function for drawing columns. Unfortunately, they can only handle powers of two (like the vanilla renderer) and get tutti-frutti errors on NPO2 textures, something that has been fixed in every other port since Boom.

Share this post


Link to post
Csonicgo said:

(note to mods: if this belongs in sourceports, please move it there)

I've been dabbling in assembly lately, and my assembly professor (who also is famous for writing the chess program "Crafty") made the claim that if doom were written in assembly language, it would have been fast even on a budget 386 (but hard to port)


Carmack himself said it would have been about 15% faster, so forget about budget anything.

Share this post


Link to post
Gez said:

ZDoom features some optimized assembler function for drawing columns. Unfortunately, they can only handle powers of two (like the vanilla renderer) and get tutti-frutti errors on NPO2 textures, something that has been fixed in every other port since Boom.


Errr?? I was under the impression that ZDoom artifically blows up NPO2 textures to PO2 by black banding (a method I also adopted in Mocha Doom) and it also safeguards masked textures against being drawn on single-sided sidedefs, and that it's prBoom/prBoom+ instead which has no such safeguards and still displays tutti frutti a-la vanilla Doom. I never saw tutti frutti in ZDoom, either of the "NPO2" kind or of the "drawing masked textures on single-sided linedefs" kind.

Share this post


Link to post
Porsche Monty said:

Carmack himself said it would have been about 15% faster, so forget about budget anything.

At the end of the day you're only as fast as you can be writing to Video memory. Pentium Pros actually sucked at writing to Video memory at the speed Doom required, so on some setups a Pentium could outperform a PPro. This has always been a weird "fact" but Romero said this himself during the production of Quake.

Share this post


Link to post

A fun fact is that Carmack himself experimented with different core rendering functions during DOOM development. The version of R_DrawColumn that is in the pre-beta EXE is totally different than either the C version OR the ASM version in the final source release - it is much larger and seems to handle many different cases (or lengths) of column as separate subroutines, indexing into them via a jump table.

Clearly this approach didn't work, either because it put too much limitation on the size of textures, or because the code didn't fit into cache.

Share this post


Link to post
Gez said:


Looks just like the black-band safeguard I described kicks in sometimes, but not always O_o I must admit I never saw ZDoom doing that, or at least I never noticed. In most cases, the problem is masked by the black bands, rather than highlighted. That's what I call "Tutti-Frutti of the first kind", aka due to rendering NPO2 textures where they can't be clipped properly.

There's also the example of tutti-frutti on the wiki, for MAP01 of R.WAD, but that's what I call "Tutti-Frutti of the second kind" aka due to rendering masked textures on a single-sided sidedef.

Share this post


Link to post

Back then if more of the game was written in assembly it would have been faster. Nowadays this is completely different. Compilers now are so optimized that you would likely spend countless hours writing assembly to find that your 'optimized' code is slower than the compiled C code.

I have tried multi-threading the thinker code but I had synchronization issues and I'm not familiar enough with the code base so I gave up. The times it wouldn't crash didn't seem to make any real difference on a dual core with nuts.wad. Before that all I had done was make rockets bounce off walls and ceilings inverting their direction when hit and only explode when hitting things. It was kind of fun.

Share this post


Link to post

These days the conversion operation to get 24 bpp or 32 bpp screen
draws through SDL, or some other graphic library, would be close
to the texture draw time itself, limiting your possible speedup.
Would have to convert the palette based draws to screen bpp directly, and writing directly to video memory. But on some machines you cannot get direct access to video memory, so there is added video buffer copy time.

I have seen assembly in DoomLegacy (and other ports), but it is turned off by default.

Share this post


Link to post

THe Doom renderer is quite clever in that it avoid overdrawing to an anal degree, plus the vanilla renderer has pretty low limits after which it simply stops drawing or crashes. In particular, at vanilla resolution, the renderer will actually draw almost exactly 64000 pixels on the screen (give or take, due to minimal single-pixel overdraw caused by roundoff errors. I doubt there's even as little as 0.5% overdraw).

What really slows things down in maps like nuts.wad are the sheer engine calculations, without even drawing anything to the screen yet: even if you turn on the automap or use -noblit it will still be slow at molasses, and even if you have an infinitely optimized renderer that can draw everything in 1 clock cycle (a hardware DOOM ASIC perhaps?), updating and garbage collecting the 10000 monsters of nuts.was AND all of those spawning and vanishing projectiles will make this moot. Maps with ultra-complex architectural detail also bog down the BSP trasverser and the column-rendering code (again, before actually blitting to the screen).

Parallelizing the map objects code is an even more dauting task, and is not possible unless you don't care about demo compatibility or even about the ability to record demos at all. In fact you'd need to introduce very frequent forced thread barriers which would destroy concurrency, to be able to record ANY sort demo that would be playable back with a semblance of consistency, let alone playing back vanilla demos. Even with demo compatibility/playback/recording out of the way, you'd still need to add a lot of safeguard checks (e.g. what if a monster A attacking another monster B is killed by B first on another thread? More synchronization losses, and goodbye concurrency)

Share this post


Link to post

I'm not sure how modern games do this; my initial approach would be to create an "action queue" where actors put actions into a synchronized queue, then a single thread processes as much of the queue as it can every TIC. While on the one hand you might get significant gains because most actors aren't doing anything 99% of the time, a lot of these types of threads focus on nuts.wad or similar WADs where you have 1000s of monsters active at a time, so you still have the bottleneck of the 1 action queue processing thread. Additionally maintaining demo compatibility would probably still be a challenge. Overall demo compat, complexity and the effort required probably means it's not worth it.

Share this post


Link to post

If you want optimization, you should consider just rendering OpenGL or DirectX. The software render is worthless and really should not even be worked on.

Share this post


Link to post

I'm in shock that someone would say software is worthless D: The DOOM software renderer is a marvel for being so fast, and as Maes stated, psychotically avoiding overdraw. I personally prefer using vanilla DOOM on dosbox, with dosbox blitting/stretching/aspect correction via OpenGL. Regardless, optimizing the DOOM renderer is useless if you're going to do a hardware renderer, because that depends purely on the implementation, and if your implementation sucks, you're screwed. Not only that, but how far can you optimize a hardware renderer?

Share this post


Link to post

Something I always wondered is whether the software Doom renderer could be accelerated by using 2D drawing primitives e.g. if static graphics but also texture/sprite columns or entire sprites could be scaled + blitted directly by the hardware through the use of 2D drawing primitives, provided you could organize the rendering appropriately. Some things like e.g. the automap or console which use solid-color fills, custom fonts and simple line graphics would lend themselves well to this sort of rendering.

For 3D rendering, it would require performing at least a per-column pipelining before actually drawing anything (something which is done anyway for parallelized renderers, in one form or the other), but if done right it could give hardware acceleration with "pure" software look and make certain tricks like enhanced color depth easier. The question is whether the overhead would kill any performance gains: zerg-rushing the low-level drawing subsystem with a million requests of span- and column- draws per second probably does not fall into "intended use" territory.

Share this post


Link to post

Another question, why would you want Doom to even have that "pure" software look? Nostalgia? It looks like garbage.

Share this post


Link to post
TIHan said:

Another question, why would you want Doom to even have that "pure" software look? Nostalgia? It looks like garbage.


You think it looks garbage, others may not. Pretty simple really.

Share this post


Link to post
TIHan said:

If you want optimization, you should consider just rendering OpenGL or DirectX. The software render is worthless and really should not even be worked on.


Implying DirectX is at all optimized is a pretty big point against you imo so I'm just going to have to call you out right about now as one of those in the "uneducated" class. I'm frankly pretty surprised that you think that your opinion that what has been a core part of this -- a videogame with a dedicated mod community that has lasted for longer than some of its members lifetimes -- should be dealt away with entirely should be ranked above everyone elses opinions and treated as fact

So in other words to me you rank about the same as an average GvH player and hackers on Skulltag Zombie Escape servers

TIHan said:

Another question, why would you want Doom to even have that "pure" software look? Nostalgia? It looks like garbage.


imo the software renderer for vanilla Doom looks 1000000000000x better than shitty blurry openGL bilinear quadrilateral parallelofuck rendering and it doesn't remind me of a shitty "high resolution pack" project on ZDF in the hq2x or 3x or whatever the fuck filter modes in GZDoom.

Share this post


Link to post

I love how "blurry" is used by both the OpenGL afficionados and the software purists to describe the look of the other renderer.

Share this post


Link to post
Gez said:

I love how "blurry" is used by both the OpenGL afficionados and the software purists to describe the look of the other renderer.


I don't get at all what's blurry about Doom in 1366x768 with no x-linear filtering.

Share this post


Link to post
Gez said:

I love how "blurry" is used by both the OpenGL afficionados and the software purists to describe the look of the other renderer.


I think it's funny how they use that term, because it precisely describes the opposite of how software rendering looks. There is absolutely nothing "blurry" about pixels. Pick another word or shut up! Pixelated, sharp, whatever, just don't call it blurry!

Share this post


Link to post

I have no dog in this fight (nor an opinion one way or the other), but using intermediary colors to smooth a line seems like a pretty clear-cut (no pun intended) case of "blurry" to me.

Share this post


Link to post

"BUT QONCEPT, L0L, THE FAEC ON TEH RIGHT SI MORE PHOTORELASITC AND DETAILD COZ ITS HIGHER RESOLTION"

Share this post


Link to post

I love the original look, it has a certain character
Same with 2D MAME games - never understood the hard on for N64-ing the crazy good artwork of the old titles.
Beware of smear campaigns.

Share this post


Link to post
TIHan said:

OpenGL
(image)

Software
(image)

i fail to see the difference (other than the pinkies)

Share this post


Link to post
HavoX said:

i fail to see the difference (other than the pinkies)

Then ignorance will forever be you. :)

Share this post


Link to post

Looks like OpenGL without any blurring/texture filtering on, which is a quite unusual mode. It's well-known (?) that you can do that and have "crisp OpenGL", OK.

Some other give-aways are the proper 3D perspective (look at how the door's "glass brick" pillars appear slanted in the 1st screenshot. You CAN'T have that with a pure vanilla-like renderer), and the lack of distance-lighting discoloration on the pinkies.

The problem is that you don't always have that degree of control over the settings, and then there are always subtle or even major differences in the way distance/sector lighting works etc. which may not be everybody's cup of tea.

Plus when people speak of "OpenGL" they usually mean the filtered "non pixelated" kind, not the one that tries too hard to look like software mode ;-)

Share this post


Link to post

So I guess we need a new term for OpenGL "pixelated". Perhaps just "pixelated OpenGL" would do fine.

Yes we can see these differences between the two, which Maes noted, "'glass brick' pillars appear slanted in the 1st screenshot; lack of distance-lighting discoloration on the pinkies; major differences in the way distance/sector lighting works". All of which, OpenGL is seriously superior too in every way. Actually I take that back, there was a engine modification a bit back where someone made the software render true color - pretty cool and a great exercise.

But the main point of the beginning of this thread, is optimization of the software render, which was pointed out it wouldn't make much difference across multiple CPUs. OpenGL, according to these screenshots almost gives you a 250-300% performance boost, comparing to a potential max of 20% for a multicore software render; oh, and OpenGL looks great with (potentially) a more maintainable code base render. Now, while this performance boost percentage will fluctuate in a lot of maps; but even with these screenshots gives us a good indication that OpenGL is far more optimized than any software render could be. :)

Actually, some good multicore optimization should in the AI.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×