Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
VGA

Curious about the engine's speed

Recommended Posts



I watched the unplayably low framerate in that and since there are some people here that are familiar with the engine's innards, I thought I'd ask.

With the potential optimizations in the various ports and the general knowledge about the engine like, for example, that bug which caused extra frames to be rendered, could a DOS demo-compatible version be made that would run playably on lowspec machines of the time?

I guess there could be an additional advantage in using a more modern compiler and a better DOS extender like DOS/32.

Share this post


Link to post

I once calculated that at least on a 486 CPU, Doom needed about 32 CPU instructions per pixel rendered. The formula is quite accurate on other CPUs with a similar efficiency (1 instruction per clock), if using the vanilla engine, and a tad more efficient on some console ports, at the expense of static limits and complexity.

Without any sort of hardware assistance, I don't think it could get much better than that. Scene demos from the same era seldom had as complex visuals as Doom, and many competing FPS games struggled to get playable frame rates with visuals and engines that often didn't hold a candle to Doom.

Share this post


Link to post

Not convinced you can get much better than what Vanilla Doom already achieves. There are probably small optimizations you can make to improve things slightly but the overall experience isn't likely to get much better.

Share this post


Link to post
fraggle said:

Not convinced you can get much better than what Vanilla Doom already achieves. There are probably small optimizations you can make to improve things slightly but the overall experience isn't likely to get much better.


It'd be an interesting experiment nonetheless if someone were to try.

Share this post


Link to post

Also, focusing optimizations only on the renderer (which include e.g. unrolling, parallelization etc.) has at best an average improvement of a few %, and are only noticeable at super-vanilla resolutions. But there are other sources of inefficiency that are deeper ingrained in the engine, e.g. thinker and BSP code. I remember once debugging the traversal code, that the player shooting an imp on the E4M2 demo resulted in a ridiculous number of recursive calls, like 20 in a tic just for ONE shot O_o

Share this post


Link to post

That reminds me, were Boom / MBF actually any faster than vanilla on old 486s and so forth, or was their talk of speeding up the code just bloviation?

Share this post


Link to post

AFAIK the only part of the code which claimed to have a "3x speedup" was replacing the O(N) lump search function with a hashtable. For the rest of the code, I don't know. I wouldn't be surprised if it was actually slower on a 386/486, as by the time Boom was released, those machines weren't exactly mainstream anymore, and compilers were more likely to be more optimized for the (not so) newfangled Pentium and derivatives.

Share this post


Link to post
Linguica said:

That reminds me, were Boom / MBF actually any faster than vanilla on old 486s and so forth, or was their talk of speeding up the code just bloviation?

They were but it was initially squandered by two things:

  • Use of Allegro which didn't have support for page flipping like vanilla used. So BOOM was running in vanilla Mode 13h (like Heretic and Hexen do) and as a result the game "felt" slower visually and had screen tearing artifacts.
  • The optimizer was not fully deployed.
Lee Killough fixed both of these for MBF, and as a result it seemed much faster on my 486.

Share this post


Link to post

OK, I read some threads and I understand a bit more.

So a user called gerwin at the dosbox forum started investigating the situation of the DOS source ports and found that it used this fabled Mode X method that gives it an advantage over the ports (at least on 1993 hardware), because the linuxdoom source doesn't contain the magic code.
http://www.vogons.org/viewtopic.php?f=7&t=40699
https://groups.google.com/forum/?hl=en#!msg/alt.games.doom/3tMB2UmEBK0/m1VR6LiJRQMJ

Then he made a lot of improvements to MBF and released an unofficial update:
http://www.vogons.org/viewtopic.php?f=24&t=40857

Seems he is still working on it.

Share this post


Link to post

For the fastest possible .exe, I'd consider these points first:

Demo-preserving:
. Resolve the proper video mode, and use proper double/quad buffering
. Unrolled, and/or self-modifying repeated inner-loop rendering of walls, sprites, flats.
. SoMs flat math improvement, eliminating a shift/addition step
. chained hash lump lookup
. precache all textures, flats, sprites, and sounds (bump up heap size)
. Custom, optimized sound mixing code
. quicker visplane lookup/hash
. better sprite sort (quicksort/heapsort)

Demo-breaking:
. Align structures to allow aligned memory reads/writes
. Use fixed version of Heretic's sight algorithm
. Move mobj allocation out of Z_Malloc, into a separate custom block allocator
. line iterator optimization/possible split into specialized routines

MBF does some of these things already.

Share this post


Link to post
VGA said:

OK, I read some threads and I understand a bit more.

So a user called gerwin at the dosbox forum started investigating the situation of the DOS source ports and found that it used this fabled Mode X method...

Not Mode X; Mode Y. Mode Y is a variant of Mode 13h with VGA memory unchained. What this implies is 4 pages of non-linear VRAM, rather than the one page of linear VRAM that normal Mode 13h has. So while blitting to VRAM is more complicated just as it is in Mode X, you can do very fast page flipping just by changing the framebuffer address register.


The main difference between Mode X and Mode Y is the resolution. Mode X is 320x240. Mode Y is 320x200.

Share this post


Link to post
Quasar said:

The main difference between Mode X and Mode Y is the resolution. Mode X is 320x240. Mode Y is 320x200.

I never heard of that distinction before. Every documentation I read calls any resolution of that kind mode x. See the setup tool (bsetup.exe) for the Build editor for instance. It reads:
Chain mode (Mode X) | VGA compatible | 256*200 to 400*540
Also not selectable in the game setup tool, those can be used nevertheless, by editing the well commented .cfg file.

Share this post


Link to post
LogicDeLuxe said:

I never heard of that distinction before. Every documentation I read calls any resolution of that kind mode x. See the setup tool (bsetup.exe) for the Build editor for instance. It reads:
Chain mode (Mode X) | VGA compatible | 256*200 to 400*540
Also not selectable in the game setup tool, those can be used nevertheless, by editing the well commented .cfg file.

I guess the usage is limited to certain domains then. The most generic term is "VGA tweaking." Mode X as originally invented by Abrash specifically referred a 320x240 variant he found which had very desirable qualities. I suppose like most terms ("source port" being a homegrown variety) it was prone to grow in meaning to cover the entire tweaked video mode phenomenon.

Share this post


Link to post

So that special mode they used is the fastest possible for most hardware combinations around 1993?

Why was it changed for the linux version?

If someone could reimplement it in the linuxdoom source, would it finally be at the same speed as 1.9?

Share this post


Link to post
VGA said:

So that special mode they used is the fastest possible for most hardware combinations around 1993?


It was "the fastest possible" for certain kinds of visuals, e.g it allowed to write out up to 4 pixels at once with one write command (ideal for fast solid polygon fills, or for Doom's low-detail mode), smooth scrolling (useless for Doom), and double/triple
buffering.

VGA said:

Why was it changed for the linux version?


Simply because on a "proper" OS, you just can't bang on the hardware directly as you could do on DOS with your own ASM functions: you have to be a good boy and draw on a framebuffer, and let the OS take care of it. There would need to be a Mode X driver for linux which would allow apps to "see" the VGA RAM directly in order to use multi-pixel writes, and provide functions for page flipping etc.

VGA said:

If someone could reimplement it in the linuxdoom source, would it finally be at the same speed as 1.9?


It would regain some speed, but it still wouldn't be as fast as DOS on the same hardware and at screen resolution parity for a variety of reasons. To begin with, most variants of Linux (or UNIX) wouldn't even run on a DOS gaming PC that would run Doom, so how could they ever be "as fast"? That'd be like expecting Doom95 to be as fast as the DOS version of Doom, on similar hardware.

Let's face it, Doom was (almost) as fast as it could have been. No other FPS game with comparable visuals could run as smoothly on spec parity.

Share this post


Link to post

No, I meant if someone could reimplement that mode and create a DOS version, it would be a good start for a DOS source port optimized for weaker CPUs, like the 386SX, still with a 4mb mem requirement of course.

Because Boom, even with the optimizations is still slower than 1.9, so it's an uphill battle.

Share this post


Link to post
VGA said:

No, I meant if someone could reimplement that mode and create a DOS version, it would be a good start for a DOS source port optimized for weaker CPUs, like the 386SX, still with a 4mb mem requirement of course.


If someone could re-implement the DOS version of Doom it would be exactly as fast as the original DOS version of Doom. I don't know if a minimal DOS source port which is just the linuxdoom source code + a VGA Mode X driver similar to what vanilla Doom used exists, but it could be done.

However no amount of optimizations can bridge that gap, at least not without starting sacrificing demo compatibility (e.g. by simplifying data structures, lowering static limits, and trying to reduce memory usage and cache trashing).

Since I mentioned cache, keep in mind that those weaker CPUs also had other handicapping factors: the 386SX had only a 16-bit memory bus and no on-chip cache, not even support for onboard cache, and mobos for it only had plain ISA VGA, no PCI or even Vesa Local Bus. Shaving off a few CPU cycles here and there won't be able to make a difference under those conditions. You'd have quite literally half the memory bandwidth to work with, no cache-induced acceleration even in small loops like rendering or sound etc.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×