Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Redneckerz

FastDoom: DOS Vanilla Doom optimized for 386/486 processors

Recommended Posts

7 hours ago, zokum said:

A decent EGA mode should use dithering to improve the graphics. A palette hack would look pretty bad in comparison to what would be possible to do if an algorithm used dithering.

 

Dithering only really works well if you are willing to trade off resolution for color depth (which I guess you are, if you're looking into such solutions...). This can be mitigated somewhat if you can increase the resolution to compensate (EGA has a 640 x 200 16-color mode, so you could say that two horizontal pixels in that mode "count" as a normal pixel), but there goes that pesky planar display problem again :-)

 

If you are trying to do such a dithering mode for the heck of it, to see how it would look, OK. But if you're hoping for any speed gains or a "What if DOOM supported EGA back in the day?" inquiry, well, in the latter case the answer is that it wouldn't: it would just be just too damn slow. :-)

 

Interestingly enough, CGA actually had packed pixels instead of bit planes. It's still a chore to single out a specific pixel within a byte (they are arranged somewhat like p1|p1|p2|p2|p3|p3|p4|p4 (2 bits per pixel in 320 x 200 mode), but doing a bit of masking is certainly better than writing to 2 or 4 different planes every time...

 

...but still, it will be slower that just writing a whole byte-wide pixel to memory, exactly where you want it to :-)

Edited by Maes

Share this post


Link to post

There's also CGA analogue colour... 16 colours through 1-bit dithered ouput

Share this post


Link to post

You can do even more (up to 1K) with CGA composite output/artifact colors, as the 8088 MPH demo shows:

 

 

The problem is that, just like Amiga HAM mode, you're constrained to specific colour transitions, it's not a suitable method for rendering moving images of arbitrary complexity (except for scrolling/rolling effects). It has more in common with an image compression method, than a display mode.

 

So, once again, if you wanted to use this mode for Doom, you'd have to pay a hefty image processing "fee" for every frame.

Edited by Maes

Share this post


Link to post

You're right @Maes CGA and EGA native modes will be slower than the VGA, but i think it's possible to have decent performance by using a backbuffer, render everything onto it and then transfer all the data to the video card. The main problem with CGA and EGA cards is that the 8-bit ISA bus bandwith is very limited, and the 320x200 resolution is too much for those cards. Even for 16-bit ISA VGA cards the 320x200 resolution is too much. Maybe the non-standard CGA 160x100 mode will be fast enough. I will finish first the text mode, and then try to implement any of those modes.

 

@zokum Dithering can be done, but it will be slower for sure. Maybe it's doable with a fast 486, but i'm pretty sure that it is a no-no for 386 processors. Also using some CGA tricks it's doable but limits the number of compatible setups (only composite NTSC monitors are supported)

 

BTW i've tested the new modes with a very old Trident EGA card and a "fast" processor (486DX@50), it runs pretty smooth. The major factor limiting the framerate is still the processor.

 

 

Edit: the textmode can be faster, for now i'm using a translate table that converts the 256 color output to 16 colors, and that table is recalculated every time the palette changes. Also the color mapping can be better, i'm struggling hard to find the best convert formula.

Edited by viti95

Share this post


Link to post

Well with text mode, dithering is even easier since you have 25%, 50% and 75% shades (ASCII Extended characters 0xB0-B2) to play with (I forget if you're allowed to use high-intensity colors for the background - you can't for ENDOOM IIRC, the intensity bit for BG is repurposed to mean "this text flashes" - so this might be more limited than it initially looks). To figure out what each dithered color looks closest to, first you need to take the two colors you're dithering and put each red, green and blue value to the power of your gamma (usually 2.2 for IBM PCs, probably different if you're futzing with Doom's built-in gamma control but I don't know what values it adjusts the gamma to), and then do your blending there (so for 50%, you'd add up 0.5*R/G/B from one color to 0.5*R/G/B of the second, and so on for the other channels; for 25% you'd use 0.25*R/G/B of the foreground color with 0.75*R/G/B of the background; etc), before going back to linear space by taking the resulting red/green/blue values and getting their [gamma]-root (i.e.: 2.2-root of the value, or putting the value to the power of (1/2.2)).

 

As for how to match up the resulting blended colors with the actual Doom palette, nearest-neighbor search, of course. Distance from one color to the next is sqrt((R1-R2)^2+(G1-G2)^2+(B1-B2)^2), find the smallest resulting value (though I don't know if this is best done in gamma-corrected space or not; experimentation probably required). A naïve implementation would be O(3S), where S is the number of resulting combinations of colors (at most 784 - 16 EGA colors plus 256 combinations of those 16 colors at 25%, 50% and 75% each - but realistically way less because of redundancy, like combinations that use the same input as color 1 and color 2 are just one of the original 16 colors, a 50% combination of color 1 and color 2 is the same as a 50% combination of color 2 and color 1, a 25% combination of color 1 and color 2 is the same as a 75% combination of color 2 and color 1, etc). All of this is to say, you're best off calculating this once at startup and then referencing the results from a table somewhere. All you'll really need is what two colors to use (since both are 4-bit, you can do that in a single byte - use the high nybble for color 1 and low nybble for color 2) and what character to use (probably can't get around using a whole byte here sadly).

 

A lot of this is hypothetical on my part, though, I've not put any of it to the test (except the part about calculating what a dithered color would look like as a solid color by gamma-correcting both input colors and calculating the output - I've done that elsewhere and can assert the results look great). It is veering very hard away from "let's make Doom playable on 386es!" into "Let's make text-mode Doom for science!", but as long as you're okay with that scope creep, I'm interested in the results.

Share this post


Link to post

@viti95Well, it goes without saying that any rendering would have to be done in a backbuffer before transferring to video RAM. Or were you thinking of performing all of the plane-splitting / pixel packing operations directly on the video card? :-p

 

A better question is whether it'd be more efficient to use column rendering functions that deal directly with bitplanes/pixel packing, or rendering normally to an 8-bit chunky buffer and then do the c2p conversion/pixel packing/color dithering all at once. The bummer is that on your target machines you have no vector/MMX instructions to accelerate such a process, and often not even (enough) cache.

 

As for bandwidth, dunno... certainly the ISA bus itself can handle transferring 16KB or 32KB of data 35 times per second (just above 0.5MB/sec and 1MB/sec respectively). The cards themselves, that's another matter. Even with VGAs, not all of them were created equal.

 

Edit: some more thoughts....on CGA you might be able to speed up considerably 4-px wide writes (and multiples) of the same color, by simply writing a single byte, no masking required. Of course you'll have to check for scaling. A minor speedup may be obtained for 2-px and 3-px wide writes, by merging them into a single bit-masking operation.

 

Of course the above will mean moving all of the pixel-packing to the column renderers... hard to say what will be more efficient.

 

Then again, since you are dealing with only 4 colours, it's possible to construct a 32-to-8 bit mapping for a suoer-efficient pixel packing function. Read 4 chunky pixels in a row, find out which of the 256 possible sequences it is, and write a single packed byte with 4 pixels at once. This of course presumes that you applied a 4-colour COLORMAP beforehand....

 

Anyway, all this mumbo-jumbo just confirms that those older graphics adapters were shitty/broken for more reasons than just having few colors :,-)

 

Edited by Maes

Share this post


Link to post

@Shadow Hog the main idea of the textmodes is to make FastDoom playable on really slow 386 processors (cacheless 386SX-25 and below), in my tests these modes are 2x times faster than potato mode at the same screen size. Also 386 processors where launched in 1985 and VGA cards were launched in 1987, so I though that adding support for older video cards would be fun and challenging, and time correct for earlier 386 processors. I'm using the same color approximation formula that you described, but avoiding the SQRT function as it's really slow, but if it's used at the game start that will be no problem. I'll try to implement the idea of precalculating all the color tables at startup, instead of every time the palette changes, as it should be faster.

 

@Maes I'll check that CGA idea, it looks pretty promising, but first I have to understand how CGA fully works. And yeah, old video cards are a real pain in the a** to program. The problem with the column and span renderer is that usually what is fast for column rendering, is slow for span rendering and viceversa, until I develop it we won't know if it's faster or not.

Share this post


Link to post

Push comes to shove, if you're doing no dithering at all (to avoid the gamma-related pow() calls), then you could probably get away with using Manhattan distance (the sum of the absolute values of each dimension's difference, for those who didn't know) instead of Euclidean (the actual value you'd get with a measuring stick between point A and point B, gotten with the formula in my previous post). Generally speaking they both go up and down at the same time, just at different rates, so what's closest in Euclidean is probably close enough for government work in Manhattan.

 

Heck, technically the dithered EGA colors aren't liable to change from run to run, so you can precalculate the lot of 'em offline and then run the distance function (whichever you opt for) on those at runtime/bootup.

Share this post


Link to post

To what extent can text-mode show two colours per cell? I'm wondering if you could have two "pixels" per cell (one half one colour, and the other half another), to give you a not-unreasonable 160 "px" wide resolution.

Share this post


Link to post

Text mode always has two colors per cell - foreground and background. So, to a total extent, to answer that initial question.

 

As for two pixels per cell: yes! 0xDD and 0xDE fill in the left and right halves with the foreground respectively, doubling the horizontal resolution for a screen size of 160x25, while 0xDC and 0xDF fill in the lower and the top halves of the cell with the foreground color respectively, allowing you to double vertical resolution to get a screen size of 80x50. The former is probably more useful, as you can already impact the vertical resolution by adjusting how many pixels of each row is displayed in the Motorola 6845 CRTC (as mentioned here on Great Hierophant's Nerdly Pleasures blog). You can get all the way up to 160x100 this way - Paku Paku uses this method to manage a 16-color 160x100 display on CGA cards. This does mean no dithering at all, but so it goes I guess.

Share this post


Link to post
12 hours ago, Kroc said:

To what extent can text-mode show two colours per cell? I'm wondering if you could have two "pixels" per cell (one half one colour, and the other half another), to give you a not-unreasonable 160 "px" wide resolution.

 

Sure you can. It's an old weird trick of the masters.

Share this post


Link to post

What do you think about the new status bar for the text modes? Should I make it different? (I know it's missing the arms and the Doomguy face)

2021-03-30 22_52_41-DOSBox-X 0.83.10, 100%, FDOOM.png

Share this post


Link to post

Looks pretty darn good for me, but I think it might be a little bit cleaner if the HUD doesn't display leading zeroes (i.e; 84% instead of 084%, 4 Shells instead of 004 Shells, etc.). Probably could just code a leading 0 character to be a black cell char, or something?

 

HUD face would definitely be important, since it actually does provide visual feedback for the direction of an attack. But it might also be too hard to make out given the low resolution. Maybe some kind of arrow graphic, if not?

Share this post


Link to post

I have released a new dev build, so everyone can test the new 80x25 video mode (virtual 80x50 resolution, 16 colors and CGA and EGA support!). This release includes the Doom shareware WAD so everyone can test this directly. There are some bugs to be fixed, but can give you an idea of how the new video modes work.

 

How to use:

  • Option 1 (original Doom color palette, 256 colors to 16 color conversion): fdoom.exe -fixcolors
  • Option 2 (using an optimized color palette, precalculated colors): fdoom.exe -file ega_pal.wad

If you have a CGA video card you have to add the command line parameter "-cga" in order to work properly.

 

Changelog:

 

  • Better RAM usage (also reduced memory footprint)
  • More code optimizations
  • Compiled with the latest OpenWatcom v2 version
  • Stripped episode finale texts from the executable, now are stored in external text files
  • Support for Doom II BFG edition. Use the command line parameter "-bfg"
  • New option to render the status bar background a little bit faster. Use the command line "-simplestatusbar". It replaces the status bar background with a simple grey color.
  • New command line parameter "-cga", this let's the new video modes run properly with CGA video cards
  • New command line parameter "-fixcolors", this is needed to correct the 256 to 16 colors conversion, as the original Doom palette is too dark for a direct conversion
  • New video modes based on text modes: 80x25 (CGA, EGA and VGA supported, 16 colors and virtual resolution of 80x50) and 80x50 (VGA only, 16 colors). Both support triple buffering. Those video modes will be released as separated executables, in order to not slowdown the original Mode Y version. Also the new executables have reduced load times and reduced RAM usage.

https://github.com/viti95/FastDoom/releases/tag/0.8_DEV1

 

Executable: https://github.com/viti95/FastDoom/releases/download/0.8_DEV1/FastDoom_0.8_dev1.zip

Edited by viti95

Share this post


Link to post

Just a little inquiry:

void V_DrawPatchDirect(int x, int y, patch_t *patch)
{
    int count;
    int col;
    column_t *column;
    byte *desttop;
    byte *dest;
    byte *source;
    int w;

    #if (EXE_VIDEOMODE == EXE_VIDEOMODE_80X25) || (EXE_VIDEOMODE == EXE_VIDEOMODE_80X50)
        return;
    #endif

When switched in Text mode (VIDEOMODE_80x25 or VIDEOMODE_80x50) there is no way how to draw graphical patches right on the screen, is it?

Share this post


Link to post

Well, as I mentioned before you could, in theory, draw precalculated blocks of ASCIIart/colorized text and pretend they are fixed-resolution graphics/patches, and do an on-the-fly substitution between "proper" graphics and their text-mode representations.

Share this post


Link to post

@Maes I've ment the old Commodore or ZX Spectrum way, where you could write text on the screen and then directly redraw it with graphical (bitmap) sprites in screen-buffer. It would be nice to have marine-face drawn over it or even let's say sprites/things.

Share this post


Link to post

@AnotherGruntNope, unfortunately we're talking about pure text modes here, not a "mixed mode" those 8-bit machines often had, where both a character buffer and a pixel-addressable screen could somehow coexist. You could however switch to an actual graphical mode and have a mix of text/normal graphical elements, but then you'd lose the main advantage of a pure text mode: speed.

 

You could approximate graphics with ASCIIart "patches", as I mentioned, or even abuse redefinable character sets (only on EGA and above, unfortunately).

Share this post


Link to post
12 hours ago, AnotherGrunt said:

Just a little inquiry:


void V_DrawPatchDirect(int x, int y, patch_t *patch)
{
    int count;
    int col;
    column_t *column;
    byte *desttop;
    byte *dest;
    byte *source;
    int w;

    #if (EXE_VIDEOMODE == EXE_VIDEOMODE_80X25) || (EXE_VIDEOMODE == EXE_VIDEOMODE_80X50)
        return;
    #endif

When switched in Text mode (VIDEOMODE_80x25 or VIDEOMODE_80x50) there is no way how to draw graphical patches right on the screen, is it?

 

I have developed two functions to render the patches in text mode (both 80x25 and 80x50): V_DrawPatchDirectText8025 and V_DrawPatchDirectText8050. These two functions allows drawing the original patches onto the screen mantaining the original size (which leads to a loss in resolution).

 

As @Maes said, it's not possible to mix text modes and graphical modes in any CGA, EGA or VGA card, but it's possible to redefine the EGA and VGA fonts on the fly.

Share this post


Link to post

Well, time for a new release. FastDoom 0.8. As always, here is the final changelog:

  • Better RAM usage (also reduced memory footprint)
  • More code optimizations
  • Compiled with the latest OpenWatcom v2 version
  • Stripped episode finale texts from the executable, now are stored in external text files
  • Support for Doom II BFG edition. Use the command line parameter "-bfg"
  • Smaller executable thanks to UPX compression tool
  • New option to render the status bar background a little bit faster. Use the command line "-simplestatusbar". It replaces the status bar background with a simple grey color.
  • Two new executables: FDOOMT25, which renders in text mode at 80x25 resolution (CGA, EGA and VGA supported, 16 colors and virtual resolution of 80x50) and FDOOMT50, which renders in text mode at 80x50 resolution (VGA only, 16 colors). Both support triple buffering. These executables have reduced even more the memory footprint as multiple graphics don't need to be loaded (fonts for example). Automap feature isn't supported. This release also includes the WAD ega_pal.wad, which replaces the original colormaps to a better one for text modes.
  • New command line parameter "-cga", this let's the new video modes run properly with CGA video cards
  • New command line parameter "-fixcolors", this is needed to correct the 256 to 16 colors conversion, as the original Doom palette is too dark for a direct conversion.
  • Unified the 386 and 486 executables, the 486 executables were always slower and bigger than the 386 ones. Never figured out why this happens, maybe you can blame OpenWatcom.
  • Unified executables between different versions of Doom onto a single one. The supported wads are the following:
    • DOOM.WAD -> DOOM Registered (3 episodes)
    • DOOM1.WAD -> DOOM Shareware (1 episode)
    • DOOMU.WAD -> Ultimate DOOM (4 episodes)
    • DOOM2.WAD -> DOOM II
    • PLUTONIA.WAD -> Final DOOM The Plutonia Experiment
    • TNT.WAD -> Final DOOM TNT Evilution
  • New SETUP program! I've found how to edit and compile the original IDSETUP program, and modified it to create a custom version for FastDoom.
  • Renamed the configuration file from default.cfg to fdoom.cfg. Also renamed save files from doomsav*.dsg to fdoomsv*.dsg.

 

https://github.com/viti95/FastDoom/releases/tag/0.8

 

Executable: https://github.com/viti95/FastDoom/releases/download/0.8/FastDoom_0.8.zip

Uncompressed executable: https://github.com/viti95/FastDoom/releases/download/0.8/FastDoom_0.8_uncompressed.zip

Edited by viti95

Share this post


Link to post
9 hours ago, viti95 said:

Well, time for a new release. FastDoom 0.8. As always, here is the final changelog:

Amazing. Ill update the wiki as soon as possible.

Two things:

  • You mention 80x25 support in 16 colors for CGA, EGA and VGA in a virtual resolution, and native 80x50 support in 16 colors for CGA. Outside the obvious, hoe much of a difference is there in virtual 80x50 res and native? Reason i ask is bv you market the native mode for VGA as a separate feature, but fdoomt25 does it very similarily but for CGA and EGA aswell. I feel im missing something obvious here outside the virtual VS native difference. Whats the catch? ;)
  • The only thing this would perfect the port is DeHacked support. Unless im dumb because i did get BTSX  running (but without DeHacked patch).

Also, provided it had enough space and RAM, FastDoom would now be performant enough to run on a Igel Etherminal 3X from 1994)/1995 (386-SX40, Cirrus Logic CL-GD5428 VGA, 2 MB VRAM, 4/8 MB RAM), one of the first X terminals using textmode. (The 2X was the first, had a 386 at 16 Mhz)

 

Just to put it in perspective :)

 

Share this post


Link to post

There are some differences between the 80x25 and the 80x50 modes:

  • The 80x25 mode supports CGA, EGA and VGA cards, while the 80x50 mode only works with VGA cards
  • The 80x50 mode is faster as it doesn't require to calculate the half height characters, and avoids reading from VRAM when drawing single pixels
  • The main issue of the 80x50 is that the font is very small, that make the menus look a bit weird

Don't know much about how DeHacked works, maybe it's possible to implement it.

Share this post


Link to post
8 hours ago, viti95 said:

There are some differences between the 80x25 and the 80x50 modes:

  • The 80x25 mode supports CGA, EGA and VGA cards, while the 80x50 mode only works with VGA cards
  • The 80x50 mode is faster as it doesn't require to calculate the half height characters, and avoids reading from VRAM when drawing single pixels
  • The main issue of the 80x50 is that the font is very small, that make the menus look a bit weird

Yeah i got that, but the explanation in your readme makes it not appear obvious but obfuscated. As in: It says exactly what you are telling about not what the obvious differences are.

8 hours ago, viti95 said:

Don't know much about how DeHacked works, maybe it's possible to implement it.

Vanilla, outside of a source port/scripting, basically had DeHacked as its only companion in the 90s when Greg Lewis made it, before the source code release. See the Wiki for a full explanation on what it does. DeHacked is a common standard in the Doom world and quite a lot of Vanilla WADS use it to introduce new monsters or new behavior.

 

One thing i am thinking about is how much overhead DeHacked adds to FastDoom. With DeHacked, things like Batman Doom become reality, but Back To Saturn X would also be fully supported by it then. (FastDoom currently can run BTSX but not its DEH file).  It would be fun to see how far you can go with third party wadsets and TC's. I'd love to see HacX ran in text mode like that. Perhaps @Doomkid may want to see this aswell.

 

You did mention that because FastDoom executables differ from stock Vanilla, DeHacked behavior might be different aswell. But you know, this could be fun.

 

For what it is worth, The FastDoom wiki page is updated. I have also added it to the text mode wiki page as FastDoom has the distinction of being one of very few ports that have a native text mode renderer. (The other is SMMU.)

 

Share this post


Link to post

Yep, the readme is very bad written, I should make a better one. Also I should update the GitHub readme xD. Again, thanks for updating the wiki @Redneckerz.

 

Anyway, I've started implementing the VGA Mode 13h, it's much faster under certain configurations (fast cpu's and some videocards)

 

 

Results with a Pentium III at 550 MHz and a ISA Cirrus Logic GD-5422:

  • Mode Y: 22,361 fps
  • Mode 13h: 51,351 fps (229% faster!)

 

Share this post


Link to post
13 minutes ago, viti95 said:

Yep, the readme is very bad written, I should make a better one. Also I should update the GitHub readme xD. Again, thanks for updating the wiki @Redneckerz.

 

Anyway, I've started implementing the VGA Mode 13h, it's much faster under certain configurations (fast cpu's and some videocards)

 

 

Results with a Pentium III at 550 MHz and a ISA Cirrus Logic GD-5422:

  • Mode Y: 22,361 fps
  • Mode 13h: 51,351 fps (229% faster!)

 

Did some testing of the terminal builds:

FDOOMT25 loads fine, but no matter if EGA wad is loaded in, it does not appear as sharp as your screens.

FDOOMT50 flickers heavily and is completely unplayable.

 

I am sure ill have to adjust config files, this is run on bog standard DOSBox 0.74-3.

Share this post


Link to post

FDOOMT50 flickers that way when you're using an EGA card, because the EGA cards supports that mode but the resolution is different (80x43). That makes the video pages to be allocated at different addresses than the VGA ones.

 

I recommend you to use DOSBox-X, enable the aspect ratio correction and disable the scaler. For me it's much better than the original DOSBox. I've added my dosbox-x conf so you can test exactly with my same setup.

dosbox-x.zip

Share this post


Link to post
4 hours ago, viti95 said:

Results with a Pentium III at 550 MHz and a ISA Cirrus Logic GD-5422:

  • Mode Y: 22,361 fps
  • Mode 13h: 51,351 fps (229% faster!)

 

 

Imagine being in 1994 and someone comes up with a hacked executable that runs twice as fast! It's amazing to see this. I assume Heretic ran faster than Doom because of this? Doom used Mode Y while Heretic used Mode 13h.

Share this post


Link to post
9 hours ago, axdoomer said:

 

Imagine being in 1994 and someone comes up with a hacked executable that runs twice as fast! It's amazing to see this. I assume Heretic ran faster than Doom because of this? Doom used Mode Y while Heretic used Mode 13h.

 

Well, the test was done in a circumstance where the "CPU speed" factor is dramatically minimized/nearly eliminated, by 1994 standards, but otherwise yeah, we're talking about a more than 200% improvement at least in the video output itself. I guess that explains why Heretic felt somewhat smoother than Doom, but certainly not 2x faster.

Share this post


Link to post
9 hours ago, axdoomer said:

 

Imagine being in 1994 and someone comes up with a hacked executable that runs twice as fast! It's amazing to see this. I assume Heretic ran faster than Doom because of this? Doom used Mode Y while Heretic used Mode 13h.

Doom used Mode 13h in early development, and switched to Mode Y for better performances on low-end systems. Heretic had higher system requirements since it was released later than Doom.

Share this post


Link to post

In hindsight, it's unclear if that was the correct choice, after all. Heretic seemed to run noticeably smoother than Doom on my 486 DX/40 that had a Cirrus Logic GD-series video card similar to the one used by viti95 (albeit with a VESA bus), so it begs the question exactly what they had in mind as "low end systems". Maybe specific video cards, or specific kinds of bottlenecks.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×