Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Redneckerz

FastDoom: DOS Vanilla Doom optimized for 386/486 processors

Recommended Posts

12 hours ago, AnotherGrunt said:

@Maes It probably isn't ISA bus speed itself but how transfer instructions are structured. Look at this. Is it really done by 8-bit ISA transfer?

 

Interesting read...so the ISA bus's main problem seems to be that it's just not all that efficient. At best, with single-block transfers that hog the entire memory bus, the theoretical maximum taking into account DMA controller timings seems to be a measly 2 MB/sec on an AT (16-bit) and 1 MB/sec on an XT (8-bit), basically an eight of the theoretical bandwidth based on frequency and bus width alone. And in practice, getting 50% of that 1/8th is considered "pretty good optimization" O_o

 

So with that in mind... updating 64000 pixels (320 x 200) with 4 bpp 35 times a second should be barely doable, at least on an AT and a 16-bit card (1.120 MB/sec required, just a tad more than 50% of the theoretical 2 MB/sec an AT should be capable of). Assuming of course the video card itself isn't a complete dog and doesn't introduce unnecessary delays of its own, the main system's RAM also has plenty of bandwidth to spare etc.

Share this post


Link to post
On 3/24/2021 at 6:10 PM, viti95 said:

You're right @Maes CGA and EGA native modes will be slower than the VGA, but i think it's possible to have decent performance by using a backbuffer, render everything onto it and then transfer all the data to the video card. The main problem with CGA and EGA cards is that the 8-bit ISA bus bandwith is very limited, and the 320x200 resolution is too much for those cards. Even for 16-bit ISA VGA cards the 320x200 resolution is too much. Maybe the non-standard CGA 160x100 mode will be fast enough. I will finish first the text mode, and then try to implement any of those modes.

 

@zokum Dithering can be done, but it will be slower for sure. Maybe it's doable with a fast 486, but i'm pretty sure that it is a no-no for 386 processors. Also using some CGA tricks it's doable but limits the number of compatible setups (only composite NTSC monitors are supported)

 

BTW i've tested the new modes with a very old Trident EGA card and a "fast" processor (486DX@50), it runs pretty smooth. The major factor limiting the framerate is still the processor.

 

 

Edit: the textmode can be faster, for now i'm using a translate table that converts the 256 color output to 16 colors, and that table is recalculated every time the palette changes. Also the color mapping can be better, i'm struggling hard to find the best convert formula.

Just don't attempt a slaughter map with these graphics enabled. :D

Share this post


Link to post
11 hours ago, Maes said:

 

Interesting read...so the ISA bus's main problem seems to be that it's just not all that efficient. At best, with single-block transfers that hog the entire memory bus, the theoretical maximum taking into account DMA controller timings seems to be a measly 2 MB/sec on an AT (16-bit) and 1 MB/sec on an XT (8-bit), basically an eight of the theoretical bandwidth based on frequency and bus width alone. And in practice, getting 50% of that 1/8th is considered "pretty good optimization" O_o

 

So with that in mind... updating 64000 pixels (320 x 200) with 4 bpp 35 times a second should be barely doable, at least on an AT and a 16-bit card (1.120 MB/sec required, just a tad more than 50% of the theoretical 2 MB/sec an AT should be capable of). Assuming of course the video card itself isn't a complete dog and doesn't introduce unnecessary delays of its own, the main system's RAM also has plenty of bandwidth to spare etc.

 

I'll create a separate open source benchmark tool for all types of video cards, this way we'll see the real throughput those cards can achieve. Using SpeedSys the maximum transfer rate i've seen for an ISA card is nearly 7MB/s (a Cirrus Logic GD-5429). That card can achieve almost 55 fps with a Pentium III in FastDoom mode 13h, that's 3.35MB/s. I guess the ISA bus depends on the chipset and the videocard, maybe if DMA is used (sound cards?) the bus become more bottlenecked.

Share this post


Link to post
Spoiler

#ifdef MODE_ATI640
void ATI640_DrawBackbuffer(void)
{
    int x;

    unsigned char *vramScanline1 = (unsigned char *)0xB0000;
    unsigned char *vramScanline2 = (unsigned char *)0xB2000;
    unsigned char *vramScanline3 = (unsigned char *)0xB4000;
    unsigned char *vramScanline4 = (unsigned char *)0xB6000;

    unsigned int base = 0;
    unsigned char color0, color1, color2, color3;

    for (base = 0; base < SCREENHEIGHT * 320;)
    {
        // 1st scanline

        for (x = 0; x < 160; x++, base += 2, vramScanline1++)
        {
            color0 = ptrlutcolors00[backbuffer[base]];
            color1 = ptrlutcolors01[backbuffer[base]];
            color2 = ptrlutcolors00[backbuffer[base + 1]];
            color3 = ptrlutcolors01[backbuffer[base + 1]];
            *(vramScanline1) = (color0 & 3) << 6 | (color1 & 3) << 4 | (color2 & 3) << 2 | (color3 & 3);
            *(vramScanline1 + 0x8000) = (color0 & 12) << 4 | (color1 & 12) << 2 | (color2 & 12) | (color3 & 12) >> 2;
        }

        // 2st scanline

        for (x = 0; x < 160; x++, base += 2, vramScanline2++)
        {
            color0 = ptrlutcolors10[backbuffer[base]];
            color1 = ptrlutcolors11[backbuffer[base]];
            color2 = ptrlutcolors10[backbuffer[base + 1]];
            color3 = ptrlutcolors11[backbuffer[base + 1]];
            *(vramScanline2) = (color0 & 3) << 6 | (color1 & 3) << 4 | (color2 & 3) << 2 | (color3 & 3);
            *(vramScanline2 + 0x8000) = (color0 & 12) << 4 | (color1 & 12) << 2 | (color2 & 12) | (color3 & 12) >> 2;
        }

        // 3rd scanline

        for (x = 0; x < 160; x++, base += 2, vramScanline3++)
        {
            color0 = ptrlutcolors00[backbuffer[base]];
            color1 = ptrlutcolors01[backbuffer[base]];
            color2 = ptrlutcolors00[backbuffer[base + 1]];
            color3 = ptrlutcolors01[backbuffer[base + 1]];
            *(vramScanline3) = (color0 & 3) << 6 | (color1 & 3) << 4 | (color2 & 3) << 2 | (color3 & 3);
            *(vramScanline3 + 0x8000) = (color0 & 12) << 4 | (color1 & 12) << 2 | (color2 & 12) | (color3 & 12) >> 2;
        }

        // 4th scanline

        for (x = 0; x < 160; x++, base += 2, vramScanline4++)
        {
            color0 = ptrlutcolors10[backbuffer[base]];
            color1 = ptrlutcolors11[backbuffer[base]];
            color2 = ptrlutcolors10[backbuffer[base + 1]];
            color3 = ptrlutcolors11[backbuffer[base + 1]];
            *(vramScanline4) = (color0 & 3) << 6 | (color1 & 3) << 4 | (color2 & 3) << 2 | (color3 & 3);
            *(vramScanline4 + 0x8000) = (color0 & 12) << 4 | (color1 & 12) << 2 | (color2 & 12) | (color3 & 12) >> 2;
        }
    }
}
#endif

 

 

This part is the culprit. It simply cannot be fast. Isn't there some ASM example of how to write into ATI's VRAM?

Share this post


Link to post

The ATI 640x200 is very rare, is a mix between planar and packed pixel modes, and uses 4 different scanlines. The Commodore PC10 Advanced Graphics Adapter manual describes it very well. I've been optimizing this method in the last days, but it's still slow. Even with a very fast processor the framerate is low, so I guess the videocard is limiting the performance.

 

1543635654_2021-11-2715_09_14-Commodore_PC10-PC20_Advanced_Graphics_Adapter.pdfy3pginasms-Perfil1_Mi.png.3b753619bf3d19e797d5bf2e41839e7c.png228515418_2021-11-2715_08_45-Commodore_PC10-PC20_Advanced_Graphics_Adapter.pdfy3pginasms-Perfil1_Mi.png.c81def078f977c62009c38ecbb8c2601.png

 

Commodore_PC10-PC20_Advanced_Graphics_Adapter.zip

Edited by viti95

Share this post


Link to post

@viti95 Well, as I see it, it is intentional. Red and Green bits are the most important colors. So my wild guess is first bitplane (Plane 0) is more important than the second bitplane. Make update as 2-pass write function. In first pass update "Plane 0" bitplane and then "Plane 1" and only even-numbered scanlines, return from ATI640_DrawBackbuffer(), let routine R_RenderView() update backbuffer and then the same thing for odd-numbered scanlines. If lucky you'll be able to catch CRT ray as is updating CRT screen and color artifacting won't be crazy.

 

Happy computing,

COMMODORE ;-)

Edited by AnotherGrunt

Share this post


Link to post

Some dithering stuff could be precomputed. This would help on the banding one often seens in low-res modes. If one upscaled the textures to twice the height and width and used up to 4 different colors in order to represent the original color better. You would need more sprite/texture memory, but with only 4 bits of data per pixel, that would half the memory need per pixel. You could even go for upscaling width only and use the same amount of memory.

Also, the original Doom setup.exe used a hack to change the blue into 'Romero blue' that was a different shade of blue. What would the game look like in text mode if the mode used had 16 custom colors instead of the default ones? You can certainly redefine the two magenta shades and two cyan shades. You might do fine without the white as well, and replace it with a slightly darker light grey. Make the greens less intense and more like the green in the game would also improve the look of things.

image.png.23f6a8f12997d4c4f43200b750ac2101.png

 

Share this post


Link to post

@zokum yeah VGA cards can change the palette of 16 color modes and the text font, so it's possible to make text based modes even better. Using shade characters (or custom ones) it's possible to simulate more colors (as SMMU does), even change the palette in realtime as the 256 color modes do. I'll try to implement those techniques in the 80x50 text mode and 160x200 VGA mode. A custom colormap for 16 color modes would make them look much better.

 

Also i'll reenable the "Romero blue" in the setup program, the code is still there but was disabled for whatever reason.

Share this post


Link to post

A small update, got working VGA 16 color palette changes. It's a little bit weird how it works (the colors aren't mapped to 0-15 first DAC entries), but it works.

 

This is the VGA 160x200 16 color mode with a custom palette based on the original one. I know, it can be done better ^^

 

 

Share this post


Link to post

OK...willl I sound weird if I mentioned that I was mostly impressed by that pitching-down sound during the intro screen? I never heard this before O_o

Share this post


Link to post
2 hours ago, Maes said:

OK...willl I sound weird if I mentioned that I was mostly impressed by that pitching-down sound during the intro screen? I never heard this before O_o 

 

I'm using a X2GS midi module (got it and a Orpheus sound card thanks to keropi from Vogons), it sounds very similar to Roland SC-88. Add a pair of good speakers, and you have the ultimate experience.

 

15 hours ago, VGA said:

Runs fast!

 

It should run even faster, the backbuffer it's still rendering at 320x200, but the VGA 160x200 only uses half of it (performance uplift should be near twice). This mode is really good because you have only to update ~16kb of VRAM per frame. ISA 8-bit VGA cards really love this mode.

Share this post


Link to post

Maybe a 16 shades of grey-version would be the one with the best overall look... Textures would keep a lot of their fidelity with this many shades to use. Also good for B/W monitors!

Share this post


Link to post

@zokum that's a good idea, 16 shades of grey are more than enough visually. But I guess someone in the forum with more artistic ability than me can create a much better version of this 16 limited color palette (playpal + colormap).

 

mode16v.zip

Share this post


Link to post

A "16 Shades of Grey" wad that only contains a palette and colormap with 16 shades of grey sounds like a cool idea for a community project.

Share this post


Link to post
19 hours ago, jerk-o said:

A "16 Shades of Grey" wad that only contains a palette and colormap with 16 shades of grey sounds like a cool idea for a community project.

No STARTAN allowed. Only STARGREY.

Share this post


Link to post

Ay caramba!  Esto es fantastico :)

 

Seriously though I've been following this really closely.  It's amazing that you're keeping up with this.  The close bond between software and hardware is 'almost' a lost art nowadays.  This is really a great port, keep it up!

Share this post


Link to post

Looks really good!!! I guess on a real CRT it should look even better. It's a shame we can't use full text characters on these modes, the HUD really suffers with low resolutions (and not much can be done to solve that)

 

As always, feel free to create a pull request in the github repo, i'll be very happy to add this new mode :D

 

Thanks @Frenkel !

Share this post


Link to post
14 minutes ago, viti95 said:

Looks really good!!! I guess on a real CRT it should look even better. It's a shame we can't use full text characters on these modes, the HUD really suffers with low resolutions (and not much can be done to solve that)

 

Well, it's possible to both get more pseudocolors AND give a semblance of proper shape/contours if you use encoding techniques such as the ones used to make 8088 corruption. Now, pre-processing every (full resolution) frame to give it the best possible representation using the entire gamut of ASCII characters and colors is not trivial.

 

Doing that for every frame on the entire viewport would likely kill off any speed gains obtained from the rest of the hacks, if somehow you tried to make a real-time version of the encoder used to produce the "text video" used in that demo. However, it might be just about viable if you restrict it selectively to a few key areas -e.g. just the HUD or even just parts of it.

Share this post


Link to post
7 hours ago, viti95 said:

As always, feel free to create a pull request in the github repo, i'll be very happy to add this new mode :D

 

Sure, but I don't know if I should include a separate mode136.wad. I've based the 80x100 136 color mode on MODE_CGA16. Like that mode I created a separate wad with a PLAYPAL and a copy of the COLORMAP from mode16.wad. I used IrfanView to match the 136 colors to Doom's palette, but the result is similar if I don't supply the PLAYPAL and just let I_ProcessPalette in i_ibm.c do its work. Although I think the chain gun looks better with IrfanView's PLAYPAL.

If I don't supply the COLORMAP I think the game looks too dark.

 

And I don't like the all grey ceiling in most of the video. I don't know where that's coming from. The ceiling isn't just one color when I run the MODE_CGA16 executable and all 16 colors are also available in the 136 color mode.

Share this post


Link to post

I think you would be better off with a text mode version handcoded to do the status bar than to convert the graphics into text mode. It should also be faster.

Share this post


Link to post

Something like this could be a bit more readable, uses only 80x4 - Added glyphs for all the numbers as well. Took me about 10 minutes to hack together. To be honest another solution is needed for the face. Maybe just the eyes and then adding the looking left and right would convey the right tone. Perhaps a mini marine in 6x4 chars that turn around and animate a bit would be a usable twist?

The game should be faster if this rather simple status bar can be rendered at the bottom, leaving more cycles for the 3d graphics. This could perhaps only be updated when there are changes to the data.
image.png.c05ca77969f1eecf6153ef30480523d0.png

 

Share this post


Link to post

@zokum The idea is good but won't work with hacked CGA modes. The "high res" 80x100 16 color modes use text mode, but only two rows are drawn per character, so no full characters are available. It's possible though to composite hand crafted graphics and maybe simulate fonts with those glyphs. For example:

 

1k03_16c_cga_ansi_from_hell.png.e0e80c7e74ab3d7046b24fda26b56f6d.png

 

As for basic 80x25, it's definetily useable, but having such low resolution it's preferable to make the HUD as small as possible.

 

@Frenkel It's fine to add mode136.wad as a separate file. I guess the colormap needs to be tuned to user better the available color palette. There are some colormaps out there that make the game brighter, maybe those can help.

Share this post


Link to post

Mine was more for the 80x25. As for the quarter character mode, it should still be possible to work out something sensible even with the default char set. The UI i made was meant more as a template. I could probably whip up various status bar looks for that cga mode as well.

Still, the UI is important and even in 80*25 using 16% of the gfx for the status bar is on par with the original game really as it uses a similar amount of real estate on the screen. The original game uses 32/200 pixels, which happens to be 16% as well.

I could probably make an 80*50 version that was more detailed and better represented the "feel" of the game.

Share this post


Link to post

I've also added code for a similar VGA 80x200 136 color mode.

And I'm supplying a PLAYPAL, because with that palette you can compare this mode to Potato mode, which also runs in a resolution of 80x200.

 

Have you seen MagiDuck? It's a scrolling platform game that runs in CGA 80x50 16 color mode and it has a readable font.

 

BTW, there are actually only 122 different colors instead of 136, because different combinations of colors merge into the same pseudocolor, unfortunately.

Share this post


Link to post

I've been playing around all day with the Color corrections options in IrfanView to improve the palette of the CGA 80x100 136 color mode. I settled on to increase contrast by 64, gamma correction by 2.50 and saturation by 45.

 

This is what it looks like:

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×