Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
jval

32 bit software renderer

Recommended Posts

Hi,

After some (quick) changes to my source port, I've managed to
get a working version of a 32 bit color SOFTWARE renderer.

Well, now I have some questions:


1. Does it worth it?
There are ports like PrBoom, GZDoom, Doomsday, etc that
use OpenGL, or even the old Doom3D that uses Direct3D.

2. Performance

After a lot of programming tricks for speed optimization:

Running in Pentium 4 3GHz:
35 fps (limited by game engine) in 800x600 display
17-20 fps in 1280x1024 display (<- SLOW!!!)

Running in AMD Athlon 2600+:
35 fps (limited by game engine) in 800x600 display
12-15 fps in 1280x1024 display (<- SLOW!!!)

Is that performance acceptable?

Back in 90's classic Doom was running fine in a 486DX!! that is about 50 or 100 times slower than a P4!


3. Invisiblility

I don't use fuzz column effect, I use a ligher color transparency effect:





4. Blood and explosions

I make explosions transparent like this:




Please tell me your opinion.

Thanks

Share this post


Link to post

Let's see your code to decide whether it can be improved in terms of speed. 32bit inevitably requires more data to be transferred but this certainly can be optimized.

Which port did you use as base for your work? I managed to do something similar 8 years ago with the long gone WinDoom port but unfortunately I can't get it to run anymore because I can't reconstruct the data files this version requires...

Share this post


Link to post

The code and the executables are here:

EDIT: (Uptaded link to newest version)
http://delphidoom.sitesled.com/downloads.html




I'm using DelphiDoom source port.
The performance stats are when I compile it with Delphi, if I use FPC it's a little bit slower...
When I look the dissasembly code there is no much 'room' for optimization (even I'm not an assembly guru).

Anyway, there are a few new column and span drawing routines,
located in r_draw.pas. For column drawing I use 2 dc_sources
(dc_source, as original Doom, and dc_source2) and then I
make call function R_ColorAverage() (in r_hires.pas) that averages
32 bit color values depending a variable called dc_mod (witch corresponts to the weights of the 2 dc_sources that create the

column).
For spans I use a next pixel average tecnique with varying weight parameter.

R_DrawFuzzColumnHi averages the background with white value and a constant factor.


R_DrawColumnAverageHi is used for blood and explosions and averages the explosion/blood sprite with the background.

Share this post


Link to post
jval said:

Back in 90's classic Doom was running fine in a 486DX!! that is about 50 or 100 times slower than a P4!

Considering, you're calculating about 20 times more pixels and writing 80 times the data to the video RAM it has to be. Not able to use palettes for accelleration, filtered textures and having transparency increases the calculations further. Thus the performance doesn't seem that bad for a software renderer.

However, the first reason for a 32 bit renderer I can think of is getting rid of those ugly light transistions. Look at the floor on your first screenshot. That I call 8 bit limitation in its full glory. I find that more distracting than unfiltered textures.

Share this post


Link to post
LogicDeLuxe said:

However, the first reason for a 32 bit renderer I can think of is getting rid of those ugly light transistions. Look at the floor on your first screenshot. That I call 8 bit limitation in its full glory. I find that more distracting than unfiltered textures.



Well the reason I was initially thinking for a 32 bit renderer to get rid the blocky walls when you are near a wall:

Normal detail (original)



Ultra detail (32 bit weight averaging)


I didn't work very much with my port in 32 bit software renderer,
just a couple of days. Then I found that I can create very easy
the new transparent blood/explosion effect and the new replacment of
the Fuzz column (that looks really bad especially in higher screen
resolutions).
I focused so much to those that I totally forgot
the 'clumsy' colormaps!!!! Sure it worth the price to check this!
Thanks!!!!!

Share this post


Link to post

I think rendering the Spectre as a lighter colored silhouette of the Demon looks odd. It's just my opinion, but I'd either leave it in it's original black static form or render it as a semi-transparent Demon, like in the PSX Doom.

Share this post


Link to post
jval said:

Well the reason I was initially thinking for a 32 bit renderer to get rid the blocky walls when you are near a wall

You can compare your ultra-quality with prboom231's quality of 32-bit software renderer with filters
http://prboom-plus.sourceforge.net/prboom-filter.png

P.S. I have checked up speed of prboom from trunk in 32-bit software mode at 1280x1024 on my P4-3GHz. I got more than 100 fps in most cases and 250 fps at 800x600 (without filters). PrBoom 2.3.1 in the same mode works much slower. As your port approximately. So you can pay attention on r1578, r1823 and r2246 revisions of prboom SVN

Share this post


Link to post

Having a fast look in PrBoom source I'm managed to mimic the filter tecnique
and after some (very fast and totally unoptimized) changes I'm managed
to get the following result:




The result is, let's say acceptable! I like it!

The problem now is performance, it's about 25% slower than my previous routine (unoptimized yet).

Share this post


Link to post
entryway said:

heh, you are fast


I just applied my filter routine not only to a second
dc_source, but also to top/down pixel neighbours
of each column. Just 3-4 lines of code :-)

Share this post


Link to post

Your work is totally admirable and I respect that, as it finally makes something requested long-ago finally available, in some form.

OK, IMHO it's not very easy to tell the difference with the 8-bit renderer, except in particular lighting conditions/room, however it's obvious there is no "1-bit" effect when looking at monsters placed far away.

What you really need is some speed optimization, perhaps by changing the way the screen is drawn. I think that performing pixel averaging etc. on each frame is really overkill, you need a simpler method, even at the expense of decreased visualization accuracy.

Share this post


Link to post

Nice work, interesting to see someone finally attempt that within the original DOOM software renderer.

Share this post


Link to post
Maes said:

What you really need is some speed optimization, perhaps by changing the way the screen is drawn. I think that performing pixel averaging etc. on each frame is really overkill, you need a simpler method, even at the expense of decreased visualization accuracy.


It's possible to get good speed optimization by using precalculating tables for averaging pixel values, but now my averaging function
has an accuracy of FRACUNIT(i.e. #FFFF) but if I use tables with an accuracy of only 16 values I need a table of size 256x256x16 = 1MB, or
if I use a table with an accuracy of 256 values I need a table of 256x256x256 = 16MB of memory!!

LogicDeLuxe said:

However, the first reason for a 32 bit renderer I can think of is getting rid of those ugly light transistions. Look at the floor on your first screenshot. That I call 8 bit limitation in its full glory. I find that more distracting than unfiltered textures.


I tried to get rid of the colormaps that produces the ugly light transitions by calculating the light value depending on sector's
lightlevel and the view distance. I didn't succeded to get correct result, the lighting is neither accurate, nor correct, but looks good:











Can anyone post me a simple function/algorythm for the light level
calculation depending on the sector's light level and the view distance?
I currently use for the light calculation:

llevel = (sector->lightlevel + extralight << 4) * 256 - distance >> 11;
if(llevel<0) {llevel=0;}
if(llevel>#FFFF) {llevel=#FFFF;}


witch calculates the lightlevel in range [0..#FFFF].

Share this post


Link to post
jval said:

I currently use for the light calculation:

llevel = (sector->lightlevel + extralight << 4) * 256 - distance >> 11;
if(llevel<0) {llevel=0;}
if(llevel>#FFFF) {llevel=#FFFF;}


witch calculates the lightlevel in range [0..#FFFF].


You could explicitly optimize the *256 part by performing a left shift by 8 positions (<<8, shl 8), instead of hoping the compiler will do so.

jval said:

It's possible to get good speed optimization by using precalculating tables...if I use a table with an accuracy of 256 values I need a table of 256x256x256 = 16MB of memory!!


You could actually store only half of these values, as e.g. average(246, 137) = average (137,246), but you will need to place an extra control step before consulting the table. In any case, the "table" would be "jagged", e.g. it would have different lengths depending on the first operand (which is always equal or larger than the second operand)

E.g.:

average: array[0..255] of array of byte;
then, allocate a variable length for each sub-array, so the final result looks like this:
average[0][0]:= 0
average[1][0]:= 0, average[1][1]:= 1
average[2][0]:= 1, average[2][1]:= 2 average[2][2]:= 2
etc.

to use the table for comparisons, the first index must always be larger than the first. This helps cutting down in size.

You may however noticed something...and that is 0+0 = 0, 0+1 =1, 1+0=1, 2+0=2, 2+1 =3 etc.

That is, each combination of two operands, no matter what their order is, results in an unique index, so if you create an uni-dimensional array (again, with the above structure), you can immediately get the average by performing a single addition e.g. average(123,127) = average[123+127], provided the array is organized as above.

You may think you need an extra addition, but that's really pointer arithmetic, which would have been done anyway by the compiler when using a multidimensional array.

Share this post


Link to post

The issue is not to optimize the lightlevel calculation (that has small overhead) but to get a correct calculation.

llevel = (sector->lightlevel + extralight << 4) * 256 - distance >> 11; -> does not give correct results!!!

Share this post


Link to post

The "ultra detail" is so blurry up close that it rapes my eyes and makes them bleed.

(read: It needs to be A LOT sharper to look good; that's just ugly)

Share this post


Link to post

Something else...I assumed so far you're "averaging" between 8-bit values, right?

In that case you only need 128*256 bytes, nowhere close 1 MB or 16 MB you mentioned, and that's with a full range of 256 values.

In case you're averaging between FRACUNITS, which would effectively be 16-bit shorts, then yeah, that would be overkill but the "half memory" trick still applies, but unfortunately, not the "cheap index" one :-(

Another note: since you originally mentioned only a 256*256 array, that means you are only considering 8-bit values, or at least, only an 8-bit part of the FRACUNIT struct, which in turn, only makes sense to be mapped to 8-bit values anyway, and not e.g. the average of 64K values vs other 64K values, that yeah, would be tremendous and way larger than 16 MB.

Average between two 8 bit integer numbers -> is an 8-bit integer number itself. You can map it to a 16 bit value too, if you wish, but that doesn't change much.

Share this post


Link to post

Yes, I can reduce the size of tha table but there is an extra check
of the greater byte value.

Maes said:

That is, each combination of two operands, no matter what their order is, results in an unique index, so if you create an uni-dimensional array (again, with the above structure), you can immediately get the average by performing a single addition e.g. average(123,127) = average[123+127], provided the array is organized as above.



This tecnique as far as I can understand gives an average "div 2"
of two byte values. I need a average(123,127, factor) function.
with the factor can take an acceptable range of values (after
some tests the result is good enough if I use 16 or 32 different
factor values.)

e.g

If is used a range of 16 values of factor:
average(1, 64, 0) = 0
average(1, 64, 1) = 4
average(1, 64, 2) = 8
average(1, 64, 3) = 12
average(1, 64, 4) = 16
average(1, 64, 5) = 20
average(1, 64, 6) = 24
average(1, 64, 7) = 28
average(1, 64, 8) = 32
average(1, 64, 9) = 36
average(1, 64, 10) = 40
average(1, 64, 11) = 44
average(1, 64, 12) = 48
average(1, 64, 13) = 52
average(1, 64, 14) = 56
average(1, 64, 15) = 60
average(1, 64, 16) = 64

but
average(32, 33, 0) = 32; != average(1,64, 0)
average(32 + 33, 0) = 32; != average(1 + 64, 0)
average(65, 0) = 32; != average(65, 0)

Share this post


Link to post
Maes said:

Something else...I assumed so far you're "averaging" between 8-bit values, right?

In that case you only need 128*256 bytes, nowhere close 1 MB or 16 MB you mentioned, and that's with a full range of 256 values.

In case you're averaging between FRACUNITS, which would effectively be 16-bit shorts, then yeah, that would be overkill but the "half memory" trick still applies, but unfortunately, not the "cheap index" one :-(

Another note: since you originally mentioned only a 256*256 array, that means you are only considering 8-bit values, or at least, only an 8-bit part of the FRACUNIT struct, which in turn, only makes sense to be mapped to 8-bit values anyway, and not e.g. the average of 64K values vs other 64K values, that yeah, would be tremendous and way larger than 16 MB.

Average between two 8 bit integer numbers -> is an 8-bit integer number itself. You can map it to a 16 bit value too, if you wish, but that doesn't change much.


I'm considerring 8 bit values of each R, G, B component of
the curpal (current pallete, 'PLAYPAL' lump as transformed by the
gamma correction tables)

Share this post


Link to post
Jodwin said:

The "ultra detail" is so blurry up close that it rapes my eyes and makes them bleed.

(read: It needs to be A LOT sharper to look good; that's just ugly)


The blurry detail is the result of using a wide range of factor.

If I reduce the factor to 16 values looks like that:



If I reduce the factor to 4 values looks like that:



Note: Both images are rendered by using a lower averaging factor
for next column / next pixel but in FRACUNIT factor for the (wrong) lighting without colormaps.


When I render the view using colormaps (and not "dynamic" lighting) and a range of 4 values of averaging factor :



It's not so blurry and needs a 256x256x4 = 256KB memory table for averaging.

Share this post


Link to post

You could explicitly optimize the *256 part by performing a left shift by 8 positions (<<8, shl 8), instead of hoping the compiler will do so.

All modern compilers without exception (and even Delphi) know much more than you can suppose.

i := GetTickCount * 48 =>

  call GetTickCount
  mov ebx, eax;
  shl ebx, $04
  lea ebx, [ebx+ebx*2]

Share this post


Link to post

I do use Delphi. but the bulk of my programming tasks (and my area of expertise) is in Java, I've learned not to trust the language's JIT "compiler", and usually you have to do really gross things like manually inlining code or expliciting loop variables instead of hoping the JIT compiler does it for you...and these things DO cause tremendous performance differences.

In any case:

@jval: I can't quite understand what kind of function you're trying to implement. The standard definition of average between two numbers is (a+b)/2. There is also weighed average, which requires two extra factors c and d: (ac+bd)/(c+d) if c,d are integers, or xa+yb, if x=c/(c+d) and y=d/(c+d).

The only case where "your" function matches the normal definition of average is:

average(1, 64, 8) = 32

In any case, no matter what you're trying to implement, I see you need a third parameter...which I'm asking, is it really necessary? First of all, is the filtering between columns absolutely necessary? It's like trying to emulate multilinear hardware texture filtering in software....is that really what your port was meant to do?

I thought it was only about bringing colormap-free lighting variations and freedom from the 8-bit palette, so if it was my project I'd strive to get that part done first: a pure software, unfiltered renderer which eliminates the need for colormaps and is able to apply lighting effects that look similar to Doom's but with 32-bit depth.

Texture filtering done in software? People spend 100s of $ exactly not to do that...I mean it should not be a priority in your project, IMHO. Try getting light levels correct first and leave the engine with a crisp, unfiltered look for now.

More importantly, try adding support for 24-bit and 32-bit textures and sprites, which is after all what a 32-bit rendered port should be all about: I'd like to be able to load a custom texture/sprite without using the restrictive doom palette, yet see the same cool doom's lighting effects applied to it, as if they were extended from 8 bit to 16 or 32 bit (AFAIK, only PSX Doom has a true 16 bit rendering which extends Doom's own).

Share this post


Link to post
Maes said:

and these things DO cause tremendous performance differences.


A better algorythm can also cause tremendous performance differences,
I used a precalced table for color averaging and I noticed an FPS
increase by 40%-45%!!!! But unfortunately nothing is free, the
table needs 4MB of RAM! I decseased it to 1 MB without any noticable
loss but a 32 bit software renderer needs a Pentium 4 class computer,
so there is a little room for 4MB memory waste.... Anyway, it can
be changed very easily, I just need to change a constant DEFINE
(AKA const) and it's not the time right now for me to find the
"happy medium" between memory usage and rendering accuracy. The thing
that now is No 1 priority for me is the correct lighting calculation
without the colormaps.

Maes said:

I can't quite understand what kind of function you're trying to implement.


My function average(x, y, f) implements pixel averaging
with weighed average. The x anf y parameters are palette indexes. The value of the f parameter corresponts to
the weight of each of the two pixels: If f is in range 0..16, then
the weight of x is (16 - f) and the weight of y is f.

eg:

average(100, 200, 10) = average32(curentpalette[100], curentpalette[200], 10)

#define FACTORRANGE 16

ulong Average32(ulong c1, c2; int factor);
{
byte r1, g1, b1;
byte r2, g2, b2;
ulong r, g, b;
int factor1;

r1 = c1;
g1 = c1 >> 8;
b1 = c1 >> 16;
r2 = c2;
g2 = c2 >> 8;
b2 = c2 >> 16;

factor1 = FACTORRANGE - factor;
r := ((r2 * factor) + (r1 * factor1)) / FACTORRANGE;
g := ((g2 * factor) + (g1 * factor1)) / FACTORRANGE;
b := ((b2 * factor) + (b1 * factor1)) / FACTORRANGE;
result := r + g << 8 + b << 16;
}

Now about colormap issues:

32 bit Software rendering using colormaps:



32 bit Software rendering dynamic lighting (lihgt calculation is wrong, sorry):



Direct 3D rendering (from an project I've made a long time ago, sorry no lighting at all!) with texture filtering:




If you take a close look to the above pictures you will notice that
Direct 3D renderer makes a slightly better texture filtering than
the sofware renderer, but during gameplay is not noticable.

Maes said:

...try adding support for 24-bit and 32-bit textures and sprites...


This could be a little easier in a Direct3D/OpenGL renderer, in software renderer could be a little more complex especially when dealing with R_GenerateComposite function that gets a column. It could
be easier in flats and in textured walls without creating composited images from WALL lumps.

Share this post


Link to post
jval said:

(thorough explanation)


OK, now it makes more sense, thanks for your explanation. However, IMHO 4 MB or even 16 MB overhead are really not much nowadays, when the average system has 1GB of RAM. I'd trade them in anyday for a speed boost (which you can never get enough of).

jval said:

This could be a little easier in a Direct3D/OpenGL renderer, in software renderer could be a little more complex especially when dealing with R_GenerateComposite function that gets a column. It could
be easier in flats and in textured walls without creating composited images from WALL lumps.


It seems like the whole point of having a 32-bit software renderer is being missed, if the above is true: why not implement a clean, "unfiltered" renderer first? If somebody wanted filtering, he'd use something hardware accelerated IMHO. But maybe it's just me...I'm only "complaining" because the filtering seems to be taking a significant portion of the development of your otherwise very cool project, while it's something you'd not expect in a "software renderer".

So...in practice what you're accomplishing so far is a software renderer similar to the standard one in its inner workings (colormaps, sprites, textures) but expanded to 32-bit color depth, with some effects implemented through real time/precomputed formulas, and with a 32-bit image filtering layer "on top" -correct me if I'm wrong-.

I don't know how much of the inner workings is actually performed in 32-bit color depth (e.g., do you convert loaded sprites and textures to 32-bit "without palette" before drawing them? Are all lighting effects like e.g. Berserk implemented by altering e.g. the red component of screen pixels instead of applying a palette? Etc.)

Share this post


Link to post

why not implement a clean, "unfiltered" renderer first?

What is the sense of 32-bit software renderer without filtration for compatible port? Transparency?

If somebody wanted filtering, he'd use something hardware accelerated IMHO

32-bit software renderer make sense, because there are no OpenGL ports which can emulate all software tricks and lighting. GZDoom is most close, but still far from vanilla.

Share this post


Link to post

Maes: In fact I'm accomplishing so far a 32 bit software renderer
"playing" with texture filtering and light effects. A TRUE 32 bit
software renderer will also render 32 bit color images, possible
but quite difficult to me at this time. Flats can be replaced
by true color images quite easy but walls and sprites need work!!!
I don't filter the columns when load them from wad, the renderer
do the filtering and the lighting effects.
Also a better 32 bit software renderer should not use z-axis shift
for up/down look, but make render as true 3D engine!! (months of work...)

entryway: A 32-bit software renderer should make filtering! Even
Direct3D without filtering looks bad!!

Latest samples (fixed colormaps) :

Normal detail:



Ultra detail: (no colormap tables)




As LogicDeLuxe said "the ugly light transistions" of "8 bit limitations" are gone!! Notice the smooth light transition
in the second image.

Share this post


Link to post
entryway said:

What is the sense of 32-bit software renderer without filtration for compatible port? Transparency?


Quite obviously it's to allow people creating custom graphics for whatever reasons to use all 32 bits for their images. If you just implement filtering but don't add support for 32 bit sprites, textures and other images, the whole thing is pretty much pointless.

Personally, I'd most rather have a renderer with support for 32-bit images while still emulating vanilla lightning behaviour (not necessarily the "ugly transitions" though), without any kind of filtering.

Share this post


Link to post
Jodwin said:

If you just implement filtering but don't add support for 32 bit sprites, textures and other images, the whole thing is pretty much pointless.

Why? It looks perfectly with original game data already. The only problem is speed.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×