Search In
• More options...
Find results that contain...
Find results in...

# Dynamic wiggle/Tall Sector fix for fixed-point software renderer

## Recommended Posts

If you don't like idea to have precalculated lengths of segs, you can calculate it on the fly with slightly modified R_PointToDist to get distance between two points instead of point and (viewx,viewy).

Well, I couldn't get my super-duper mind-blowing code to work. I was going to impose an opposite error upon the calculation to cancel out the wobble. I gave it a half-assed effort and didn't have luck.

```//	rw_distance = FixedMul(hyp, sineval);
rw_distance = (int)(hyp * sin((distangle * .00000000146291808f)));
```
That eliminates the long wall error, and is probably pretty fast. Of course, it uses that dirty floating point :)

The point is that, it seems that the only real problem is in the finesine table. If you look at the middle entries, you'll see quite a few angles all map to the same 16-bit sine value. finesine stores 32-bit fixed sine values, but only 16 of those bits are set (the upper 16 bits are all zero). I am considering shifting the finesine values left 8 or 16 bits or so, and filling in the low half properly, to see if the table can still be used.

If I read the code correctly, hyp can be no larger than 8192 (see question below), so I could use 2 extra finesine bits right away without overflow, if they were available.

Linguica's approach is the most mathmatically correct approach, but I think this might be faster, especially by creating a more-precise finesine. Actually, I am going to create a new finesine table: "finersine" :), still fixed-point, but, maybe shifted 8 bits left in comparison to finesine. This new table will be the same size, and will be indexed the same way, and only used for rendering.

Can anyone verify that R_PointToDist cannot return values higher than 8192 fixed-point?

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:

```float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;

x2 = number * 0.5F;
y  = number;
i  = * ( long * ) &y;                       // evil floating point bit level hacking
i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
y  = * ( float * ) &i;
y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

return y;
}
```

Linguica said:

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:

Yeah, I've been wanting to use that formula ever since I saw it. It's one of those things that shouldn't work, but does.

I guess some profiling is in order:
. Original w/long wall error
. Linguica's fix
. Linguica's fix w/Entryway's optimizations and pre-calc inv_length
. Linguica's fix w/Entryway's optimizations and magic inv sqrt
. kb's original code + one-line floating-point sin() cheat
. kb's original code + more precise "finersine[]' table

Maybe I'll get some time to try them all, but I may need to wait for the weekend.

Quasar said:...Sticking to it religiously for Doom is an exercise in frustration beyond a certain point...

If this stuff is just an intellectual exercise, that's one thing... being religious about fixed point - then I don't think that's a good idea at all...[/B]

Linguica said:

...I'm partial to thinking about optimizations / improvements that *could* have been done in the original DOS engine in 1993. It's more interesting that way...

Of course, you both have very valid points. Here's a third viewpoint, for what it's worth (actually a bunch of small points):

. Doom can be compiled to run on a lot of hardware, with mixed support for floating-point, some very good, some very slow.
. Conversion to/from fixed/float must be done properly, so it's not slow.
. If you rewrite the whole renderer with floating-point in mind, you can create an awesome, fast renderer (like Cardboard). But, it's sort of all or nothing - you don;t want to be doing conversions everywhere.
. Yes, computers are very fast now, but Doom must do a lot more than it did in '93: 100x the pixels, massive limit-removal levels, 15K+monster levels, etc.

If I'm in a peer-to-peer coop game with, say, 2 others, playing a huge level, if that single float calculation is even a millisecond slower, multiplied by 500 walls, 35 fps, that just might push me past that 35 fps boundary, where I lose a frame. It matters.

In other words, fixing these old renderer bugs better not slow me down much. I have gotten used to my renderer's speed. And, yes, it's kinda neat to see what could have been done back in '93.

Having said all that, yes, a lot of these fixes are in the realm of creating what I call "poor man's floating point" with fixed-point math, and it can get quite ridiculous. For me, this is an intermediate step towards full conversion to floating-point, which I'm not ready for yet.

And, finally, these old bugs have frustrated me and others for years. I want to know why they occur, and I want to finally put them down!

Can anyone verify that R_PointToDist will never return a value higher than 8192? Thanks in advance.

Another issue with long walls:

Looks like precise value for rw_offset does help:

```// rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
double dx = viewx - curline->v1->x;
double dy = viewy - curline->v1->y;
double hyp = sqrt(dx*dx+dy*dy);
double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
rw_offset = -sin(a)*hyp;```

Linguica said:

Calculating the inverse square root of the length of segs is the perfect opportunity to use this famous function:
*snip*

Curious, does Q_rsqrt still work without -fno-strict-aliasing in GCC and clang?

Quasar said:

Curious, does Q_rsqrt still work without -fno-strict-aliasing in GCC and clang?

```\$ gcc -O2 -Wall -o check check.c
check.c: In function â€˜Q_rsqrtâ€™:
check.c:19:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
i  = * ( long * ) &y;                       // evil floating point bit level hacking
```
of course, the Doom engine is hardly one to talk, since its entire functionality is built around type punning and void pointers...

entryway said:

Another issue with long walls:

Looks like precise value for rw_offset does help:

```// rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
double dx = viewx - curline->v1->x;
double dy = viewy - curline->v1->y;
double hyp = sqrt(dx*dx+dy*dy);
double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
rw_offset = -sin(a)*hyp;```

What code were you using when you made the video? Does the code quoted above fix the video's issue, or does it cause the issue to occur?

kb1 said:

What code were you using when you made the video? Does the code quoted above fix the video's issue, or does it cause the issue to occur?

Yes, brutal code above fixes the issue. Original calculation of rw_scale is commented in the first line.

entryway said:

Yes, brutal code above fixes the issue. Original calculation of rw_scale is commented in the first line.

What happens if you use my first formula?

I believe that the call to R_PointToDist is ok to calculate hyp. I think the only thing that causes long wall error is bad finesine values.

So, what happens in that area in the map you posted, if you change your code to this below?

```// // rw_offset = FixedMul (hyp, -finesine[offsetangle >>ANGLETOFINESHIFT]);
// double dx = viewx - curline->v1->x;
// double dy = viewy - curline->v1->y;
// double hyp = sqrt(dx*dx+dy*dy);
// double a = (double)offsetangle/(1<<19)*2*M_PI/8192;
// rw_offset = -sin(a)*hyp;

rw_distance = (int)(hyp * sin((distangle * .00000000146291808f)));

```
EDIT: Or, if you prefer:
```#define DOOM_ANG_TO_RAD   ((2 * M_PI / 8192) / (1<<19))
...

rw_distance = (int)(hyp * sin((distangle * DOOM_ANG_TO_RAD)));
```
Does look nicer. If this works, I'll try to make a more precise finesine table, which, if that also works, should be quite fast.

kb1 said:

What happens if you use my first formula?

I don't know what is distangle in your code. Anyway, I fixed it with nice code.

```const int shift_bits = 1;

int_64_t dx = (curline->v2->x - curline->v1->x) >> shift_bits;
int_64_t dy = (curline->v2->y - curline->v1->y) >> shift_bits;
int_64_t dx1 = (viewx - curline->v1->x) >> shift_bits;
int_64_t dy1 = (viewy - curline->v1->y) >> shift_bits;

int_64_t distance = (dy * dx1 - dx * dy1) / (curline->length >> shift_bits);
int_64_t offset = (dx*dx1 + dy*dy1) / (curline->length >> shift_bits);

rw_distance = (fixed_t)(distance << shift_bits);
rw_offset = (fixed_t)(offset << shift_bits);
```
You even can remove this shift_bits stuff. It's hard to get int64 overflow there. Code becomes very simple.
```int_64_t dx = curline->v2->x - curline->v1->x;
int_64_t dy = curline->v2->y - curline->v1->y;
int_64_t dx1 = viewx - curline->v1->x;
int_64_t dy1 = viewy - curline->v1->y;

rw_distance = (fixed_t)((dy * dx1 - dx * dy1) / curline->length);
rw_offset = (fixed_t)((dx*dx1 + dy*dy1) / curline->length);
```

I understood. You meant

```rw_distance = (int)(hyp * cos((offsetangle * .00000000146291808)));
rw_offset = (int)(hyp * -sin((offsetangle * .00000000146291808)));```
It does help for long walls, but doesn't help for too long walls (30k) on my test level above.

entryway said:

I understood. You meant

```rw_distance = (int)(hyp * cos((offsetangle * .00000000146291808)));
rw_offset = (int)(hyp * -sin((offsetangle * .00000000146291808)));```
It does help for long walls, but doesn't help for too long walls (30k) in my test level above.

Ah, ok then. Nice fix!

entryway said:

You even can remove this shift_bits stuff. It's hard to get int64 overflow there. Code becomes very simple.

```int_64_t dx = curline->v2->x - curline->v1->x;
int_64_t dy = curline->v2->y - curline->v1->y;
int_64_t dx1 = viewx - curline->v1->x;
int_64_t dy1 = viewy - curline->v1->y;

rw_distance = (fixed_t)((dy * dx1 - dx * dy1) / curline->length);
rw_offset = (fixed_t)((dx*dx1 + dy*dy1) / curline->length);
```

This is beautiful, thank you!

What about segs with zero length? Are they possible? Are they possible in R_StoreWallRange? Then we should check before dividing or skip them.

Segs are defined by their start and end vertices so if you have a map with 2 vertices on top of each other then yes, I guess. IIRC the game would never try to draw them anyway, since such a seg would always span 0 pixels.

Linguica said:

IIRC the game would never try to draw them anyway, since such a seg would always span 0 pixels.

Correct. Segs with zero length should be skipped in R_AddLine

```x1 = viewangletox[angle1];
x2 = viewangletox[angle2];

// Does not cross a pixel?
if (x1 == x2)
return;
```

Alright, now that this thread has born the fix for the wiggling lines and the long wall bug, here's the next curiosity that needs to get fixed. Look at the alignment of the floor flat tiles when the light flickers:

My hypothesis is that this has again to do with insufficient angle calculations: When the lights are off, all the floor has the same height, flat and lighting -- so the engine considers it as one visplane. When the lights are on, in turn, the floor turns into two different visplanes, so the engine needs to recalculate the floor texture offset from the point on where the lighting changes. Since this point is a lot closer to the player than the starting point of the formerly combined visplane, the alignment shifts by a single pixel and the floor wiggles. Can someone confirm that this makes sense?

Isn't that already fixed in prboom-plus? I definitely notice that when I play normal prboom at high resolutions I can see floor textures jittering around a lot when I turn slowly. I assumed it had to do with the affine mapping the Doom engine uses and that prboom-plus does something more perspective-correct. Or am I totally wrong?

It still flickers in PrBoom+, at least with the software renderer. Tested with SVN rev 4403, though.

I just went to that exact spot in 2.5.1.3, 8-bit rendering, 1280x960, and didn't see anything.

Sorry, but could you please try the latest 2.5.1.4.test version provided here:
http://prboom-plus.sourceforge.net/history.html

I just tried with 1600x900 both 8-bit and 32-bit software renderer and the floor tiles jump leaps!

edit: it happens when I change rendering quality from "Quality" to "Speed". Looking at the source this changes the behavior of R_MapPlane():

```  // e6y
//
// [RH]Instead of using the xtoviewangle array, I calculated the fractional values
// at the middle of the screen, then used the calculated ds_xstep and ds_ystep
// to step from those to the proper texture coordinate to start drawing at.
// That way, the texture coordinate is always calculated by its position
// on the screen and not by its position relative to the edge of the visplane.
//
// Visplanes with the same texture now match up far better than before.
//
// See cchest2.wad/map02/room with sector #265
```

Linguica said:

edit: it happens when I change rendering quality from "Quality" to "Speed". Looking at the source this changes the behavior of R_MapPlane():]

Ah, I see, I had this set to "Quality". Now we just need someone to translate this routine into fixed-point arithmetics. ;)

I would love to see an update to Prboom + with this fix applied sometime.

Breezeep said:

I would love to see an update to Prboom + with this fix applied sometime.

It is applied since ages, you just need to enable it by setting Options->General->Rendering Quality from "Speed" to "Quality".

This is how it looks in fixed-point math:

```distance = FixedMul(planeheight, yslope[y]);

ds_xstep = FixedMul(viewsin, planeheight) / abs(centery - y);
ds_ystep = FixedMul(viewcos, planeheight) / abs(centery - y);

ds_xfrac =  viewx + FixedMul(viewcos, distance) + (x1 - centerx) * ds_xstep;
ds_yfrac = -viewy - FixedMul(viewsin, distance) + (x1 - centerx) * ds_ystep;
```
Edit: Don't forget to return() early if (y == centery)!

Nice!

I feel like I am at the crossroads, between fixed-point and floating-point. I wonder if it's worth it to dynamically switch functions to/from fixed/float, maybe based on map size. Or just go float and be done with it. I know that modern float is very fast, so should I abandon the older processors? I have a feeling that, on modern CPUs, float may even get parallelled, incurring very low cost, if written carefully.

I guess some profiling is in order. Has anyone done any timings in these areas?

There were/are source ports to choose from for older cpus, no need to cater to them anymore in 2015.

I don't think that float math should be considered slow per-se anymore -- at least since the rise of the 486-DX ;) -- but converting back and forth between float and int types wastes CPU time, c.f.
http://stereopsis.com/FPU.html