Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
entryway

Sunder.wad, map11 - FPS

Recommended Posts

Yeah, the corpses were a poor example, influenced by another thread. Still, replace the word "corpses" with "living monsters" and you get the picture.

Share this post


Link to post
Gez said:

If I have understood the whole deal correctly anyway, the reject tables speeds up line-of-sight checks, and I don't think there are many such checks performed by corpses. AFAIK, it's only used in AI routines for active monsters, such as A_Look or A_Chase, isn't it? If not, where else is it used?


A_Explode also uses it - and some special functions for Raven's monsters.

But yes, it's mostly A_Look/A_Chase.

A corpse is an inert object that won't ever call it unless it performs some special actions like dealing some radius damage periodically.

Share this post


Link to post
Maes said:

Yeah, the corpses were a poor example, influenced by another thread. Still, replace the word "corpses" with "living monsters" and you get the picture.


But I still don't think it'd matter much. Because since it's part of the AI loop rather than the rendering loop, it runs at 35 Hz. In other words, the 100 FPS that GZDoom gained on Entryway's computer were gained because the simulation left more free time for the renderer to render useless frames, not because the frames were rendered faster. So I'm not sure the maths are that easy to extrapolate if you change the amount of active monsters.


Still, it is indeed overall a useful thing because it does improve speed; but it's not a very useful thing except in some rather specific scenarios. If you've got, say, less than 400 monsters on the map, and a fast computer, you probably won't notice anything. Sunder MAP11 has 5295 monsters when you start it.

Just as a little experiment, I made myself a small ZDoom mod that can decimate the amount of monsters in a map, in a completely random way, thanks to skill-based replacement with RandomSpawners.

Here's a quick benchmark of what I got. (Yeah, my computer is a lot weaker than Entryway's. :p)
5295 monsters (normal): 90 FPS
2707 monsters (halved): 116 FPS
1309 monsters (quartered): 200 FPS
572 monsters (less than 10%): 215 FPS

If you're curious, said mod is here. Requires ZDoom obviously. Adds three skill levels, based on Ultra-Violence, which just randomly remove more and more monsters, each one having a random chance to just not be there. No two playthrough will be the same -- though they will all be made ridiculously easy compared to the normal skill levels. (Not advised for use with Ultimate Doom boss levels because you'll look very stupid if the boss monsters are randomly not there for you to fight. Also, as a side effect, pain elementals are seriously nerfed since their attack have a chance not to spawn a lost soul. This wasn't deliberate but I didn't feel like bothering to fix it.)

Graf Zahl said:

A_Explode also uses it - and some special functions for Raven's monsters.

But it's still all actor logic, not rendering logic.

Share this post


Link to post

And this is why there are so many source ports doing it differently, and I am glad of that. Choose the one you like. (I shall duck now while you continue the war).

Share this post


Link to post

Interenstingly, we were debating with Jodwin whether an implementation based on adjacency lists could bring some advantages.

In theory, on maps with very large numbers of sectors (e.g. 5K or more) but low connectivity (each sector sees tops 10 others, and much less on average), this would allow for enormous memory savings, and improved cache performance which could offset the fact that looking up a list wouldn't be an O(1) operation anymore. Because surely, keeping reject information for 5K sectors compacted into a few kBs rather than spreading it over 3.5 MB of RAM surely ain't the same from the point of view of the cache.

Share this post


Link to post
Graf Zahl said:

Nobody denies that it helps. However, if the difference is 300 fps vs 400 fps the actual benefit is rather questionable when your monitor only can do 60. You just get more tearing artifacts...

Speeding something up from 70 to 200, that's a different matter though, because it lowers the minimum required hardware specs to play the level.

Speeding up the engine is ALWAYS beneficial. Consider this scenario:
. 2-player coop session
. relatively fast computers
. software rendering
. 1920x1080 resolution
. +500 monster map
. blood+smoke effects
. wall splats

Now, consider that Player 1 gets hit in the face by 4 imp shots. That's 4-level deep translucency, and, there's no way that isn't going to lag. Also, at the same time, Player 2 drops a lift containing 150 corpses, to reveal a wall with overlaid splats. Possibly not anywhere as bad, but significant. Combine those with the extra blood and smoke effects, and you've got the potential for an unpleasant amount of lag. Not necessarily the sight checker's fault, but it should be obvious that, no matter how fast the computer, FPS can always be dropped to unacceptable rates.

Having said that, it should be obvious that, anything that can be done to speed up operations should be considered, and the REJECT map can provably do just that. And, it's simple to create - just wait a little longer.
Sure, a fixed version of the old sight checking is awesome, and fast. And even better when combined with a valid REJECT lump.
Had id understood the issue with their <=1.2 sight checking, would REJECT still exist? Yes, because it was implemented in 1.2 and below.

tempun said:

Suppose that a map has 10 000 sectors. Then the size of the REJECT table for that one map will be 12 mb. Ouch. REJECT just doesn't scale.

I'd rather it "not scale" at design time, than during a game. Besides, unless your port of choice allows a 0-length REJECT (ZDoom and friends), you will still eat up 12Mb, but, if the REJECT lump was not built, it is sitting there providing no benefit. Thanks for nothing.

Share this post


Link to post

Heh. I want to make a map with a large open area and yet a terrain subdivided in a ridiculous number of sectors (something that will blow the REJECT size to at least the full size of the cache), and then throw a nuts.wad horde at it ;-)

It will be interesting to compare the various methods, including no reject use, dynamic computation of reject (that would be trivial, but not always consistent, depending on which monster did the check. But then again, REJECT assumes that another sector is visible from ANYWHERE within another particular sector, or not at all), and an adjacency list, to see if the O(k) search time can still beat a LOS check and provide better cache efficiency than a large reject.

Share this post


Link to post
kb1 said:

Speeding up the engine is ALWAYS beneficial. Consider this scenario:


The thing is that a lot of other things could be optimized in the same way, with large static tables of precomputed data. But they can create problems as well; mostly in the way that they are static and prebuilt, meaning that if the engine ever becomes more dynamic than the algorithm expected, they'd have to be ignored or potentially create strange glitches.

Real-time sector movement and deformation (as seen for example 1in the latter half of this Shadow Warrior tech demo), if they were implemented in the Doom engine, they could invalidate the REJECT lump. What to do then? Recalculate the reject table in real-time? Ignore it altogether? The former would be prohibitive for maps with a lot of sectors; and the latter would just make them dead weight.

And yeah, this example is from a different engine with a different architecture, but mimicking Build's possibilities is not entirely impossible: if you look at ZDoom's real-time building of mini-BSP trees, you have what could maybe serve as the first step towards this goal.

And besides, it's a speedup that only happens 35 times per second, and not once per frame. Meaning that you shouldn't necessarily expect a large boost in rendering performances, since indeed it's not a rendering optimization at all. Heck, suppose you have done the work of entirely separating the renderer logic from the simulation logic so that each can be run in its own thread. Suddenly and magically, you no longer gain any FPS increase from the REJECT lump -- because all it speeds up is simulation logic, which remains capped at 35 Hz, so provided you managed to reach this number it no longer changes anything.


Sure, any optimization is good to take, but that doesn't prevent a cost/benefit analysis to be made on them to see if they still actually are optimizations. The circumstances can change. Multicore computers weren't even thought of when Doom was being developed, for example.


Hey, as a freebie, another cost/benefit study of optimization. See this? Do you know why it happens? Because they're drawn with hyper-optimized assembly routines that only work with powers of two... Getting rid of this optimization would decrease performances a bit, but fix an annoying glitch.

Maes said:

Heh. I want to make a map with a large open area and yet a terrain subdivided in a ridiculous number of sectors (something that will blow the REJECT size to at least the full size of the cache), and then throw a nuts.wad horde at it ;-)


For fun, design the level so that each sector can actually see all others -- in other words, the REJECT lump will be filled with zeroes.

For REJECT to be very efficient, you need three things:
1. Lots of monsters (really tons of them)
2. Spread all over several sectors (not all concentrated in the same area)
3. Which cannot see each other (so that the test-skipping bit does happen)

Remove any of these elements and the boost plummets. Remove two or more and the lump is as good as useless.

Share this post


Link to post
Gez said:

For REJECT to be very efficient, you need three things:
1. Lots of monsters (really tons of them)
2. Spread all over several sectors (not all concentrated in the same area)
3. Which cannot see each other (so that the test-skipping bit does happen)

Remove any of these elements and the boost plummets. Remove two or more and the lump is as good as useless.


Hmm you're right. A better design would probably be be a sort of complex concentric maze populated with monsters, with lots of obstructions. Something like that:



(and, dammit, I'm reminded again of how cool it would be to have a program capable of turning CAD, vector drawings or edge detection images into Doom maps, even ones needing a lot of manual retouching).

Share this post


Link to post
Gez said:

The thing is that a lot of other things could be optimized in the same way, with large static tables of precomputed data...

If so, I'd be interested to know what they are - they might make sense to implement.

Gez said:

But they can create problems as well; mostly in the way that they are static and prebuilt, meaning that if the engine ever becomes more dynamic than the algorithm expected, they'd have to be ignored or potentially create strange glitches.

Yeah - with some as-of-yet-theoretical major engine changes, one would theoretically need to develop many different optimizations to support such changes. What I was discussing was primarily Doom with Doom's way of doing things.

Gez said:

...
For REJECT to be very efficient, you need three things:
1. Lots of monsters (really tons of them)
2. Spread all over several sectors (not all concentrated in the same area)
3. Which cannot see each other (so that the test-skipping bit does happen)

Remove any of these elements and the boost plummets. Remove two or more and the lump is as good as useless.

Or, as useless as with an empty REJECT table. The whole point is that the situation you describe is *exactly* when REJECT shines. This "300/400 FPS" debate might seem silly - when playing Doom II Map01 there's no benefit, but it's those large populated maps that really need all the performance you can throw at them. More or less, it effectively eliminates the need to calculate line-of-sight everywhere except the area the player is in.

I tend to play a lot of coop. I find it cumbersome to have to maintain two versions of each WAD - the original, and a version I've rebuilt the REJECT tables on, but, I definitely notice a difference, especially in a net game. And, yes, my home port has a menu setting for toggling out the sight check: Doom 1.9's vs. PrBoom Plus's (heretic?) adaption, with ZDoom's "through the center" bug fix.

REJECT does make noticable differences, in most cases.

Maes said:

...dynamic computation of reject...

I'd be interested to see that in action. (obvious comment)It would be especially helpful if it were enabled when a zeroed REJECT lump was loaded.

Share this post


Link to post

After rotating MAP11 I have only 4 fps in prboom with "-complevel 9" :)

270 fps with "-complevel 0", 135 fps without complevel and 250 fps in gzdoom.

200 fps with "-complevel 9" after building reject table with ZenNode

Share this post


Link to post
entryway said:

270 fps with "-complevel 0", 135 fps without complevel


Interesting that complevel 0 is exactly twice as fast as the default mode.

Share this post


Link to post

What do you think makes the biggest difference when the complevel is used? Does that enable the MBF AI stuff?

Share this post


Link to post
kb1 said:

What do you think makes the biggest difference when the complevel is used? Does that enable the MBF AI stuff?

Here - the LoS calculation. MBF AI stuff is enabled on complevels 11 (MBF) and above.

Share this post


Link to post
kb1 said:

What do you think makes the biggest difference when the complevel is used? Does that enable the MBF AI stuff?

At least it is not related to P_CrossSubsector. After replacing P_CrossSubsector with copy from Boom, prboom is still 100 fps with "-complevel 17" instead of 5 fps with "-complevel 9"

Share this post


Link to post

Entryway: Maybe try the "REJECT as 0xFF" setting to see the maximum difference in speed!

I recently added a menu option to my home source port that toggles the 1.9 vs. 1.2 sight checking, however I incorporated the ZDoom diagonal trace fix to the 1.2 code. It seems to work ok, and it's quite fast, but, having played quite a few coop games using the modified 1.2 sight-checking, I started to notice a pattern: Sometimes, whole groups of monsters failed to spawn in on those maps. The monsters never woke up. They were placed into those off-the-map spawn closets with teleport lines.
I haven't analyzed the maps yet, but I assume the map designers fiddled with the spawn box's sector references, or joined sectors, or something, specifically to get the monsters to awaken, and I suspect that that doesn't work too well with the 1.2 sight checking.
Has anyone else noticed this?

I am considering making a new setting that uses the 1.2 checking, say, 90% of the time, but occasionally calls the slower 1.9 code 10% of the time, simply to get the map designer's intended effects to work, while still being faster. This is, of course, not useful for old demo compatibility - I am trying to improve multiplayer performance, without losing the original Doom feel.

Share this post


Link to post

I duplicated some map data into separate struct to reduce cache misses in P_CrossSubsector

Timedemo on sunder.wad map11 (without shooting of course) in comparison with 2.5.1.1

doom2  (-complevel 2)  85.9 fps -> 109.1 fps = +27%
boom   (-complevel 9)  51.4 fps ->  62.9 fps = +22%
prboom (-complevel 17) 87.5 fps -> 107.3 fps = +23%
I am still not absolutely sure that the new code has no bugs and can't lead to desynches, so currently changes are not commited, but you can download win32 binaries from changeloga page and check.

kb1 said:

Has anyone else noticed this?

If you have fully functional and bugs free replacement for LOS based on doom 1.2, then post it here. It will make sense for next complevels (if it will happen ever).

Share this post


Link to post
entryway said:

I duplicated some map data into separate struct to reduce cache misses in P_CrossSubsector

Timedemo on sunder.wad map11 (without shooting of course) in comparison with 2.5.1.1

doom2  (-complevel 2)  85.9 fps -> 109.1 fps = +27%
boom   (-complevel 9)  51.4 fps ->  62.9 fps = +22%
prboom (-complevel 17) 87.5 fps -> 107.3 fps = +23%

Very nice! How does it hold up during a game? Yes, the memory caching is so problematic for Doom, especially the renderer. I've been considering prefetch and no-cache commands, seems like they could provide some real benefit in isolated cases.

entryway said:

If you have fully functional and bugs free replacement for LOS based on doom 1.2, then post it here. It will make sense for next complevels (if it will happen ever).


Well, I don't know if it's bug free (see my post above). Once again, upon playing a few coop games, I sometimes noticed that monsters were still in their teleport closets at the end of the maps, so I don't know if those maps rely on the 1.4 - 1.9 sight checking or not. I need to look into this further.

Basically, the code I use is almost directly lifted from ZDoom. However, I started with the PrBoom Plus code, and "adapted it" with ZDoom's fixes, so it felt like I was a part of it :) It's posted below (I use tabs, and a rather strange indent style, I must say...)

#define MAX_SIGHT_COUNT		64

int P_SightPathTraverse_New(fixed_t x1, fixed_t y1, fixed_t x2, fixed_t y2)
{
	fixed_t		xt1;
	fixed_t		yt1;
	fixed_t		xt2;
	fixed_t		yt2;
	fixed_t		xstep;
	fixed_t		ystep;
	fixed_t		partialx;
	fixed_t		partialy;
	fixed_t		xintercept;
	fixed_t		yintercept;
	
	int		mapx;
	int		mapy;
	int		mapxstep;
	int		mapystep;
	int		count;

	validcount++;
	intercept_p = intercepts;

	if (((x1 - bmaporgx) & (MAPBLOCKSIZE - 1)) == 0)
		x1 += FRACUNIT;        // don't side exactly on a line
  
	if (((y1 - bmaporgy) & (MAPBLOCKSIZE - 1)) == 0)
		y1 += FRACUNIT;        // don't side exactly on a line
  
	trace.x = x1;
	trace.y = y1;
	trace.dx = x2 - x1;
	trace.dy = y2 - y1;

	x1 -= bmaporgx;
	y1 -= bmaporgy;
	xt1 = x1 >> MAPBLOCKSHIFT;
	yt1 = y1 >> MAPBLOCKSHIFT;

	x2 -= bmaporgx;
	y2 -= bmaporgy;
	xt2 = x2 >> MAPBLOCKSHIFT;
	yt2 = y2 >> MAPBLOCKSHIFT;

	// points should never be out of bounds, but check once instead of
	// each block
	if (xt1 < 0 || yt1 < 0 || xt1 >= bmapwidth || yt1 >= bmapheight	||
		xt2 < 0 || yt2 < 0 || xt2 >= bmapwidth || yt2 >= bmapheight)
	{
		return 0;
	}

	if (xt2 > xt1)
	{
		mapxstep = 1;
		partialx = FRACUNIT - ((x1 >> MAPBTOFRAC) & (FRACUNIT - 1));
		ystep = FixedDiv(y2 - y1, abs(x2 - x1));
	}
  
	else if (xt2 < xt1)
	{
		mapxstep = -1;
		partialx = (x1 >> MAPBTOFRAC) & (FRACUNIT - 1);
		ystep = FixedDiv(y2 - y1, abs(x2 - x1));
	}
  
	else
	{
		mapxstep = 0;
		partialx = FRACUNIT;
		ystep = 256 * FRACUNIT;
	}

	yintercept = (y1 >> MAPBTOFRAC) + FixedMul(partialx, ystep);


	if (yt2 > yt1)
	{
		mapystep = 1;
		partialy = FRACUNIT - ((y1 >> MAPBTOFRAC) & (FRACUNIT - 1));
		xstep = FixedDiv(x2 - x1, abs(y2 - y1));
	}

	else if (yt2 < yt1)
	{
		mapystep = -1;
		partialy = (y1 >> MAPBTOFRAC) & (FRACUNIT - 1);
		xstep = FixedDiv(x2 - x1, abs(y2 - y1));
	}

	else
	{
		mapystep = 0;
		partialy = FRACUNIT;
		xstep = 256 * FRACUNIT;
	}
	
	xintercept = (x1 >> MAPBTOFRAC) + FixedMul(partialy, xstep);

	// [RH] Fix for traces that pass only through blockmap corners. In that case,
	// xintercept and yintercept can both be set ahead of mapx and mapy, so the
	// for loop would never advance anywhere.

	if (abs(xstep) == FRACUNIT && abs(ystep) == FRACUNIT)
	{
		if (ystep < 0)
		{
			partialx = FRACUNIT - partialx;
		}

		if (xstep < 0)
		{
			partialy = FRACUNIT - partialy;
		}

		if (partialx == partialy)
		{
			xintercept = xt1 << FRACBITS;
			yintercept = yt1 << FRACBITS;
		}
	}

	//
	// step through map blocks
	// Count is present to prevent a round off error from skipping the break
  
	mapx = xt1;
	mapy = yt1;


	for (count = 0; count < MAX_SIGHT_COUNT; count++)
	{
		if (!P_SightBlockLinesIterator(mapx, mapy))
		{
			return 0;  // early out
		}
		
		if ((mapxstep | mapystep) == 0)
			break;

		switch ((((yintercept >> FRACBITS) == mapy) << 1) | ((xintercept >> FRACBITS) == mapx))
		{
			case 0:		// neither xintercept nor yintercept match!
				// Continuing won't make things any better, so we might as well stop right here
				count = MAX_SIGHT_COUNT;
				break;

			case 1:		// xintercept matches
				xintercept += xstep;
				mapy += mapystep;
			
				if (mapy == yt2)
					mapystep = 0;
			
				break;

			case 2:		// yintercept matches
				yintercept += ystep;
				mapx += mapxstep;
			
				if (mapx == xt2)
					mapxstep = 0;
				
				break;

			case 3:		// xintercept and yintercept both match
			
				// The trace is exiting a block through its corner. Not only does the block
				// being entered need to be checked (which will happen when this loop
				// continues), but the other two blocks adjacent to the corner also need to
				// be checked.
			
				if (!P_SightBlockLinesIterator(mapx + mapxstep, mapy) ||
					!P_SightBlockLinesIterator(mapx, mapy + mapystep))
				{
					return 0;
				}
			
				xintercept += xstep;
				yintercept += ystep;
				mapx += mapxstep;
				mapy += mapystep;
			
				if (mapx == xt2)
					mapxstep = 0;
			
				if (mapy == yt2)
					mapystep = 0;
			
				break;
		}
	}
		
	// couldn't early out, so go through the sorted list
	return P_SightTraverseIntercepts();
}

Share this post


Link to post
kb1 said:

I haven't analyzed the maps yet, but I assume the map designers fiddled with the spawn box's sector references, or joined sectors, or something, specifically to get the monsters to awaken, and I suspect that that doesn't work too well with the 1.2 sight checking.
Has anyone else noticed this?



That shouldn't be possible. If the monsters are in a separated area any sight check will fail, both with the fixed 1.2 and 1.9 algorithm. Anything else would mean that sight checking is not working.

Such monsters are always awoken by sound.

Share this post


Link to post
Graf Zahl said:

...Such monsters are always awoken by sound.

Ah, that's right - thanks, Graf! I guess I must be having an issue with sound flooding. One example off the top of my head is map 4 of DV.wad. In the back of the "mouth" room, sometimes I can't get the monsters to spawn easily. I haven't studied how that's supposed to work though, could be totally unrelated. I seem to remember that, typically, one wall of DV spawn closets references a sector near the spawn destination - I assume that's how the monsters are supposed to wake up.

But, I've noticed many more examples of spawn failure in different wads - I'll have to research it somewhat to find a definite repeatable failure case for my home port. Kinda hard to do research when you're hosting a coop game with your friends...

After Graf cleared that up for me, I can say that the code I posted probably does work pretty well. It has no demo version check; basically it would be called instead of the 1.2-compatible version when speed vs. 1.2-demo-compatibility was desired.

One day, I'll be ready to upload v1.0 of my home port so I can refer to it instead of these code extracts.

entryway said:

I think it is noticeable smooter. You can test it himself.

I will! Nice to see such improvements - keep up the good work!

Share this post


Link to post
entryway said:

If you have fully functional and bugs free replacement for LOS based on doom 1.2, then post it here. It will make sense for next complevels (if it will happen ever).

Why not create a new complevel and disallow recording on it?

Share this post


Link to post
tempun said:

Why not create a new complevel and disallow recording on it?

Based on MBF+ compatibility options hell? nooooooo

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×