Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
entryway

Sunder.wad, map11 - FPS

Recommended Posts

Sunder.wad map11 at start position:

-complevel 0 - 212 fps
-complevel 2 - 150 fps
-complevel 9 - 77 fps
-complevel 11 - 150 fps

Doom v12 is 3x faster than Boom

Just because of

dboolean P_CheckSight(mobj_t *t1, mobj_t *t2)
{
  if (compatibility_level == doom_12_compatibility)
    return P_CheckSight_12(t1, t2);
Look at profiler statistic for Boom complevel:



P_CrossSubsector eats 50% of CPU time.

Share this post


Link to post

entryway said:

LOL, P_CrossSubsector eats 50% of CPU time. [/B]


Let's see of there is room for optimization in P_CrossSubsector() function.

In original source there is the following comment:

// stop because it is not two sided anyway
// might do this after updating validcount?
if (!(line->flags & ML_TWOSIDED))
return false;

EDIT: Just figured out that PrBoom+ correct this :)

Share this post


Link to post

PROTIP: Sunder.wad doesn't have REJECT tables

Share this post


Link to post
Maes said:

PROTIP: Sunder.wad doesn't have REJECT tables

200+ fps with REJECT lump instead of 77, heh

So REJECT table is strongly recommended for MAP11

Share this post


Link to post

Well that pretty much invalidates Graf's older argument that the REJECT table is no longer useful in modern ports, then.

Share this post


Link to post
Maes said:

Well that pretty much invalidates Graf's older argument that the REJECT table is no longer useful in modern ports, then.

P_CheckSight from (G)ZDoom is similar to doom 1.2.

After building REJECT table FPS increases from 300 to 400 for GZDoom on MAP11. GZDoom is much faster than glboom-plus on this stress level. It is 1.6x slower on map10, but 1.9x faster on map11.

Share this post


Link to post

I'm not surprised by this.

The old blockmap based CheckSight algorithm is quite a bit better than the BSP based 1.9 version.

That's why ZDoom uses this code, too, albeit a fixed version of it with no 'holes' the monsters can look through.

Maes said:

Well that pretty much invalidates Graf's older argument that the REJECT table is no longer useful in modern ports, then.


Only with the BSP algorithm. The blockmap algorithm would profit a lot less from it because by its very nature it operates on much more localized map data.

So bottom line, unless demo compatibility is important, changing the sight algorithm should always be the first step of optimization, not throwing huge chunks of data at a badly designed checking method that takes forever to generate.

Share this post


Link to post

Well in that older discussion I didn't realize that you were also proposing an equipotent alternative. I thought you were just advocating ditching any and all precomputed speedup tricks and just brute-forcing our way out of sight checks.

Share this post


Link to post
Graf Zahl said:

I'm not surprised by this.

The old blockmap based CheckSight algorithm is quite a bit better than the BSP based 1.9 version.

That's why ZDoom uses this code, too, albeit a fixed version of it with no 'holes' the monsters can look through.



Only with the BSP algorithm. The blockmap algorithm would profit a lot less from it because by its very nature it operates on much more localized map data.

So bottom line, unless demo compatibility is important, changing the sight algorithm should always be the first step of optimization, not throwing huge chunks of data at a badly designed checking method that takes forever to generate.


http://zdoom.org/wiki/REJECT

This says that REJECT is obsolete, but doesn't give a "why" other than "computers are fast now." I'd say that using a much more efficient sight check algorithm is part of the justification; perhaps the wiki page should be edited.

Share this post


Link to post

Feel free to do so. It's a wiki after all...

Sunder's maps are an extreme case though. On anything resembling 'normal' there won't be enough sight checks to make any noticable difference.

Share this post


Link to post

That article is hardly neutral, and the very first thing it does is dismissing it as an obsolete construct (even though it later on mentions the possibility of using it for special effects, therefore making it part of a level).

Share this post


Link to post

The article was written in 2004 by Cyb, so feel free to direct your anger at him. :p

Share this post


Link to post

FWIW my opinion of the REJECT table has always been that it was an optimization afterthought rather than an integral part of the solution. Personally I think that the PVIS set produced by Vavoom is of far more use however the time required to build it makes it rather undesirable, especially given "modern" (i.e., open, and highly detailed) maps like those in sunder amplify the problems even moreso.

It is my opinion that neither solution is optimal. The problem with modern DOOM maps is that they can verge on treating sectors as the lowest geometric construct available (not explicitly, its a by-product of the node builder). Pushing that kind of geometric detail while using the BSP algorithm for sorting is always going to be slow (especially in GL).

IMO GL ports need to forget about crunching larger numbers of sectors and subsectors and instead look at entirely different solutions.

Share this post


Link to post

BSP is fine for rendering where you only have to traverse the tree once. Tell me a method that's faster. Build's algorithm is the only other one I ever tried and it can't even remotely compete on large open maps.

But with sight checking, especially if there's thousands of monsters, it will definitely show some problems.

Share this post


Link to post
Graf Zahl said:

So bottom line, unless demo compatibility is important, changing the sight algorithm should always be the first step of optimization, not throwing huge chunks of data at a badly designed checking method that takes forever to generate.

Badly designed? It's a raw bitmap! Hard to be more tightly designed than that. And, it hard to beat an almost free early exit on the sight checking algorithm, no matter what algorithm you use. The OP's numbers prove that.

And, it's not just about getting triple-digit frames-per-second either. If you're partial to coop "zoo" maps, the REJECT table is a must - in network games, timing is critical, and a slightly slower routine can push you past the threshold of, say, a vertical sync wait, which has a cascading effect on lag.

You should always include a REJECT table, there's no excuse not to.

Share this post


Link to post
kb1 said:

Badly designed? It's a raw bitmap! Hard to be more tightly designed than that.

I think Graf was talking about sight checking using BSP.

kb1 said:

And, it hard to beat an almost free early exit on the sight checking algorithm, no matter what algorithm you use. The OP's numbers prove that.

You should always include a REJECT table, there's no excuse not to.

Suppose that a map has 10 000 sectors. Then the size of the REJECT table for that one map will be 12 mb. Ouch. REJECT just doesn't scale.

Share this post


Link to post
tempun said:

REJECT just doesn't scale.



That. And also consider the time to generate it. In nearly all cases the cost to benefit ratio is just way too low, especially if you want to distribute the maps via internet.

What I find really ironic about the whole situation is that REJECT can speed up the v1.9 sight checking algorithm but still not make it match the older v1.2 version. And the only reason why they changed it was a trivial bug that could have been fixed with a few hours work back then.

But no, somebody apparently thought that the algorithm itself was flawed and replaced it with something far less efficient...

Share this post


Link to post
Graf Zahl said:

What I find really ironic about the whole situation is that REJECT can speed up the v1.9 sight checking algorithm but still not make it match the older v1.2 version.

According to entryway's data, it still helps v1.2 version which is in ZDoom. BTW, I'm also interested in map11 fps on a low-end computer. I tried it once on a netbook and it was 2 fps in prb+ w/default compatibility, and ~60 after kill monsters cheat. I then decided not to try ZDoom.

Share this post


Link to post
entryway said:

P_CheckSight from (G)ZDoom is similar to doom 1.2.

After building REJECT table FPS increases from 300 to 400 for GZDoom on MAP11. GZDoom is much faster than glboom-plus on this stress level. It is 1.6x slower on map10, but 1.9x faster on map11.


+100 FPS is a good thing.

But if you delete 90% of the monsters in Sunder, I'm pretty sure the FPS difference between REJECT and no REJECT will be a lot less noticeable. :p

Share this post


Link to post
tempun said:

Suppose that a map has 10 000 sectors. Then the size of the REJECT table for that one map will be 12 mb. Ouch. REJECT just doesn't scale.

As long as the inclusion of a REJECT table gives significant enough performance boost the size of the table is no issue these days. If you're concerned about ram, well 12 mb extra hasn't been a problem for a long time now. If it's hard drive space, again no problem with 500gb drives around, plus you can keep your wads zipped since some ports (such as ZDoom and GZDoom) can load zips directly and for other ports there are launchers that temporarily extract wads from zips for playing.

Similarly download sizes are no issue with today's connections and, again, zipping makes things even better. Not to mention that if you're in such a hurry to play the latest wads that you can't wait a little bit for the download to finish there's some serious problems here.

Share this post


Link to post
tempun said:

According to entryway's data, it still helps v1.2 version which is in ZDoom. BTW, I'm also interested in map11 fps on a low-end computer. I tried it once on a netbook and it was 2 fps in prb+ w/default compatibility, and ~60 after kill monsters cheat. I then decided not to try ZDoom.



Nobody denies that it helps. However, if the difference is 300 fps vs 400 fps the actual benefit is rather questionable when your monitor only can do 60. You just get more tearing artifacts...

Speeding something up from 70 to 200, that's a different matter though, because it lowers the minimum required hardware specs to play the level.

Share this post


Link to post
Graf Zahl said:

Nobody denies that it helps. However, if the difference is 300 fps vs 400 fps the actual benefit is rather questionable when your monitor only can do 60. You just get more tearing artifacts...


Not wishing to reopen the old "how many FPS can you really see?" can of worms, which seems to be a sort of taboo around here, however ANY speedup that gives you the theoretical ability to run at more FPS, means that you have the computational leeway to do more complex, non-monster calculations instead, like handling bigger maps or higher resolutions. So any speedup is welcome (BTW, even the example you used actually implies a hefty 33% speedup, which simply cannot be considered a neglibible thing, by any standard.

Now, if it was a matter of e.g. 78 vs 81 fps then yeah, that would fall squarely in the "matters only for old shitty computers" category. Even then, optimization nuts may be inclined to disagree "3 WHOLE FPS??? That's like, almost 10% of the target base framerate of 35 FPS! YOU'RE optimization suxx0rz!!!!" ;-)

Share this post


Link to post
Maes said:

Not wishing to reopen the old "how many FPS can you really see?" can of worms, which seems to be a sort of taboo around here, however ANY speedup that gives you the theoretical ability to run at more FPS, means that you have the computational leeway to do more complex, non-monster calculations instead, like handling bigger maps or higher resolutions. So any speedup is welcome (BTW, even the example you used actually implies a hefty 33% speedup, which simply cannot be considered a neglibible thing, by any standard.


Graf's point of view is:
(1000/60+1000/300-1000/400) / (1000/60) = 1.05 (only)

FPS above 200 is approximated (should be rounded) to infinity.

Share this post


Link to post
Maes said:

Not wishing to reopen the old "how many FPS can you really see?" can of worms, which seems to be a sort of taboo around here, however ANY speedup that gives you the theoretical ability to run at more FPS, means that you have the computational leeway to do more complex, non-monster calculations instead, like handling bigger maps or higher resolutions. So any speedup is welcome (BTW, even the example you used actually implies a hefty 33% speedup, which simply cannot be considered a neglibible thing, by any standard.

Now, if it was a matter of e.g. 78 vs 81 fps then yeah, that would fall squarely in the "matters only for old shitty computers" category. Even then, optimization nuts may be inclined to disagree "3 WHOLE FPS??? That's like, almost 10% of the target base framerate of 35 FPS! YOU'RE optimization suxx0rz!!!!" ;-)


You are making the same old mistake to treat fps as a linear unit of measurement.

With 300 vs 400 fps we are talking about a 0.8 milliseconds gain per frame. 78 vs 200 is 7.8 ms, almost 10 times as much.

Share this post


Link to post
Graf Zahl said:

You are making the same old mistake to treat fps as a linear unit of measurement.


Funny, and you are making the mistake of treating time as an absolute measure of performance which has a "good enough" threshold that never needs to be improved upon, once reached.

So if we agree that e.g. taking 1/300 = 3.33 ms to render a frame is already "OK", then yeah, taking 1/400 = 2.5 ms instead doesn't sound that impressive...or does it? It's still a 33% improvement, no matter what.

If I render 300 of those 2.5 ms "faster frames", that means I have 250 whole ms left to do other processing (or, on a multitasking OS, to let other programs do their stuff), period. I never heard of a computer scientist or computer company saying "Yeah, that's fast/big enough, so let's cap improvements of this particular concern at this point, and focus on other things instead". Well, there's that infamous quote about 640 KB of RAM, and we all knew how well that turned out ;-)

OK, there ARE some real-life engineering scenarios where at a certain point it pays to call it "quits" on improving a certain aspect.

Why? Because the rig DOES get the job done or meets a certain quota as it is. In the CS domain, in those applications where you have to deal with fixed hardware, single-tasking OSes, fixed-function software and/or very precisely delimited expectations of performance, then yeah, once you reach a certain critical spot of "awesomeness" you can sort of call it a day and let the marketing team take over. This philosophy works great for e.g. dedicated hardware arcade games, game consoles, embedded systems, etc. and any fixed-hardware platform and single-purpose, single-task software in general.

Now, a program that has to perform in a multi-tasked environment and be extended to do arbitrary stuff....that's another story. Simply put, there's no "good enough" theoretical absolute excellence limit where you can simply call it a day and rest on your laurels forever (well..unless you can design a CPU specially optimized to run Doom and map 1 game operation to 1 opcode running in 1 CPU cycle ;-)

And it's one thing having no known way to improve performance of a particular aspect, and another knowing of a way but not applying it or dismissing it as unneecessary (ok, in the particular case of ZDoom it's actually a more complex situation, I understand that, and sometimes there are other compromises to take into account too, e.g. memory-CPU tradeoffs).

About the REJECT table's size:

Graph theory is relentless in this case. The REJECT table is simply a special type of adjacency matrix, the simplest form of checking whether two nodes of a graph are connected (in Doom terms, that would mean that there's mutual or partial LOS visibility between them), and also the fastest method (checking is an O(1) operation). However space is O(n^2), and that doesn't scale well no matter what.

There are more space-efficient ways such as adjacency lists, which however trade off space with search speed (still, they would probably be faster than a brute force unconditional LOS check most of the time, depending on how many sectors would be visible from a particular sector).

Maybe an interesting variation would be to cache LOS checks in a real-time generated REJECT structure, so that checks would be done with the engine's LOS check method (and that's fool-proof, right?) but would not need to be performed again and again. This would, eventually, lead to almost perfect in-memory REJECT structures (and an adjacency list approach would also make more sense).

Share this post


Link to post
Maes said:

Funny, and you are making the mistake of treating time as an absolute measure of performance which has a "good enough" threshold that never needs to be improved upon, once reached.

So if we agree that e.g. taking 1/300 = 3.33 ms to render a frame is already "OK", then yeah, taking 1/400 = 2.5 ms instead doesn't sound that impressive...or does it? It's still a 33% improvement, no matter what.

If I render 300 of those 2.5 ms "faster frames", that means I have 250 whole ms left to do other processing (or, on a multitasking OS, to let other programs do their stuff), period. I never heard of a computer scientist or computer company saying "Yeah, that's fast/big enough, so let's cap improvements of this particular concern at this point, and focus on other things instead". Well, there's that infamous quote about 640 KB of RAM, and we all knew how well that turned out ;-)


That whole argument doesn't pan out. The engine runs at 100% CPU core load so the impact it has on other processes is precisely zero.

It also won't help you with game performance. All you end up with is more passes through the rendering code, nothing more, nothing less. The game still runs at 35 fps and will easily manage those. So this falls squarely into the 'not worth bothering about' kind of optimization. I'm much more concerned to speed something up from 60 to 70 fps than from 300 to 400 because that will actually show as a real improvement.

Share this post


Link to post
Graf Zahl said:

That whole argument doesn't pan out. The engine runs at 100% CPU core load so the impact it has on other processes is precisely zero.


Sadly, you're right on this one (and I wouldn't risk having a hard-real time application like a video game yielding away thread scheduling time just to be "nice"). Still, it exists as an option e.g. in MAME, and can be helpful on battery-powered devices or where you want to keep the heat down (quite literally) rather than busy-waiting for the next frame.

Graf Zahl said:

All you end up with is more passes through the rendering code, nothing more, nothing less. The game still runs at 35 fps and will easily manage those.


OK, in this particular case there IS an absolute benchmark: attaining 35 fps. Yet, with all that demand for uncapped framefrates, motion interpolation and stuff, you know full well that it's an argument that has lost its vigour.

Don't forget that those 300-400 "easy" FPS we were talking about are just indicative of a particular resolution or map complexity.

Cue the occasional 10000 monster fest or otherwise super-complex map or ridiculously high resolution or other more exotic processing (e.g. anaglyph 3D displays, which essentially has to do double the rendering work). In those cases, you REALLY need to squeeze out every bit of performance that you can, and citing an "absolute" constant-time performance metric achieved under different circumstances is meaningless.

Then there's that whole uncapped framerate story which is also skewing our discussion: sure, you shouln't need to render more than the display can show (but syncing and time sampling issues dictate that you do). But that doesn't need that optimizing a source port should stop the moment it reaches 60 fps under one arbitrary circumstance or test map.

And, BTW, to produce visual/movement interpolation, you don't actually need to run the actors code multiple times, now, do you? So improvements that affect the actors code alone will only be ran 35 times per second anyway (not counting timedemos or fastdemos here).

However, an improvement that helps you bring a timedemo from 300 to 400 could, in most cases, also help you bring a "normal" uncapped framerate visual from 60 to 70, unless I'm missing something here.

In other words: being able to go faster and/or do things with less effort is always better. Even if you don't actually need to ;-)

Share this post


Link to post
Maes said:

OK, in this particular case there IS an absolute benchmark: attaining 35 fps. Yet, with all that demand for uncapped framefrates, motion interpolation and stuff, you know full well that it's an argument that has lost its vigour.



No, it hasn't. Show me the monitor that can display 400 fps. I can't. Mine does 60 and not one more.

So all you get is 6.6 rendering passes per screen refresh instead of 5, meaning more tearing artifacts. Nothing gained at all.

I run games with VSync on anyway because those ultra-high frame rates actually make the game more jerky instead of less because they are much more uneven because they vary too much between frames.

Share this post


Link to post

Yes, I know that. I'm also among the first to pour concrete down the FPS e-peeners'/griefers' asses, so there's no need to remind me how pointless it is to actually display 400 FPS.

However, that's not the point. The point is that the "power" to render those 400 fps (a futile task, as you aptly put it) can be used to do something else, like e.g. render 100000 corpses @35 fps, instead of 8750@400. Or, worse, instead of 10000@35 before starting to lose speed.

Share this post


Link to post

If I have understood the whole deal correctly anyway, the reject tables speeds up line-of-sight checks, and I don't think there are many such checks performed by corpses. AFAIK, it's only used in AI routines for active monsters, such as A_Look or A_Chase, isn't it? If not, where else is it used?

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×