Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
jval

Potential for improvement in vissprites sorting

Recommended Posts

2 hours ago, Gez said:

Because optimization is a fickle goddess, and focusing too much on an abnormal special case can result in losing sight of what's needed in the general case.

 

Of course a new better algorithm, could open new possibilities as mentioned above.

 

The only fps slowdown on any level, it is, literally, the slowdown of a couple of if(numvissprites < X)  can cause once on each frame, just to find what sort function to call.

 

Also made some real vissprite data and tested some of the sort speed.
The data were collected in random frames of the mentioned maps.

The test application can be downloaded from here

You can also create your own data, save then on a text file (containing the integer values to be sorted), and load them to see the results.

--------------------------------------------
Doom2.wad / MAP18
Sort Test 132 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000015 OK            Undefined
MergeSort     0,0000019 OK            Undefined
RadixSort #2  0,0000098 OK            Undefined
--------------------------------------------

--------------------------------------------
sunlust.wad / MAP28
Sort Test 174 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000020 OK            No
MergeSort     0,0000029 OK            Yes
RadixSort #2  0,0000093 OK            Yes
--------------------------------------------

--------------------------------------------
epic.wad / MAP05
Sort Test 319 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000049 OK            No
MergeSort     0,0000073 OK            Yes
RadixSort #2  0,0000110 OK            Yes
--------------------------------------------

--------------------------------------------
sunder.wad / MAP14
Sort Test 876 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000433 OK            No
MergeSort     0,0000238 OK            Yes
RadixSort #2  0,0000178 OK            Yes
--------------------------------------------

--------------------------------------------
DVII-1i.wad / MAP03
Sort Test 967 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000492 OK            No
MergeSort     0,0000372 OK            Yes
RadixSort #2  0,0000187 OK            Yes
--------------------------------------------

--------------------------------------------
sunder.wad / MAP08
Sort Test 1241 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0000616 OK            No
MergeSort     0,0000419 OK            Yes
RadixSort #2  0,0000254 OK            Yes
--------------------------------------------

--------------------------------------------
nuts.wad / MAP01
Sort Test 3840 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0,0001019 OK            No
MergeSort     0,0000899 OK            Yes
RadixSort #2  0,0000545 OK            Yes
--------------------------------------------

 

Share this post


Link to post

Way back in October 2015 I experimented with the vissprite sorting using nuts.wad as a test file.

One method that I tried was to keep the array always sorted. Instead of adding each vissprite to the end of the array I slotted it into the correct location using a binary search. Trouble was, updating the list of pointers each time got expensive.

Share this post


Link to post
On 9/22/2018 at 8:40 AM, Gez said:

That's a point that should be emphasized.

 

nuts.wad is a special corner case. Let's remind everyone that its author created it on a lark, as a test file for the idgames upload process, so that a mess up wouldn't result in the loss of an actual serious work.

 

As a result, nuts.wad is a very atypical level, and that makes it a poor choice for stress-testing the engine...  ...What does it mean for your port if you optimize it to run maps like this instead of maps that look like actual Doom maps? What if you gain 2 FPS on nuts.wad but lose 5 FPS on any map with more than two rooms, do you claim victory?

 

On 9/22/2018 at 9:48 AM, Cacodemon345 said:

Nuts.wad frankly makes 0 sense for any vissprites performance optimization. It was just a joke map.

Sorry folks, but these comments are ridiculous. Performance of the sorter is inconsequential when there's not much to sort. Sort time is directly (and typically exponentially) related to the number of items being sorted. Nuts is an excellent test case.

 

Consider multiplayer coop, over the web. Consider that this game adds blood splats, chunks, gibs, monster chunks, etc. Vissprites can add up very quickly, even without too many monsters. In this scenario, one dead monster can produce 10 vissprites, or more.

 

So, in this game, your connected over the web, so there's going to be some lag. But you also get lag when a single PC can't keep up. Maybe their computer is a bit weak, or maybe it's running 10 background processes, whatever. Anything you can do to keep the frame rate up is probably a good thing, unless the optimization makes the code crazy to manage. Maps like nuts.wad provide a consistent heavy load, allowing the developer to see just how well an optimization improves performance, when it counts. And, in this case, "typical" maps don't count - so what if you have to sort 150 sprites? You might as well use an insertion sort for that - heh. The intricacies of the map layout simply don't matter. The number of vissprites does matter. And, the more you have, the more you need a better sort.

 

 

On 9/22/2018 at 1:26 AM, jval said:

@kb1 Indeed, it's not that much memory in modern computing terms, even in the case that the array holds the actual data, not only pointers. But merge-sort needs only half of the array size as extra memory. Could more extra memory lead to cache hit misses? I 'll try to test this using an lower cache CPU...

 

@zokum Multi threading seems an excellent idea, and I thing the right place to do it is while rendering the Planes (R_DrawPlanes). lI'll try this ASAP!

 

@anotak & @scifista42 I've added merge-sort to the test-case and here are the results:

 

vissprites_sort_chart2.png.a88d0d2007251fb146c2be026ff326ec.png

Nice charts! And, very informative. I'll definitely be checking out your algorithms. I wouldn't worry too much about the cache, especially with pointers. With the way radix sort works, you're reducing memory hits so much that it balances out the cache misses. You'd have more cache misses with quick sort, I would think. Your chart shows that. I think the size of each cache entry has been pretty much standardized to 64 bytes for a long time. My point is, you'd have to have a pretty crappy PC for cache to be a serious issue here.

 

1.5 milliseconds for 16K entries, vs. 0.3 huh? That actually is significant, as far as I'm concerned. For 35 fps, you only have 28 milliseconds to do all rendering, network, sound, game logic, etc. So, saving 1.2 milliseconds out of 28 is a big deal. And, if you like running higher frame rates, 70 fps only allows for 14 ms. This is good stuff. Thanks for posting your results. This motivates me to implement radix sort in my port. Good stuff! Doing it in a separate thread? Even better!

Share this post


Link to post
1 hour ago, kb1 said:

Sorry folks, but these comments are ridiculous. Performance of the sorter is inconsequential when there's not much to sort. Sort time is directly (and typically exponentially) related to the number of items being sorted. Nuts is an excellent test case.

 

One of the first things they teach at university in sorting class is that generic algorithms can lose out heavily against domain specific ones. In this case, specifically, sorting a single huge sector of unsorted objects is highly atypical to what you'll encounter in real maps. In real maps the general rule is that when you have a big open viewable space you will typically have many sectors, or at least many subsectors, and you will have many segments. Because if you didn't, the map would be plain and boring. Plain and boring fits Nuts well, by the way.

 

Locking yourself into the mindset of finding the best generic sorting algorithm for something you will almost never encounter, including your multiplayer case, unless it is Nuts you're playing, you spent lots of time gaining very little while missing the obvious advantage that in most cases Doom's sprites are already generally well sorted. Your multiplayer example even further helps proving the point as any gibs and gore all over the map will be static and not moving - there's absolutely no reason to sort that frame after frame.

 

If you look closer at jval's numbers for non-Nuts stuff, you'll see the total sorting time is very low. In the typical case you'll be shaving off maybe 0.3 milliseconds of a frame. With a 60 fps render deadline, we are optimizing 1.6% of the total time, in one of the worst non-joke examples from jval's test set. I'm sorry, but the ridiculous part here is to make an optimization clearly meant for Nuts and pretty much nothing else. If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach.

 

1 hour ago, kb1 said:

Good stuff! Doing it in a separate thread? Even better!

 

Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this.

Share this post


Link to post
3 hours ago, dpJudas said:

 

If you look closer at jval's numbers for non-Nuts stuff, you'll see the total sorting time is very low. In the typical case you'll be shaving off maybe 0.3 milliseconds of a frame. With a 60 fps render deadline, we are optimizing 1.6% of the total time, in one of the worst non-joke examples from jval's test set. I'm sorry, but the ridiculous part here is to make an optimization clearly meant for Nuts and pretty much nothing else. If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach.

 

Please consider the fact, that the absolute values are in a relatively fast modern core i7 processor. The absolute and relative gain will be higher in a lower-end CPU. I'll post some tests as soon as I can reach a lower - spec PC.

I can not use the method using the BSP since I can not fully understand it.

One other approach I've considered is front to back sprite drawing using dephbuffer, to avoid overdraw.

 

 

3 hours ago, dpJudas said:

Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this.

 

I've already tested this with good results, since the sorting is performed right after the bsp traverse, in paraller with the column and flat drawers. Since the sorting is finished in less time than the drawers, I'm considering the possibility  to use the extra thread for other stuff also, to improve multiple-core expoiltation.

Share this post


Link to post

@jval I have a low spec laptop, I can do some benchmarks for you :) These are the specs https://support.hp.com/us-en/document/c05987723 I think I know of a couple of wads I can use to do the benchmarks hehe

 

edit: um how do I add a wad for testing?

 

edit2: here is my log for the tests it comes with:

 

 OS: Windows  6.2 (Build 9200)
 MMX extentions detected
 Multi-core system detected (2 CPUs)
 TDDTimer.Create(): Timer frequency = 1068751

--------------------------------------------
Doom2.wad / MAP18
Sort Test 132 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000071 OK            Undefined
MergeSort     0.0000085 OK            Undefined
RadixSort #2  0.0000386 OK            Undefined
--------------------------------------------

--------------------------------------------
sunlust.wad / MAP28
Sort Test 174 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000058 OK            No
MergeSort     0.0000070 OK            Yes
RadixSort #2  0.0000229 OK            Yes
--------------------------------------------

--------------------------------------------
epic.wad / MAP05
Sort Test 319 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000120 OK            No
MergeSort     0.0000131 OK            Yes
RadixSort #2  0.0000214 OK            Yes
--------------------------------------------

--------------------------------------------
sunder.wad / MAP14
Sort Test 876 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000632 OK            No
MergeSort     0.0000446 OK            Yes
RadixSort #2  0.0000461 OK            Yes
--------------------------------------------

--------------------------------------------
DVII-1i.wad / MAP03
Sort Test 967 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000591 OK            No
MergeSort     0.0000507 OK            Yes
RadixSort #2  0.0000393 OK            Yes
--------------------------------------------

--------------------------------------------
sunder.wad / MAP08
Sort Test 1241 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0000764 OK            No
MergeSort     0.0000597 OK            Yes
RadixSort #2  0.0000500 OK            Yes
--------------------------------------------

--------------------------------------------
nuts.wad / MAP01
Sort Test 3840 Items (Mean time from 1000 tests)
--------------------------------------------
Algorithm     Time      Validation    Stable
--------------------------------------------
QuickSort     0.0001910 OK            No
MergeSort     0.0001609 OK            Yes
RadixSort #2  0.0001194 OK            Yes
--------------------------------------------

 

edit3: results of a "sort 1MB" test:

OS: Windows  6.2 (Build 9200)
 MMX extentions detected
 Multi-core system detected (2 CPUs)
 TDDTimer.Create(): Timer frequency = 1068751

Random Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000888 OK     
QuickSort #2  0.0000922 OK     
MergeSort     0.0000923 OK     
RadixSort     0.0000477 OK     
RadixSort #2  0.0000464 OK     
RadixSort MT  0.0000832 OK     
HybridSort    0.0004941 OK      (requires special key destribution)
HybridSort #2 0.0000963 OK      (requires special key destribution)
HybridSort MT 0.0001784 OK      (requires special key destribution)
SelectionSort 0.0013584 OK     

Visplane Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000719 OK     
QuickSort #2  0.0000762 OK     
MergeSort     0.0000661 OK     
RadixSort     0.0000419 OK     
RadixSort #2  0.0000418 OK     
RadixSort MT  0.0000840 OK     
HybridSort    0.0004543 OK      (requires special key destribution)
HybridSort #2 0.0000764 OK      (requires special key destribution)
HybridSort MT 0.0001670 OK      (requires special key destribution)
SelectionSort 0.0012949 OK     

Arranged Visplane Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000701 OK     
QuickSort #2  0.0000738 OK     
MergeSort     0.0000622 OK     
RadixSort     0.0000413 OK     
RadixSort #2  0.0000420 OK     
RadixSort MT  0.0000768 OK     
HybridSort    0.0004167 OK      (requires special key destribution)
HybridSort #2 0.0000762 OK      (requires special key destribution)
HybridSort MT 0.0001467 OK      (requires special key destribution)
SelectionSort 0.0011240 OK     

Zero Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000352 OK     
QuickSort #2  0.0000354 OK     
MergeSort     0.0000345 OK     
RadixSort     0.0000470 OK     
RadixSort #2  0.0000490 OK     
RadixSort MT  0.0000824 OK     
HybridSort    0.0004098 OK      (requires special key destribution)
HybridSort #2 0.0000441 OK      (requires special key destribution)
HybridSort MT 0.0001188 OK      (requires special key destribution)
SelectionSort 0.0009095 OK     

Ascending Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000187 OK     
QuickSort #2  0.0000170 OK     
MergeSort     0.0000358 OK     
RadixSort     0.0000430 OK     
RadixSort #2  0.0000441 OK     
RadixSort MT  0.0000771 OK     
HybridSort    0.0003888 OK      (requires special key destribution)
HybridSort #2 0.0000279 OK      (requires special key destribution)
HybridSort MT 0.0000961 OK      (requires special key destribution)
SelectionSort 0.0009115 OK     

Descending Data
Sort Test 1024 Items (Mean time from 1024 tests)
Algorithm     Time      Validation
QuickSort     0.0000210 OK     
QuickSort #2  0.0000185 OK     
MergeSort     0.0000555 OK     
RadixSort     0.0000410 OK     
RadixSort #2  0.0000449 OK     
RadixSort MT  0.0000795 OK     
HybridSort    0.0003890 OK      (requires special key destribution)
HybridSort #2 0.0000290 OK      (requires special key destribution)
HybridSort MT 0.0000973 OK      (requires special key destribution)
SelectionSort 0.0009071 OK     

Random Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.4283945 OK     
QuickSort #2  0.4252225 OK     
MergeSort     0.3650591 OK     
RadixSort     0.3955379 OK     
RadixSort #2  0.3783201 OK     
RadixSort MT  0.3304509 OK     
HybridSort    0.4443121 OK      (requires special key destribution)
HybridSort #2 0.4431413 OK      (requires special key destribution)
HybridSort MT 0.4724321 OK      (requires special key destribution)

Visplane Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.2349740 OK     
QuickSort #2  0.2376861 OK     
MergeSort     0.2024410 OK     
RadixSort     0.1390976 OK     
RadixSort #2  0.1424502 OK     
RadixSort MT  0.1512586 OK     
HybridSort    0.1761807 OK      (requires special key destribution)
HybridSort #2 0.2019460 OK      (requires special key destribution)
HybridSort MT 0.1952192 OK      (requires special key destribution)

Arranged Visplane Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.1437758 OK     
QuickSort #2  0.1439879 OK     
MergeSort     0.1503334 OK     
RadixSort     0.1102479 OK     
RadixSort #2  0.1116324 OK     
RadixSort MT  0.1165094 OK     
HybridSort    0.1132397 OK      (requires special key destribution)
HybridSort #2 0.1364878 OK      (requires special key destribution)
HybridSort MT 0.1232825 OK      (requires special key destribution)

Zero Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.1075548 OK     
QuickSort #2  0.1061959 OK     
MergeSort     0.0514893 OK     
RadixSort     0.0584514 OK     
RadixSort #2  0.0563623 OK     
RadixSort MT  0.0627253 OK     
HybridSort    0.1237737 OK      (requires special key destribution)
HybridSort #2 0.1217629 OK      (requires special key destribution)
HybridSort MT 0.1188411 OK      (requires special key destribution)

Ascending Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.0524823 OK     
QuickSort #2  0.0521408 OK     
MergeSort     0.0517174 OK     
RadixSort     0.1115473 OK     
RadixSort #2  0.0915000 OK     
RadixSort MT  0.0995190 OK     
HybridSort    0.0701134 OK      (requires special key destribution)
HybridSort #2 0.0639433 OK      (requires special key destribution)
HybridSort MT 0.1108268 OK      (requires special key destribution)

Descending Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.0548909 OK     
QuickSort #2  0.0540987 OK     
MergeSort     0.0736647 OK     
RadixSort     0.1632838 OK     
RadixSort #2  0.1498176 OK     
RadixSort MT  0.1525292 OK     
HybridSort    0.0674762 OK      (requires special key destribution)
HybridSort #2 0.0619812 OK      (requires special key destribution)
HybridSort MT 0.0640573 OK      (requires special key destribution)

Random Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.4546765 OK     
QuickSort #2  0.4547556 OK     
MergeSort     0.3649901 OK     
RadixSort     0.4002803 OK     
RadixSort #2  0.3662736 OK     
RadixSort MT  0.3702160 OK     
HybridSort    0.4258462 OK      (requires special key destribution)
HybridSort #2 0.4208597 OK      (requires special key destribution)
HybridSort MT 0.4402999 OK      (requires special key destribution)

Visplane Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.2252414 OK     
QuickSort #2  0.2336576 OK     
MergeSort     0.1966024 OK     
RadixSort     0.1380272 OK     
RadixSort #2  0.1416443 OK     
RadixSort MT  0.1450525 OK     
HybridSort    0.1746038 OK      (requires special key destribution)
HybridSort #2 0.2004885 OK      (requires special key destribution)
HybridSort MT 0.1838501 OK      (requires special key destribution)

Arranged Visplane Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.1432773 OK     
QuickSort #2  0.1449795 OK     
MergeSort     0.1463830 OK     
RadixSort     0.1113309 OK     
RadixSort #2  0.1112624 OK     
RadixSort MT  0.1146256 OK     
HybridSort    0.1119915 OK      (requires special key destribution)
HybridSort #2 0.1343435 OK      (requires special key destribution)
HybridSort MT 0.1225454 OK      (requires special key destribution)

Zero Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.0997835 OK     
QuickSort #2  0.1025110 OK     
MergeSort     0.0510498 OK     
RadixSort     0.0564874 OK     
RadixSort #2  0.0569761 OK     
RadixSort MT  0.0574547 OK     
HybridSort    0.1182205 OK      (requires special key destribution)
HybridSort #2 0.1192455 OK      (requires special key destribution)
HybridSort MT 0.1155569 OK      (requires special key destribution)

Ascending Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.0517913 OK     
QuickSort #2  0.0503646 OK     
MergeSort     0.0507054 OK     
RadixSort     0.1051169 OK     
RadixSort #2  0.0853938 OK     
RadixSort MT  0.0922289 OK     
HybridSort    0.0644753 OK      (requires special key destribution)
HybridSort #2 0.0589564 OK      (requires special key destribution)
HybridSort MT 0.0610098 OK      (requires special key destribution)

Descending Data
Sort Test 1048576 Items (Mean time from 4 tests)
Algorithm     Time      Validation
QuickSort     0.0528264 OK     
QuickSort #2  0.0515176 OK     
MergeSort     0.0725585 OK     
RadixSort     0.1597631 OK     
RadixSort #2  0.1501966 OK     
RadixSort MT  0.1479498 OK     
HybridSort    0.0655272 OK      (requires special key destribution)
HybridSort #2 0.0612327 OK      (requires special key destribution)
HybridSort MT 0.0605859 OK      (requires special key destribution)

 

Edited by therektafire

Share this post


Link to post

Verdict: if the zero data, ascending data, and descending data tests are anything to go by, radix sort sucks ass with presorted stuff (at least I assume that the data in these tests is presorted), and slow when there are a small number of sprites, but fast when there are a large amount of sprites like in Nuts. Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen

Share this post


Link to post
2 hours ago, therektafire said:

Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen

 

That's what I have in mind. Radix sort seems to behave better that mergesort & quicksort when we have more than 500-1000 vissprites to draw.

Share this post


Link to post
21 hours ago, dpJudas said:

 

One of the first things they teach at university in sorting class is that generic algorithms can lose out heavily against domain specific ones. In this case, specifically, sorting a single huge sector of unsorted objects is highly atypical to what you'll encounter in real maps.

I've been told that I don't have to play along, but you've been very polite, so I'll be brief:


Your use of the terms "atypical" and "real map" tend to confuse the issue. Those things that you were taught are pretty good generic advice on how to typically get generically decent results, generically speaking. This describes the Doom we all know and currently enjoy.

 

However, this thread is, in fact, striving for an atypical case: extreme performance. (Yes, I apologize for using your terms to make my point.)

 

21 hours ago, dpJudas said:

 

In real maps the general rule is that when you have a big open viewable space you will typically have many sectors, or at least many subsectors, and you will have many segments. Because if you didn't, the map would be plain and boring. Plain and boring fits Nuts well, by the way.

There's that term "real map", suggesting that, again, in general, most non-Nuts maps will, based on their "realness", have some effect upon the performance of the sort routine: They won't. It's all about the raw numbers of vissprites, and that's it.

 

Now, that's not to say that, devising some method that allows you to avoid sorting that many vissprites is not the right thing to do, but that's not what's being discussed. Typical/atypical...who cares? Yes, faster is faster. And, faster is faster, even if you render a subsector at a time, or not.

 

You say that the sprites are "generally well sorted"...what does that mean? That it's ok for a sliding corpse to jump in front of, then in back of, say, a fireball? You know that, when you turn 180 degrees, many sprites will reverse order, but for 90 degrees, some will, and some won't. How do you propose to tell the renderer which sprites to care about? They're either in proper order, or they're not.

 

And then the "it's only 1.6%" or "0.3%" argument. With that approach, why optimize anything?

 

21 hours ago, dpJudas said:

If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach.

Fully agree - why not?

 

21 hours ago, dpJudas said:

Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this.

Typically? In general? How in the world do you know that? On what PC? For how many sprites? Where did 0.3 come from? Your definitive answer is that it's going to be slower to periodically ask "Is the sort finished?"

 

17 hours ago, jval said:

 

Please consider the fact, that the absolute values are in a relatively fast modern core i7 processor. The absolute and relative gain will be higher in a lower-end CPU. I'll post some tests as soon as I can reach a lower - spec PC.

I can not use the method using the BSP since I can not fully understand it.

I've seen some of your brilliant work. Maybe leave this for a future improvement :)

 

17 hours ago, jval said:

One other approach I've considered is front to back sprite drawing using dephbuffer, to avoid overdraw.

I think Quake 1's software renderer used a per-object Z-buffer to handle hidden surface removal when rendering a model, but that's not the same thing. A full screen depth buffer would fix a lot of the anomalies of the software renderer, like when you can see slivers of sprites sticking out from the edge of staircases. Unfortunately, you're still doing overdraw...this time, to a depth buffer vs. a screen buffer, with the additional overhead of having to read before conditional write.

 

17 hours ago, jval said:

I've already tested this with good results, since the sorting is performed right after the bsp traverse, in paraller with the column and flat drawers. Since the sorting is finished in less time than the drawers, I'm considering the possibility to use the extra thread for other stuff also, to improve multiple-core expoiltation.

Yes, especially with a sort whose runtime speed can be trusted.

 

13 hours ago, therektafire said:

Verdict: if the zero data, ascending data, and descending data tests are anything to go by, radix sort sucks ass with presorted stuff (at least I assume that the data in these tests is presorted), and slow when there are a small number of sprites, but fast when there are a large amount of sprites like in Nuts. Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen

Radix sort does not "suck ass". You've got it a bit backwards:

Unlike most other general sort methods, Radix sort speed is very predictable, regardless of pre-sort state. But you do make a good point: Having the engine switch out the sort function, based on the vissprite count, is a great idea.

 

Radix sort has some additional overhead, in setup time, and in memory usage, compared with some other sort methods. As with most things, the key is to find the most appropriate tool for the job.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×