jval Posted September 22, 2018 (edited) 2 hours ago, Gez said: Because optimization is a fickle goddess, and focusing too much on an abnormal special case can result in losing sight of what's needed in the general case. Of course a new better algorithm, could open new possibilities as mentioned above. The only fps slowdown on any level, it is, literally, the slowdown of a couple of if(numvissprites < X) can cause once on each frame, just to find what sort function to call. Also made some real vissprite data and tested some of the sort speed. The data were collected in random frames of the mentioned maps. The test application can be downloaded from here You can also create your own data, save then on a text file (containing the integer values to be sorted), and load them to see the results. -------------------------------------------- Doom2.wad / MAP18 Sort Test 132 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000015 OK Undefined MergeSort 0,0000019 OK Undefined RadixSort #2 0,0000098 OK Undefined -------------------------------------------- -------------------------------------------- sunlust.wad / MAP28 Sort Test 174 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000020 OK No MergeSort 0,0000029 OK Yes RadixSort #2 0,0000093 OK Yes -------------------------------------------- -------------------------------------------- epic.wad / MAP05 Sort Test 319 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000049 OK No MergeSort 0,0000073 OK Yes RadixSort #2 0,0000110 OK Yes -------------------------------------------- -------------------------------------------- sunder.wad / MAP14 Sort Test 876 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000433 OK No MergeSort 0,0000238 OK Yes RadixSort #2 0,0000178 OK Yes -------------------------------------------- -------------------------------------------- DVII-1i.wad / MAP03 Sort Test 967 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000492 OK No MergeSort 0,0000372 OK Yes RadixSort #2 0,0000187 OK Yes -------------------------------------------- -------------------------------------------- sunder.wad / MAP08 Sort Test 1241 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0000616 OK No MergeSort 0,0000419 OK Yes RadixSort #2 0,0000254 OK Yes -------------------------------------------- -------------------------------------------- nuts.wad / MAP01 Sort Test 3840 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0,0001019 OK No MergeSort 0,0000899 OK Yes RadixSort #2 0,0000545 OK Yes -------------------------------------------- 0 Share this post Link to post
jeff-d Posted September 22, 2018 Way back in October 2015 I experimented with the vissprite sorting using nuts.wad as a test file. One method that I tried was to keep the array always sorted. Instead of adding each vissprite to the end of the array I slotted it into the correct location using a binary search. Trouble was, updating the list of pointers each time got expensive. 1 Share this post Link to post
kb1 Posted September 25, 2018 On 9/22/2018 at 8:40 AM, Gez said: That's a point that should be emphasized. nuts.wad is a special corner case. Let's remind everyone that its author created it on a lark, as a test file for the idgames upload process, so that a mess up wouldn't result in the loss of an actual serious work. As a result, nuts.wad is a very atypical level, and that makes it a poor choice for stress-testing the engine... ...What does it mean for your port if you optimize it to run maps like this instead of maps that look like actual Doom maps? What if you gain 2 FPS on nuts.wad but lose 5 FPS on any map with more than two rooms, do you claim victory? On 9/22/2018 at 9:48 AM, Cacodemon345 said: Nuts.wad frankly makes 0 sense for any vissprites performance optimization. It was just a joke map. Sorry folks, but these comments are ridiculous. Performance of the sorter is inconsequential when there's not much to sort. Sort time is directly (and typically exponentially) related to the number of items being sorted. Nuts is an excellent test case. Consider multiplayer coop, over the web. Consider that this game adds blood splats, chunks, gibs, monster chunks, etc. Vissprites can add up very quickly, even without too many monsters. In this scenario, one dead monster can produce 10 vissprites, or more. So, in this game, your connected over the web, so there's going to be some lag. But you also get lag when a single PC can't keep up. Maybe their computer is a bit weak, or maybe it's running 10 background processes, whatever. Anything you can do to keep the frame rate up is probably a good thing, unless the optimization makes the code crazy to manage. Maps like nuts.wad provide a consistent heavy load, allowing the developer to see just how well an optimization improves performance, when it counts. And, in this case, "typical" maps don't count - so what if you have to sort 150 sprites? You might as well use an insertion sort for that - heh. The intricacies of the map layout simply don't matter. The number of vissprites does matter. And, the more you have, the more you need a better sort. On 9/22/2018 at 1:26 AM, jval said: @kb1 Indeed, it's not that much memory in modern computing terms, even in the case that the array holds the actual data, not only pointers. But merge-sort needs only half of the array size as extra memory. Could more extra memory lead to cache hit misses? I 'll try to test this using an lower cache CPU... @zokum Multi threading seems an excellent idea, and I thing the right place to do it is while rendering the Planes (R_DrawPlanes). lI'll try this ASAP! @anotak & @scifista42 I've added merge-sort to the test-case and here are the results: Nice charts! And, very informative. I'll definitely be checking out your algorithms. I wouldn't worry too much about the cache, especially with pointers. With the way radix sort works, you're reducing memory hits so much that it balances out the cache misses. You'd have more cache misses with quick sort, I would think. Your chart shows that. I think the size of each cache entry has been pretty much standardized to 64 bytes for a long time. My point is, you'd have to have a pretty crappy PC for cache to be a serious issue here. 1.5 milliseconds for 16K entries, vs. 0.3 huh? That actually is significant, as far as I'm concerned. For 35 fps, you only have 28 milliseconds to do all rendering, network, sound, game logic, etc. So, saving 1.2 milliseconds out of 28 is a big deal. And, if you like running higher frame rates, 70 fps only allows for 14 ms. This is good stuff. Thanks for posting your results. This motivates me to implement radix sort in my port. Good stuff! Doing it in a separate thread? Even better! 4 Share this post Link to post
dpJudas Posted September 26, 2018 1 hour ago, kb1 said: Sorry folks, but these comments are ridiculous. Performance of the sorter is inconsequential when there's not much to sort. Sort time is directly (and typically exponentially) related to the number of items being sorted. Nuts is an excellent test case. One of the first things they teach at university in sorting class is that generic algorithms can lose out heavily against domain specific ones. In this case, specifically, sorting a single huge sector of unsorted objects is highly atypical to what you'll encounter in real maps. In real maps the general rule is that when you have a big open viewable space you will typically have many sectors, or at least many subsectors, and you will have many segments. Because if you didn't, the map would be plain and boring. Plain and boring fits Nuts well, by the way. Locking yourself into the mindset of finding the best generic sorting algorithm for something you will almost never encounter, including your multiplayer case, unless it is Nuts you're playing, you spent lots of time gaining very little while missing the obvious advantage that in most cases Doom's sprites are already generally well sorted. Your multiplayer example even further helps proving the point as any gibs and gore all over the map will be static and not moving - there's absolutely no reason to sort that frame after frame. If you look closer at jval's numbers for non-Nuts stuff, you'll see the total sorting time is very low. In the typical case you'll be shaving off maybe 0.3 milliseconds of a frame. With a 60 fps render deadline, we are optimizing 1.6% of the total time, in one of the worst non-joke examples from jval's test set. I'm sorry, but the ridiculous part here is to make an optimization clearly meant for Nuts and pretty much nothing else. If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach. 1 hour ago, kb1 said: Good stuff! Doing it in a separate thread? Even better! Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this. 1 Share this post Link to post
jval Posted September 26, 2018 3 hours ago, dpJudas said: If you look closer at jval's numbers for non-Nuts stuff, you'll see the total sorting time is very low. In the typical case you'll be shaving off maybe 0.3 milliseconds of a frame. With a 60 fps render deadline, we are optimizing 1.6% of the total time, in one of the worst non-joke examples from jval's test set. I'm sorry, but the ridiculous part here is to make an optimization clearly meant for Nuts and pretty much nothing else. If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach. Please consider the fact, that the absolute values are in a relatively fast modern core i7 processor. The absolute and relative gain will be higher in a lower-end CPU. I'll post some tests as soon as I can reach a lower - spec PC. I can not use the method using the BSP since I can not fully understand it. One other approach I've considered is front to back sprite drawing using dephbuffer, to avoid overdraw. 3 hours ago, dpJudas said: Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this. I've already tested this with good results, since the sorting is performed right after the bsp traverse, in paraller with the column and flat drawers. Since the sorting is finished in less time than the drawers, I'm considering the possibility to use the extra thread for other stuff also, to improve multiple-core expoiltation. 0 Share this post Link to post
therektafire Posted September 26, 2018 (edited) @jval I have a low spec laptop, I can do some benchmarks for you :) These are the specs https://support.hp.com/us-en/document/c05987723 I think I know of a couple of wads I can use to do the benchmarks hehe edit: um how do I add a wad for testing? edit2: here is my log for the tests it comes with: OS: Windows 6.2 (Build 9200) MMX extentions detected Multi-core system detected (2 CPUs) TDDTimer.Create(): Timer frequency = 1068751 -------------------------------------------- Doom2.wad / MAP18 Sort Test 132 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000071 OK Undefined MergeSort 0.0000085 OK Undefined RadixSort #2 0.0000386 OK Undefined -------------------------------------------- -------------------------------------------- sunlust.wad / MAP28 Sort Test 174 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000058 OK No MergeSort 0.0000070 OK Yes RadixSort #2 0.0000229 OK Yes -------------------------------------------- -------------------------------------------- epic.wad / MAP05 Sort Test 319 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000120 OK No MergeSort 0.0000131 OK Yes RadixSort #2 0.0000214 OK Yes -------------------------------------------- -------------------------------------------- sunder.wad / MAP14 Sort Test 876 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000632 OK No MergeSort 0.0000446 OK Yes RadixSort #2 0.0000461 OK Yes -------------------------------------------- -------------------------------------------- DVII-1i.wad / MAP03 Sort Test 967 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000591 OK No MergeSort 0.0000507 OK Yes RadixSort #2 0.0000393 OK Yes -------------------------------------------- -------------------------------------------- sunder.wad / MAP08 Sort Test 1241 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0000764 OK No MergeSort 0.0000597 OK Yes RadixSort #2 0.0000500 OK Yes -------------------------------------------- -------------------------------------------- nuts.wad / MAP01 Sort Test 3840 Items (Mean time from 1000 tests) -------------------------------------------- Algorithm Time Validation Stable -------------------------------------------- QuickSort 0.0001910 OK No MergeSort 0.0001609 OK Yes RadixSort #2 0.0001194 OK Yes -------------------------------------------- edit3: results of a "sort 1MB" test: OS: Windows 6.2 (Build 9200) MMX extentions detected Multi-core system detected (2 CPUs) TDDTimer.Create(): Timer frequency = 1068751 Random Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000888 OK QuickSort #2 0.0000922 OK MergeSort 0.0000923 OK RadixSort 0.0000477 OK RadixSort #2 0.0000464 OK RadixSort MT 0.0000832 OK HybridSort 0.0004941 OK (requires special key destribution) HybridSort #2 0.0000963 OK (requires special key destribution) HybridSort MT 0.0001784 OK (requires special key destribution) SelectionSort 0.0013584 OK Visplane Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000719 OK QuickSort #2 0.0000762 OK MergeSort 0.0000661 OK RadixSort 0.0000419 OK RadixSort #2 0.0000418 OK RadixSort MT 0.0000840 OK HybridSort 0.0004543 OK (requires special key destribution) HybridSort #2 0.0000764 OK (requires special key destribution) HybridSort MT 0.0001670 OK (requires special key destribution) SelectionSort 0.0012949 OK Arranged Visplane Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000701 OK QuickSort #2 0.0000738 OK MergeSort 0.0000622 OK RadixSort 0.0000413 OK RadixSort #2 0.0000420 OK RadixSort MT 0.0000768 OK HybridSort 0.0004167 OK (requires special key destribution) HybridSort #2 0.0000762 OK (requires special key destribution) HybridSort MT 0.0001467 OK (requires special key destribution) SelectionSort 0.0011240 OK Zero Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000352 OK QuickSort #2 0.0000354 OK MergeSort 0.0000345 OK RadixSort 0.0000470 OK RadixSort #2 0.0000490 OK RadixSort MT 0.0000824 OK HybridSort 0.0004098 OK (requires special key destribution) HybridSort #2 0.0000441 OK (requires special key destribution) HybridSort MT 0.0001188 OK (requires special key destribution) SelectionSort 0.0009095 OK Ascending Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000187 OK QuickSort #2 0.0000170 OK MergeSort 0.0000358 OK RadixSort 0.0000430 OK RadixSort #2 0.0000441 OK RadixSort MT 0.0000771 OK HybridSort 0.0003888 OK (requires special key destribution) HybridSort #2 0.0000279 OK (requires special key destribution) HybridSort MT 0.0000961 OK (requires special key destribution) SelectionSort 0.0009115 OK Descending Data Sort Test 1024 Items (Mean time from 1024 tests) Algorithm Time Validation QuickSort 0.0000210 OK QuickSort #2 0.0000185 OK MergeSort 0.0000555 OK RadixSort 0.0000410 OK RadixSort #2 0.0000449 OK RadixSort MT 0.0000795 OK HybridSort 0.0003890 OK (requires special key destribution) HybridSort #2 0.0000290 OK (requires special key destribution) HybridSort MT 0.0000973 OK (requires special key destribution) SelectionSort 0.0009071 OK Random Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.4283945 OK QuickSort #2 0.4252225 OK MergeSort 0.3650591 OK RadixSort 0.3955379 OK RadixSort #2 0.3783201 OK RadixSort MT 0.3304509 OK HybridSort 0.4443121 OK (requires special key destribution) HybridSort #2 0.4431413 OK (requires special key destribution) HybridSort MT 0.4724321 OK (requires special key destribution) Visplane Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.2349740 OK QuickSort #2 0.2376861 OK MergeSort 0.2024410 OK RadixSort 0.1390976 OK RadixSort #2 0.1424502 OK RadixSort MT 0.1512586 OK HybridSort 0.1761807 OK (requires special key destribution) HybridSort #2 0.2019460 OK (requires special key destribution) HybridSort MT 0.1952192 OK (requires special key destribution) Arranged Visplane Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.1437758 OK QuickSort #2 0.1439879 OK MergeSort 0.1503334 OK RadixSort 0.1102479 OK RadixSort #2 0.1116324 OK RadixSort MT 0.1165094 OK HybridSort 0.1132397 OK (requires special key destribution) HybridSort #2 0.1364878 OK (requires special key destribution) HybridSort MT 0.1232825 OK (requires special key destribution) Zero Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.1075548 OK QuickSort #2 0.1061959 OK MergeSort 0.0514893 OK RadixSort 0.0584514 OK RadixSort #2 0.0563623 OK RadixSort MT 0.0627253 OK HybridSort 0.1237737 OK (requires special key destribution) HybridSort #2 0.1217629 OK (requires special key destribution) HybridSort MT 0.1188411 OK (requires special key destribution) Ascending Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.0524823 OK QuickSort #2 0.0521408 OK MergeSort 0.0517174 OK RadixSort 0.1115473 OK RadixSort #2 0.0915000 OK RadixSort MT 0.0995190 OK HybridSort 0.0701134 OK (requires special key destribution) HybridSort #2 0.0639433 OK (requires special key destribution) HybridSort MT 0.1108268 OK (requires special key destribution) Descending Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.0548909 OK QuickSort #2 0.0540987 OK MergeSort 0.0736647 OK RadixSort 0.1632838 OK RadixSort #2 0.1498176 OK RadixSort MT 0.1525292 OK HybridSort 0.0674762 OK (requires special key destribution) HybridSort #2 0.0619812 OK (requires special key destribution) HybridSort MT 0.0640573 OK (requires special key destribution) Random Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.4546765 OK QuickSort #2 0.4547556 OK MergeSort 0.3649901 OK RadixSort 0.4002803 OK RadixSort #2 0.3662736 OK RadixSort MT 0.3702160 OK HybridSort 0.4258462 OK (requires special key destribution) HybridSort #2 0.4208597 OK (requires special key destribution) HybridSort MT 0.4402999 OK (requires special key destribution) Visplane Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.2252414 OK QuickSort #2 0.2336576 OK MergeSort 0.1966024 OK RadixSort 0.1380272 OK RadixSort #2 0.1416443 OK RadixSort MT 0.1450525 OK HybridSort 0.1746038 OK (requires special key destribution) HybridSort #2 0.2004885 OK (requires special key destribution) HybridSort MT 0.1838501 OK (requires special key destribution) Arranged Visplane Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.1432773 OK QuickSort #2 0.1449795 OK MergeSort 0.1463830 OK RadixSort 0.1113309 OK RadixSort #2 0.1112624 OK RadixSort MT 0.1146256 OK HybridSort 0.1119915 OK (requires special key destribution) HybridSort #2 0.1343435 OK (requires special key destribution) HybridSort MT 0.1225454 OK (requires special key destribution) Zero Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.0997835 OK QuickSort #2 0.1025110 OK MergeSort 0.0510498 OK RadixSort 0.0564874 OK RadixSort #2 0.0569761 OK RadixSort MT 0.0574547 OK HybridSort 0.1182205 OK (requires special key destribution) HybridSort #2 0.1192455 OK (requires special key destribution) HybridSort MT 0.1155569 OK (requires special key destribution) Ascending Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.0517913 OK QuickSort #2 0.0503646 OK MergeSort 0.0507054 OK RadixSort 0.1051169 OK RadixSort #2 0.0853938 OK RadixSort MT 0.0922289 OK HybridSort 0.0644753 OK (requires special key destribution) HybridSort #2 0.0589564 OK (requires special key destribution) HybridSort MT 0.0610098 OK (requires special key destribution) Descending Data Sort Test 1048576 Items (Mean time from 4 tests) Algorithm Time Validation QuickSort 0.0528264 OK QuickSort #2 0.0515176 OK MergeSort 0.0725585 OK RadixSort 0.1597631 OK RadixSort #2 0.1501966 OK RadixSort MT 0.1479498 OK HybridSort 0.0655272 OK (requires special key destribution) HybridSort #2 0.0612327 OK (requires special key destribution) HybridSort MT 0.0605859 OK (requires special key destribution) Edited September 26, 2018 by therektafire 1 Share this post Link to post
therektafire Posted September 26, 2018 (edited) Verdict: if the zero data, ascending data, and descending data tests are anything to go by, radix sort sucks ass with presorted stuff (at least I assume that the data in these tests is presorted), and slow when there are a small number of sprites, but fast when there are a large amount of sprites like in Nuts. Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen 1 Share this post Link to post
jval Posted September 26, 2018 2 hours ago, therektafire said: Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen That's what I have in mind. Radix sort seems to behave better that mergesort & quicksort when we have more than 500-1000 vissprites to draw. 2 Share this post Link to post
kb1 Posted September 26, 2018 21 hours ago, dpJudas said: One of the first things they teach at university in sorting class is that generic algorithms can lose out heavily against domain specific ones. In this case, specifically, sorting a single huge sector of unsorted objects is highly atypical to what you'll encounter in real maps. I've been told that I don't have to play along, but you've been very polite, so I'll be brief: Your use of the terms "atypical" and "real map" tend to confuse the issue. Those things that you were taught are pretty good generic advice on how to typically get generically decent results, generically speaking. This describes the Doom we all know and currently enjoy. However, this thread is, in fact, striving for an atypical case: extreme performance. (Yes, I apologize for using your terms to make my point.) 21 hours ago, dpJudas said: In real maps the general rule is that when you have a big open viewable space you will typically have many sectors, or at least many subsectors, and you will have many segments. Because if you didn't, the map would be plain and boring. Plain and boring fits Nuts well, by the way. There's that term "real map", suggesting that, again, in general, most non-Nuts maps will, based on their "realness", have some effect upon the performance of the sort routine: They won't. It's all about the raw numbers of vissprites, and that's it. Now, that's not to say that, devising some method that allows you to avoid sorting that many vissprites is not the right thing to do, but that's not what's being discussed. Typical/atypical...who cares? Yes, faster is faster. And, faster is faster, even if you render a subsector at a time, or not. You say that the sprites are "generally well sorted"...what does that mean? That it's ok for a sliding corpse to jump in front of, then in back of, say, a fireball? You know that, when you turn 180 degrees, many sprites will reverse order, but for 90 degrees, some will, and some won't. How do you propose to tell the renderer which sprites to care about? They're either in proper order, or they're not. And then the "it's only 1.6%" or "0.3%" argument. With that approach, why optimize anything? 21 hours ago, dpJudas said: If your general point is that faster is faster, then why not go all-in on fast and use the domain specific knowledge available and use the BSP. Even Carmack commented this is the right approach. Fully agree - why not? 21 hours ago, dpJudas said: Thread synchronization has a cost. For 0.3 milliseconds run time it will probably make it overall slower to use threads for this. Typically? In general? How in the world do you know that? On what PC? For how many sprites? Where did 0.3 come from? Your definitive answer is that it's going to be slower to periodically ask "Is the sort finished?" 17 hours ago, jval said: Please consider the fact, that the absolute values are in a relatively fast modern core i7 processor. The absolute and relative gain will be higher in a lower-end CPU. I'll post some tests as soon as I can reach a lower - spec PC. I can not use the method using the BSP since I can not fully understand it. I've seen some of your brilliant work. Maybe leave this for a future improvement :) 17 hours ago, jval said: One other approach I've considered is front to back sprite drawing using dephbuffer, to avoid overdraw. I think Quake 1's software renderer used a per-object Z-buffer to handle hidden surface removal when rendering a model, but that's not the same thing. A full screen depth buffer would fix a lot of the anomalies of the software renderer, like when you can see slivers of sprites sticking out from the edge of staircases. Unfortunately, you're still doing overdraw...this time, to a depth buffer vs. a screen buffer, with the additional overhead of having to read before conditional write. 17 hours ago, jval said: I've already tested this with good results, since the sorting is performed right after the bsp traverse, in paraller with the column and flat drawers. Since the sorting is finished in less time than the drawers, I'm considering the possibility to use the extra thread for other stuff also, to improve multiple-core expoiltation. Yes, especially with a sort whose runtime speed can be trusted. 13 hours ago, therektafire said: Verdict: if the zero data, ascending data, and descending data tests are anything to go by, radix sort sucks ass with presorted stuff (at least I assume that the data in these tests is presorted), and slow when there are a small number of sprites, but fast when there are a large amount of sprites like in Nuts. Meaning that it would probably be the best to implement multiple sort algorithms and use them conditionally based on the number of sprites on screen Radix sort does not "suck ass". You've got it a bit backwards: Unlike most other general sort methods, Radix sort speed is very predictable, regardless of pre-sort state. But you do make a good point: Having the engine switch out the sort function, based on the vissprite count, is a great idea. Radix sort has some additional overhead, in setup time, and in memory usage, compared with some other sort methods. As with most things, the key is to find the most appropriate tool for the job. 1 Share this post Link to post