Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
fabian

Sorting algorithms for sprites

Recommended Posts

As an anecdote, I have just changed sprite ordering in Crispy Doom from Vanilla's linear sorting algorithm to stdlib's native qsort() and it turns out to make a huge difference in scenes with many sprites (obviously) but not so much in others. What I want to say with this is, the bottlenet is not "just there". Sometimes it is sprite ordering, sometimes it is collision tracking, sometimes it is low-level rendering. But there is not something like "the bottleneck" in Doom's rendering pipeline.

Share this post


Link to post
fabian said:

As an anecdote, I have just changed sprite ordering in Crispy Doom from Vanilla's linear sorting algorithm to stdlib's native qsort()



Be careful with that, there's maps that depend on a stable sorting order which qsort does not provide.

Share this post


Link to post

qsort, if implemented using actual quicksort, is a potentially bad choice for other reasons, too. The keys (sprite projected z positions) are largely already in order thanks to BSP adding sectors in a front-to-back manner. Sorting mostly ordered keys is one of the situations that can trigger worst-case behavior in quicksort, in which it decays from the ideal O(n log n) to O(n^2) time. Unfortunately it's unpredictable because it also depends on your pattern of partition element selection, which when compiling against a standard library implementation, is out of your control (or knowledge even, in the case of a closed-source libc implementation).

There's no way a known sorting implementation won't outperform this, first of all because you can inline your comparator function, something qsort is incapable of doing.

Strife: Veteran Edition was using qsort for sprites initially in its GL renderer. It showed up as one of the top uses of time during profiling, so I replaced it with a customized mergesort and it was faster by a factor of 10, removing the sprite sorting process as a profiling hotspot entirely.

https://github.com/svkaiser/strife-ve/blob/63ac6f880aa17da63f3b204896a6744c394855ef/strife-ve-src/src/opengl/rb_drawlist.c#L81

Share this post


Link to post

I thought that MSVC's version of qsort, at least, was guaranteed to be stable (?).

Share this post


Link to post
Graf Zahl said:

Be careful with that, there's maps that depend on a stable sorting order which qsort does not provide.


Could you please point me to a situation where stable sorting order actually matters? I am currently unable to make one up in my mind. :/

Quasar said:

qsort, if implemented using actual quicksort, is a potentially bad choice for other reasons, too.


GNU glibc actually uses a mergesort algorithm in qsort() and it performs incredibly well (the next article on this site discusses the O(n^2) scenario that you mentioned in your post):

http://calmerthanyouare.org/2013/05/31/qsort-shootout.html

I should probably add a CPP check for __GLIBC__ and use the qsort() approach only then.

Edit: How does the original Vanilla sorting algorithm scale? It seems to me that it scales O(n^2) generally, so any implementation that has the chance to scale better is already an improvement, or not?

Share this post


Link to post
fabian said:

Could you please point me to a situation where stable sorting order actually matters? I am currently unable to make one up in my mind. :/

All the maps that are listed there as getting the "invert sprite sorting" option.

Share this post


Link to post
fabian said:

Could you please point me to a situation where stable sorting order actually matters? I am currently unable to make one up in my mind. :/


Well, the worst that can happen is some sprites flickering in and out of their proper position, especially if they are too close to walls or masked textures (which are also affected by the rendering sorting, somehow).

Share this post


Link to post

... and that's exactly the problem in the mentioned maps which deliberately overlay some sprites to create some special effects.

The sorting order was inverted by accident in Boom and this propagated to nearly all other source ports as well but since we do not know how many maps out there depend on Boom's ordering the only option left is to have an option to toggle.

ZDoom uses std::stable_sort, for sorting sprites, btw.

Share this post


Link to post
fabian said:

I should probably add a CPP check for __GLIBC__ and use the qsort() approach only then.


The sorting algorithm is also specific to the libc you are using also.

For example:

A) Architecture A uses sorting algorithm A because it is faster on said CPU, it is unstable.
B) Architecture B uses sorting algorithm B because it is faster on said CPU, it is stable.

Since it is unspecified whether sorting is to be stable or not, this allows specific optimizations for each architecture. So while your level might work on one system, it may break on others.

Share this post


Link to post

I think the way I ended up fixing this results in a stable sort, is universally portable and independent of the actual qsort() implementation:

https://github.com/fabiangreffrath/crispy-doom/commit/06364231fd3d1432240ff8d485241b1bf5b26384

Also, I think changing to use qsort(), although it scales with O(n^2) under catastrophic circumstances, is never a regression compared to the naive sorting algorithm used in Vanilla Doom. Would someone confirm this? ;)

Share this post


Link to post
fabian said:

Also, I think changing to use qsort(), although it scales with O(n^2) under catastrophic circumstances, is never a regression compared to the naive sorting algorithm used in Vanilla Doom. Would someone confirm this? ;)


Well, a common trick question for CS students is posing them the problem of determining if Bob, who claims that his O(n^2) algorithm can be faster than Alice's O(nlogn) one under some circumstances, can possibly be right.

Spoiler

In the actual problem, the complexities were given in a more specific form (e.g. O(n^2+3) and O(5n(1+log(2n))). Up to some small values of n, it could really turn out that the "naive" quadratic time algorithm could be faster than the "smarter" one, precisely because there is less logic and borderline cases to handle, and there are less operations to perform when actually switching an element: just a comparison and swap, instead of branching/recursion/pivoting etc., plus a linear behavior on already sorted lists. In RL scenarios, there might be other constraints, e.g. a space-speed tradeoff, significant penalties for using more complex code and a sufficiently small number of elements, which may make a naive algorithm adequate. In Dooming terms, I wouldn't be surprised to discover that for typical vanilla levels and with the memory constraints of the DOS machines of the era, a more advanced sort would bring only marginal, if any, benefit. For something like Nuts.wad ofc, the rules change completely.

Share this post


Link to post
Maes said:

Well, a common trick question for CS students is posing them the problem of determining if Bob, who claims that his O(n^2) algorithm can be faster than Alice's O(nlogn) one under some circumstances, can possibly be right.

Mmm yeah except I got the idea to try merge sort for SVE from MBF, which was profiled on a 486 (thanks Lee).

Share this post


Link to post
Quasar said:

Mmm yeah except I got the idea to try merge sort for SVE from MBF, which was profiled on a 486 (thanks Lee).


Is that in-place? ;-)

It would be interesting to see if the increased requirements for an algorithm (both in terms of code complexity and storage) could end up outweighing any performance benefits gained from it, if n is sufficiently small. OK, when sorting 1000 sprites it would be much better to do 3000 ops rather than 10000000, even if that means using some extra memory, but what if e.g. a complete quicksort op is on average 5 times slower than a selection sort one, due to cache and other considerations? In that case, there might be a "break even" point that's surprisingly high, and well beyond the range of "typical" vanilla sprite counts.

This list on wikipedia is very interesting:

https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms

Other than the lower theoretical Ω(nlogn) bound on comparison-based algorithms, it's interesting to see no algorithm can have at the same time:

  1. Stable sorting
  2. Constant/unitary memory usage
  3. Worst-case execution time lesser than O(n^2)
  4. Simple implementation (code-wise).
  5. Easy to parallelize.
Block Sort has all but 5.

Edit: now that I think about it, seeing how Doom was not always successful with choosing proper fixed static limits, I wonder how a sorting algorithm which requires an additional O(n) storage would've worked. Unlike displaying the sprites, you cannot trivially reject some of them from the sort a-priori. And doing a malloc/free each time wouldn't be desirable for other, obvious reasons.

Share this post


Link to post

If you go by the academic definition and interpretation of the big-O notation, yeah, bubble-sort will always be an algorithm with a worst-case (and average-case) O(n^2) execution time, and it'd be crazy to use it "for a sufficiently big n" (another academic approach). But it also is true that for a "sufficiently small n" a specific implementation of Bubble Sort can be faster than a specific implementation of e.g. Qsort.

In general, almost all O(n^2) sorting algorithms are also simpler to code (they only have loops, no recursion, and very few checks for special cases). More compact code means less CPU-memory fetches, and the O(1) memory requirements mean less cache pollution, which may be a more important factor on a restricted system than using an algorithm that is theoretically faster, but the conditions for benefitting from it (e.g. large n, ease in allocating memory etc.) are not there.

As I said before, I would really like to see how vanilla Doom could accomodate a sorting algorithm with a variable-n storage requirements, without resorting to artitrarily removing sprites from view before they are even sorted.

Share this post


Link to post

You could also have a red-black tree so that you sort on in insertion.

If you have a sprite limit, you can knock off any sprites that are further or of lower priority. ReMooD does this, and it is noticable in NUTS.WAD. Only the sprites closest to your are drawn when there are a few thousand. If given a choice between a monster and a corpse you are instead given the monster. You have to drop off priority with distance because otherwise very distance high priority objects will make lower priority nearby objects invisible.

Share this post


Link to post

A long-distance "z-cutoff" would be the trivial approach to do, but it would result in annoying "popping" effects, of which Doom was -mostly- free.

Sure, vanilla Doom does limit the amount of sprites it displays, but not the amount it sorts. The trouble with setting any arbitrary amount of "z-cutoff" is that this z value should be variable: if a scene contains "a few" sprites, below the sorting limit, it should be unrestricted. What if they are numerous but all fairly far away though? What if they are more or less evently distributed? That "z" value must be frame-adaptive, and that means an extra O(n) step during each frame, to assess where to place the cutoff distance. At least you don't need O(n) extra storage for that ;-)

Share this post


Link to post
Maes said:

Is that in-place? ;-)

It would be interesting to see if the increased requirements for an algorithm (both in terms of code complexity and storage) could end up outweighing any performance benefits gained from it, if n is sufficiently small. OK, when sorting 1000 sprites it would be much better to do 3000 ops rather than 10000000, even if that means using some extra memory, but what if e.g. a complete quicksort op is on average 5 times slower than a selection sort one, due to cache and other considerations? In that case, there might be a "break even" point that's surprisingly high, and well beyond the range of "typical" vanilla sprite counts.

It's possible to do hybrid sorts that roll over to a simpler algorithm to sort runs that are less than some "N" in length where this overhead barrier you are talking about occurs. Really good std::sort implementations do this in C++, usually using either qsort or mergesort for the primary algorithm, and then devolving to a simpler O(N^2) algorithm when N is small, because for very small N those algorithms run faster. It wasn't worth optimizing for in SVE though. mergesort's performance on small N still kept it off the profiling hotspot list, and, there were almost always enough sprites in a scene anyway for it to not be of concern.

Share this post


Link to post

I did some very naive benchmarking on my wife's Windows 10 PC: How many FPS are drawn for the initial view of NUTS.WAD?

For this I compiled Crispy Doom once with vissprite ordering done by Vanilla's naive algorithm and once with qsort(), which for win32 Programs compiled with MinGW uses the implementation from some MSVCRT.dll library. The tests were carried out with an all-standard configuration except for enabled uncapped framerate and the following command line: "./crispy-doom.exec -iwad /z/doom2.wad -file /z/nuts.wad -warp 1 -1 -window".

The results as reported by the SHOWFPS cheat were about 40fps with the naive sorting algorithm and about 160 fps with qsort(). I did not test Killough's msort() implementation, though.

Since NUTS.WAD is probably one of the most extreme examples for the task of ordering roughly pre-ordered sets of sprites, I deduce from this little test that using the system's native qsort() algorithm will never cause a performance regressions compared to Vanilla's naive one. Maybe. ;)

Share this post


Link to post

I'm not sure nuts.wad is so good a test as you think, since sprites in the same sector (which the entire first room of the map consists of one, AFAIK) will be added in sector linking order. Since none of them are moving, that's probably the same order they spawned in, which to the game is arbitrary since it's the order in which the mapper placed down the monsters.

However, what are we talking about when we say a "naive algorithm?" Exactly how bad is the original code and what algorithm is it using?

Share this post


Link to post

Without the ability to plug code in/out of the vanilla doom.exe, with the constraint of having to work with its static memory limitations, such benchmarks are moot.

Share this post


Link to post
Quasar said:

Sorting mostly ordered keys is one of the situations that can trigger worst-case behavior in quicksort

Depends how you choose your pivot though. If you do it right, with the assumption it'll be mostly ordered, it can potentially be very efficient.

Share this post


Link to post

I changed DoomLegacy to use a merge sort some years ago.
The previous code would sort all the sprites before making any decisions.
The merge sort saved some intermediate steps in that the merge sort is done directly into the draw list and avoided creating a temporary sprite sort structure.
It also has a sprite draw limit that is applied during the sort.

When the number of drawn sprites reaches a limit, the near sprites are drawn but the farther sprites are semi-randomly dropped from being drawn.
This happens during the merge sort, which immediately drops sprites that are not going to be drawn.
A pointer to the last near sprite is maintained.
If the next sprite is closer than the last sprite, then it is merged sorted into the near sprites, and the last sprite is kicked far sprites.
The far sprites are semi-randomly dropped as they are entered into the draw list, to limit the size of the far draw list.
Sorting the next sprite into smaller lists speeds up the sorting. Regardless of which sort you use, reducing n as fast as possible will help.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×