Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
jval

Potential for improvement in vissprites sorting

Recommended Posts

Last days I'm exploring various sorting algorithms and their performance.

I'm focusing in vissprites sorting, a task that could affect performance in scenarios with high number of vissprites. In such a case a slow down can occur, and this is regardless the screen resolution, or rendering effects.

 

So, I've created a test case, by using random (but vissprite - like) data, and tried some algorithms, including selection-sort, quick-sort and 2 radix-sort variants.

 

Here are the (interesting) results (bench values in msecs):

 

vissprites_sort.png.c233595a6f84bf97e328d10842526523.png

 

The tests were performed in a mobile core-i7 7700HQ, using critical CPU priority, without other applications running in the background.

 

As expected using an O(n^2) method, such as selection-sort will be catastrophic, when we have more than a few hundred of vissprites.

 

To make things more clear, here is a graph comparison between quick-sort and the 2 radix-sort implementations:

 

vissprites_sort_chart.png.d03896cd01a33f8e5cc60df7ea9fac42.png

 

As we can see, there is definitely a performance gain in favor of radix-sort compared to the fast quick-sort.

On the other hand, the cons of radix-sort is higher memory usage.

 

As a real world example I've tried nuts.wad with the following fps performance (DelphiDoom - 960x600x32 bit color software multi-core).

The opening scene draws 3744 vissprites:

vissprites_fps.png.fd7a9c9919a4283c6b735ac9d661566e.png

 

Other methods were tried in the test case (counting sort variants, multithreaded variants, hybrid variants etc) but better performance occured only in an insane number of vissprites (more than a few hundred thousand).

 

As a conclusion, we can see that despite the fact that radix-sort is about 4 times faster than quick-sort, the fps gain is poor (but not negligible).

Radix-sort pros include that is also stable sorting algorithm, and cons the memory usage.

 

Further work hopefully will include merge-sort, bucket-sort etc...

 

vissprites_selection.jpg.0eefa727de4836689a52d5ca5d1fe178.jpg

 

vissprites_quicksort.jpg.08319c935a76afe5eb748aa76a478b2e.jpg

 

vissprites_radixsort.jpg.1690ddedce69ed7329a22a70f162ecdb.jpg

 

Edited by jval

Share this post


Link to post

What do you use as the pivot for quicksort? I'm guessing that vissprites are "mostly sorted" already by virtue of the fact that the BSP tree's distance ordering. So picking something from the middle of the list probably makes the most sense.

 

Although, you probably want to use mergesort instead of quicksort because it has better worst-case performance (nlogn instead of n^2)

Share this post


Link to post

I'm using the middle of the partition as pivot for qsort. Haven't tried merge-sort yet, but I'm planning to.

The vissprites are branches of almost sorted sub-arrays, for example here are the first keys (scale) of an unsorted vissprite table, as created by the bsp (real data from a random drawing frame):

4648 <- 1rst subsector start
4648
140429 <- 2nd subector start
140432
140436
140440
140443
89364 <- 3rd subsector start
89371
89368
89367
89365
65533 <- 4th subsector start
65533
65534
65541
65539
65538
65537
65537
65536
65535
etc...

 

How can we take advantage of such this pattern of the unsorted data? Bitonic sort maybe?

 

 

Edited by jval : Change 'sector start' to 'subsector start'

Share this post


Link to post

What exactly is the number by which they are sorted? What does it represent, where/when is it computed, where/how is it stored. (EDIT: I just want to understand how does the pattern in the data emerge, whether there would be any advantage in computing/storing additional data along with computing them to be used for the sorting later, also where/how the data is stored may affect performance of different sorting algorithms differently.)

Edited by scifista42

Share this post


Link to post
6 minutes ago, scifista42 said:

What exactly is the number by which they are sorted? What does it represent, where/when is it computed, where/how is it stored.

 

The array is sorted by the scale field. It is calculated during bsp traverse (stack is R_ProjectSprite/R_AddSprites/R_Subsector) . 

It's a fixed_t type which represents the scale that we will draw the sprite in screen. Larger values are near the player/view, lower values are far away.
It is stored in a vissprite_t struct.

The vissprites that are drawn at each frame are stored in the vissprites array.

The vissprites array is sorted ascending, and the drawing is performed from the 1rst element to the last (back to front).

Vanilla declaration is vissprite_t vissprites[MAXVISSPRITES]; (MAXVISSPRITES = 128).

Source ports change this declaration to remove the 128 limit.

Share this post


Link to post

for the record, i'm not sure how much this'll help, because even in the bad case i don't think the sorting is a particularly expensive part of rendering a scene in doom?

 

you may be able to save time by being aware of how sprites are added, as in, per-sector, and doing the sorting within a sector and then you have separate lists that you can merge more easily by checking their start/end

 

this wont rly help on nuts.wad which is just 1 big sector though but i think nuts.wad is a poor test case for doom in general;

if you want to check the many-sprites cases, stuff like sunder.wad is more realistic imo

Share this post


Link to post

@jval I was originally under an impression that you were doing these tests to improve performance of a specific source port, but since you didn't provide any source port specific info in your last post, I guess you're after solving the problem in general. I'm going to look at it more abstractly, then. The pattern in the data looks very exploitable and makes me think of an algorithm like this:

  • As vissprites are being added, store a list of sectors that have at least one vissprite each, and for each such sector, store a list of its vissprites and also store the minimum and maximum vissprite scale of all the sector's vissprites
  • Sort the lists of vissprites within each sector, the algorithm may vary for each list depending on its number of vissprites and on the difference between the maximum and minimum vissprite scale in the sector and even on how well the list was already sorted
  • Check for pairs of sectors (in the list of sectors with vissprites) where the first sector's maximum vissprite scale is lesser or equal than the second sector's minimum vissprite scale, append the second sector's list of vissprites to the end of the first sector's list, change the first sector's maximum vissprite scale to the second sector's one, and remove the second sector from the sector list
  • Merge the vissprite lists of the remaining sectors by mergesort's merging logic

Basically mergesort with an intermediate step of simply appending sorted lists whose ranges of vissprite scales do not overlap.

 

EDIT: Nevermind, anotak already said it, and with less words, too.

48 minutes ago, anotak said:

you may be able to save time by being aware of how sprites are added, as in, per-sector, and doing the sorting within a sector and then you have separate lists that you can merge more easily by checking their start/end

Edited by scifista42

Share this post


Link to post
2 hours ago, anotak said:

for the record, i'm not sure how much this'll help, because even in the bad case i don't think the sorting is a particularly expensive part of rendering a scene in doom?

 

It depends, but in a scene like the infamous Frozen Time the sprite rendering takes a very significant part of the total - something in ballpark of 40% if I remember correctly.

 

In general I have to agree with you that using nuts is probably not the best benchmark for sprite performance as its high sprite count in a single sector could easily lead to the wrong conclusions. The benchmark numbers here seem to indicate they are all based on the start location of the player in nuts, which is problematic as the modder that created the map may have placed all the monsters systemically in an array-like fashion in the wad file. What that means is the array to be sorted might not be typical and we are looking at it from a specific angle - the numbers might shift in other sort algorithms favor if you view the scene from a different direction. For example if we look at it from exactly the opposite direction it is very likely the arrays will be initially sorted exactly opposite to Fraggle's cautious "mostly sorted" assumption.

 

I think it is also worth mentioning that partly what makes sprites rendering in Doom silly slow isn't so much the initial sorting of the sprites, but rather the way it applies sprite clipping with regard to segments. I can't say for sure if this was a problem in the original Doom version, but at least the version in ZDoom is horribly inefficient as it constantly seeks in the sorted sprite array for segments that may be in front of any given sprite. This problem won't at all be revealed by a nuts test as you basically have about 20 segments in total for the scene.

 

In my opinion if you really want to improve sprite performance in general, as opposed for just nuts, you really should consider changing Doom into rendering sprites at the subsector level. Ideally by tracking which subsector a sprite belongs to instead of just which sector. This gives you two advantages. First, you only have to depth sort within a single subsector, the initial BSP front-to-back pass will already have sprites sorted across subsector boundaries. Secondly, it allows you to skip the segment search entirely. It no longer need to compare each sprite's location to all segments in front of it - subsectors guarantee that it will always be behind them all.

Share this post


Link to post

that's what SoM mentioned the other day too, regarding adding sprites by subsectors.

 

i had an idea for an alternate approach i was starting to implement last week before i got sick (still sick, unfortunately) where you iterate over the vissprites & drawsegs both in front-to-back order, and generate clipping arrays that can merge between sprites (like visplanes). ideally this should greatly reduce the cubic nature of the existing clipping algorithm (vissprites * drawsegs * columns). i'm not sure how practical/efficient this actually is; but i thought it was worth exploring

Share this post


Link to post

@jvalThanks for posting these interesting results! It always amazes me how one small aspect of the Doom engine can bring rendering speed to its knees!

Nuts.wad *is* a good test scenario, simply because it demonstrates the case where you need quick sorting. In "normal" scenes, there's no enough sprites to need a fast sorting algorithm Sure, nuts.wad is not typical, but typical scenes sort fast anyway. In other words, if you can make nuts.wad sort (and render) quickly, every Doom scene involving lots of sprites has a good chance of performing well.

 

I don't think there's any way to reliably depend on the order of the vissprites, because this order regularly changes. But, of course, if there's a way to reduce the number of items to sort, you'll improve sorting performance.

 

So, how much memory are we talking about, for radix sort? Does the key consist of some upper bits of the distance (inverse scale)? You could always do a rough pre-sort with, say, radixsort, with possibly a smaller memory footprint, that separates items into groups. Then, you could sort each group with a simple, non-memory intensive sort that performs ok with reduced item counts. This could afford you the best of both worlds: Lesser memory usage and faster speed, at the cost of complexity (requiring a multi-step, multiple sort algorithm approach).

 

Something to think about with the multiple sort algorithm approach: For close-up sprites, sorting by scale makes sense, but for further-away sprites, it may make sense to use distance instead. The correlation between the two is non-linear. As distance increases, scale decreases very little, which has a negative effect on the grouping aspect of radixsort. Profiling some specific Doom scenarios will shed some light on this phenomenon:

  • Scenario 1: Lots of nearby monsters
  • Scenario 2: Lots of far-away monsters
  • Scenario 3: A combination of both
  • Scenario 4: Lots of monsters spread out evenly along the line-of-sight

Ideally, for Scenario 4, you want the first sort, the radix sort, to "fill each bucket" with the same number of monsters, to be able to exert some control on the number of elements the second sort has to deal with. Using scale might be appropriate here. Perhaps the second sort is also a radix sort, but uses distance vs. scale, to provide sufficient precision and "dynamic range", allowing it to also evenly fill each division. It may require a lot of trial and error, and a lot of profiling to produce best results, but it's worth it to keep those frames coming :)

 

@anotak: When you feel better, can you elaborate a little bit on your idea? I don't know if I understand what you're describing, but I think I had a similar idea. Are you describing clipping overlapping sprite spans to avoid sprite overdraw? If so, this has some real potential.

 

In some experiments I did a while back, I found that sprite overdraw is a *big* performance killer...way bigger than I expected. I think it's a cache invalidation issue. One of the worst culprits is when you're standing at some threshold, picking off monsters, causing their corpses to fall at the same place. Some map layouts promote this.

 

Once you get a couple dozen bodies piled up on top of each other, you can watch the frame rate plummet as you approach the pile. Because of the holes in sprite spans, it's not so easy to build clipping spans, as it is with textures, but if you could, it could improve a majority of Doom scenes, including that Nuts screenshot.

Share this post


Link to post

it has little to do with overdraw and entirely to do with clipping-sprites-by-drawsegs, though it may be possible to extend the method i'm working on to prevent *some* overdraw in certain very specialized cases

Share this post


Link to post
32 minutes ago, anotak said:

it has little to do with overdraw and entirely to do with clipping-sprites-by-drawsegs, though it may be possible to extend the method i'm working on to prevent *some* overdraw in certain very specialized cases

 

PrBoom+ uses multiple lists of drawsegs, split into ranges (Halves & the entire list - 3 ranges total) to improve O(vissprites * drawsegs) complexity.

A similar approach I use in DelphiDoom (Quarters, halves & the entire list - 7 ranges total).

 

Share this post


Link to post
5 minutes ago, jval said:

 

PrBoom+ uses multiple lists of drawsegs, split into ranges (Halves & the entire list - 3 ranges total) to improve O(vissprites * drawsegs) complexity.

A similar approach I use in DelphiDoom (Quarters, halves & the entire list - 7 ranges total).

 

i do have this un-accepted pull request for EE sitting around that sorts drawsegs by screenspace X and then binary searches the list for each sprite. it offered major improvements on wads like comatose & the given.

 

but this other method i was investigating i suspect would be even better

Share this post


Link to post
9 hours ago, kb1 said:

@jvalSo, how much memory are we talking about, for radix sort? Does the key consist of some upper bits of the distance (inverse scale)? You could always do a rough pre-sort with, say, radixsort, with possibly a smaller memory footprint, that separates items into groups. Then, you could sort each group with a simple, non-memory intensive sort that performs ok with reduced item counts. This could afford you the best of both worlds: Lesser memory usage and faster speed, at the cost of complexity (requiring a multi-step, multiple sort algorithm approach). 

 

 

My implementation uses twice the size of vissprites array (2 auxiliary arrays with same size as the unsorted array ) plus a constant array of 5120 * sizeof(int). But ...not always - most of the time will use only one auxiliary array... The optimized RadixSort of the test case will use most of the time one extra array and some times 2. The non optimized version uses always 2.

In DelphiDoom vissprites array is an array of pointers to the actual data, not an array of structs, so I exchange/read/write only pointers, not the actual struct data. Also this means less extra memory usage. If a port uses an array of vissprites, not an array of vissprite pointers, then the memory usage will rise.

 

 

Share this post


Link to post
20 hours ago, anotak said:

i do have this un-accepted pull request for EE sitting around that sorts drawsegs by screenspace X and then binary searches the list for each sprite. it offered major improvements on wads like comatose & the given.

 

but this other method i was investigating i suspect would be even better

I re-read your description, and I think I understand better. It'd be really cool if you could toggle between the original and new methods runtime, and measure the difference, but that'd probably be a nightmare. Very interesting. I'd love to know what performance gains you get, and how difficult it was to implement.

 

When I first read your description, I immediately thought of overdraw, due to the fact that I was getting major, unexpected performance losses when the player would approach a mass grave. In some cases, the frame rate would drop to single digits, with only a relatively small number of corpses.

 

We've basically solved the hi-res Doom rendering performance issue in the general cases. It's these diabolic edge cases that require new clever techniques to keep the frame rate going. It seems like Doom still has a lot of room for improvement after all :)

 

12 hours ago, jval said:

 

My implementation uses twice the size of vissprites array (2 auxiliary arrays with same size as the unsorted array ) plus a constant array of 5120 * sizeof(int). But ...not always - most of the time will use only one auxiliary array... The optimized RadixSort of the test case will use most of the time one extra array and some times 2. The non optimized version uses always 2.

In DelphiDoom vissprites array is an array of pointers to the actual data, not an array of structs, so I exchange/read/write only pointers, not the actual struct data. Also this means less extra memory usage. If a port uses an array of vissprites, not an array of vissprite pointers, then the memory usage will rise.

 

 

That doesn't really seem like enough memory usage to really worry about, especially if it profiles well. I'm a fan of big memory usage to gain performance. If you have the memory, why not use it? Nowadays, you have to worry about read/write patterns to keep the cache happy, but in this case, it's one big usage per frame, which seems quite acceptable.

 

I'm glad you brought this up - I've long been worried about the quicksort worst case possibilities. It's right in the name "quick", but it ain't always so :) It's nice to see a robust solution that works in all cases.

Share this post


Link to post

Without having looked into this problem very much, here are some ideas:

I would assume that in most cases, the drawing order doesn't change much, any algorithm that has good performance on mostly sorted data could excel. You could use this algorithm in most cases and switch to a special case algorithm the when view / position has changed drastically (teleport).

The sorting algorithm should take into account the accuracy of the z-buffer or whatever means you use in software for clipping. Sorting more accuratly than what is possible to render might be pointless. Lower precision might lead to flickering, but that would only happen on somewhat distant sprites.

Do you really need to update the sprite order on object that are really far away every frame? If their rendering ends up being only 2 pixels tall, I'd rather have a better frame rate than theoretical high accuracy. Split your sprites into three categories. Close, distant, really distant. Always sort close and one of the other categories for each update. There might be need for some trickery for adjusting the partition between distant and really distant, to even out the load.

Multi threading this could be a possibility, the sorting can run while other things in the engine are done. This won't work on ancient machins, but relatively few people run "horde" maps on low-end machines so old they do not have some form for threading/multi core-support on the cpu.

Share this post


Link to post

@zokumThere might be some merit in assuming that sprite sorting may remain somewhat accurate across frames, especially when far away. The beauty of radix sort is that it automatically creates something like your "Close, distant, really distant" categories, by virtue of what it does. And, it's run time remains pretty constant, regardless of pre-sort order.

 

But, yes, radix sort could be altered to treat far away sprites specially, in a way that's kinda clever. It could sweep from the most-significant to least-significant bits of the sort key, across multiple frames. Effectively, it could take, say 8 frames to put the distant sprites in exact sort order (if they don't move). This basically follows radix sort's algorithm, split into multiple frames. Personally, I'd be worried about the flickers, and about trying to maintain state between frames, with sprites changing from visible to not-visible, and moving through different distance categories. If not implemented exactly right, there could be lots of subtle bugs occurring in ways difficult to track down. If you could make it work, it might provide a nice boost.

 

I think I'd prefer a straight-forward simpler approach that performs well in all conditions. Radix sort is a solid performer, and a good choice here.

Share this post


Link to post

@kb1 Indeed, it's not that much memory in modern computing terms, even in the case that the array holds the actual data, not only pointers. But merge-sort needs only half of the array size as extra memory. Could more extra memory lead to cache hit misses? I 'll try to test this using an lower cache CPU...

 

@zokum Multi threading seems an excellent idea, and I thing the right place to do it is while rendering the Planes (R_DrawPlanes). lI'll try this ASAP!

 

@anotak & @scifista42 I've added merge-sort to the test-case and here are the results:

 

vissprites_sort_chart2.png.a88d0d2007251fb146c2be026ff326ec.png

 

As we can see, merge-sort is faster than quick-sort, but not that fast as the radix-sort. Next step will be to test this in a lower-end CPU as I mentioned above, to make sure that smaller CPU cache does not bottleneck radix-sort.

 

And @anotak since I was curious about this comment of Lee Killough , I've made a comparison of merge-sort in random data vs vissprite-like unsorted data. The results are obvious:

 

vissprites_merge_sort.png.f11670e63354e1d3be9b59ed25e4136c.png

 

 

Share this post


Link to post
3 hours ago, jval said:

I've added merge-sort to the test-case and here are the results:

As we can see, merge-sort is faster than quick-sort, but not that fast as the radix-sort.

 

Did you do the intermediate step of appending lists with non-overlapping ranges? Because of course that merge-sort won't be as fast as radix-sort if you didn't. When used on vissprite-like data, that step should significantly reduce the number of lists to be merged by the standard merge-sort logic, speeding up the most critical part.

Edited by scifista42

Share this post


Link to post
On vendredi 21 septembre 2018 at 12:38 AM, dpJudas said:

In my opinion if you really want to improve sprite performance in general, as opposed for just nuts

That's a point that should be emphasized.

 

nuts.wad is a special corner case. Let's remind everyone that its author created it on a lark, as a test file for the idgames upload process, so that a mess up wouldn't result in the loss of an actual serious work.

 

As a result, nuts.wad is a very atypical level, and that makes it a poor choice for stress-testing the engine. Consider playing nuts.wad with -nomonsters, are you still stress-testing anything? I mean, here's the map:

hCUwWY7.png

How many real Doom maps look like this? What does it mean for your port if you optimize it to run maps like this instead of maps that look like actual Doom maps? What if you gain 2 FPS on nuts.wad but lose 5 FPS on any map with more than two rooms, do you claim victory?

Share this post


Link to post
7 hours ago, scifista42 said:

 

Did you do the intermediate step of appending lists with non-overlapping ranges? Because of course that merge-sort won't be as fast as radix-sort if you didn't. When used on vissprite-like data, that step should significantly reduce the number of lists to be merged by the standard merge-sort logic, speeding up the most critical part.

 

No, I did 't include the intermediate step. I used the a pattern of unsorted vissprites as created by the bsp.

 

 

3 hours ago, Gez said:

nuts.wad is a special corner case. Let's remind everyone that its author created it on a lark, as a test file for the idgames upload process, so that a mess up wouldn't result in the loss of an actual serious work.

 

As a result, nuts.wad is a very atypical level, and that makes it a poor choice for stress-testing the engine. Consider playing nuts.wad with -nomonsters, are you still stress-testing anything? I mean, here's the map:

 

How many real Doom maps look like this? What does it mean for your port if you optimize it to run maps like this instead of maps that look like actual Doom maps? What if you gain 2 FPS on nuts.wad but lose 5 FPS on any map with more than two rooms, do you claim victory?


 

2 hours ago, Cacodemon345 said:

Nuts.wad frankly makes 0 sense for any vissprites performance optimization. It was just a joke map.

 

The tests were not performed in nuts.wad, it was just picked as an easy choice for screenshot, since it has many vissprites at start of the map.

The benefit stands in all maps, since we're trying to optimize a task that needs constant time of CPU time regardless the screen resolution.

That's why the benefit also is more observable in lower-end CPUs.

@Gez Why lose fps?

 

Share this post


Link to post

maybe you slipped up when you said the word "stress testing", but:

nuts.wad is actually a good test case for stress testing doom software specifically because it is so strange. because it is an "edge case". you always test edge cases.

who cares if it was a joke? that's completely irrelevant, some people play it seriously, and that's what matters. and even if they didn't, why shouldn't we let folks enjoy their joke in 60fps?

 

however, testing it is different than what you do with the information.

my earlier comment was that it is not the general case that should be optimized for if it sacrifices other maps.

especially because all the ports mentioned in this thread already run it at 60+FPS, so who cares. though i suppose the people with 144hz monitors might want an improvement there.

the thing is that testing it gives you a "canary in the coal mine" if something will be wrong in other situations. a serious performance problem handling nuts might lead you to investigate and find other issues elsewhere.

 

there's a reason my set of test wads typically includes "atypical" works like nuts, planisf2 & choz

Share this post


Link to post
4 hours ago, Gez said:

As a result, nuts.wad is a very atypical level, and that makes it a poor choice for stress-testing the engine. Consider playing nuts.wad with -nomonsters, are you still stress-testing anything?

Let's remind ourselves real quick that engines can have more than just a single bottleneck that needs dealing with.

 

Maps like nuts are good for stresstests that involve lots of active "thinkers" at once, which is not something many other maps do to such a degree. That said, for anything that involves active monsters, it is a good "benchmark WAD" to check how well the engine handles the workload.

 

Map geometry is a whole different deal, it occupies a different "space" in the engine and clearly that is not something you would test with nuts. That's where maps like frozen time come in handy, because they put a lot of stress on how the engine handles map geometry, in particular lots of 2-sided linedefs.

 

4 hours ago, Gez said:

What if you gain 2 FPS on nuts.wad but lose 5 FPS on any map with more than two rooms, do you claim victory?

This is a load of bollocks for reasons I pointed out above.

Share this post


Link to post
1 hour ago, jval said:

@Gez Why lose fps?

 

Because optimization is a fickle goddess, and focusing too much on an abnormal special case can result in losing sight of what's needed in the general case.

 

3 minutes ago, Nine Inch Heels said:

Maps like nuts are good for stresstests that involve lots of active "thinkers" at once, which is not something many other maps do to such a degree. That said, for anything that involves active monsters, it is a good "benchmark WAD" to check how well the engine handles the workload.

The thread isn't about optimizing active thinkers, it's about optimizing sprite sorting.

 

The geometry becomes relevant for that because sprite sorting depends on the BSP tree.

Share this post


Link to post
On 9/20/2018 at 3:38 PM, dpJudas said:

In my opinion if you really want to improve sprite performance in general, as opposed for just nuts, you really should consider changing Doom into rendering sprites at the subsector level. Ideally by tracking which subsector a sprite belongs to instead of just which sector. This gives you two advantages. First, you only have to depth sort within a single subsector, the initial BSP front-to-back pass will already have sprites sorted across subsector boundaries. Secondly, it allows you to skip the segment search entirely. It no longer need to compare each sprite's location to all segments in front of it - subsectors guarantee that it will always be behind them all.

 

Funny that you say this, for two reasons:

 

  1. Rendering sprites at the subsector level is critical to changing the renderer to do The Right Thing, as elaborated upon by John D. Carmack, December 23, 1997:
     
    Quote

    The way the rendering proceded from walls to floors to sprites could be collapsed into a single front-to-back walk of the bsp tree to collect information, then draw all the contents of a subsector on the way back up the tree.  It requires treating floors and ceilings as polygons, rather than just the gaps between walls, and it requires clipping sprite billboards into subsector fragments, but it would be The Right Thing.

     

  2. @Kaiser has recently been rewriting the Doom64 EX renderer to draw everything in a single pass (in other words, doing The Right Thing), and the key for him was to split up sprite rendering by subsector.
     

Share this post


Link to post

The softpoly software renderer in GZDoom also renders its sprites by subsector, although it doesn't really deal with clipping the proper way. The Doom software renderer in GZDoom also sorts by subsector when models are present, although I left out clipping the sprites across subsector boundaries (can cause render order problems).

 

The really tricky part is supporting all the render hacks and Boom extensions - I don't know how much of that had to be done for a Doom64 solution. But nice to hear other people, even Carmack himself, favors this approach. :)

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×