Altazimuth Posted April 28, 2023 (edited) UPDATE: It's live! Get any devbuild from May or later to use it! https://devbuilds.drdteam.org/eternity/ Well hi there. I just got done putting the final touches on my multithreaded renderer branch to make it actually usable! While I'd love to just sit on these changes forever and keep my mental state and blood pressure in check, I need people to actually test that I haven't introduced any extreme weirdness or crashes by making all these changes. To that end, it's time for a public beta! You may find the files at the bottom of the post, and the source is on the multithreaded-renderer-fixes branch if you wish to compile it yourself. In the meantime a brief explanation of the new settings and how to mess with them. r_numcontexts: Video Options > Page 2 > Renderer Threads. This does what it says on the tin. The higher the number, the more render contexts there are. This number ranges from 1 to however many threads your CPU supports. The optimal number of threads will vary based on your CPU and how complex the scene you're rendering is. Do not assume that the highest number is the best. Note that this number is technically off by one if you're talking about threads. It spawns one fewer than the number of contexts, because one context is actually run on the main thread. Separately implemented from this, if you only have one render thread then the whole system acts as it used to. Everything is executed on the main thread. r_sprprojstyle: Video Options > Page 2 > Sprite Projection Style. This has three settings, Default, Fast, and Thorough. Default means Fast for 1 thread, or Thorough for more than 1 thread. Fast is the classic Doom sprite projection style—if a subsector isn't visible in a given render context then it won't be renderered. This may cause sprites to have a cut-off appearance between the boundaries of render context windows. As the setting implies though, it is a fair bit faster than the Thorough setting. Thorough will (generally) eliminate the issues seen in Fast. It works by not just considering sprites that have their centre in the sector being renderered, but anything whose hitbox (yes, it's based on hitbox radius instead of sprite size) is within the rendered sector. Previously-drawn sprites that frame are then cached to avoid redrawing. Though this can cause a reduction in performance in more heavy scenes compared to Fast, in your average vanilla WAD you're not likely to notice much of a difference. It's not in the menus, but if you want to compare FPS between settings you will want to set `d_drawfps` to `on`. If you have any scenes where the rendering glitches out where it didn't previously then the ideal thing to provide would be your resolution, number of contexts, any savegames that have the exact frame where the glitch is happening, and the WADs you're running it with. Crashes are much the same as above, though obviously you can't provide a save; the crash report application will give you further instruction on how best to report crashes. BUILDS: 2023-05-01: Windows (x64) 2023-04-30: Windows (x64) 2023-04-29: macOS (x64), macOS (Apple Silicon) 2023-04-28: Windows (x64) NOTE: There's no load balancing yet because I'm so tired, dude, just so tired. I just wanna show this off now that it seems mostly done. I was worried this was gonna take another year or so of off-and-on work but nope, it's working just fine seemingly. You can thank GooberMan for this existing at all. Without Rum & Raisin doom I wouldn't ever have managed this. Edited May 7, 2023 by Altazimuth 51 Share this post Link to post
Edward850 Posted April 28, 2023 fp! Props to all logged in clang ThreadSanitizers. 4 Share this post Link to post
skillsaw Posted April 28, 2023 Wow, awesome. I haven't had much time to play with this but I ran around Heartland MAP05 a bit -- for comparison, I noticed that when using a single thread (on an i9-10900k with 32GB RAM), I was getting between 60-144 FPS (my monitor's refresh rate), roughly scaling with the complexity of the rendered view, with only about 7% CPU utilization (according to task manager). With 20 threads, it's running at 144 FPS more consistently, and utilizing over 90% of the CPU. It still dips a bit into the 130's. Still, pretty crazy. I'll play with this more later. One possible issue with the menu: When using the left arrow key to set renderer threads to the maximum number from single threaded mode, renderer threads is set to 19 (max threads - 1) rather than maximum threads (20). 5 Share this post Link to post
Altazimuth Posted April 28, 2023 10 hours ago, skillsaw said: One possible issue with the menu: When using the left arrow key to set renderer threads to the maximum number from single threaded mode, renderer threads is set to 19 (max threads - 1) rather than maximum threads (20). Thanks for the report, fixed. Good to know it's running far better in Heartland. That was one of my targets, and some scenes really tested the performance. I won't generate a new build for solely this change just to keep things simpler in the case that there are crash reports to deal with soon. I'm half-tempted to make the contexts a number entry field to reduce the window recreation spam, but then there's no exact way to know the max thread count, and also number entry fields don't work if you're going controller-only which is something I've been trying to improve the experience for recently. 1 Share this post Link to post
Altazimuth Posted April 29, 2023 There's now macOS builds (both x64 and ARM) courtesy of printz. There's no major differences between them and the builds I made yesterday. The only fix present is for the off-by-one that skillsaw pointed out, and that's extremely minor. 3 Share this post Link to post
esselfortium Posted April 29, 2023 This is awesome, thank you for your hard work! 3 Share this post Link to post
Redneckerz Posted April 29, 2023 This definitely is a great upcoming feature on Eternity. Thanks to Altazimuth for even starting this and Gooberman for his pioneering work in multithreading Doom. 1 Share this post Link to post
Darkcrafter07 Posted April 29, 2023 There is such an FPS progression I have noticed on Heartland: 1 thread - 98 FPS, 2 threads - 110 FPS, 4 threads - 117FPS and more threads it has the lower performance it gets further, good job! Especially that I prefer capped framerates in software modes. 2 Share this post Link to post
Altazimuth Posted April 29, 2023 Thanks for all the kind words. I've uploaded a new build that fixes a crash if you changed the number of contexts after a swirling flat had been rendered (it'd crash as soon as another swirling flat was drawn). 4 Share this post Link to post
Darkcrafter07 Posted April 30, 2023 That's cool and being such a nice addition to the renderer. I'm wondering if the thing could have been of a much great help for rendering maps with 3D models if these were ever implemented. 0 Share this post Link to post
Edward850 Posted April 30, 2023 I don't believe that's ever a plan, and performance is not the limitation. 0 Share this post Link to post
Darkcrafter07 Posted April 30, 2023 3D models are hard to fit in Doom but sometimes there is nothing that can replace them, like beatiful trees, way too unconvenient to model in a map editor. The tree sprites don't really possess that depth effect, and if such actor enlarged, it may create nice forest effects. Yeah, I definitely want too much, sorry. 0 Share this post Link to post
Midway64 Posted May 1, 2023 Well, from what i see with other's comments, this is actually real! Almost thinked for a moment it was another hoax just like the other one that ended up in Post-Hell. Gonna give it a try once i can. 2 Share this post Link to post
Meerschweinmann Posted May 1, 2023 Wow, the multithread renderer is a big thing. Thanks for the hard work Altazimuth. I have a whopping bump from 50fps to 80fps when i increase the threads from 1 to 4. 4 seems the sweetspot for my notebook. More threads then 4 are possible, but then the cooling system is only louder and the fps raise only a little bit. 2 Share this post Link to post
Altazimuth Posted May 1, 2023 Glad to hear of the perf improvement! Sadly as long as the synchronisation time increase per thread is so large it'll end up outweighing the gradually-decreasing gains from increasing the number of render threads. If there's any way that people think I might be able to improve this then I'd owe you a debt of gratitude if you informed me of it. In other news I uploaded another build, though this only really folds in fixes from master. Most of the rendering changes were to tidy things up and make more things const. 1 Share this post Link to post
Meerschweinmann Posted May 1, 2023 (edited) I have tested now on my good old I7 3770K desktop. The frames in my test scene go from 1 thread with 88fps to 127fps with 3 threads. 4 threads give nearly the same fps as 3 threads. With 5 threads the fps go backwards. But hey, that is an bump at around 45% more frames on this old CPU. My test-scene is a map with 5 Edge-Portals on screen standing on one of them. 3 Share this post Link to post
Andromeda Posted May 1, 2023 Good stuff! Maybe you could make it so it autodetects which amount of threads is needed for the best performance, although I suppose that would be a tall ask for ultimately little net benefit. 2 Share this post Link to post
Kyka Posted May 1, 2023 I have nothing to add, except to say well done. Amazing work. 3 Share this post Link to post
Midway64 Posted May 1, 2023 I couldn't think of a better wad to try with, so I just resorted to the ol' NUTS.WAD And I'm running this on an Intel machine: Spoiler 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz (4 CPUs), ~3.0GHz 8192MB RAM DirectX 12 Intel(R) UHD Graphics I get 20-30 FPS on average when everything is fucking going after me in the middle of the combat, it's a steady framerate for a slaughtermap, so I guess it does make an improvement! (It usually gets a worse framerate when playing with Woof! on the start sector for example) Not too informative, not too casual. 1 Share this post Link to post
Meerschweinmann Posted May 1, 2023 (edited) After benchmarking the new multithread renderer this morning, i have played some hours DOOM today with the 2023-05-01 build and i am impressed about the performance on my old computer and notebook. I have found no visual glitches so far that are caused by the multithread renderer. Good work! Edited May 1, 2023 by Meerschweinmann 1 Share this post Link to post
Altazimuth Posted May 2, 2023 12 hours ago, Andromeda said: Maybe you could make it so it autodetects which amount of threads is needed for the best performance Not sure how I'd figure out how to do this. Would require some wide-scale rewrites and even with that done I'm not sure I'd be confident in any sort of heuristics I could use to figure out what the best thread count it. 10 hours ago, ValveMercenary said: I couldn't think of a better wad to try with, so I just resorted to the ol' NUTS.WAD I think NUTS is largely slow due to just how many monsters are thinking, rather than being rendered. It's definitely a combo but IIRC the thinking takes up way more time, meaning that faster rendering isn't gonna help toooo much. 2 Share this post Link to post
Midway64 Posted May 2, 2023 9 hours ago, Altazimuth said: I think NUTS is largely slow due to just how many monsters are thinking, rather than being rendered. It's definitely a combo but IIRC the thinking takes up way more time, meaning that faster rendering isn't gonna help toooo much. Well, my bad. I couldn't think quite of something I could try with, and I forgot about the fact monster thinking is separate from rendering. Is there some WAD which pushes the limits of rendering you could recommend me with so I could try it with this? Thanks. 0 Share this post Link to post
dpJudas Posted May 2, 2023 On 5/1/2023 at 11:05 AM, Altazimuth said: Glad to hear of the perf improvement! Sadly as long as the synchronisation time increase per thread is so large it'll end up outweighing the gradually-decreasing gains from increasing the number of render threads. What are you synchronizing? Resource (texture) access? In GZDoom's software renderer I first check the pointer for a resource without locking. If the pointer is not null then it is already loaded and I can safely use it. If the pointer is null then I perform a mutex lock since only one thread can safely load it. See GetSoftwareTexture for an example of this strategy. Basically this removes virtually all synchronization in the gzd multithreaded renderer, except for when the threads wait for a new frame to begin or the first time something needs to load. 0 Share this post Link to post
Altazimuth Posted May 2, 2023 4 hours ago, dpJudas said: What are you synchronizing? Just the start/end of threaded rendering. I might be using terminology incorrectly here, to be fair. Any synchronising within the render threads should be at a minimum. All texture caching and such were moved outside rendering. Allocations are all for thread-local heaps, last I checked. There is a mutex for the global zone heap, but basically nothing uses it from within the render threads. 0 Share this post Link to post
Edward850 Posted May 2, 2023 I believe you are referring to thread joining. Waiting for each thread to terminate has the typical overheads you describe. 0 Share this post Link to post
Altazimuth Posted May 2, 2023 Yeah honestly for the guy who ironed out all these multithreading kinks I really don't know what I'm doing to a large degree. Initial set-up was based on Rum & Raisin code and then the vast majority of it was coming up with novel solutions to issues reported to me by ThreadSanitizer. It should be beared in mind there's no thread joining here, just setting an atomic bool to true and releasing of a semaphore (at which point the threads will spin). The whole communication between the threads happens on the render end here, and on the main thread's end here. 0 Share this post Link to post
Altazimuth Posted May 3, 2023 I've decided I'm pretty happy with how things are, even without load balancing. I plan on merging this into master in 1.5 days, unless anybody has any major reports. 2 Share this post Link to post
dpJudas Posted May 3, 2023 (edited) Okay if it is the frame start/end part of the threads then there's not much more you can do to improve that. The only thing really would be to further distribute the work between the threads (some finish earlier than others due to a simpler BSP subtree), but that is really difficult with this method. I did some testing with GZD some years back where I partially implemented the other thing from R&R: draw the walls as spans. That gave a massive speed improvement even for simple maps just because it reduces the pressure on the caches so much. Especially at higher resolutions. I never finished it for GZD due to too many drawers to port for me to bother, but I can highly recommend implementing that optimization too if you're up to it. :) 1 Share this post Link to post
Altazimuth Posted May 4, 2023 7 hours ago, dpJudas said: I never finished it for GZD due to too many drawers to port for me to bother, but I can highly recommend implementing that optimization too if you're up to it. :) I gave a stab at an incredibly simplistic attempt but couldn't quite figure out how to resolve rendering issues easily enough to bother. It seemed like some sort of persistent data was causing sprites to not render in the zones between the render context where the load balancing was happening. I'll probably pester GooberMan when he's freer. 0 Share this post Link to post
Altazimuth Posted May 4, 2023 @ceski Found an issue with r_sprprojstyle which I have since fixed. I'm not going to upload a new build since I plan on merging into master tomorrow. 0 Share this post Link to post