Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

GooberMan

Members
  • Content count

    1515
  • Joined

  • Last visited

Everything posted by GooberMan

  1. GooberMan

    [Wolfenstein: Blade of Agony] Achievement Ideas? (p13)

    So the response to content that's deemed anti-Semetic and transphobic is to, uh, go Soviet? That's a weird flex.
  2. https://github.com/GooberMan/rum-and-raisin-doom So I heard you like performance Currently tested on the following operating systems: Windows 10 Linux Mint 19 Cinnamon Ubuntu 20 MATE Raspbian/Raspberry Pi OS OSX (thanks to Linguica) And with the following development hardware: i7-6700HQ 16GiB DDR4 RAM (main dev machine, laptop) i7-3930K 16GiB DDR3 RAM Raspberry Pi 4B Cortex-A72 (ARM v8) 8GiB RAM This is source-only for now, exact same build steps as Chocolate Doom. I have no time to provide to support for anyone (so don't ask); and the code certainly isn't in release quality yet. Until that changes you can consider this project to be academic in nature, with real and testable results. What is Rum & Raisin Doom? R&R Doom is a vanilla-compatible source port, using Chocolate Doom as a base. It is focusing exclusively on optimising the software renderer for modern hardware. And since it's pointless to attempt these optimisations without high resolution support, all work is currently being tested with a 2560x1600 backbuffer. I'm otherwise taking a preservationist approach, sticking as close to the original software renderer as I can (I even scale the HUD/end text flats/interpics etc correctly; and the screen melt functionality is 100% proportional to the original code). But I will be fixing bugs like wobbly walls and potato-quality flats at the near plane. This entire exercise is just a way for me to relax. I'm working at Housemarque these days on Returnal, and since I haven't been able to switch off from programming I decided to sit down and finally do this. I've preferred to spread knowledge over the years than write code, especially since contracts with previous employers have restricted what I own outside of the office. So now that I can write code, this will also help out those people who will look at my resume (AAA engine programmer, specialsing in low-level optimisations and multithreading) and demand code anyway whenever I speak. Progress is constantly being pushed to Github... Which means it can occasionally be in a broken state, like that time everything went all acid-trip melty. Honestly, breaking the Doom renderer is fun. I've got many screenshots and videos saved locally. I'm also writing articles explaining why things are done in a certain way and what benefits they give whenever I have results. The wiki on the Github repository will be updated whenever I feel like it. What hardware shouldn't you run this on? Yeah, don't expect this to make Doom run faster on VGA hardware. Most of what I'm doing will outright run worse on it. This is explicitly targeting modern systems. What's already being done better? Transposed backbuffer. It's certainly not a new idea, but it's a very obvious first step. It also immediately gives wins... and makes everything else planned possible Pre-lit textures and flats. Reads are bad. Indirect reads are worse. We end up using 33 times as much memory, but whatever, it shaves some time off and reduces code complexity, again making some things possible that previously weren't. What's currently being worked on? SIMD wall rendering. Yep. If you think things through, there's some very solid optmisations you can make with SIMD. Short story though: You can't have just one column render function and call it optimised. For any given 16-byte output block, you will need to do anywhere from 0 to 16 unique texture data reads. You know what's great about reading 0-15 texels from the source texture? They're in the same 16-byte location in your texture. So to account for wrapping, you only need to do two SIMD reads for any given output block at most. Shuffle the bytes the right way, write. Boom. Currently, I've got the 0-1 texel read function up and running (with one very annoying visual bug that I'm so very close to solving). Want some performance analysis? Here's a scene where a good chunk of the screen calls that particular column function, versus one that doesn't. So 5-10% off the frame on average already (and remember, this is already after my transposed backbuffer optimisations, a scene like this will look much much better compared to the stock renderer). There's plenty of space to improve on this actually, the target for just this one function is at least another half a millisecond off the time taken. My algorithm for inflating a value from 0-15 in to a 128-bit mask is rubbish as it turns out. I'll have to go find some binary gods out there that used to do this stuff with their eyes closed back in the day. And I'm not even finished yet. This is just my first attempts to get an algorithm running. I already know theoretically better ways to do it. Next step, however, will be to fill out the rest of those functions so that I have a complete working implementation that I can improve on. But I might take a break from SIMD first to do something else. I've done plenty of SIMD over the years but barely touched integer SIMD, so I need a little break before finishing it. What's planned? Multithreaded rendering. Yep. I have a solid plan of attack here. More information when I try it and get some results. SIMD flat rendering. Interestingly, transposing the backbuffer resulted in roughly the same performance for flats as the non-transposed backbuffer. I will rewrite the flat renderer to render by column first; and once that's up and running make it SIMD. Thanks are in order for people that have been commenting/testing/etc (in alphabetical order): AlexMax, Altazimuth, Edward850, fraggle, Linguica, Quasar, sponge. If I accidentally forgot you, you know how to harass me directly and make me update this post. DISCLAIMER: Do you suffer from the following symptoms? You think software renderers are pointless You think it will have a limited audience You don't think anything I'm doing is technically possible or worthwhile Then by all means, direct your concerns to the correct part of the internet. This is just a way for me to relax. I really don't care if anyone's honor gets offended/it invalidates lies pushed around for years/etc. This work is all open and explained, so it can only benefit anyone interested in software rendering. Take your negativity elsewhere.
  3. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Update on this: There is no update. Why? So I stepped up to Technical Director a few months ago at Housemarque. My focus right now is on shipping the game. I don't expect to get back to R&R Doom until late May at the earliest.
  4. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Well, it was inevitable. But real life has gotten in the way of progress over the last couple of weeks. Next thing to come will be an article on the threading. And before I go any further with features, I think I'll need to re-architect a few things so that I don't lose some vanilla compatibility I want to keep.
  5. GooberMan

    dsda-doom source port [v0.19.7]

    I started programming a "suicide if pacifist" feature in to Rum and Raisin. And also Tyson functionality. My approach is more player-focused than demo-authentication though, so :+1:.
  6. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    https://github.com/GooberMan/rum-and-raisin-doom/wiki/Rendering-Visplanes-by-Column New article. Attempts to give a ground up understanding of all the concepts involved with rendering visplanes by column.
  7. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Just leaving some 21:9 shots. And also: I'll use this scene as a guidepost to see if my visplane lookup code is working as intended. At 21:9 and 4 render contexts, the middle 2 overlap the normal HUD. And the other two hit blocking walls fairly quickly, hence their disproportionately small times. Fixing the render load balancing code to not be a colossal fustercluck will also help spread the cost, but as I always say in a professional environment: Threading is not a silver bullet, your slow code is still slow on another thread.
  8. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Oh, nah, sorry mate. Context, I guess. Short story is that I am bipolar, so sooner or later I'll just flat out lose interest in this. The fact that people like Altazimuth are waiting on my results and that it will subsequently benefit a large part of the community means that my focus will stay for far longer than it did for, say, BSP2DOOM. Which I couldn't get enough people proper interested in. Code still exists, theory is still about the same even after digging deep in to the renderer, but no real incentive to continue the work. And also my Demon Workers Unite! mapset, which I had grand plans of being an industry-wide relaxation effort but no one (outside of Kaiser for an intro map) seemed interested in. A bunch of other things, including professional concerns, would need to go a certain way before I commit to a full blown source port effort with users and all that. I do know that my kind of attitude and experience would be a good thing in general for source ports. See the bottom of my first post, and the nochicken reference a few posts above? This work is also, in part, my response to that kind of insanity. I can describe theory until the cows come home, but now for the first time thanks to very favorable professional conditions I can also supply code which is a very effective STFU mechanism. But actually committing? I've got a Playstation 5 game that needs my attention, and when I get to the point where relaxing doesn't mean "thinking about code 24/7" any dedicated source port efforts I make will take the hit.
  9. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Being a real source port means I'd isolate the playsims in to their own dynamic library. Vanilla. Boom. Even ZDoom if you wanted would go right in to a switchable-at-runtime library. All dependent on render support of course. Whatever, playsim features can be their own thing. Even up to MBF level you don't need to do a whole lot different from a rendering level to support the advanced features. Again, I really want profiles from Eviternity. Since that's a gold standard mapset and a first-stop for many people new to the Doom modding community. It's not an unusual idea. Pretty sure Ling wants to do the same thing with his dream port.
  10. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Well, this is the thing. If it's a real source port, I have to provide support to users. And if it's a real source port, it's harder for any other source port maintainers to see the ideas I'm employing and pull them in to their own ports. This is ultimately the point of the articles I write. I want everyone to understand what I'm doing. Seriously, redefining visplanes is a big thing. This is something that hasn't changed since 1993, but this work has allowed a new understanding of what they actually are. And if I can put that in a format that anyone can understand from the ground-up and employ in whatever source port they're developing, then that's far more valuable than a reference implementation in my opinion - for example, Linux Doom is a reference implementation and yet everyone's still using spans for software rendering. And it's already giving results - just from my articles and questions on Discord, Altazimuth has implemented the backbuffer transpose in Eternity. I really want to see some results with Eviternity, get an idea of what else is deficient. This is a resource for all Doomers. Being a real source port does diminish from that.
  11. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    I honestly need to tread lightly around jailbroken/modded devices. I mean, back in the day we used modded Xboxes as devkits and I did work on our PSP engine with my own hacked PSP. But it's a different industry these days. Doing it on a Raspberry Pi is the closest I'll get without professional concerns coming in. Back-and-forth development on someone else's Switch with them reporting results is not something I want to do either. It's already a ballache trying to get ImGui playing nice with GL3 on Ling's mac (and I've given up for now and just #if 0 the offending code out). Subsequently, I'm hunting for a cheap Mac that I will literally only be using to compile and test on, so it can be an i3 for all I care it just needs to compile and run code.
  12. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    I keep saying that I don't want this to become a real source port, that I'm happy to have this as an academic project and provide articles and explanations for what I'm doing so that anyone else interested in what I'm doing can grok it and try it out themselves. But it is dangerously close to becoming a real source port after all. Implementing widescreen support, and I'm all "It's 2020, I expect to drag my window out to 42:9 and have it work". The reality of the Doom renderer is that the trig calculations start breaking down after a horizontal FOV of 165 degrees, so I probably can't get it to go that far without rewriting the projection functions entirely. First time I tried Doom in 21:9 though, and ooh yeah. Lean in close to my monitor (27") so that it fills my peripheral vision. It's gooooooooooood. But widescreen though, there's a thing I wanted to profile here. 3 cores, pixel density just a tiny bit more than a proper 1080p render buffer. Three threads. Raspberry Pi 4. Clean out the spikes, and this is basically "If I had a Switch I could make Doom render at 60FPS on it" territory right here (ignore that bit about being a debug build, that's just me getting the defines wrong on not-Windows). I've shipped games on 11 platforms. 12 when Returnal releases. Worked on several others platforms. So yeah, making it work well across multiple platforms is one of my things - and being an engine programmer, gaming hardware especially interests me. Maybe I'll finally do something about visplane merging when I'm done fixing all the widescreen bugs. Although "use a hash map" for what's there like every other port is probably the best I'll do, short of actually doing what I said and dealing exclusively in rasterlines instead of visplanes and walls.
  13. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    I think you'll find literally every post I've made in this thread emphasises that you did not need to say this.
  14. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Yeah mate, the full context is "ditching the original function at high resolutions is just plain necessary" - which it sounds like basically every high res source port either does, or fixes the problem another way. All good. This reminds me, it was glitchy with the original code and I haven't checked with the new code. Chocolate Doom, so entirely untainted by my code: Original code, 32-bit sample indexing at 2560x1600: And my new renderer: I can spot a bad pixel, but that line down the centre of the Icon's visage is otherwise eliminated. The pixel artefact also appears more at lower Log2 samples. So clearly there's more work to be done.
  15. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Can the community at large say offhand what exactly the precision issue is? Or do you really mean source port authors at large? The average person around here doesn't know how or why. The Doom wiki won't tell you. Even reading the Doom black book won't tell you. A video like that, I'll use later when documenting everything to illustrate the example. Note that I describe visually what transposed means in the first article I list on the github wiki - something that anyone who's ever looked at how Doom texture data is stored knows, but that most people don't need to know how or why it's like that. But I'll give a ground up understanding for whoever wants to know. (Short story, since it's now a Thing: That original sampler recorded in that video actually isn't purely the original. I modified the span function to use the full 32-bit values to sample the texture. The issue comes from a precalculated scale value based on the centre column that is used to adjust the X and Y integration values for span rendering. These values become more and more inaccurate the further away from the centre of the screen you get. And it's entirely avoided with my function going vertically along the screen and self-correcting after N pixels depending on the backbuffer resolution.) Needless to say, this next bit should be addressed as a separate thing: No, and I have no intention to. Again, this is a way for me to relax. More pragmatically as a wider community, this is also a way for anyone that's interested to see what someone with my knowledge and experience would do when given a blank slate. Using ZDoom's headers as an example, the flat plane still wants to draw by span instead of by visplane column. I would gain nothing from that, since transposing the backbuffer means that's not a good idea. So I had to look at exactly what a visplane is, going in knowing that it stores top and bottom pixel values for screen columns. Which led to the realisation that these are exactly rasterlines for a texture mapper, and that the span translation is really quite unnecessary. Which leads to a further realisation: Every piece of render data in Rum and Raisin Doom is now actually a rasterline for a texture mapper. I'm seriously considering the benefit of deleting visplanes, deleting the column renderer, and generating and sorting raster fragments in a list for absolutely everything. Sprites, walls, whatever. They're all exactly rasterlines now. At which point, I'll basically have accidentally converted vanilla Doom in to a full 3D renderer - and be more efficient while I'm at it. And keep slime trails too. Where exactly am I going to get those kind of realisations by reading another port's code? Maybe Vavoom? Except that started life as porting the Quake renderer to Doom, and I've already separately dug in to Quake's renderer. With visplanes rendering faster, I'm approaching the limits of what I can do at a raw pixel level. Once SIMD is up and running properly, it's all algorithmic from here. And I'm going to get more speed wins by similarly abandoning conventional wisdom.
  16. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    So here's a bit of fun. You can actually very easily break my new code by using it at low resolutions. So I auto select the right function dependent on resolution. But you can just plain override it anyway for strange visuals. Also, ditching the original function at high resolutions is just plain necessary. This version of the original function even upgrades the sampling coordinates to the full 32-bit and it's still plain awful. But unnoticable at the original 320x200.
  17. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Actually, the above got me thinking. I'm testing on a 2560x1600 target buffer. What happens if I decrease the accuracy a bit more? So there's very definitely clear results to going with less accuracy on a target of that size. Which then leads to the question: How inaccurate is the end result? Subtractive blend in GIMP tells quite a bit: There's inaccuracies alright, off in the distance where the relative distances of each pixel changes quicker and especially with already-noisy textures, but will you notice them? Eeeeeehhhhhhhhhhhhh, probably not. So I'll go ahead and choose constants for the texture sampler based on view height, which means writing multiple copies of that function. Which means I _really_ would like the code to be in a language like C/C++/D where I can constexpr branch with template parameters and not have to deal with multiple copies of the same code everywhere/messy defines/etc. EDIT: Updating the first post with this latest little thing That's stats compared to a straight-uprezzed Vanilla renderer. Doing miles better at the moment.
  18. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Yep, that's essentially what it's doing. The data has basically always been clipped rasterlines for a perspective-correct texture mapper, but Carmack either missed this (unlikely I guess) or did tests and realised the multiplications involved were annoyingly slow and went with the method of estimating horizontal interpolations. Either way, everyone has been rolling with it ever since. As I started writing this post, I wasn't sure of the benefit of this particular implementation though for a 320x200 output. Short story: The method used is accurate every 16 pixels in a column from the start of the raster line to the end. You can change the PLANE_PIXELLEAP and PLANE_PIXELLEAP_LOG2 constants to 8/3 or 4/2 or 2/1 and remove the unrolled loop elements to deal with it to make it more accurate and subsequently slower. Then I realised: It's easy to change some constants myself and recompile. Getting comparable visual results is best at that resolution with 4/2. But. Here's the results with 16/4. Yep. We run slower (4/2 is even slower still), and I'd say the image quality isn't anywhere near as accurate. This is, of course, the complete opposite of high res rendering where it's faster and we fix inaccuracies introduced by the horizontal line scaling of the original code. This doesn't surprise me at all really. Most of the work I'm doing really only starts seeing tangible benefits at higher resolutions than the original renderer. Which is entirely the point of this work. I mean, the scene renders in 0.26 milliseconds here on a single thread. The problem with Doom on modern systems is that people want it to look good and run well, which is pretty much the opposite philosophy of Fast Doom where you just want it to run well on contemporaneous hardware. I am still optimising this new code though, and stripping out parts of the old code that no longer make sense. I could very well get it faster at low resolutions, but it's really not my focus. Sorry this one won't work out for you.
  19. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Ah, finally, I figured someone had to have done it somewhere. It's way too obvious an idea for it to be nochickened in to oblivion for anyone interested in software rendering. I'll definitely be writing an article about visplanes and flat rendering. The above was my second attempt at making them render by column, but I also did this work in the space of a few hours (six total according to Discord logs, and including an attempt at brute force optimising the exiting code to only render visplane columns) so I'm still slotting all the pieces together in my mind from everything I've done and everything the codebase is doing. But I'll leave this little hint: A visplane generates exactly raster lines for a perspective-correct texture mapper.
  20. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Guess who just halved the flat rendering time in single threaded mode. Tell me why again no one tried transposing the render buffer before? EDIT: And interesting stats too. My Linux box is showing a performance degrade, with the ARM showing no change. Hurmmmmmm. I do have an unrolled loop in there, lessee what happens after I play with everything some more... EDIT2: Got some performance back on ARM by cleaning up. REALLY need to sort visplanes by distance. Over to Linux x64 now then. EDIT 3: Jeebus, this box ain't happy. Might be time to update the build scripts to always use Clang.
  21. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Alright, so I got a "good enough for now" vanilla-style high-res fuzz. Slightly worse performance with optimisations turned on. Because I am making a memcpy of a chunk of the buffer. Still some tuning to do then, and one branch I still want to get rid of. You might think the switchover is severe - until you compare it to vanilla/Chocolate Doom.
  22. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    This did remind me that I have some code in there from way back when I first started doing this (a whole month or two, I know right). Wrapped in the ADJUSTED_FUZZ define. This was never looking right, because I was still reading from the next column instead of the next row. So I turned it back on. Basically, it doesn't increment the fuzz offset until the next sprite column is reached. Hmm. Okay. Let's see what we can do about that. How about we finally increase the fuzz table like I was saying, and not adjust the sample position until the next sprite pixel is reached. Not quite there yet. And I managed to make it worse, blergh, this is trash. Hmm. Okay. So we were discussing this on Discord the other day, and I was saying that the errors in high resolution when trying to adjust it for low resolution sampling is because you're reading from the same buffer that you're writing to. This does mean that fuzz is implicitly tied to the results of the last fuzz pixel. But to remove that error, you need to sample from a different buffer to the one you're writing to. So how about we just copy the backbuffer out to a temporary array and sample from that. oooooooh. It's pretty subtle on this shot. Nowhere near dark enough. So let me Youtube that for you. Ya know what? I think we've had it wrong all this time. It's not a fuzz effect. It's a heatwave effect. We can replicate the accumulated darkening by randomly choosing a darker or lighter colormap to sample from, but I'm pretty sure with a little bit more thought and effort this will actually look really fuckin' nice.
  23. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Well that's one thing I noticed when the atomics were going full on every thread. The fuzz can look good at high resolutions without touching the actual code algorithm. Just the fuzz table needs to be bigger with a better random distribution. Pay attention to the fuzz down the middle of the screen (ignore the missing fuzz line, bad code). Notice how it just completely skips that weird line thing and actually looks like noise. This is likely the original intention of the programmer, fine at potato resolutions but inadequate when scaled up. But my question to Romero is more about the original intention of the designer.
  24. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Making myself some convenience features. No Skill and Pistol Starts are things I want to test with in continuous play. I'm sure people have considered options like this available in the UI before. Right?
  25. GooberMan

    Rum and Raisin Doom - haha software render go BRRRRR

    Bit of random fun to show that I still screw the pooch from time to time. I was getting abysmal performance when rendering fuzz. Even on my i7, but it was particularly egregious on the Pi. Nope. Nope. Ignore the black status bar, this is just in-progress work to use one render buffer per thread. But nope. context 2 is full of nope. Turns out that the way I was using atomics to track the fuzz counter was terrible. You can see the atomic increment at the bottom of that screenshot. Previously, I was incrementing every fuzzpos access. This is baaaaaaaaaaaad. I decided to try atomics to track the fuzz position across threads. This did result in the fuzz looking like fuzz when every thread was accessing the value, and the usual high-res artifacts when not. But performance was full of nope. So a single increment patches that up for now, but clearly I'm going to need to stop treating the fuzz index as an atomic value. I also screwed up the fuzz sampling. The original code would either sample one row higher or one low rower than the current pixel. I was sampling one column to the left or one column to the right. Oops. Fixed that. But even higher than that. Fuzz is one of those things where it seems the intention is more valuable than preserving vanilla code. The original fuzz table consist of 50 offsets. This loops really quickly, and its random distribution isn't that great. I asked John Romero a question on twitter based on conversation about this subject, let's see if he responds. But as far as I'm concerned, the fuzz sample randomiser needs to be greatly expanded or rethought to match the intention of the vanilla codebase at higher resolutions. It's one of those things where not sticking to the original code will probably give an actual vanilla experience at higher resolutions (weird, right?).
×