Jump to content
Search In
  • More options...
Find results that contain...
Find results in...


  • Content count

  • Joined

  • Last visited

Everything posted by GooberMan

  1. This is intentionally broken until I finish player interpolation. So here's how it currently works. For all mobjs, the previous and current positions are interpolated according to where the display frame is in between the simulation frame. This does technically mean that you will be guaranteed to see past data except for one out every <refresh rate> frames where a tic lines up exactly with a second. The one exception to this rule is the player. The angle is not interpolated at all. Instead, the most recent angle is used. And for each display frame, we peek ahead in to the input command queue and add any mouse rotation found. This was the quick "it works" method that let me get everything up and running. Here's how it should work. Exactly as above, except do for the display player's movement exactly what we do for mouse look. The full solution will require properly decoupling the player view from the simulation. At this point, it will be the decoupler's responsibility to create correct position and rotation for the renderer instead of the renderer doing the job. There's some more setup I need to do for that to work correctly, but it will also cover all keyboard inputs once working. It may not seem like much to anyone, but I really got sensitive to input lag when implementing 144Hz support in Quantum Break. I tested a few other ports to see how they feel too, in fact. prBoom has the worst feel by far to me, with every other port tried feeling about the same. My intention is to ensure there's basically zero effective lag between when input is read and when it is displayed (ie read on the same frame you render) and thus remove input error considerations from skilled play. There's also one thing I've realised after playing Doom at 35Hz for so long: it cannot be overstated how much of an effect frame interpolation has had on raising the Doom skill ceiling.
  2. https://github.com/GooberMan/rum-and-raisin-doom Rum and Raisin Doom is a limit-removing port focusing on both vanilla accuracy by default; and performance on modern systems. Features include: Startup screen that looks like ye olde DOS Doom's text startup Limit removing when using the -removelimits command line parameter A multithreaded renderer 64-bit software rendering Frame interpolation Support for unsigned values in map formats, as well as DeepBSP and ZDoom nodes Flats and wall textures can be used anywhere Dashboard functionality implemented with Dear ImGui Full support for Raspberry Pi on Debian-based operating systems Latest release: 0.2.1 https://github.com/GooberMan/rum-and-raisin-doom/releases/tag/rum-and-raisin-doom-0.2.1 Still the same deal as the last release, it's semi supported. I want maps that are limit-removing that break this port so I can work out why and tighten it up. This release has some null pointer bug fixes, and some oddities I encountered when trying to -merge Alien Vendetta instead of -file. The big one y'all will be interested in though: I decided it was well past time I implement frame interpolation. Now it hits whatever your video card can handle. As it's borderless fullscreen on Windows, it'll be limited to your desktop refresh rate. Previous contents of this post:
  3. GooberMan

    Scientists prove that AAA gaming sucks.

    This game will kick your ass, ya filthy casul. The entire industry is expecting Elden Ring to take the statue next year.
  4. GooberMan

    Scientists prove that AAA gaming sucks.

    Having worked on a BAFTA-award-winning AAA game with no microtransactions and a reputation for requiring a high degree of skill, I feel I should point out that the game likely to win the same BAFTA next year can be described in the same manner. And that the AAA game that's likely to win the same award next year has many players saying how rewarding it is to progress after grinding out better weapons and stats points to get past their progression blocker. You basically can't release a mobile game without the practices highlighted here - unless you expect to make $bugger-all from your game. These practices continue to make money. The gnashing of teeth that they suck is a sentiment that I share, and entirely expected from a community based around a nearly-29-year-old game where satisfaction for most comes from freely-available user made content.
  5. Maybe it's time for a Planisphere update too. Because I've been seeing threads drop below 9ms lately. But maybe more impressively: The Given. At 2560x1200, the original screenshots showed >40ms per render thread back on July 6. That's basically playable in software now. Even more so if you drop it to a Crispy-style resolution. Still todo: Fixing the load balancing code to not pile on the last thread when threads > 4. But I'm chasing something else right now: vissprites and masked textures. I decided to open up Comatose yesterday (runs, but seems to require some Boom line types so you can't leave the first room without noclipping). It's something of a dog on software renderers. Disproportionately on sprite draws. Running -skill 0 shows very reasonable render times. So I wanted to know what was going on. Threw some more profile markers in to see where the time was going. I'm seeing two problems here: 1) Sprite clipping is awful, it does a ton of work just to render nothing. 2) Sprite clipping is awful, it does a ton of work and then when it does draw stuff the rendering routines aren't ideal but aren't really the performance bottleneck here. So I'm currently grokking how sprite clipping works. I already have ideas on what I want to do to it, but I need to understand a few more bits of the code before I can dive in and do what I want with it.
  6. Been going through the column rendering routines to get speed back on UI/sprite/etc elements. You know what that means: It's glitchcore time! Also I guess this shows off frame interpolation and all that. Some issues with SDL and it being unable to detect the highest refresh rate a duplicated display is running at means I can't get 120FPS footage just yet. But it'll come.
  7. I've been working on a little something in my spare time over the last few weeks. Many years ago, when I was doing 3D floor layouts in Prime Directive, it was really annoying me that the tools were so obtuse. It required essentially a hacker's knowledge of exactly how the feature works to place them. Draw a 2D sector, draw a control sector, set up the lines. Doing anything reasonably complicated with them required thinking in full 3D in a 2D space. The community has long used a baked binary format - the WAD - as both its source and target data, and it annoyed me that GZDoom Builder didn't support a custom source data, allow arbitrary placement of 3D floors, and bake that data down to a WAD during the save operation. Of course, adapting GZDoom Builder or even writing a new editor is a big task. There's a bit of a different solution out there. Enter BSP2Doom. Quake has existed for a while now. There's full 3D editors and everything. And Doom engines have supported 3D floors for a while now. Translating Quake maps to Doom isn't the impossible task it used to be. Honestly, I know community loves building these big sprawling vista maps at the moment, but it really felt like overkill looking at last year's Cacoward winners and every one had a vista screenshot. So let's see what kind of gameplay we can get out of Doom if we start thinking about it with verticality as an actual thing. Besides. I spend my work day knee-deep in Unreal Engine doing everything from bug fixes and optimisations; to finishing half-implemented features the UE team left about. This is relaxing and far less insane in comparison. The workflow goes something like the following: Use another tool in this package - Doom2QWad - to create a texture WAD that Quake editors can read This will also spit out configurations for Quake editors in the future Load up an editor like Trenchbroom Create a Quake level using Doom assets Compile your map. EricW's tools are the gold-standard in the Quake community at the moment. Do the following: Run qbsp. You don't get geometry otherwise. Run light. Your map will be fullbright otherwise. Don't build vis data. This is normally where you'd do such a thing for a Quake map. It's 100% unnecessary here Run BSP2Doom. This will spit out Doom-compatible data from your newly compiled Quake map. Run your normal node builder on the new map WAD. Run your map. The resulting map is not meant to be edited in a Doom editor. Depending on the options given to the BSP2Doom utility, the map could look anywhere between normal and nightmarish when loaded in a Doom editor. The WAD here is exactly a cooked data format. Only ever edit the source format. The code isn't ready for public release. It does already live on Github in a private repository though, so all I'll need to do is flick a switch when it's ready. Regardless, I plan on releasing it with a mapset rather than just dumping it on the community. I have a couple of maps in mind, but I'll also be on the look-out for experienced mappers who want to explore what Doom can do when properly exploiting a 3D playspace. No slaughter, no everything-is-a-trap, no epic vistas. Drop the cliches and let's see what else can be squeezed out of Doom. Also of note: I'm resurrecting Calamity with this tool. The old code I wrote will basically be ignored. Much of the code I'm writing now will make it in to Calamity with an optimisation pass or ten. It's all being written ground-up in D, which is resulting in some really clean code (the UDMF parser is ultra clean; the BSP parser reasonably clean; the WAD parser less so because it's an insane format wholly reliant on the original implementation). Progress Quake BSPs - Geometry shell done. 3D floors about 80% done. Doors and platforms TBD. Lightmaps working but constantly being tweaked. Coloured lighting (ie .lit files) supported. Quake overbrights supported. GoldSrc (Half-Life, Counterstrike) BSPs - Preliminary. Data structures almost identical to Quake BSPs, just need to build some test maps. Quake 3 BSPs - Preliminary. Data structures defined, but need to handle bezier patches before I can attempt to support it. A big question I have is how to support doors and platforms. I haven't decided if I want to bake them in to the map geometry, or use all the polyobject tricks I whipped up a few years back. I will likely try both approaches. Engines supported GZDoom - Working off 3.6.0 as a base, probably works fine on earlier builds Eternity - The UDMF namespace is fully supported at the least, but since the GZDoom implementation heavily relies on shaders this will probably need the software fallbacks I'm coding k8vavoom - Same deal with the shaders. A way to hook in to its own lightmap system with my pre-baked lightmaps will be fab and mostly eliminate the need for shader work. Output formats Map UDMF Textures PNG Doom lumps Packaging Doom WAD ZDoom folder structure Run 7zip or equivalent in your build pipeline if you need a PK3, this is designed to be a scriptable tool and not an all-in-one solution The theory behind translating geometry Things? Entities? Quake style lighting is quite different - how do you handle it? Lighting system implementation details Screenshots
  8. Latest release: 0.2.1 https://github.com/GooberMan/rum-and-raisin-doom/releases/tag/rum-and-raisin-doom-0.2.1 Still the same deal as the last release, it's semi supported. I want maps that are limit-removing that break this port so I can work out why and tighten it up. This release has some null pointer bug fixes, and some oddities I encountered when trying to -merge Alien Vendetta instead of -file. The big one y'all will be interested in though: I decided it was well past time I implement frame interpolation. Now it hits whatever your video card can handle. As it's borderless fullscreen on Windows, it'll be limited to your desktop refresh rate.
  9. And also the ARM used in the Raspberry Pi. But I think I'm going to do a deep dive on how to handle division anyway. You can turn on a faster division at compile time on ARM; and there's also things like libdivide. I don't think it'll be a massive win at this point, but it'll still shave a bit of time off. My next focus on ARM though is just what in the heck is going on with thread time consistency. Only the final thread is performing in a consistent manner, every other thread wildly fluctuates in execution time. I can eliminate weirdness with the load balancing algorithm too, based on screenshots. With load balancing: And with no load balancing. Getting those to level out and not fluctuate should let the load balancer work better, and bring the total frame time down.
  10. https://github.com/GooberMan/rum-and-raisin-doom/releases/tag/rum-and-raisin-doom-0.2.0 Release is out. Preliminary support is in for using flats and textures on any surface. Which means Vanilla Sky renders as intended. But it's not perfect. It has rendering routines based on powers-of-two. And MSVC absolutely cannot deal with the template shenanigans I'm doing, it takes half an hour to compile that file now topkek. Clang just does not give a fuck, even when compiling on my Raspberry Pi. Still got some work to do though. Vanilla Sky isn't exactly playable thanks to bad blockmap accesses. Still, this 0.2.0 release is the "break my port with some maps" release.
  11. Getting dangerously close to being a real port now... I'll do a 0.2.0 release once I've done a bit more work on the limit-removing side. I was incidentally pointed towards Vanilla Sky on Discord the other day. It needs a limit-removing port. And, yeah, Rum and Raisin doesn't break a sweat playing it. It's the complete opposite of Planisphere in that regard - it's a big city map where the entire map isn't visible half the time. What it does do, however, is mix flats and textures. This isn't a huge task to hack something together. Both are already stored as full textures in memory, although I will need to transpose flats and update code to match. And it's easy enough to use the index values there to indicate whether to look for a flat or a texture. But as I'm converting the code to C++, I can do it better. So I'll do that, then call it a build.
  12. There's nothing technically standing in the way of doing Switch homebrew myself. Professionally, however, is a different matter. I'm a professional in the video game industry, and that means I need to play by those rules. Having said that, it's unlikely I'll ever actually use a Switch devkit myself. But still.
  13. This one's just for fun - I've used fraggle's text screen library to render the traditional vanilla loading screen messages.
  14. I got curious and decided to see how much of a performance difference it made. Change some defines in m_fixed.h so that rend_fixed_t is just an alias for fixed_t (itself an alias for int32_t) and: About 10 milliseconds saved on that scene... at the expense of reintroducing rendering bugs that have no good solution at 32-bit short of using 32-bit floats. Still, it does confirm to me that there's value in my intended approach - providing a renderer at 32-bit precision by default, and only using the 64-bit precision renderer when -removelimits is running. I'm 100% curious as to what some of the worst-performing 100% vanilla maps are now. One of my stated goals here is to get Doom's renderer running at 1080p on a Switch, and since I can't exactly do Switch homebrew the Raspberry Pi is the next best thing. I suppose the obvious test WADs will be whatever's been released on the Unity port, since those currently do run on a Switch. EDIT: Well how about NUTS.WAD Yeah, I definitely need the 44.20 renderer for that.
  15. Loaded up the ol' Pi 4 and ran latest on it. That's with a pixel count slightly higher than 1080p. The IWADs should all be able to run at 60, just need to focus on some usability issues and do that "wake threads" thing I was talking about. The real question though is "What about Planisphere???" Not great. Not terrible. But actually playable in a "This reminds me of playing Doom on my 486SX 33 back in the day" way. Another way to look at it - that's "GTA4 on consoles" performance territory. I dropped down to 1706x800 renderbuffer and gave it a bit of a play. Not bad. Some scenes absolutely murder the Pi though:
  16. GooberMan

    Question about framerates

    I'm writing a source port specifically focused on performance concern for modern computers. I know you know this, because you've replied in the thread. I've got two challenges for you Explain to me how Continuous Collision Detection works. Explain why GZDoom runs way slower on, say, a slaughter map than other ports (let's use Woof as a comparison). If you can't do either, then I suggest you accept what people are saying in this thread instead of continuing this combative approach.
  17. GooberMan

    Question about framerates

    Alright, so it is clear that this is a troll thread if you're going to skip over Continuous Collision Detection.
  18. GooberMan

    Question about framerates

    I'm just going to quote this since it is an accurate statement. Even the big boy physics engines (Havok, PhysX) tell you this and prefer you keep things to a fixed timestep. Accounting for variable timesteps means using an integration algorithm that can handle such things, which is more computationally expensive. And when you deal with complex physics worlds like those simulations do, they earn their paycheck/reputation by making things as fast as possible. As an example: Why go to all the trouble of a time correcting algorithm when simple Euler will suffice? If timestep is constant then you don't need to worry about accelerative forces going out of whack. There's often no real need to run your physics sim higher than 30Hz. I've worked on million+ sellers that do this. And while implementing 144Hz support for one of them, there's one conclusion that was obvious: As long as objects are interpolated between those 30Hz steps and the user's camera controls are framerate independent, they will not notice. The simple fact about physics detections in a gaming space is that most collisions do not need to operate at a higher frame rate. Any simple object (sphere, box, etc) that moves less than its bounds each simulation frame will not benefit from a higher update rate. Take something bouncing off a wall. Finding the point where the object penetrates the wall and resolving that collision is simple: find the penetration point, divide the distance to that point from the previous position of the object by the distance the object traveled that frame and you get a percentage that will let you work out every value you need. This works just fine regardless of the frame rate. Adding accelerative forces to your object (such as, say, gravity) and you can make a case that it can be more accurate at a higher framerate. And it will be imperceptibly more to boot. It's like saying 0.333333333333 multiplied by 3 equals 0.999999999999 is more accurate than saying it equals 1, but who cares? We've spent all this computing power on a result that was already good enough already for a video game. For the objects where this isn't good enough, there's a feature in modern physics engines known as Continuous Collision Detection. It's especially useful for objects that travel farther than their bounds each frame. It will collect objects that can potentially collide with another object in a frame and perform more rigorous calculations on them to determine the exact collision points of both. These are more computationally expensive, obviously - but they exist because it's computationally expensive to run a simulation frame. Physics at 120Hz versus physics at 30Hz with interpolation and smart use of CCD is a no brainer - the 30Hz model will win. It will take less time to provide results that are good enough in virtually all cases. And you know what that means - it means designers are free to add more things to their game. Let's bring it back to Doom though. Doom's physics operates on that simple collision model. Virtually all objects move less than their bounds each frame. There's a handful of objects that need something like CCD to eliminate glitches - mancubus fireballs and the player comes to mind. Just about every other object moves less than its radius per frame. There's no advantage to running that simulation at a higher framerate versus both re-implementing the collision resolution algorithms (they're very crude by today's standards, and are full of glitches such as wall running and elastic collisions); and implementing CCD for fast moving objects. And at that point you'll break vanilla compatibility. Save games will work fine, but demos will no longer work.
  19. R&R currently runs faster than the GL renderers I tried, yes. I'd need to get frame interpolation in, and implement a better way of worker threads to sleep after a frame before I can illustrate this fully. I'm also only testing with 4 threads - the amount I'd run on a Pi/Switch; and also the load balancer keeps pushing work disproportionately to the final thread when running threads are >4. But tl;dr - yes. I would need to look at the code for the ports in question to get a clearer idea of what the bottlenecks are. I also have a solid view of how I'd implement a traditional hardware renderer myself, although my current thinking on the matter is somewhat less-traditional.
  20. So. Since I made that post about 24 hours ago. I shaved 7 milliseconds off the rendering of that Planisphere 2 scene. How, you might ask? Well. Let's go back in to a previous optimisation/discovery that I did. I rewrote the flat renderer a while back. The original flat renderer worked great for a non-transposed buffer, kept in cache coherency for the output thanks to the span rendering moving left to right across the screen/backbuffer. But that's no good for R&R's transposed backbuffer. The same kind of cache misses you were getting on the wall renderer was just moved over to the flat renderer. Obviously I needed to render visplanes going down the screen just like walls do in order to retain the performance benefits of a transposed backbuffer. The discovery I made is that the span renderer is kinda unnecessary. I mean, it totally was back in 1993. It precalculated some values that are constant for horizontal lines, and converting visplane lines to spans resulted in faster rendering than trying to render visplane lines one by one. But it occured to me when writing the new code: visplanes are actually just a collection of raster lines for a perspective-correct texture mapper. And that's the code I implemented. So anyway. Visplanes are literally just screen-width arrays of rasterlines. There's one thing you can say about most visplanes though: they do not cover the entire screen. Thus, most visplanes are massive wastes of memory - especially when you get to high resolution rendering. I've hit the delete key on the old visplane code. It's gone. You won't find it in R&R unless you roll back to an earlier revision. In its place, I've put raster regions. They superficially function like visplanes - a collection of rasterlines. Their storage is temporary, and obtained from a pool that gets reset at the start of every render frame. Every time I want to add new lines, I do not try to match with previous raster groups. I just grab a new group and storage for the lines, store it in a single-linked list, and off I go. And as you can see, 7 milliseconds fell off my profiling. There's a few other things worth pointing out. My visplane structures for one thread was clocking in at over 100 megabytes. I needed to continually increase the number of visplanes thanks to these limit-removing maps I'm testing. I allocate for 8 threads by default, so that's a good chunk of memory spent just on visplanes. In my new code, on that Planisphere 2 scene it reserves (checks notes) just over 2 megabyte per frame. That's down from 800+ megabytes to support the multithreaded renderer. No visplane overflows means I've been able to look at that scene when rendering on a single thread for the first time without crashing. I broke something while I was doing all this. Check out the top of these walls for example: Once I work out what I did wrong there, the next target in my sights is another one of those vanilla limitations that keeps screen-width arrays. I also had a look at how Planisphere 2 performs in other ports. Let's just say that when comparing both software and hardware renderers, Rum and Raisin Doom is going to be the only port that can keep 60FPS on this map... on my 2.6GHz Skylake i7. I haven't tried this on my Raspberry Pi 4 yet. But needless to say, at this point this renderer is at a stage where I should be able to keep 60FPS on a Nintendo Switch at 1080p for all currently released maps on the Unity port.
  21. Now let's talk about profiling and tracking down performance issues. Sample-based profiling on Windows is in a bit of a sad state. The tools available to you (including the ones built-in to Visual Studio) are all bound by the kernel's profiling report rate. It only allows a maximum of 8KHz. Now this was a sensible value back in, say, the 90s when your home PCs were only able to hit 100MHz if you were rich. IPC count was not great, and all the delays in the system meant that 8KHz would get you a good idea of what your program was actually doing. So anyway I'm on a 2.6GHz processor. But let's look at it another way. If you're trying to keep to 60FPS, then what this means is that Windows will only give you 64 samples per frame. That's bonkers. There is another form of profiling commonly used - instrumented profiling. This means inserting hooks in to your code that sends out markers for a profiler. The common profiling tools all support this. And since I want to dig in and see where my bottlenecks are, I now support this. I've also build a bare-bones UI in to R&R because that's what I do. Here's an example from something I was profiling. I test-rendered the sky twice with two different functions to compare and- wait, hold on, why is my limit-removing column drawing function running 7 times slower than the standard one? And what is a limit-removing column drawing function? First up, the what: It's a column renderer that can handle arbitrary texture sizes instead of the hard-coded 128-tall textures from the original Doom code. Cool. But why is it such a dog? static INLINE pixel_t Sample( colcontext_t* context, rend_fixed_t frac, int32_t textureheight = 0 ) { return context->source[ (frac >> RENDFRACBITS ) % textureheight ]; } Well there's your problem right there. The modulo operator. Or, in English, the remainder of a divide operation. And it's happening on Every. Single. Sample. Welp. We can do better here. I avoid branches as much as possible, but to get some speed back we can throw one in. A bit of hand tuning later and: static INLINE pixel_t Sample( colcontext_t* context, rend_fixed_t& frac, const int32_t& textureheight ) { rend_fixed_t texfixed = (rend_fixed_t)textureheight << RENDFRACBITS; if( frac >= texfixed ) { frac -= texfixed; } return context->source[ frac >> RENDFRACBITS ]; } Much better. I'm still not happy with it, but I've minimised operations as much as possible so that's one positive. (EDIT: That's a lie, I only need to do that texture conversion to fixed once. Derp.) Short story, it got back down to acceptable ranges. I can live with that. This is a 1080p equivalent render buffer, so that's acceptably slower than the standard function. Now, there are tricks I can do so that this function essentially never gets called... But that's for another time. Having my profiler running though means that I've been able to track down other things that aren't ideal and can be trimmed away. So let's do some screenshot comparisons here. Before I implemented the limit-removing column renderer: After implementing it: And after optimising it and a couple of other things: Nice. I'm at a net-win compared to before the limit-removing column renderer. Of course, we're far from finished here. Implementing instrumentation means I've been drilling down on to exactly what the problems are and thinking of ways to deal with them. And I need to point out: having this API on does slow your program down. You might think "inaccurate times are useless". Not necessarily. We can determine proportions of time that a function takes accurately enough and work out what to optimise from there. I have some clear targets here: (And yes, that extra time you're seeing on the render graphs is the current overhead of the profiler)
  22. Here's a video of Planisphere 2. I've got some optimisations I want to try before I make another point release.
  23. So. I accidentally ZDoom node support. Planisphere 2 is basically working (I'll upload a playthrough video), so I asked Ling for a new map to test R&R on. The Given was suggested. And immediately it wouldn't load. I was a bit puzzled at first, but it turns out that I can't read and it needs a port that implements ZDoom nodes. So I cleanboxed it, with nothing but the ZDoom wiki to tell me the format of the nodes. I'd already done some work on the loading code to template it, allowing me to write code once and just swap in input types as necessary. This has worked for limit-removing (ie interpreting vanilla data as unsigned) and DeepBSP (lots of size changes all over the place) node types. ZDoom nodes are a bit special though - they put several lump types in one lump; and the structures are wildly different in some cases. So it's not exactly templated code to load its NODES lump, but it does resuse much of the existing code. The only bit of note you need to know to load in ZDoom nodes and jam it in to a vanilla renderer is the data it doesn't include in the file itself. P_ApproxDist, R_PointToAngle2, and making sure you get the front and back sectors correct when converting the data is the only thing you need to take in to consideration. Everything else is very straightforward. Wrote the code, it compiled. Hit run, and the map loaded and rendered first time. So that note in the Cacowards about the performance wasn't fucking around. That's with a backbuffer of 2560x1200, which is more pixels than 1080p. Lowering it to 1706x800 still won't get you a 60FPS renderer on my Skylake i7 @ 2.6GHz. This will be a good testbed for optimisations indeed. So. What's taking all the time? Visplanes: And preparing walls for render: So basically the two areas I was going to focus on with Planisphere 2. tl;dr is that now my challenge is to get this running at an acceptable rate on my Raspberry Pi. This shall be fun. EDIT: Fixed my "remove limits" check to not hammer M_CheckParam all the time. Got some tasty framerate back. 3 milliseconds is 3 milliseconds, so I'll take it.
  24. So. I've finally got this Planisphere 2 blockmap figured out. The first index for the blockmap list after the header and the table is 92,460. If you do a modulo operation of that index with 65,536 (ie the number of entries you can have before integer overflow ruins your day) you get 26,924. So, end of the day the blockmap just didn't account for integer overflow. What this means though is that we can apply some corrective steps to the table: Initialise an offset value to (<first entry offset> % 65,536) Check if the current list offset in the table is less than the previous one. If so, increment an offset value by 65,536 Add the offset value to the current index If, however, the index is equal to (<first entry offset> % 65,536), then set it to that first entry Which now leaves us with an almost-likely-correct blockmap table. I say almost likely because each time you detect an integer wraparound, there's a chance that you'll encounter a 26,924 value that isn't meant to point to the first list. And a bit of a map rendering hack, and we can iterate through blockmap cells to show which lines are used. Which looks about correct, yeah. I haven't verified if there's blockmap gaps thanks to that first list offset hack, but what I know now is that we do very definitely have a blockmap that reports the expected lines for the expected cells. And running around in map, there are parts that are doing blockmap lookups correctly: And parts that, well, aren't. Like the starting room: So that's going to be the previously-mentioned issues about the math resulting in bad values for the lookup. Now. I'm keeping the playsim vanilla-accurate. But to fix that - either via the Maes method or some other method - means making the playsim do things that the vanilla playsim does not. So I've got a decision to make. I already need people to specify the -removelimits command line to load this map. Maybe the playsim could read that and do correct blockmap lookups for large maps?