sqpat Posted September 21, 2023 RealDOOM is a port of vanilla DOOM (forked from PCDoomv2) made to run in Real Mode. (Coincidentally, Doom8088 was being worked on at the same time, with a similar goal but a different starting point.) It runs through the use of EMS to use memory beyond 640kb. I've been working on this since June, and the project now sort-of runs in 16-bit mode. Timedemo 2 seems to work right, but demos 1 and 3 have desyncs or memory-related crashing bugs. It's kind of in an alpha state. It'll run pretty okay on 233 MhZ hardware and up, using EMM386 whether real hardware or on 86box. DOSBox isn't recommended, it seems to struggle with 16 bit applications or EMS. It will of course also run on 16 bit computers. You probably want close to 620 KB free and 3-4 MB of EMS minimum right now. Basically, you can't really just take the original codebase and build for 16 bit for many reasons. Slowly, the code was rewritten with more and more 16-bit style restrictions, until it became possible to actually build the code with a 16-bit compiler. You can still build and run the code in 32-bit mode, and it will use a sort of EMS emulator, simulating what the 16 bit code is doing. The 32-bit codebase is a lot more stable than the 16-bit one, but eventually they should end up pretty equal. The goal for RealDOOM is really to make the port run as fast as possible as a 16-bit executable with the same level of quality, etc. as the original game. It may turn out futile to try and get this to run at smooth speeds on 16 bit processors, but I'll try to take it as far as reasonably possible. Then once that's done, it can also always be forked and modified with some tradeoffs between quality and speed. I don't want to start making those tradeoffs earlier than necessary though. I haven't made major efforts on optimizations yet as it took a few months just to get the game to work as a 16-bit executable at all. It only started working last week and I haven't worked out all the bugs yet. To be honest, I had hoped to clean up the 16-bit build a little bit more before posting this here - but I'm going on a month-long trip starting tomorrow and I might not have much time to work on this in the near future... oh well. Known (Major) Issues - Savegames dont work - No sound (need to find a 16 bit compatible library or write from scratch?) - Untested outside of doom1 shareware for now. - 16 bit mode has some desyncs and memory bugs, but it's almost there. Work that has been done - Removal of some features (multiplayer, joystick...) - Lots of optimizations especially to lower conventional memory usage and size of the executable - Zone memory manager rewritten to use EMS, "MEMREFs" passed around between functions instead of pointers, lots and lots of code rewritten to support this. - Lots of changed types, explicitly declared bit sizes, etc. to make 16/32 bit both work off the same codebase. RealDOOM on real hardware 286-20: RealDOOM on 86box Pentium MMX 233 I want to shout out Viti95, who wrote FastDOOM of course, which I referenced for a lot of code removal and optimizations, and he personally contributed a couple of optimizations to RealDOOM as well. 31 Share this post Link to post
Dark Pulse Posted September 21, 2023 When the Doom community runs out of things to port Doom to, they just invent new things to port Doom to. Can't wait until someone figures out how to get the game running on some crusty 8-bit CPU like a Z80. 9 Share this post Link to post
viti95 Posted September 24, 2023 (edited) I think I'll be sharing ideas here to reduce the memory footprint that I've recently discovered on FastDoom (as this is the most importat part to avoid unnecesary paging). It's possible to unify the STBAR and STARMS lumps, as there will be no multiplayer. This lowers memory usage 1Kb, and makes the code a bit faster since there is no need to update twice the status bar. With some optimizations and downgrades I think fast 286s will be able to run Doom at somewhat playable framerates (similar to 386SX) Edited September 24, 2023 by viti95 4 Share this post Link to post
Scyon Posted September 24, 2023 On 9/21/2023 at 2:00 PM, Dark Pulse said: When the Doom community runs out of things to port Doom to, they just invent new things to port Doom to. Can't wait until someone figures out how to get the game running on some crusty 8-bit CPU like a Z80. Already done: 6 Share this post Link to post
viti95 Posted September 24, 2023 That's not based on vanilla Doom source code, so not Doom at all. 1 Share this post Link to post
Scyon Posted September 24, 2023 30 minutes ago, viti95 said: That's not based on vanilla Doom source code, so not Doom at all. Obviously. That was a joke comment, the same as Dark Pulse's one that I quoted. 1 Share this post Link to post
Grieferus Posted September 24, 2023 And I thought this was about 16-bit color mode, like in Alphas. 0 Share this post Link to post
sqpat Posted September 24, 2023 11 hours ago, viti95 said: I think I'll be sharing ideas here to reduce the memory footprint that I've recently discovered on FastDoom (as this is the most importat part to avoid unnecesary paging). It's possible to unify the STBAR and STARMS lumps, as there will be no multiplayer. This lowers memory usage 1Kb, and makes the code a bit faster since there is no need to update twice the status bar. Oh interesting, I'll have to take a look! I changed some of the code in there recently and was able to pull out some code related to multiplayer (frags i think) and also removed all the boolean pointer logic and kind of hardcoded it. It's much simpler now, but I haven't looked super close at any lump optimizations. 11 hours ago, viti95 said: With some optimizations and downgrades I think fast 286s will be able to run Doom at somewhat playable framerates (similar to 386SX) I am starting to think so too. There's some big possibilities if 1:1 vanilla doom compatibility is dropped, like a fixed 1-byte size viewport width (240? 250? 255) (which would drop visplane memory usage down a bit and change a number of 2 byte variables to 1 byte. Maybe also trying out lower-precision fixed -point - I've done this in a few instances like variables using heights and also fine angles. I've been wondering about 24 bit ints/structs too. I tried looking into EMS 4.0 support to try and get 8 or 10 physical pages instead of just 4, but I'm having trouble getting it to work in any sort of software driver. Most drivers don't seem to want to give you more physical pages - EMM386 only supports 4 for sure, QEMM386 might support it but i'm fighting with the configuration options right now. I'm traveling right now so I can't test on real hardware. The current main issue continues to be that there is some sort of memory bug I haven't found yet in 16 bit, which manifests itself as lock-ups (usually when the player fires a weapon). It's making it difficult to do comparative 16-bit optimizations and benchmarking right now. I'm sure I'll eventually find it, but it's a cat-and-mouse game where adding debug code makes the problem go elsewhere. Meanwhile I've been trying to add some other features and optimizations and hoping for the bug to show up in a way that makes it easier to fix. 0 Share this post Link to post
deathz0r Posted September 25, 2023 I've been following this closely since I saw the comment in the Doom8088 post in the VCFed forums, good to see a thread here on DW! On 9/21/2023 at 2:12 PM, sqpat said: You probably want close to 620 KB free and 3-4 MB of EMS minimum right now. I've been a bit confused with whether it's possible to get in-game from reading the GitHub commit history, but if it is possible I can see why the conventional RAM requirements now makes sense to why I've been having difficulty with that - I can't get UMA and my hardware EMS card to work at the same time on my Protech PM286 mobo, which means that using something like DOSMAX is out of the question - the best I can get with conventional RAM is about 615KB free. 0 Share this post Link to post
sqpat Posted September 25, 2023 1 hour ago, deathz0r said: I've been following this closely since I saw the comment in the Doom8088 post in the VCFed forums, good to see a thread here on DW! I've been a bit confused with whether it's possible to get in-game from reading the GitHub commit history, but if it is possible I can see why the conventional RAM requirements now makes sense to why I've been having difficulty with that - I can't get UMA and my hardware EMS card to work at the same time on my Protech PM286 mobo, which means that using something like DOSMAX is out of the question - the best I can get with conventional RAM is about 615KB free. It's possible to get it running with less, using as little as 400-450k of conventional memory and everything else in EMS. I've been recently adding code to have more and more variables optionally work in conventional memory, to figure out what combination works best, so it's a little messy now. Basically if you just go into z_zone.c and set all the #defined block sizes to 1 (STATIC_CONVENTIONAL_BLOCK_SIZE_1, STATIC_CONVENTIONAL_BLOCK_SIZE_2, STATIC_CONVENTIONAL_SPRITE_SIZE, etc) then there wont be any big memory blocks reserved for conventional allocations. (Has to be 1, zero will cause boot issues) One of my 286es got in game and rendered about 30-40 frames before crashing last week there was a recording I posted above. Ultimately, when things are really optimized, though - you will want your machine configured for as much conventional memory as possible to reduce EMS paging. 1 Share this post Link to post
slowfade Posted October 2, 2023 It's nice to get some information about this project too, thanks. Good luck! I have a feeling we'll have something running at least 10fps on a 286 within a year. Maybe even more fps. With decades of accumulated optimization tricks and algorithms, it feels inevitable. 0 Share this post Link to post
Drum Posted October 2, 2023 Interesting looking source port. Also I pretty much have the same profile pic as you LOL. 0 Share this post Link to post
sqpat Posted October 25, 2023 Okay - I'm back from a month of traveling and hoping to get back to work on this. The main outstanding issue is that there is still some sort of random (probably memory corruption) bug in the 16 bit build. Because of this, I still can't really do benchmark comparisons of code and feature changes. Fixing this bug is the #1 priority for now. Once that's done, I think I can make a bunch of easy improvements and benchmark some ideas. One funny thing i've noticed about memory management for this project, while at first I moved everything to EMS at the start, the recent trend has been to use my available conventional memory to put stuff back there and avoid EMS as much of possible. In the shareware version, it's probably possible to put just about everything but textures in conventional memory. I don't think there is enough space in retail versions to do this though. (Also, we still aren't using sound. That will take up a lot of memory too... I wonder if there will be a safe way to page sound code in/out of EMS). 2 Share this post Link to post
viti95 Posted October 26, 2023 How do you debug RealDoom? I tried to use the Watcom debugger for FastDoom but it was very prone to crash, so I ended up using the classic log files and realtime second screen MDA/Hercules output. 0 Share this post Link to post
sqpat Posted October 26, 2023 12 hours ago, viti95 said: How do you debug RealDoom? I tried to use the Watcom debugger for FastDoom but it was very prone to crash, so I ended up using the classic log files and realtime second screen MDA/Hercules output. Yep... up till now it's always been output to log files or to the screen. This came up a lot during the transition to the EMS memory manager as thousands of lines were rewritten to use paged variables, and there were many bugs. Usually I'd find the player x/y or some other field diverged from a normal timedemo. I'd figure out the tick/frame it happened on, use debug code to find out which thinker caused the issue, then turn on some static flags when frame X and thinker Y was active, then print out various values to try and figure out what diverged from a standard build of PCDoomV2 - usually there was another bad variable - and i'd work backwards in frames to when that variable became wrong, etc, until I found the bad line of code that started the bug. The current bug i'm dealing with is bouncing around to different areas every time I make any code change, though, so I can't just make changes to the debug code and rebuild and run again to find the bug source. I'm trying to create the simplest possible reproduction of the bug. It's almost surely a memory leak (or maybe even a bug in the timer code). 0 Share this post Link to post
sqpat Posted November 1, 2023 Phew, I finally was able to fix the main bug causing all the desyncs in the 16 bit build. A mobj pointer was being paged out then continued to be used in P_DamageMobj. After fixing this bug, it seems all three demos play back. I'm going to continue some work on conventional memory allocation and then do some comparative benchmarks to see what helps most when its in conventional memory. 5 Share this post Link to post
slowfade Posted November 1, 2023 (edited) That's really good! Keep us updated on the project. 0 Share this post Link to post
sqpat Posted November 2, 2023 Some good and bad news I suppose - The bad is that there are still some bugs (two that I can count) that are popping up from time to time causing crashes during gameplay and demos, but the good news is that they are somewhat uncommon and also pretty deterministic, so it should be easier to fix, and it also doesn't get in the way of benchmarking too much. In the short term, in addition to fixing bugs, my focus is on maximizing performance with regards to what gets stored in EMS versus what little conventional memory we have. Comparative Benchmarks with EMS/Conventional Allocations So timedemo 3 run on 86box with a pentium 233, high detail and a very small window yielded the following results: Everything in EMS: Realtics 2134 (100% - yes, it ran exactly 1:1 with gametics) 6618473 reads from EMS memory manager (100%) 1746208 EMS page swaps (100%) 65535 Bytes made available for sectors, lines, vectors, etc in conventional (not enough for everything, but was enough for a few large fields) Realtics 1593 (74.6%) 4674431 reads from EMS memory manager (70.6%) 409409 EMS page swaps (23.4%) Texture info cached in conventional (Requires around 48000 bytes in shareware DOOM) Realtics 1942 ( 91.0%) 5312437 reads from EMS memory manager (80.2%) 1363269 EMS page swaps (78.1%) Sprite info cached in conventional (Requires 7000 bytes in shareware DOOM) Realtics 2122 (99.4%) 6528690 reads from EMS memory manager (98.6%) 1728079 EMS page swaps (98.9%) Thinkers cached in conventional (I gave it around 50000 bytes but to support max thinkers = 1000 would require as much as 97000) Realtics 2022 (94.8%) 4736768 reads from EMS memory manager (71.5%) 1518549 EMS page swaps (86.9%) So it's pretty clear caching the level variables like sectors and such causes the biggest speed up... texture info is also good to cache. Sprite info seems like it may not be worth caching. I'd like to make around 128k available for level data and also have the texture info cached eventually. Right now there's something like 50-100k available depending on build settings. Another 20-30k at least can get freed up if overlays work (hard to tell when there are unrelated bugs) and I can probably save the same off the stack eventually (currently using 32k stack but it should be easy to make some code changes and then make this 8k or maybe less, will require testing.) And then in theory a well configured machine will have as much as 96k extra conventional memory lying around in the high memory area, so I think there will be a lot to work with once I write some code to make use of that space too. I have to consider at some point sounds and music will require a lot of memory too though. Some fields like texture info are pretty static and don't need to be changed after game start, while things like level data (sectors, etc) don't really change in size after level load, which means you can just give them space in a contiguous memory blob and not get any fragmentation during the level - then between levels you just clear it out and re-allocate. So they don't even need complicated allocation management. Thinkers are kind of a pain because they are constantly being created and recreated which causes some fragmentation. I made a really basic allocator that was kind of wasteful on memory and good enough to run a demo, and it seems it doesn't improve run speed too much. I think what's going on is that while a lot of reads are being made for thinker objects, a lot of them are 'cache hits', hitting something already in the EMS page frames, meaning that the page swaps don't actually decrease too much. It's probably best to leave these in EMS, though I can maybe allocate them a dedicated range of pages, and that might lead to an even better cache hit rate. I think the player object is still in EMS - I'd like that in conventional that but it might require some code rewrite. Hybrid visplanes are still active. I'm sure I can mess around with how many are in conventional memory and how much is in EMS to get some better average performance. Currently 60 are in conventional memory and more than that becomes EMS visplanes. I think once this EMS performance is sort of "maxed out", we'll end up with about twice the framerate as before. Which is a nice start, but not enough. We will still be in 'fast pentium or 486' territory requirements at that point. I'm kind of just starting to get back into the feel of everything in the codebase after being away for so long, but I should be able to more or less work on this every day for the rest of the year. Hopefully I can get a pretty stable build sometime this month that is compatible with the doom shareware WAD. 2 Share this post Link to post
sqpat Posted November 5, 2023 I managed to free up a lot of conventional memory using overlays and reducing stack (from 32k to 3k) and I also removed a bunch of debugging code that was also slowing down runtime. Currently, all level data (vertexes, sectors, side defs, line defs, linebuffer, subsectors, nodes, segs) are in conventional memory, as well as texture and sprite info. This adds up to about 150k of the most-used data shoved into that space. Using 86box, every machine using an ISA Tseng ET4000ax, here are some demo3 results for various processors. (Fullscreen Low Detail is screenblocks 10, Small is screenblocks 5 Demo3, so 2134 corresponds to 35 FPS) Processor Fullscr Small Low Detail Hi Detail FPS P-233 1967 1246 37.97 / 59.94 P-100 3507 2325 21.30 / 32.12 P-75 4365 3033 17.11 / 24.62 AMD P90 4336 2854 17.22 / 23.92 AMD-120 5451 3633 13.7 / 20.55 DX4-75 6350 4241 11.76 / 17.6 DX-33 17225 11968 4.34 / 6.24 386-40 21800 15302 3.42 / 4.88 286-25 51107 36074 1.46 / 2.07 I'd say the P-75/AMD P90 number corresponds with "acceptable speeds" and the P-100 number corresponds with "good speeds". To reach 286-25 speeds, we're about 12x away, assuming 86box accuracy. There might be another big performance jump in freeing up enough conventional memory to put more visplanes in conventional memory. I'm sure there's another big jump in optimizing ASM in drawing, EMS code, and math. Potato detail might help some too. It's good to get some real numbers though. Short term, I may be moving code around to make overlays reduce memory more. (For example, in w_wad.c, we can pull out the wad initialization code into a different file because it's not needed during the rest of gameplay.) I haven't committed the code for this build yet, but the exe is somewhat stable and attached below. Make sure you are using it in conjunction with the shareware doom wad, the attached dstrings.txt and use a very minimal dos setup (attached autoexec and config.sys included.) MEM should show 619KB or more available. This build is mainly configured to barely work with demo3. You can probably try other levels but it'll probably not fit in conventional and will go into EMS. doom16.7z 1 Share this post Link to post
sqpat Posted November 6, 2023 A few small improvements - Use of overlays and putting initialization code in there such that it gets paged out saved around 14000 bytes of conventional space. It may have caused some bugs in level intermissions and the 32-bit build so I'm going to have to revisit this. I can still save some space by putting a few other things in the overlay like shutdown code. I cached the player mobj in conventional memory. It's hard to measure a speed improvement because it's within the realm of run-to-run noise, but I assume it was a small (1% at best) speed improvement. EMS pagination went down 3 or 4 percent and a few hundred bytes were saved. Thinkers were packed into their own dedicated EMS pages. Since most thinker code interacts with other thinkers, it's important to have them in the same pages rather than being interspersed with other allocation types to reduce page in/page out if they are fragemented around. The new thinker allocations do not have as much storage overhead as generic EMS allocations, so I was able to save 6000 bytes here too. Performance seemed to get around 2% faster, and EMS pagination dropped 3 or 4 percent. I've freed up 20000 bytes here but honestly am running out of ideas on how to use this extra memory to improve performance further. Most of what's left in EMS are now things that get used very infrequently, once per frame at most. We will eventually want more memory available for level data in bigger levels, or more memory available for texture info, etc in commercial versions of the game, as well as memory for sound down the road, so maybe it's just going to be for those things. 0 Share this post Link to post
viti95 Posted November 10, 2023 With FastDoom I learned that the bigger speedups came from optimizing the rendering code (ASM), so I guess that's the first thing to do after fixing all bugs. Also converting 32-bit variables to 16-bit or 8-bit will help a lot (I think @Frenkel is doing this on Doom8088). 0 Share this post Link to post
Frenkel Posted November 10, 2023 Using 16-bit variables instead of 32-bit helps, but I'm not sure about 8-bit though. I've looked at some disassembled code and I've seen that it converts bytes to words. Using 32-bit variables as indices into arrays sometimes causes the compiler to crash :). I got a nice speed improvement by not bit-shifting 64-bit variables in FixedMul and FixedDiv. Not using FixedMul/FixedDiv, and thus not using 64-bit variables, also helps performance. I'm now working on potato mode in Doom8088. The view window is 240x128 pixels, but every 4 horizontal pixels are the same, so you get a 60x128 graphics mode. Using flat walls, flat sky, flat floors and ceilings I get 6.5 FPS in 86Box emulating a 286 @ 25 MHz. What's next for Doom8088? Replacing info.c by getters ;) 0 Share this post Link to post
sqpat Posted November 10, 2023 I've thought about comparing 8 and 16 bit variables but haven't done it yet. I've already converted as much as I could from 32 to 16. In theory if I were willing to have some rounding errors that made demos play back incorrectly, I could drop precision further on some items. But I'm trying to keep things 1:1 as much as possible I have a custom fixed_point union where i can move 8-16 bit fields around instead of shifting, but this was done a long time ago so I don't have any data on performance. But bit shifting is slow prior to 386 so I have to imagine it helps. Yeah, I finished replacing info.c with getters a couple days ago. There's also a number of structs I pushed into overlayed function getters so they gets paged out after initialization. I've saved another 10k or so since the last post, and tried a variety of things to use the memory to cache different things - most used textures, flats, patches, etc. It really only adds up to around 1% speed improvement for using up 32k of memory, so I think I've more or less hit the limit on reasonable speed improvements with caching. (pulling from EMS doesn't take that long, mostly its just the overhead involved in page management. We've already reduced pagination over 95% from when it was uncached, so there's not much further to take it) For sure, ASM improvements to drawing and math functions will help a lot. And maybe just a lot of work on the math functions in general. At some point I will have to go thru an ASM phase. Maybe different key functions can also be compiled in different faster compilers and reintroduced into the codebase down the road. Lowering quality (non textured flats or potato mode) might be the next step to work on, but I already know that there won't be enough performance improvement from just these things to get us to playable framerates on the fastest 16 bit cpus. Maybe that's just how things will be though. It sure would be nice to have a proper profiler. 0 Share this post Link to post
viti95 Posted November 10, 2023 (edited) 1 hour ago, Frenkel said: Using 16-bit variables instead of 32-bit helps, but I'm not sure about 8-bit though. I've looked at some disassembled code and I've seen that it converts bytes to words. Yep compilers aren't very smart. Under some conditions and ASM coding it's possible to process two 8-bit registers at the same time with a single instruction (like SIMD). I use this trick extensively to convert 256-color backbuffered modes to other video modes in FastDoom. Also having 8-bit variables is better for the 8088 as the data bus is 8-bit wide. 1 hour ago, sqpat said: Lowering quality (non textured flats or potato mode) might be the next step to work on, but I already know that there won't be enough performance improvement from just these things to get us to playable framerates on the fastest 16 bit cpus. Maybe that's just how things will be though. Potato mode is easy to implement, and reduces a lot the number of OUT instructions issued per frame. If graphic fidelity is not an issue, it's possible to modify the visplane rendering using flat colors without color dimishing, it's much faster as it doesn't require a conversion from columns to rows (took this idea from @Optimus OptiDoom). The main problem after optimizing graphic routines is to optimize game logic and mantain demo compability at the same time, there are lot's of optimizations that break it quite easily. 0 Share this post Link to post
Frenkel Posted November 12, 2023 I've refactored FixedMul so it doesn't use 64-bit integers and I've replaced FixedDiv in the drawing routines by FixedMul(a, 1 / b) so it also doesn't use 64-bit integers. And now I get more than 10 FPS in timedemo 3 in 86box emulating a 286 @ 25 MHz. 2 Share this post Link to post
sqpat Posted November 12, 2023 My focus these past couple days has been on taking as much as possible that was in EMS, and pulling it back into conventional memory - but also removing the EMS backup functionality, which means I don't need to go thru an accessor and I can just directly access pointers again. I'm guessing we can probably fit everything in memory after all as I've freed up quite a bit. (Though once we're working on say, DOOM 2 and not shareware DOOM it might be tough.) It's looking like a solid (3-5% ?) runtime improvement but it'll take a few days to work out some bugs. Funnily enough, code is mostly being reverted to how it used to be with pointers and not MEMREFs being passed around everywhere. Alongside this I'm refactoring mobj_t to be a lot smaller - some fields like nightmare respawn data are huge super rarely used and can stay in EMS, stuff like the player pointer is dumb and wasteful, we can just compare mt_type or something instead. But if i want to fit thinkers in conventional memory after all, then it's a lot of code I can rewrite to be more compact again too. I used to have mobj_t at 97 bytes, i'd like to get it into the 60s. Multiplied by many hundreds or even a thousand MAX_THINKERS will add up. I havent pulled upper memory blocks into play yet but i'm guessing 64k from there will be enough. I don't even want to think of where sound code and data will fit. Once this is all done, hopefully in a few days, I'll probably start on some baseline benchmarks again and then work on ASM math function improvements. Since the earlier benchmarks (nov 4) something like 25-30k of memory has been freed up, and things are around 4-5% faster. There's definitely easier bigger speed gains to be had with ASM down the road, but I want to get the core memory management code and patterns as tight as possible first. 3 Share this post Link to post
sqpat Posted November 20, 2023 Just a bit of an update. I've worked on various little improvements the past week like combining fields and removing certain cached data fields in lines, sectors, etc that I deemed to be not enough speed benefit to be worth the use of memory. The biggest improvement ultimately was putting mobj_t, ceil_t, floor_t, etc in conventional memory and then combining it with the thinker allocation list, so I reduced all the cross referencing between the thinker list and the actual data (thinkers no longer need a pointer, mobj etc no longer need a thinker pointer) It was a somewhat noticeable speed jump, I got my first P133 demo3 realtics score in the high 1600s with screenblocks 5 and full-quality, after starting at a 1850 a few weeks back. The memory bugs are starting to pop up again, so I need to go on a bugfixing spree again for now. I'm also going to update the readme on the github soon with somewhat of a roadmap, but I want to work towards a 0.1 release by the end of the year that is 'mostly' stable for shareware doom. That 0.1 release will probably not have any ASM or significant math code improvements yet - I'd really just prefer to do 'memory stuff' before I start working on 'asm stuff' because I don't want to rewrite ASM later because I changed my structs later or something. Crazy Long Term Idea - RealDOOM is the OS? I've looked into EMS 4.0 functions a bit earlier, and I'm starting to look into them more now. Basically, EMS 4.0 mostly is an update to the EMS spec to support multitasking. The spec specifies all kinds of possibilities but in practice the main thing the hardware tends to support is mapping of the 256-640k region of main memory. During in-game gameplay of doom, there are two main phases of the runtime - the 'physics' portion (running thinkers, for the most part) and then the render portion. There's also a lot of variables that are only used for one of those - visplanes are never used during physics, thinker data (mobj etc) is never used during rendering. If you want to go real deep, there are certain fields inside of sectors, lines, etc that are only used in one or another. Ideally, this data can be split into two blocks that map to the same region, and you can just swap from physics to render data as you switch between the two at runtime akin to how a multitasker is switching between two separate programs. It's a little complicated because the EMS mapping is done in multiples of 16kb blocks, aligned with 16kb memory regions and at runtime you have to get these memory blocks lined up dynamically, which feels like you're fighting with the OS to get exact physical memory regions. But thinking about it even more, there's a lot more potential here. Not only is there a lot of data that is only used during one phase or another - there is a lot of code that is only used during one phase or another. Multitaskers using EMS 4.0 are dynamically switching tasks in that EMS region which also include code, obviously. If a bunch of render code and physics code could also live in the same mappable memory region and similarly be swapped out with a near instant EMS call, even more memory becomes available. Now maybe your map's flats and the most-used textures will now fit in conventional memory during the render phase. Maybe when music or sfx interrupts are called, they do an EMS swap to load their data in and swap back before the interrupt ends, and the sound and music is also mapping to this region. It's not exactly simple for a real mode application to say to DOS (whether its link time, compile time, or runtime) "load this function's code at 0x40000". There's probably some way to do this if you wrote some startup code that is basically doing run-time linking at startup to put things where they need to be. This is approaching OS functionality, though. And I think RealDOOM in some idealized final form is more or less becoming the operating system in one way or another so it has complete control of the address space and where things are loaded and it can use EMS multitasking functions to dynamically and 'instantly' and seamlessly load in and out big memory regions at once. This isn't any sort of near term goal and the project will probably never get that far I'm guessing. So as a much more reasonable halfway step - I'm wondering if I can just manage to allocate a 64kb block at the end of conventional memory (576k-640k) at run time with some mallocs/reallocs/etc. Then I could make that 64kb region work as the swappable data region between the two phases, which should already go a long way to free up even more memory. Before that, I'll probably just pull 64k KB from upper memory and enjoy that "free" memory region first. 2 Share this post Link to post
viti95 Posted November 21, 2023 Another idea to reduce memory usage, compress better the IWADs. @fraggle has updated wadptr and fixed old issues, this reduces the amount of memory used for certain graphics, and specially the size of the sidedefs. 1 Share this post Link to post