Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
SaladBadger

Doom 0.5 reverse engineering project

Recommended Posts

1 hour ago, SaladBadger said:

https://github.com/InsanityBringer/Doom05RE

 

I've finally gotten my Doom 0.5 RE project to a point where it's basically feature complete, so I've created a repo for it. The port I've developed is extremely barebones, and mostly exists just to verify that things are functional, but it does the job.

 

I dunno what else to do with this. My main intentions are to write an article similar to Fabien's article on how the BSP based Doom renderer works, and try to delve deep into why the original recursive algorithm was so slow. Additionally, I'd like to also possibly port it on top of Chocolate Doom's gameplay code and see if it would be possible to make it work with features like Build-like moving sectors. In practice, any serious port would be better off adopting Eternity like portals and clever use of portals and polyobjects for complicated moving things, but it might be neat to see what was originally possible. Beyond that, there's not much of interest in an engine derived from alpha-quality game code, beyond being a time capsule of what Doom's code was once like.

Epic Sauce.

 

Are you aware of Newdhack? It was an attempt to recreate the Alpha version.  Later on there was a recreation of 0.8 in a beta engine. See here.

Share this post


Link to post

I'm somewhat familiar with xttl and Quasar's work, and I remember many years ago Quasar talking about architectural differences in the pre-beta release. It's all very neat, and I admire the things the two have managed to pull off. So far as I'm aware though there's never been much work done with any of the earlier alpha versions, which is a shame since they bury lots of interesting secrets and tell a tale of the game's evolution, especially on the rendering front.

Share this post


Link to post

doom_05_renderer_demo.gif.3e5b3d1e3642c9fa8dd980d44b1efdbc.gif

Trying some ways to visualize the recursive rendering for the Article I Want to Write On The Alpha Doom Renderer. I think this works out quite nicely, and indicates some of the fun aspects of the recursive stepping problems John Romero brought up, like how sector 10 is stepped into twice and sector 1 is stepped into 4 times in this view.

 

The lines that appear before each iteration are the clipping bounds for that pass.

Share this post


Link to post

Very interesting. I don't know if this old version of E1M2 still exists in the game files, but would you be able to test the renderer on it? Romero said it "slowed the game way down" during his Doom Post-Mortem.

 

image.png.7c162665bd61917e48e2c8bbb5ed480c.png


 

 

Share this post


Link to post

Do you have any notes on your workflow for analysing the alpha binaries? I found some scattered hints in the exe hacking thread on things like how to make Ghidra more useful with programs that use dos extenders but it'd be great to have something more condensed.

 

In case you're still looking for additional references for your article, I noticed that there's some loading and saving code for the alpha's map format in the DoomEd source release.

Share this post


Link to post

TBH being my first time around my approach wasn't particularly rocket science, so I don't know how helpful I can be, but I'll gather up all I did here and some related notes.

 

To get started, I was pointed to ghidra and the LX loader addon, and I recalled xttl's tip from the exe hacking thread on how to strip the dos extender: Delete all bytes in the executable up to the second MZ header. (I was also pointed out to the fact that there's a Watcom binding tool that can do it present by mistake in one of the earliest Descent shareware releases, but I couldn't find it. It's not needed, though).

 

I chose Doom 0.5 specifically because it was built with full debugging information (though this does have some side effects, since it means stack checking is enabled, which interferes with the decompiler in some weird ways), which accelerated things a lot because it contained full information on all of the structures in the executable, as well as complete listings of all stack and register variables in every function. Sadly, I don't think there's been a lot of effort on figuring out the formats for this data, only the more limited symbol tables that are present in 0.4 up to the PRB. I just kinda brute forced it, since before each field name in a structure, there's a byte that starts at A, followed by a byte that contains the offset of the field. The A seems to be part of the offset, as it becomes B on some larger structures. This shouldn't be too hard to figure out in detail.

 

With structures and function names in hand, I set out to map the functions and global variables, which was an adventure. This, in theory, could be done much faster with a script, since xttl has figured out the format of the symbol tables. Even then, though, Watcom's linker is simple and things simply appear in the order they appear in the symbol table (which is the order they appear in the core source code), so that's how I ended up mapping things. I started from the first couple of functions in the executable (though the ASM texture mappers are the first bits of code actually present), and did a first pass of the entire executable by simply naming functions and globals while filling in rough params, and then I went through to assign and verify types and clean things up on another pass.

 

I've been pointed to this language spec file which implements Watcom's unusual calling convention (first four params are eax, edx, ebx, ecx, and then the remainders are passed RTL on the stack like usual), which helped make the work go faster. From there it was just smooth sailing as I figured out how to work the program.

 

One thing I did have to get used to was identifying patterns. FixedMul and FixedDiv are inlined, unsurprisingly, which leads to some nasty looking C code:

    lVar1 = (longlong)(extraout_EDX << 3) * (longlong)cosines[temp_37f9af4087b];
    player->momx = (uint)lVar1 >> 16 | (int)((ulonglong)lVar1 >> 32) << 16;
    lVar1 = (longlong)(extraout_EDX << 3) * (longlong)sines[temp_37f9af4087b];
    player->momy = (uint)lVar1 >> 16 | (int)((ulonglong)lVar1 >> 32) << 16;

FixedMuls can be identified as a IMUL reg; SHRD EAX, EDX, 0x10; in the disassembly, but eventually I just got used to identifying the pattern in the C code. (also the register work throws it off, in this function extraout_EDX is just one of the params to the function)

 

Once it was time to port things, I mostly ported things by parts. Ported the low level code to my new SDL backend, ported the asm drawing routines, ported the level loader, then ported the renderer, ported the play loop, and then finally ported the overall game and demo loop. All throughout this I did cleanup on things that weren't quite correct or I messed up on.

 

This is about all of the specific details I remember encountering on this project.  Also, it's pretty cool that level IO code for this format is present in the DoomEd source. The general format of the level files has been known for a while, but it's pretty interesting to see that the level editor at one point was capable of reading/writing usable maps directly. It should also be a little helpful for the level convertor I'm slowly working on, since it shows how the variable length structures are created (which yeah, I can figure out myself, it's not hard, but given the option to be lazy and read something I'll take it)

Share this post


Link to post

Thanks again, really good writeup! I hope you continue to document this, I'm glad people are interested in preserving/exploring this part of history.

 

I had also been wondering about the "missing link" of the Doom renderer's history - the SNES port of Wolf3d that had Carmack's first iteration of using BSP. The best info I have found so far is https://eludevisibility.org/2018/super-noahs-ark-3d-source-code which should be close (e.g. some familiar algorithms), but this is probably better for a separate thread.

Share this post


Link to post

For whatever it's worth, I had some time so I started cleaning up the code. There's a lot of super raw straight-from-the-decompiler-and-then-cleaned-up-to-function-but-not-very-pretty code in the thing, which was bothering me, so I went ahead and used the debugging information present in the executable to add accurate stack variable names where I can and also went ahead and changed while loops back to for loops when they were originally fors and so on. I've done this for the playsim only so far, but I want to do it to all the game's modules, and 0.5 is the only version where we can get such accurate information.

 

One thing I'm a little on the fence about is whether or not I should heavily revise functions to match what they look like in the actual doom source. compare my version of EV_VerticalDoor with the one in the official Doom source release. The flow for things how it handles a door sector already having a thinker is somewhat different, and this can be noticed with a lot of these funcs. I kinda want to try reconstructing the source code as closely as possible, but it's a lot of work for little gain.

 

I'm also still eyeballing getting this working with Open Watcom or a similar DOS compiler because I can't make this the "definitive" source because certain bugs would cause segfaults on modern operating systems if I kept them. Ah well, at least the gameplay logic and rendering seem accurate, accurate enough to run the original demos, for whatever it's worth. At least they desync in the same way as the original.

Share this post


Link to post

Honestly, I should have done this the moment I started this project all the way back in 2020, but I've finally decided to go back and do two things to the code. The first is code cleanup, I realized that since the code was compiled without any optimizations and full debugging information, I can get something very close to the exact statements that would have been on Carmack's monitor 30 years ago, you can basically see it in the assembly. This means I finally have an accurate version of MergeSort, rather than some barely functioning monstrosity cobbled together from ghidra beating at 3 different versions of it, and all local variable names are accurate. A lot of this has been committed to the github, but I've been dragging my feet on the rest of it..

 

Secondly, I've finally brought the code back to DOS.

dosbox_005.png.b2603f9da1ea4e73d5f1d1558240e8a2.png

It's functioning, and about 99% complete. I mostly need to get two routines back to assembly (the high color span drawer, which is unused since the high color column drawer is bugged in the 5/22 alpha and goes into an infinite loop, and the assembly version of IO_BlitBlocks which since it is unrolled I want to set up a macro for). Other than that all the DOS specific code has been reconstructed. The build environment is currently using Open Watcom since I've been too lazy to find the version originally used, and TASM 3.1 since Open Watcom's assembler is basically unusable for me. The dos4gw stub program is also compiled using the 16-bit compiler, but it seems like the one the alphas actually used was compiled with Borland C.

 

I would like to try to get a perfect executable like with the gamesrc recreations, but I dunno if I will. It's a lot of work for a half finished prerelease version of a game, and it would involve poking at DMX a lot. The version of DMX included in Alpha 0.5 is very different than any other version of DMX seen, and isn't represented anywhere within the leaked DMX sources. This version doesn't contain any device-specific code in the library itself, instead it defined a driver interface where a real mode driver would perform the device specific functions and communicate with the DMX library in the protected mode library with a block of memory registered in a real mode interrupt vector... so pointless exposition aside, the only way I could get an accurate executable would be to somehow extract those functions out of the executable and stuff them back into a static library (I've seen it done before), or reverse engineer all of that version of DMX and recompile it. Even though I can't test it without any of these driver programs which were never released or leaked.

Share this post


Link to post

Frankly, I'm just glad to see this effort is still alive and hasn't been abandoned. It's still hella neat in my book.

Share this post


Link to post
3 hours ago, SaladBadger said:

It's functioning, and about 99% complete. I mostly need to get two routines back to assembly (the high color span drawer, which is unused since the high color column drawer is bugged in the 5/22 alpha and goes into an infinite loop, and the assembly version of IO_BlitBlocks which since it is unrolled I want to set up a macro for). Other than that all the DOS specific code has been reconstructed. The build environment is currently using Open Watcom since I've been too lazy to find the version originally used, and TASM 3.1 since Open Watcom's assembler is basically unusable for me. The dos4gw stub program is also compiled using the 16-bit compiler, but it seems like the one the alphas actually used was compiled with Borland C.

 

I would like to try to get a perfect executable like with the gamesrc recreations, but I dunno if I will. It's a lot of work for a half finished prerelease version of a game, and it would involve poking at DMX a lot. The version of DMX included in Alpha 0.5 is very different than any other version of DMX seen, and isn't represented anywhere within the leaked DMX sources. This version doesn't contain any device-specific code in the library itself, instead it defined a driver interface where a real mode driver would perform the device specific functions and communicate with the DMX library in the protected mode library with a block of memory registered in a real mode interrupt vector... so pointless exposition aside, the only way I could get an accurate executable would be to somehow extract those functions out of the executable and stuff them back into a static library (I've seen it done before), or reverse engineer all of that version of DMX and recompile it. Even though I can't test it without any of these driver programs which were never released or leaked.

Thank you for all your effort you have put into reverse engineering Doom 0.5! I feel that the pre-release versions of Doom games are much underappreciated in the community (last time I asked here about a possible decompilation of the October 1993 build, I was just told "there's nothing interesting left to find" even though I do find how these builds tick under the hood to be interesting in and of itself) so every time I see them getting attention I have to point it out. Hope your efforts lead to a full gamesrc-like decompilation (though it would take a lot of hard work to get there).

Edited by Individualised

Share this post


Link to post
On 11/17/2020 at 7:12 AM, SaladBadger said:

and try to delve deep into why the original recursive algorithm was so slow

From the gifs, the algorithm seems very similar to how Build renders the world.

 

Main reason it is slow, compared to Doom anyway, is that it revisits the same parts of the map repeatedly (for each recursive horizontal range).  Years ago I implemented the Build algorithm in an OpenGL port of Doom and found it is fine for simple geometry, but scales very poorly as map geometry gets more complex.  Hence I abandoned that experiment.

Share this post


Link to post

indeed, from what little I understand of the Build architecture it's fairly similar. find visible portals, recurse into them keeping a list of subsectors that it wants to check out, using clipping to restrain them to the bounds of that portal. It's pretty cool that the limits of the rendering code allow for perfect clipping during the recursion, but it's also not hard to see why a new algorithm for iterating sectors was desired. Build had way more time to beat at it to make it perform well.

Share this post


Link to post

Looks like another project that I've missed and should be aware of! I've also let @nukeykt know.

 

Even if I did spot this project beforehand, I was surely not aware of the recent (re)introduction of DOS support until the previous day or night. 6 months preceding Doom shareware v0.99/1.0 were surely a significant development. When it comes to the reverse-engineering department and getting things accurate, I can totally understand the technical difficulties that ones may have to go through. I misremembered or wrongly assumed the early versions to not use DMX at all, given the lack of sound output; Well, that was apparently the case for versions of Doom preceding 0.5, while v0.5 had quite early DMX code. Having non-optimized binaries surely assists, and so does debugging information.

Share this post


Link to post

Ah, I was hoping to have all the code commited to the repo by now, but I noticed there's a bunch of functions that slipped through my fingers with regards to clean up, so I'm busy finishing that up. I did commit some of it though.

 

As a cop out, I decided to see if I could make the demos play in sync correctly, and I seem to have managed it. here's a video.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×