Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Jaxxoon R

64Doom: Classic Doom on N64

Recommended Posts

jnmartin84 said:

I've picked up working on this again. I am implementing "network" multiplayer using the USB port on the 64Drive to communicate with a network bridge based on ZeroMQ. I'm working out the communication protocol now and have stubbed out the USB send/recv in the 64Doom codebase. If the Everdrive 64 allows programmer access to send and receive data over USB, I should be able to implement it for that cart as well, and it would be compatible with the 64Drive version out-of-the-box seamlessly due to the PC bridge interface being identical for any platform that can send and receive arbitrary buffers of data over an arbitrary comms interface to a PC. More soon.



Great to hear!

Share this post


Link to post

USB multiplayer went nowhere.

However, I pulled together much faster (5x) versions of memcpy and memset that made a huge performance improvement. Game play in low detail mode is faster than any pre-Xbox home console port at this point, rivals playing Doom on a Pentium computer now. Working on releasing a new binary after I fix the broken controller pak save game code.

Just grabbed a fresh-from-Github copy of g_game.c, re-writing the entire save game stack.

Again.

Share this post


Link to post

Interesting, is there any recent video of it running on modern hardware?

Also, if you merge vanilla megawads into doom2.wad it should just work, right? So you can make roms with Plutonia, BTSX E1 etc :D

Share this post


Link to post

New release is available at Github now. Only updated the ROM Builder and 64DOOM_README.TXT files. Source will follow in the next couple of weeks.

https://github.com/jnmartin84/64doom/blob/master/64DOOM_BUILDER.zip?raw=true

NEWEST 64DOOM ROM BUILDER TOOLKIT RELEASE DATE: 2016 NOVEMBER 23
"Newer, much faster Doom engine binary with mostly-fixed save game handling. There is a known bug where if you save a game, start a New Game from the Doom menu, then try to load your saved game, the game crashes. Loading games from the initial Doom main menu works fine however (reset the console after starting a new game if you wish to re-load your last saved game instead)."

VGA said:

Interesting, is there any recent video of it running on modern hardware?

Also, if you merge vanilla megawads into doom2.wad it should just work, right? So you can make roms with Plutonia, BTSX E1 etc :D


No for first question.

Doubtful for second question. Plutonia and TNT are supported as "first-class citizens" by the Doom engine, they are separate WADs with unique filenames and can be built into 64Doom with the builder toolkit.

Share this post


Link to post

Save games are still iffy, sometimes an exception is thrown when you save a game and then later try to load a game (you'll know if this happens because you'll see a blue screen with white text of a full register dump). If that happens, reset the console and try loading it again. Something about the way save and load interact, some memory is getting corrupted and I haven't finished tracking down the issue yet.

Share this post


Link to post

I think I'll dig out the N64 and fire this up on the 64drive and record video again this weekend.

The performance improvements sound excellent as that was the biggest issue with the port as it was for me.

Share this post


Link to post
ReFracture said:

I think I'll dig out the N64 and fire this up on the 64drive and record video again this weekend.

The performance improvements sound excellent as that was the biggest issue with the port as it was for me.


Since I do not own a N64 this would be more than welcome.

Share this post


Link to post
ReFracture said:

I think I'll dig out the N64 and fire this up on the 64drive and record video again this weekend.

The performance improvements sound excellent as that was the biggest issue with the port as it was for me.

Play Plutonia on it!

Or TNT's Stronghold please!

Share this post


Link to post

I went down a bunch of different rabbit holes over the past the 18 months or so.

Most of my wild ideas didn't pan out. Networking via USB was a bust since I never even got my PC-only prototype to work.

The RDP rendering was a little different. I did make some headway with performance improvements (by not acquiring and releasing a lock on the RDP and by not doing a SYNC_PIPE before each individual quad) but could never quite figure out how to scale the column textures correctly without first software texture mapping the original column into a texture before sending it to the RDP. At that point it just seems saner to do it all in software, so I stopped going down that route.

That is not to say I haven't made some significant improvements recently however...

Share this post


Link to post

I took a week or two to review the VR4300 documentation to really get intimate with the performance characteristics of it, especially with behaviors that negatively impact instruction throughput (avoiding pipeline stalls and avoiding going to memory wherever possible).

GCC's code generation back end for MIPS is pretty good but it isn't perfect and it turns out that it generates sub-optimal code in a lot of cases regardless of the optimization settings you pass it.

I already had results from prior profiling and knew where the hottest call sites were in the Doom engine when running with the software renderer.

Improvements were available by wringing out code changes in R_DrawColumn, R_DrawSpan, FixedMul and FixedDiv.

FixedMul and FixedDiv were spilling to memory even though they only use their input parameters and constants that fit into "IMMED"-type instructions to compute their return values. I was able to rewrite them so that they never execute a single LW/SW instruction. The prolog/epilog don't need to spill/restore or touch the stack pointer.

In the case of R_DrawColumn and R_DrawSpan, neither function takes in any arguments or returns a value.

I was able to write them from scratch in MIPS assembly without touching a single callee-saved register (that is, to only use temporary registers, the argument registers and the return value registers). GCC was unable to emit similar code. I was able to entirely remove the prolog/epilog code that touched the stack pointer and loaded/stored registers to/from memory on each and every call (up to 2,000 calls total per frame from my profiling). The code I wrote also has about 20 fewer instructions in the body of the inner texture-mapping loop in each function compared to the best assembly output GCC would produce regardless of -O settings.

In the case of both functions I also took time to make sure they were scheduled as optimally as possible to avoid pipeline hazards/stalls.
 


lw $t1, 0($t0)
add $t2, $t1, $a0


is an example of code that introduces a bubble into the pipeline to deal with the fact that the result of the MEM stage of the LW instruction isn't available to the EX stage of the ADD instruction regardless of register forwarding in the pipeline ($t1 in the case of the example above).

I was able to re-order EVERY INSTANCE of the LW/xxx instruction pairs in my hand-rolled R_DrawColumn and R_DrawSpan functions to avoid this hazard. Given that these instruction pairs showed up in the inner texture-mapping loop of each function, removing these hazards was huge for instruction throughput.

I was able to find useful instructions to put in the delay slot of the branches in both functions in almost every case.

Putting these improvements in place gave a significant rendering performance boost.

The output in high detail mode is now as fast / smooth as it is in low detail mode.

One final improvement to be made to the software renderer, in the category of somewhat low-hanging fruit, is to modify it so that it outputs directly to a 16bpp framebuffer instead of updating the Doom-internal 8bpp framebuffer and then having to blit it to the N64 CFB at the end of each frame. That would save 76,800 byte reads each frame / 2,688,000 byte reads each second (76,800 bytes per frame, one frame per update, 35 updates per second). In other words, rendering to the CFB / changing Doom to a true color renderer would save 2 MB / sec in memory reads. That would probably have a significant positive performance impact. Just a guess though. ;-)

A lot of this new assembly code is in the 64Doom GitHub already. 

I am going to do a new push of the whole code base in the next week or two.

Share this post


Link to post
31 minutes ago, jnmartin84 said:

Improvements were available by wringing out code changes in R_DrawColumn, R_DrawSpan, FixedMul and FixedDiv.

These are the usual suspects. DOS Doom had assembly versions of each of these, I think.

 

31 minutes ago, jnmartin84 said:


FixedMul and FixedDiv were spilling to memory even though they only use their input parameters and constants that fit into "IMMED"-type instructions to compute their return values. I was able to rewrite them so that they never execute a single LW/SW instruction. The prolog/epilog don't need to spill/restore or touch the stack pointer.

You didn't just make them into inline functions?

 

31 minutes ago, jnmartin84 said:

In the case of both functions I also took time to make sure they were scheduled as optimally as possible to avoid pipeline hazards/stalls.


lw $t1, 0($t0)
add $t2, $t1, $a0

is an example of code that introduces a bubble into the pipeline to deal with the fact that the result of the MEM stage of the LW instruction isn't available to the EX stage of the ADD instruction regardless of register forwarding in the pipeline ($t1 in the case of the example above).

I was able to re-order EVERY INSTANCE of the LW/xxx instruction pairs in my hand-rolled R_DrawColumn and R_DrawSpan functions to avoid this hazard. Given that these instruction pairs showed up in the inner texture-mapping loop of each function, removing these hazards was huge for instruction throughput.

 

I'm almost surprised gcc isn't smart enough to do this kind of thing automatically nowadays. Maybe I have too much faith in modern compilers. Do you have -march=vr4300 on the command line?

Share this post


Link to post

Personally I'd be really surprised if all the backends of a compiler were equally maintained as well as the mainstream ones. Off-beat platforms tend to get treated like red-headed step children.

Share this post


Link to post
4 hours ago, fraggle said:

You didn't just make them into inline functions?

One thing at a time. :-D

 

But seriously, I just finally rolled around to looking at these improvements. Any time I try to make more than one leap, I invariably fail and break my source tree. 

 

One issue I can see with the assembly versions if if the compiler doesn't transform them somehow (maybe by adding back in the prolog/epilog code I ripped out) I don't think it would even be safe to inline them.

 

Another thing is that inlining would make my debugging more difficult.

 

The most I have besides "I_Error" output is the debugger built into MESS. The version I have doesn't even support loading in symbols. The features it does support are iffy. Anything that changes the code too much from the C to the assembly/machine code output is a major no bueno for my purpoes.

Edited by jnmartin84

Share this post


Link to post
4 hours ago, fraggle said:

I'm almost surprised gcc isn't smart enough to do this kind of thing automatically nowadays. Maybe I have too much faith in modern compilers. Do you have -march=vr4300 on the command line?

For the most part the compiler will schedule instructions with hazards in mind. Sometimes though it does a weird job with register allocation and then a weird job at using those registers and ends up in a si`tuation where the only way to keep the program semantically correct is to let the pipeline bubble. Sometimes it even slaps a "NOP" in between but not always.

 

Also I've never seen it put anything but a "NOP" in the delay slot of a control-transfer instruction.

 

p.s.


CFLAGS = -DTRUECOLOR -std=gnu99 -march=vr4300 -mtune=vr4300 (... snip ...)

Edited by jnmartin84

Share this post


Link to post

I'm making a fork of my code right now to start working on adding true-color software rendering to see if those 2 MB/sec worth of redundant LB byte reads in the blitter are slowing things down significantly.

Share this post


Link to post

This is a minor issue in the grand scheme of things, but how does the N64 deal with color space conversion, if at all? I assume everything on the N64 is converted to Rec.601 as opposed to the more modern Rec.709, and since sRGB uses the same color space as Rec.709, stuff might come out with a slight green or magenta cast to it. And that's not even mentioning whether or not the N64 uses "TV-safe" colors where anything below 16 is pure black and anything above 235 is pure white.

Share this post


Link to post
17 minutes ago, Linguica said:

This is a minor issue in the grand scheme of things, but how does the N64 deal with color space conversion, if at all? I assume everything on the N64 is converted to Rec.601 as opposed to the more modern Rec.709, and since sRGB uses the same color space as Rec.709, stuff might come out with a slight green or magenta cast to it. And that's not even mentioning whether or not the N64 uses "TV-safe" colors where anything below 16 is pure black and anything above 235 is pure white.

If you are asking in relation to the videos I posted, everything looks green because the CRT television I'm playing on was stored next to a 4x12" speaker cabinet for years and has terrible color reproduction due to damage from the magnets.

Share this post


Link to post

*shrug*

I just colormap from the original 8bpp pixels into a 16bpp RGBA5551 colormap and let the video hardware handle it.

Share this post


Link to post

Also I'm not really sure why we're talking about HDTV-related things considering nobody was playing a Nintendo 64 on one when they came out in 1995/1996.

Share this post


Link to post

On the other hand, I'd wager the majority of people playing a hardware N64 these days will be doing so on a HDTV. I know I've kept a LCD TV around with SCART and composite inputs for the N64 and a Megadrive, and I don't have any CRTs left.

Share this post


Link to post

True, but I feel like color reproduction issues on HDTV are beyond the scope of the color conversion in a homebrew Nintendo 64 port. I convert from the palette format meant for DOS VGA mode into N64-native 16-bit RGBA5561 colors for later lookup. Anything after that is a concern of the end-user I think. :-)

Share this post


Link to post

Excellent work - I really like the assembly "details". I have zero experience with this processor but it is quite intriguing to hear about code on non-x86 hardware powering an interesting application.

The color reproduction "issue" strike me as nonsensical in lieu of said project but maybe someone anal and knowledgable enough could "touch up" the port once it has a stable release.

Share this post


Link to post

I forgot to mention in previous recent posts but if anyone is interested in following the various day-to-day / week-to-week musings about and updates I'm making to 64Doom, I created a public Facebook group named (wait for it) 64Doom.

 

It recently attracted new membership from someone that was following the project through a YouTube video.

 

The group (well, really it is me) is gladly and graciously accepting all new-comers as long as they don't:

1) Ask me why I'm not working on (anything).

2) Be overly / overtly negative about the project or anything seen in the group.

3) SPAM.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×