Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Jaxxoon R

64Doom: Classic Doom on N64

Recommended Posts

As for changes to 64Doom in this release:

 

* very fast assembly implements of memcpy and memset lifted from MIPS Technologies code in GNU Lib C and Android

 

* optimal hand-rolled assembly routines for endianness conversion -- SwapSHORT and SwapLONG -- that I can not find any way to make better / shorter; they are basically one MIPS instruction per C operation at this point (see "m_swap.S")

 

* optimal-ish hand-rolled assembly routine for FixedMul, didn't get around to FixedDiv yet but will push a version soon (see "m_fixedmul.S")

 

* hand-rolled assembly for R_DrawColumn and R_DrawSpan, the two functions responsible for drawing the entire in-game display except for the status bar. I took a lot of time to write these such that they have no prologue/epilogue and spill no registers at any time. They only use caller-saved registers along with the argument and return value registers so nothing has to be saved or restored. They have early-return conditions that can be met after executing 4 to 5 instructions for columns and spans smaller than a pixel. Also, their texture-mapping loops (the "do { ... } while(count--);" loops at the end of the C versions) are shorter than any GCC-generated code at any optimization level and the entire functions themselves are shorter than any GCC-generated code by 20 - 30 instructions each (see "R_DrawColumn.S" and "R_DrawSpan.S")

Edited by jnmartin84

Share this post


Link to post

Props!

 

Do you use a profiler to measure hotspots and gauge efficiency of your implementation?

 

Share this post


Link to post
On 12/13/2017 at 4:42 PM, _bruce_ said:

Props!

 

Do you use a profiler to measure hotspots and gauge efficiency of your implementation?

 

I had some test code to benchmark the memcpy/memset code. The two functions ran something like 5x faster than the previous versions when executing on a real Nintendo 64 console. Benchmarks running in MESS are wildly misleading :-D .

 

As far as the video code, I'm going by instruction count and accesses to main memory and pipeline knowledge to avoid stalls. Same for the word swap and fixedmul code.

 

Also in regard to hotspots I had profiled the original linuxxdoom code a decade ago or so and profiled the renderer again a couple months back when trying to code an RDP hardware renderer. There are also profiling results available online from other developers that I've referenced as needed.

Edited by jnmartin84

Share this post


Link to post

Got FixedDiv implemented in assembly (and working... ) but have not attempted any optimization yet:

FixedDiv:
        .global FixedDiv
        .set    noreorder
        .set    nomacro

        sra     t0,     a0,     31
        xor     t1,     a0,     t0
        sub     t1,     t1,     t0
        sra     t2,     a1,     31
        xor     t3,     a1,     t2
        sub     t3,     t3,     t2
        srl     t1,     t1,     14
        slt     t4,     t3,     t1
        bne     t4,     zero,   _FixedDiv_test
        xor     t0,     a0,     a1
        dadd    a0,     a0,     zero
        dadd    a1,     a1,     zero
        dsll    a0,     a0,     16
        ddiv    a0,     a1
        nop
        nop
        mflo    v0

_FixedDiv_end:
        jr      ra
        nop

_FixedDiv_test:
        bltz    t0,     _FixedDiv_return_INT_MIN
        lui     v0,     0x7FFF

_FixedDiv_return_INT_MAX:
        addiu   v0,     v0,     0xFFFF
        jr      ra
        nop

_FixedDiv_return_INT_MIN:
	addi   	v0,     zero,   0x8000
        sll     v0,     v0,     16
        jr      ra
        nop

0338.png

Edited by jnmartin84

Share this post


Link to post

gcc's take on FixedDiv ... looks familiar :-o

 


        .file 1 "m_fixed.c"
        .set    nomips16
        .set    nomicromips
        .ent    FixedDiv
        .type   FixedDiv, @function
FixedDiv:
        .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
        .mask   0x00000000,0
        .fmask  0x00000000,0
        .set    noreorder
        .set    nomacro
        sra     $2,$4,31
        sra     $6,$5,31
        xor     $7,$2,$4
        subu    $7,$7,$2
        xor     $2,$6,$5
        sra     $7,$7,14
        subu    $6,$2,$6
        slt     $6,$7,$6
        bne     $6,$0,$L2
        dsll    $3,$4,16

        xor     $4,$4,$5

        bltz    $4,$L4
        nop

        li      $2,2147418112                   # 0x7fff0000
        j       $31
        ori     $2,$2,0xffff

 

$L2:
        move    $4,$5
        ddiv    $0,$3,$4
        teq     $4,$0,7
        mflo    $2
        j       $31
        sll     $2,$2,0

        sll     $2,$2,0

 

$L4:
        j       $31
        li      $2,-2147483648                  # 0xffffffff80000000

        .set    macro
        .set    reorder
        .end    FixedDiv

Edited by jnmartin84

Share this post


Link to post

I feel like MIPS is one of those processor architectures where there aren't very many opportunities for "exotic" optimizations, just clever register usage / reusage (kind of like those old C tricks like swapping without a temp variable, etc).

 

I often find my code very near identically matching gcc -O2 output on my first or second cleanup pass in most cases.

Edited by jnmartin84

Share this post


Link to post

To be fair to GCC, the MIPS architecture (the MIPS III ISA in the case of the Nintendo 64, with the RSP's vector instruction extensions not included) is so simple, most things in C map directly to small sequences of MIPS instructions, sometimes mapping one-to-one, and unlike the x86 as a particularly egregious example (the whole RISC vs CISC thing...), there aren't a half-dozen different ways to do something like copy a string (I'm not an Intel expert but I can think of two different opcodes that will do just that, copy a string with just a single instruction rather than a dozen-ish MIPS instructions with control flow transfer).

Edited by jnmartin84

Share this post


Link to post

Can you be more specific than that? Are you saying the rom couldn't be made at all, or that it could be made but didn't work on your favorite emulator?

Share this post


Link to post
3 hours ago, Danfun64 said:

Can you be more specific than that? Are you saying the rom couldn't be made at all, or that it could be made but didn't work on your favorite emulator?

The rom cannot be made.

Share this post


Link to post

I'm seeing reports on another forum that there might be line-ending issues with the shell script I packaged in the toolkit.

Share this post


Link to post
On 12/15/2017 at 7:40 AM, jnmartin84 said:

I feel like MIPS is one of those processor architectures where there aren't very many opportunities for "exotic" optimizations, just clever register usage / reusage (kind of like those old C tricks like swapping without a temp variable, etc).

Makes sense really - it's a RISC architecture (one of the original ones) and the central idea behind RISC is to have fewer, simpler instructions which can be better optimized in the CPU design. Most of the time there really should only be pretty much only one way to do things.

Share this post


Link to post
On 1/3/2018 at 10:23 AM, fraggle said:

Makes sense really - it's a RISC architecture (one of the original ones) and the central idea behind RISC is to have fewer, simpler instructions which can be better optimized in the CPU design. Most of the time there really should only be pretty much only one way to do things.

The antithesis of things like the VAX which could compute a polynomial with a single CPU instruction. :-D

Share this post


Link to post

This should be moved to the Console section.

 

By the way, in the videos I've seen this runs smoothly, wish I had an N64...

 

 

Share this post


Link to post

Thanks for the bump, I had no idea about this. I have a N64 flash cart so I'd definitely like to try to get this running. The videos from years back make it look pretty great, all things considered.

Share this post


Link to post

Just started working on this again a week or two ago.

 

Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April.

Share this post


Link to post
18 minutes ago, jnmartin84 said:

Just started working on this again a week or two ago.

 

Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April.

I have loved this ever since it was announced.

 

It running at near the Expansion Pak resolution is mighty impressive. 

 

Are there ideas to visually enhance the engine with new effects or?

Share this post


Link to post
7 hours ago, jnmartin84 said:

Just started working on this again a week or two ago.

 

Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April.

Cool! I'll update my thread to reflect the development status of this backport now.

Share this post


Link to post

The enhancements are it is pushing 4x as many pixels around, there's not much cpu time left to do anything else unless I take another stab at drawing columns with the RDP

Share this post


Link to post
1 hour ago, jnmartin84 said:

The enhancements are it is pushing 4x as many pixels around, there's not much cpu time left to do anything else unless I take another stab at drawing columns with the RDP

I can imagine running near native N64 Expansion resolution and twice the resolution of Vanilla Doom. Of the cross sourced ports, 64Doom (And 64Doom2) is easily one of the more technically impressive ones since you did a lot of under the hood work to make it sing with the N64 hardware :)

 

Impressive, once more!

Share this post


Link to post

Calling it "Expansion resolution" or "Expansion pak resolution" is misleading. You can have two 640x480 16-bit color framebuffers on an un-expanded N64 and still have 3.4MB of memory left for code and data.

Share this post


Link to post
1 hour ago, jnmartin84 said:

Calling it "Expansion resolution" or "Expansion pak resolution" is misleading.

Its a generic descriptor, relax. I only say this because Expansion pak res is 640x480 and you aim for 640x400, so in a general fashion, that's ''Near N64 Expansion resolution'' given most Expansion Pak enabled titles raised the resolution to 640x480. Hence the inclusion of ''Near''.

 

There isn't some deeper meaning behind it.

Share this post


Link to post
11 hours ago, Redneckerz said:

Its a generic descriptor, relax. I only say this because Expansion pak res is 640x480 and you aim for 640x400, so in a general fashion, that's ''Near N64 Expansion resolution'' given most Expansion Pak enabled titles raised the resolution to 640x480. Hence the inclusion of ''Near''.

 

There isn't some deeper meaning behind it.

And I'm trying to say that 640x480 high res on N64 and the expansion pak are two entirely unrelated things. I'm not going to go in circles over it but they have absolutely nothing to do with each other. Zelda MM, DK64, both expansion pak titles that run at 320x240.

NFL QB Club ran at 640x480 on an unexpanded system. That's the point I was making.

 

Also, "aiming for 640x400" is ignoring the built-in aspect ratio assumption of Vanilla Doom. Running the N64 at 640x480 and rendering the game display in the middle of that.

Share this post


Link to post
2 hours ago, jnmartin84 said:

And I'm trying to say that 640x480 high res on N64 and the expansion pak are two entirely unrelated things. I'm not going to go in circles over it but they have absolutely nothing to do with each other. Zelda MM, DK64, both expansion pak titles that run at 320x240.

NFL QB Club ran at 640x480 on an unexpanded system. That's the point I was making.

What you are going for is an exact definition, including mentioning that there are titles that run in 640x480. This is all very trye.

 

That was not what my original post is about. It is about a generic definition, not an exact one.

 

2 hours ago, jnmartin84 said:

Also, "aiming for 640x400" is ignoring the built-in aspect ratio assumption of Vanilla Doom.

Oh, come on, this is nitpicking if i ever saw one. You do not have to stare so intently on the word "aim" to have this response.

2 hours ago, jnmartin84 said:

 

 

Share this post


Link to post

Your generic definition is wrong.

 

Anyway, new release is up on the 64doom github  

 

github.com/jnmartin84/64doom

Share this post


Link to post
1 hour ago, jnmartin84 said:

Your generic definition is wrong.

There is no inherent point in saying this. Besides the post you are responding to is close over a month old.

 

Quote

Anyway, new release is up on the 64doom github  

 

github.com/jnmartin84/64doom

Congratulations on the new release. Here is it linked: 64Doom.

 

Oh, there is no new release under the Releases tab as that still points to the 2017 version labeled 1.0. Is the new release still uploading?

 

Changes (For those who are interested):

  • New version of 64Doom that now runs in 640x400 in high detail mode (system resolution 640x480).
  • Low detail is 320x400.
  • High detail is kinda slow, low detail is fast.
  • Sound and music rendering and mixing is now entirely fixed point.
  • Optimizations in many places.
  • Updated builder.
  • Pulled in toolkit line ending issue fix so hopefully no more complaints about that.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×