jnmartin84 Posted December 13, 2017 I just pushed updated code along with the dependencies that had been missing for the last three years and an updated 64Doom ROM Builder Toolkit to GitHub:https://github.com/jnmartin84/64doom Enjoy. 4 Share this post Link to post
jnmartin84 Posted December 13, 2017 (edited) As for changes to 64Doom in this release: * very fast assembly implements of memcpy and memset lifted from MIPS Technologies code in GNU Lib C and Android * optimal hand-rolled assembly routines for endianness conversion -- SwapSHORT and SwapLONG -- that I can not find any way to make better / shorter; they are basically one MIPS instruction per C operation at this point (see "m_swap.S") * optimal-ish hand-rolled assembly routine for FixedMul, didn't get around to FixedDiv yet but will push a version soon (see "m_fixedmul.S") * hand-rolled assembly for R_DrawColumn and R_DrawSpan, the two functions responsible for drawing the entire in-game display except for the status bar. I took a lot of time to write these such that they have no prologue/epilogue and spill no registers at any time. They only use caller-saved registers along with the argument and return value registers so nothing has to be saved or restored. They have early-return conditions that can be met after executing 4 to 5 instructions for columns and spans smaller than a pixel. Also, their texture-mapping loops (the "do { ... } while(count--);" loops at the end of the C versions) are shorter than any GCC-generated code at any optimization level and the entire functions themselves are shorter than any GCC-generated code by 20 - 30 instructions each (see "R_DrawColumn.S" and "R_DrawSpan.S") Edited December 15, 2017 by jnmartin84 2 Share this post Link to post
_bruce_ Posted December 13, 2017 Props! Do you use a profiler to measure hotspots and gauge efficiency of your implementation? 2 Share this post Link to post
jnmartin84 Posted December 14, 2017 (edited) On 12/13/2017 at 4:42 PM, _bruce_ said: Props! Do you use a profiler to measure hotspots and gauge efficiency of your implementation? I had some test code to benchmark the memcpy/memset code. The two functions ran something like 5x faster than the previous versions when executing on a real Nintendo 64 console. Benchmarks running in MESS are wildly misleading :-D . As far as the video code, I'm going by instruction count and accesses to main memory and pipeline knowledge to avoid stalls. Same for the word swap and fixedmul code. Also in regard to hotspots I had profiled the original linuxxdoom code a decade ago or so and profiled the renderer again a couple months back when trying to code an RDP hardware renderer. There are also profiling results available online from other developers that I've referenced as needed. Edited December 20, 2017 by jnmartin84 1 Share this post Link to post
jnmartin84 Posted December 15, 2017 (edited) Got FixedDiv implemented in assembly (and working... ) but have not attempted any optimization yet: FixedDiv: .global FixedDiv .set noreorder .set nomacro sra t0, a0, 31 xor t1, a0, t0 sub t1, t1, t0 sra t2, a1, 31 xor t3, a1, t2 sub t3, t3, t2 srl t1, t1, 14 slt t4, t3, t1 bne t4, zero, _FixedDiv_test xor t0, a0, a1 dadd a0, a0, zero dadd a1, a1, zero dsll a0, a0, 16 ddiv a0, a1 nop nop mflo v0 _FixedDiv_end: jr ra nop _FixedDiv_test: bltz t0, _FixedDiv_return_INT_MIN lui v0, 0x7FFF _FixedDiv_return_INT_MAX: addiu v0, v0, 0xFFFF jr ra nop _FixedDiv_return_INT_MIN: addi v0, zero, 0x8000 sll v0, v0, 16 jr ra nop Edited December 15, 2017 by jnmartin84 1 Share this post Link to post
jnmartin84 Posted December 15, 2017 (edited) gcc's take on FixedDiv ... looks familiar :-o .file 1 "m_fixed.c" .set nomips16 .set nomicromips .ent FixedDiv .type FixedDiv, @function FixedDiv: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .set nomacro sra $2,$4,31 sra $6,$5,31 xor $7,$2,$4 subu $7,$7,$2 xor $2,$6,$5 sra $7,$7,14 subu $6,$2,$6 slt $6,$7,$6 bne $6,$0,$L2 dsll $3,$4,16 xor $4,$4,$5 bltz $4,$L4 nop li $2,2147418112 # 0x7fff0000 j $31 ori $2,$2,0xffff $L2: move $4,$5 ddiv $0,$3,$4 teq $4,$0,7 mflo $2 j $31 sll $2,$2,0 sll $2,$2,0 $L4: j $31 li $2,-2147483648 # 0xffffffff80000000 .set macro .set reorder .end FixedDiv Edited December 15, 2017 by jnmartin84 0 Share this post Link to post
jnmartin84 Posted December 15, 2017 (edited) I feel like MIPS is one of those processor architectures where there aren't very many opportunities for "exotic" optimizations, just clever register usage / reusage (kind of like those old C tricks like swapping without a temp variable, etc). I often find my code very near identically matching gcc -O2 output on my first or second cleanup pass in most cases. Edited December 15, 2017 by jnmartin84 0 Share this post Link to post
jnmartin84 Posted December 16, 2017 (edited) To be fair to GCC, the MIPS architecture (the MIPS III ISA in the case of the Nintendo 64, with the RSP's vector instruction extensions not included) is so simple, most things in C map directly to small sequences of MIPS instructions, sometimes mapping one-to-one, and unlike the x86 as a particularly egregious example (the whole RISC vs CISC thing...), there aren't a half-dozen different ways to do something like copy a string (I'm not an Intel expert but I can think of two different opcodes that will do just that, copy a string with just a single instruction rather than a dozen-ish MIPS instructions with control flow transfer). Edited December 16, 2017 by jnmartin84 0 Share this post Link to post
Teivman Posted December 22, 2017 I have some issues here. I tired to make the rom and it would not work. 0 Share this post Link to post
Danfun64 Posted December 22, 2017 Can you be more specific than that? Are you saying the rom couldn't be made at all, or that it could be made but didn't work on your favorite emulator? 0 Share this post Link to post
Teivman Posted December 22, 2017 3 hours ago, Danfun64 said: Can you be more specific than that? Are you saying the rom couldn't be made at all, or that it could be made but didn't work on your favorite emulator? The rom cannot be made. 0 Share this post Link to post
jnmartin84 Posted January 2, 2018 Can you be any more specific? Error messages? I would like to help. 0 Share this post Link to post
jnmartin84 Posted January 2, 2018 I'm seeing reports on another forum that there might be line-ending issues with the shell script I packaged in the toolkit. 0 Share this post Link to post
fraggle Posted January 3, 2018 On 12/15/2017 at 7:40 AM, jnmartin84 said: I feel like MIPS is one of those processor architectures where there aren't very many opportunities for "exotic" optimizations, just clever register usage / reusage (kind of like those old C tricks like swapping without a temp variable, etc). Makes sense really - it's a RISC architecture (one of the original ones) and the central idea behind RISC is to have fewer, simpler instructions which can be better optimized in the CPU design. Most of the time there really should only be pretty much only one way to do things. 2 Share this post Link to post
jnmartin84 Posted January 10, 2018 On 1/3/2018 at 10:23 AM, fraggle said: Makes sense really - it's a RISC architecture (one of the original ones) and the central idea behind RISC is to have fewer, simpler instructions which can be better optimized in the CPU design. Most of the time there really should only be pretty much only one way to do things. The antithesis of things like the VAX which could compute a polynomial with a single CPU instruction. :-D 0 Share this post Link to post
VGA Posted October 15, 2018 This should be moved to the Console section. By the way, in the videos I've seen this runs smoothly, wish I had an N64... 0 Share this post Link to post
amackert Posted October 16, 2018 Thanks for the bump, I had no idea about this. I have a N64 flash cart so I'd definitely like to try to get this running. The videos from years back make it look pretty great, all things considered. 0 Share this post Link to post
jnmartin84 Posted February 26, 2020 Just started working on this again a week or two ago. Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April. 10 Share this post Link to post
Redneckerz Posted February 26, 2020 18 minutes ago, jnmartin84 said: Just started working on this again a week or two ago. Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April. I have loved this ever since it was announced. It running at near the Expansion Pak resolution is mighty impressive. Are there ideas to visually enhance the engine with new effects or? 0 Share this post Link to post
taufan99 Posted February 27, 2020 7 hours ago, jnmartin84 said: Just started working on this again a week or two ago. Have it running at 640x400 at a playable frame rate with sound and music. Unstable / buggy though. Going to try to do another release by April. Cool! I'll update my thread to reflect the development status of this backport now. 0 Share this post Link to post
jnmartin84 Posted February 27, 2020 The enhancements are it is pushing 4x as many pixels around, there's not much cpu time left to do anything else unless I take another stab at drawing columns with the RDP 1 Share this post Link to post
Redneckerz Posted February 27, 2020 1 hour ago, jnmartin84 said: The enhancements are it is pushing 4x as many pixels around, there's not much cpu time left to do anything else unless I take another stab at drawing columns with the RDP I can imagine running near native N64 Expansion resolution and twice the resolution of Vanilla Doom. Of the cross sourced ports, 64Doom (And 64Doom2) is easily one of the more technically impressive ones since you did a lot of under the hood work to make it sing with the N64 hardware :) Impressive, once more! 0 Share this post Link to post
jnmartin84 Posted February 28, 2020 Calling it "Expansion resolution" or "Expansion pak resolution" is misleading. You can have two 640x480 16-bit color framebuffers on an un-expanded N64 and still have 3.4MB of memory left for code and data. 0 Share this post Link to post
Redneckerz Posted February 28, 2020 1 hour ago, jnmartin84 said: Calling it "Expansion resolution" or "Expansion pak resolution" is misleading. Its a generic descriptor, relax. I only say this because Expansion pak res is 640x480 and you aim for 640x400, so in a general fashion, that's ''Near N64 Expansion resolution'' given most Expansion Pak enabled titles raised the resolution to 640x480. Hence the inclusion of ''Near''. There isn't some deeper meaning behind it. 1 Share this post Link to post
jnmartin84 Posted February 29, 2020 (edited) 11 hours ago, Redneckerz said: Its a generic descriptor, relax. I only say this because Expansion pak res is 640x480 and you aim for 640x400, so in a general fashion, that's ''Near N64 Expansion resolution'' given most Expansion Pak enabled titles raised the resolution to 640x480. Hence the inclusion of ''Near''. There isn't some deeper meaning behind it. And I'm trying to say that 640x480 high res on N64 and the expansion pak are two entirely unrelated things. I'm not going to go in circles over it but they have absolutely nothing to do with each other. Zelda MM, DK64, both expansion pak titles that run at 320x240. NFL QB Club ran at 640x480 on an unexpanded system. That's the point I was making. Also, "aiming for 640x400" is ignoring the built-in aspect ratio assumption of Vanilla Doom. Running the N64 at 640x480 and rendering the game display in the middle of that. 0 Share this post Link to post
Redneckerz Posted February 29, 2020 2 hours ago, jnmartin84 said: And I'm trying to say that 640x480 high res on N64 and the expansion pak are two entirely unrelated things. I'm not going to go in circles over it but they have absolutely nothing to do with each other. Zelda MM, DK64, both expansion pak titles that run at 320x240. NFL QB Club ran at 640x480 on an unexpanded system. That's the point I was making. What you are going for is an exact definition, including mentioning that there are titles that run in 640x480. This is all very trye. That was not what my original post is about. It is about a generic definition, not an exact one. 2 hours ago, jnmartin84 said: Also, "aiming for 640x400" is ignoring the built-in aspect ratio assumption of Vanilla Doom. Oh, come on, this is nitpicking if i ever saw one. You do not have to stare so intently on the word "aim" to have this response. 2 hours ago, jnmartin84 said: 0 Share this post Link to post
jnmartin84 Posted March 24, 2020 Your generic definition is wrong. Anyway, new release is up on the 64doom github github.com/jnmartin84/64doom 0 Share this post Link to post
Redneckerz Posted March 24, 2020 (edited) 1 hour ago, jnmartin84 said: Your generic definition is wrong. There is no inherent point in saying this. Besides the post you are responding to is close over a month old. Quote Anyway, new release is up on the 64doom github github.com/jnmartin84/64doom Congratulations on the new release. Here is it linked: 64Doom. Oh, there is no new release under the Releases tab as that still points to the 2017 version labeled 1.0. Is the new release still uploading? Changes (For those who are interested): New version of 64Doom that now runs in 640x400 in high detail mode (system resolution 640x480). Low detail is 320x400. High detail is kinda slow, low detail is fast. Sound and music rendering and mixing is now entirely fixed point. Optimizations in many places. Updated builder. Pulled in toolkit line ending issue fix so hopefully no more complaints about that. 0 Share this post Link to post
Immorpher Posted March 26, 2020 @jnmartin84 I appreciate all your work and all of your testing and experiments! I enjoy playing your port on my N64. :) 1 Share this post Link to post