Mancubus
Register | User Profile | Member List | F.A.Q | Privacy Policy | New Blog | Search Forums | Forums Home
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Which is the fastest/most efficient Hexen port?
 
Author
All times are GMT. The time now is 15:20. Post New Thread    Post A Reply
printz
CRAZY DUMB ZEALOT


Posts: 8826
Registered: 06-06


I'm trying to play a NUTS.WAD setup converted to Hexen. Which is the fastest port for it? ZDoom? GZDoom? DOSBox probably isn't efficient due to being a VM. I have an NVidia video card so I can use GZDoom.

We can't have PrBoom+ here.

__________________
Automatic Wolfenstein - Version 1.0 - also on Android

Old Post 02-17-13 15:40 #
printz is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Tosi
Warming Up


Posts: 11
Registered: 02-13


Since you have a video card, GZdoom or PrBoom with the opengl renderer would probably be faster since the software renderer would just eat up CPU cycles. Any other port with a OpenGL/DirectX renderer would probably be fast too.

Old Post 02-17-13 16:38 #
Tosi is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03


Since this is about Nuts-style, other factors are also important.
ZDoom has been known for performance issues on extremely monster heavy maps though.

For Hexen this means, you'll probably have problems finding a suitable port because there just isn't anything basic.

You may also want to try Doomsday with all light effects switched off if GZDoom doesn't work.

Old Post 02-17-13 17:01 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2714
Registered: 01-04



Graf Zahl said:
You may also want to try Doomsday with all light effects switched off if GZDoom doesn't work.

I think Doomsday with any settings will be 100x slower than GZDoom. Last time I tried it on nuts.wad it was <0.1 fps

Last edited by entryway on 02-17-13 at 17:35

Old Post 02-17-13 17:14 #
entryway is online now Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03


Ok, good to know. I thought it may be better on the gameplay side but of course I forgot how much the renderer's performance sucks...

Old Post 02-17-13 17:18 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
Gez
Why don't I have a custom title by now?!


Posts: 11063
Registered: 07-07


Basically, your choice is between Chocolate Hexen, Doomsday, GZDoom, and ZDoom. Other Hexen ports are dead. (I'm not sure whether Vavoom is entirely dead or not, but it kinda looks dead.)

Doomsday is very slow and will remain so until the 2.0 is finalized and they start working on optimizing the code again.


Tosi said:
PrBoom with the opengl renderer would probably be faster

Hexen.

Old Post 02-17-13 17:20 #
Gez is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12396
Registered: 07-06


Exactly what makes some ports so slow when handling NUTS-like maps? I didn't make any particular effort or optimization for Mocha, and it runs nearly as fast as prBoom+ on such levels *wtf*

Old Post 02-19-13 17:44 #
Maes is online now Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
exp(x)


Posts: 2595
Registered: 04-04



Maes said:
Exactly what makes some ports so slow when handling NUTS-like maps? I didn't make any particular effort or optimization for Mocha, and it runs nearly as fast as prBoom+ on such levels *wtf*

I don't know anything about the Doom engine, but I imagine that advanced features add a bunch of conditionals to the code. Multiply that by thousands of monsters, and it can have a large impact.

Old Post 02-19-13 18:16 #
exp(x) is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
kb1
Member


Posts: 337
Registered: 11-06


I seem to remember Entryway fixing a bug in PrBoom+ in MBF code that caused way to many monster-to-monster interactions to happen, but I've never tracked it down. Maybe Entryway can enlighten us.

Other culprits are:
* Added conditionals as exp(x) mentioned.
* slow line-of-sight checks
* slow memory allocation (missiles)
* believe it or not, slow sound mixing
* sprite overdraw. On modern CPUs, this is a BIG problem, thrashing memory cache many times over per frame.

Old Post 02-19-13 23:04 #
kb1 is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Quasar
Moderator


Posts: 6006
Registered: 08-00



kb1 said:
I seem to remember Entryway fixing a bug in PrBoom+ in MBF code that caused way to many monster-to-monster interactions to happen, but I've never tracked it down. Maybe Entryway can enlighten us.

Other culprits are:
* Added conditionals as exp(x) mentioned.
* slow line-of-sight checks
* slow memory allocation (missiles)
* believe it or not, slow sound mixing
* sprite overdraw. On modern CPUs, this is a BIG problem, thrashing memory cache many times over per frame.


It was one line of code from MBF that had never been adapted into the PrBoom codebase which removes dead monster corpses from the th_friends or th_enemies thinkerclass lists. This kept the search time for enemy targets excessively high on maps with large amounts of monsters. It was also responsible for the only known demo desync for an MBF demo in PrBoom-Plus.

Old Post 02-20-13 04:45 #
Quasar is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12396
Registered: 07-06



kb1 said:
Other culprits are:
* Added conditionals as exp(x) mentioned.



That one is plausible, especially if they lead to complexities higher than O(n) (where n is the number of monsters). However if they "simply" lead to O(cn) increases (where c some constant << n), it's harder to swallow that the difference can span orders of magnitude.


kb1 said:
* slow line-of-sight checks



Same as above. I'd expect this to add a (fairly larger) constant overhead to each thinker's processing time, but not enough to justify a degeneration of an order of magnitude or larger. Then again, ZDoom completely eschews REJECT map calculations, so it replaces an O(1) or O(c) operation with one of much higher complexity (a BSP search...dunno, O(nlogn) where n= number of nodes in a map?)


kb1 said:
* slow memory allocation (missiles)



I think that's the only truly strong point of Mocha against all other ports: superior garbage collection and lazy deallocation to the extreme, as my tests have shown in NUTS.WAD: stuff doesn't get collected until memory runs really low (I had to force a low heap) or you force it with GC trickery. Compare this with having to pay for a free or dealloc operation for every object in a NUTS.WAD mad (the rate of spawning/death of projectiles in that map can easily run in the 1000s per second).


* believe it or not, slow sound mixing


More like taxing the sound channel allocation with thousands of requests that never get played back, but that depends on the channel management strategy used. Of course, the less spurious/brief request reach the actual mixing state, the better.


[i]* sprite overdraw. On modern CPUs, this is a BIG problem, thrashing memory cache many times over per frame. [/B]


Understandable, though, as I said, Mocha seems to handle that just as well as prBoom+. It really becomes a bottleneck when trying to parallelize, though.

Old Post 02-20-13 10:37 #
Maes is online now Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
tempun
Member


Posts: 597
Registered: 08-09


What's sizeof(mobj_t) in ZDoom? I think mobj_t or whatever it's called is INSANELY bloated in ZDoom. No wonder Nuts runs so poorly.

Old Post 02-20-13 19:52 #
tempun is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03


That got nothing to do with it.
The main problem is that the enemy logic is a lot more complex than the original one and there's probably some bug in it. These are hard to find though.

If cache misses would cause this kind of slowdown most software would run creepingly slow.

Old Post 02-20-13 20:16 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12396
Registered: 07-06



Graf Zahl said:
If cache misses would cause this kind of slowdown most software would run creepingly slow.


Fire up memtest. Diving the transfer rate of the level 1 or level 2 cache with the main memory's. That's the amount of slowdown you can expect from a complete cache miss ;-)

On modern CPUs, the ratio of L1 cache to main memory speed is about 10:1. It can be MUCH higher for older CPUs and memory technologies though.

Old Post 02-21-13 09:34 #
Maes is online now Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03



Maes said:


Fire up memtest. Diving the transfer rate of the level 1 or level 2 cache with the main memory's. That's the amount of slowdown you can expect from a complete cache miss ;-)

On modern CPUs, the ratio of L1 cache to main memory speed is about 10:1. It can be MUCH higher for older CPUs and memory technologies though.




Sure, but the question isn't how much the effect of a single cache miss is but the increase in cache misses by certain operations.

Code that has to check lots of separate data already has lots of cache misses by default so what happens here isn't to go from 0 to 10 misses but maybe from 100 to 110 which has a far less pronounced effect.

For example, I was once toying in GZDoom's renderer with precalculating and caching some render data. Ultimately it caused a 10% speed decrease due to caching behavior - so saying that code that runs reasonably fast suddenly slows down to a crawl just by adding more cache misses is dubious. Sure, it may get slower but not by such large factors.

Old Post 02-21-13 11:49 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12396
Registered: 07-06



Graf Zahl said:
Sure, but the question isn't how much the effect of a single cache miss is but the increase in cache misses by certain operations.


It depends a lot on the memory layout of the data structures used, cache line length, cache size, main memory-to-cache size ratio and cache associativity. -if someone REALLY has no life, they could work out exactly in what sequences/patterns data is accessed, and lay them out in memory beforehand in the most optimal way for a specific set of cache parameters -kinda like a Story of Mel on steroids.

Of course, almost none does such a thing, AFAIK, not even hyper-specific compilers. However some things in Doom are glaringly anti-cache e.g. the column-based rendering is a killer, when the screen buffer is row-first (and even if it wasn't, you'd have to pay for an expensive transpose operation at the end of each tic, unless you have column-first video hardware as well).

The only thing a "general" programmer can do is use some common sense e.g. don't try and perform a matrix-vector multiplication starting from the last row and column and going backwards, that just fucks up cache coherency, cache commonly calculated const values inside loops etc. and in general, the less you access the main memory and the less you go "against the grain", the better.

Old Post 02-21-13 12:17 #
Maes is online now Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03



Maes said:

(and even if it wasn't, you'd have to pay for an expensive transpose operation at the end of each tic, unless you have column-first video hardware as well).



You could let the graphics hardware do that.

But it wouldn't help. It's only walls and sprites that are drawn vertically, not flats. Of course flats will cause other types of cache misses because they aren't accessed sequentially.


It's all academic anyway. Yes, larger data structures will decrease cache performance to a degree - but I've yet to find an example where this decrease exceeds a few percentage point unless using deliberately constructed examples.

Old Post 02-21-13 12:32 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
printz
CRAZY DUMB ZEALOT


Posts: 8826
Registered: 06-06


In Hexen (and Heretic) there needs to be more performance than in Doom or Strife, because the player is more likely to play on fast mode. In such a case, every monster will keep shooting, resulting in an immensely larger amount of actors at a given time.

__________________
Automatic Wolfenstein - Version 1.0 - also on Android

Old Post 02-21-13 13:15 #
printz is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Memfis
Forum Spammer


Posts: 5517
Registered: 04-07



Graf Zahl said:
The main problem is that the enemy logic is a lot more complex than the original one

What does that mean exactly, are the ZDoom monsters smarter?

Old Post 02-21-13 13:52 #
Memfis is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
Graf Zahl
Why don't I have a custom title by now?!


Posts: 7712
Registered: 01-03


No, not smarter, but the logic contains quite a bit of code to support new ZDoom features, e.g. following a predefined path.

If you compare ZDoom's A_Look and A_Chase functions with the originals you'll see that ZDoom's versions are considerably larger.

Old Post 02-21-13 14:28 #
Graf Zahl is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
kb1
Member


Posts: 337
Registered: 11-06



Quasar said:
It was one line of code from MBF that had never been adapted into the PrBoom codebase which removes dead monster corpses from the th_friends or th_enemies thinkerclass lists. This kept the search time for enemy targets excessively high on maps with large amounts of monsters. It was also responsible for the only known demo desync for an MBF demo in PrBoom-Plus.
Very interesting, and good to know!


Maes said:
...Then again, ZDoom completely eschews REJECT map calculations, so it replaces an O(1) or O(c) operation with one of much higher complexity (a BSP search...dunno, O(nlogn) where n= number of nodes in a map?)

Nah, ZDoom actually does something very smart here - it still uses REJECT, but if REJECT fails to stop a sight-check, ZDoom uses a fixed version of the old Doom 1.2 sight-checking, that happened to make it's way over to Heretic (or Hexen??). This code is generally much faster than the final stuff in later Doom versions, but, it originally had an ugly bug that would have to be studied carefully to find (see ZDoom source) Randy found and fixed it, and has been using it ever since.



Maes said:
...I think that's the only truly strong point of Mocha against all other ports: superior garbage collection and lazy deallocation to the extreme, as my tests have shown in NUTS.WAD: stuff doesn't get collected until memory runs really low (I had to force a low heap) or you force it with GC trickery. Compare this with having to pay for a free or dealloc operation for every object in a NUTS.WAD mad (the rate of spawning/death of projectiles in that map can easily run in the 1000s per second)...
My port uses a freelist for mobj_ts, so I enjoy a half-dozen or so allocations per game! Hard to beat.


Maes said:
...However some things in Doom are glaringly anti-cache e.g. the column-based rendering is a killer, when the screen buffer is row-first (and even if it wasn't, you'd have to pay for an expensive transpose operation at the end of each tic, unless you have column-first video hardware as well)...
Oh, you're right, it's horrible. If you want a REALLY diabolical case, set your horizontal resolution to a multiple of cache size, like a power of 2...like 1024x768 - then you guarantee a cache write/flush cycle each pixel write! Adding extra bytes to each line of your frame buffer will make a huge difference in this case (1028 bytes vs. 1024).

Now, add sprite overdraw to that. Now you're invalidating cache that was written quite some time ago - now you're flushing MULTIPLE cache layers.

ZDoom was the first (I think) port to try to write 4 horizontal pixels at once before switching lines. It used a mind-boggling algorithm to attempt to align 4 separate vertical runs horizontally. It's way more complicated than the original renderer, but, amazingly, can double renderer performance in some cases.

Eternity followed with its quad-renderer, which is a similar idea, but very different approach.

Modern CPUs have write instructions that deliberately avoid the cache, but, unless you write the assembly code yourself, it's tricky (if even possible) to get compilers to use the instructions. Of course, there's no portable way to code it anyway. A shame.

Old Post 02-22-13 01:30 #
kb1 is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
All times are GMT. The time now is 15:20. Post New Thread    Post A Reply
 
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Which is the fastest/most efficient Hexen port?

Show Printable Version | Email this Page | Subscribe to this Thread

 

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are OFF
[IMG] code is ON
 

< Contact Us - Doomworld >

Powered by: vBulletin Version 2.2.5
Copyright ©2000, 2001, Jelsoft Enterprises Limited.