Cyberdemon
Register | User Profile | Member List | F.A.Q | Privacy Policy | New Blog | Search Forums | Forums Home
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Software vs Hardware timedemos
 
Author
All times are GMT. The time now is 19:14. Post New Thread    Post A Reply
DuckReconMajor
Forum Legend


Posts: 4226
Registered: 01-09


So the other day I decided to run some timedemos to see whether prboom-plus or glboom-plus was running faster on my ThinkPad T61P. I was surprised at the results.
code:
prboom-plus on doom.wad DEMO1: 200.2 glboom-plus on doom.wad DEMO1: 63.2 prboom-plus on nuts.wad: 62.1 glboom-plus on nuts.wad: 44.8
I'm guessing it's because I've got a mobile GPU, but I never realized the difference would be that much.

Post your hardware vs software timedemos here. It'd be interesting to see what runs faster on what.

P.S. I tried disabling all the filtering on glboom-plus and it made no difference in framerate. Anything else I should try turning off?

edit: system specs: Core2Duo T7700 @2.4 and NVIDIA Quadro FX 570M (256.0 MB)

Last edited by DuckReconMajor on 01-09-10 at 19:00

Old Post 01-09-10 18:13 #
DuckReconMajor is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04


glboom-plus.exe -geom 640x480w -nosound -timedemo demo1 -config testp.cfg
2636 fps

prboom-plus.exe -geom 640x480w -nosound -timedemo demo1 -config testp_soft.cfg
1172 fps

glboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp.cfg
164 fps

prboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp_soft.cfg
132 fps

demos and configs: http://prboom-plus.sourceforge.net/test_demos.zip
system specs: Core2Duo E6750 2.66 @ 3.0, NVidia 8800GTS (320mb), prboom-plus 2.5.0.6 release

p.s. configs are made for being comparable with prboom/glboom renderer

Last edited by entryway on 01-10-10 at 00:32

Old Post 01-09-10 18:40 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
DuckReconMajor
Forum Legend


Posts: 4226
Registered: 01-09


revised specs:

glboom-plus.exe -geom 640x480w -nosound -timedemo demo1 -config testp.cfg
540 fps

prboom-plus.exe -geom 640x480w -nosound -timedemo demo1 -config testp_soft.cfg
458 fps

glboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp.cfg
78 fps

prboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp_soft.cfg
90 fps

system specs: Core2Duo T7700 @2.4 and NVIDIA Quadro FX 570M (256.0 MB)

weird

Old Post 01-09-10 19:21 #
DuckReconMajor is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Never_Again
knows his birth month


Posts: 981
Registered: 04-03


Last edited by Never_Again on 06-13-10 at 17:12

Old Post 01-09-10 23:59 #
Never_Again is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04



Never_Again said:
glboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp.cfg
185.6 fps

system specs: Core2Duo 3.0, NVidia 9600GT (512mb), prboom-plus 2.5.0.7-test (today's build)


Hmm, 9600 should be slower than 8800 GTS I thought. 200 fps with 2.5.0.6 release? I have only 152 with beta and 164 with release.


Never_Again said:
Oh, and the DEMO1 is from DOOM.WAD, not DOOM2, right?

My results are for doom2.wad. 3150 (GL) and 1173 (soft) for demo1 @ doom.wad. After tweaking of cfg (no filtering, no aniso, no statusbar/hud, no messages) I got 3990 fps for demo1 @ doom.wad, heh

Last edited by entryway on 01-10-10 at 00:47

Old Post 01-10-10 00:21 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
HackNeyed
Member


Posts: 383
Registered: 08-04


My eeePC Netbook system specs: Intel Atom N270 @ 1.6GHz, Intel GMA 950 (224mb), prboom-plus 2.5.0.6 release in XP 32bit
code:
DOOM II demo1-GL 113 demo1-PR 130 nuts-GL 27 nuts-PR 27 DV04-GL 79 DV04-PR 72


My ACER Notebook system specs: AMD Athlon64 X2 TK-57 @ 1.9GHz, ATI X1250 (256mb), prboom-plus 2.5.0.6 release in Vista 32bit
code:
DOOM II demo1-GL 89 demo1-PR 194 nuts-GL 31 nuts-PR 45 DV04-GL 71 DV04-PR 118


My friends HP Desktop replacement Laptop system specs: AMD Turion II M600 @ 2.4GHz, ATI HD 4200 (320mb), prboom-plus 2.5.0.6 release in Win7 64bit
code:
DOOM II demo1-GL 383 demo1-PR 409 nuts-GL 66 nuts-PR 76 DV04-GL 185 DV04-PR 213


mmm, I want a fast Core i7 or at least an i5 with a sweet NVIDIA when I can afford it.

Old Post 01-10-10 02:53 #
HackNeyed is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Super Jamie
Forum Staple


Posts: 2722
Registered: 03-08


It's probably worth pointing out that you also need to be using the same IWAD version as each other, and I'm pretty sure demos are different Shareware to Registered.

The old Doombench website tests are made with Shareware v1.9 DEMO3, the author of that page wrote somewhere that be believed the other DEMOs produced inconsistent or artificially inflated benchmark figures.


entryway said:
3990 fps for demo1 @ doom.wad

Holy shit! :D

Old Post 01-11-10 02:53 #
Super Jamie is offline Profile || Blog || PM || Search || Add Buddy IP || Edit/Delete || Quote
Mike.Reiner
Senior Member


Posts: 1187
Registered: 01-05


AMD 64 X2 4200+ (939, Toledo), 2x1GB (Mushkin, Corsair) DDR 400 (Cas 3), eVGA GeForce 8800GT

glboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp.cfg
76.6 fps

prboom-plus.exe nuts.wad -geom 640x480w -nosound -timedemo nuts -config testp_soft.cfg
62.8 fps


I need to get off of socket 939.. uhg.

Old Post 01-11-10 03:18 #
Mike.Reiner is online now Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04



Mike.Reiner said:
AMD 64 X2 4200+ (939, Toledo)

Does AMD have similar problems with cache as Pentium4 has?

What do you have in stdout.txt for prboom-plus -geom 1024x768w ?

Core2:
test case for pitch=1024 is processed 28294 times for 100 msec
test case for pitch=1056 is processed 28896 times for 100 msec
optimized screen pitch is 1056

Pentium4:
test case for pitch=1024 is processed 1130 times for 100 msec
test case for pitch=1056 is processed 18550 times for 100 msec
optimized screen pitch is 1056

As you see, Pentium 4 is 16x slower at 1024 in comparison with 1056 and there is no difference for core2. I think it is a reason why PrBoom is ~4x faster on Core2 than on P4 with the same frequency.

Old Post 01-11-10 07:26 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Mike.Reiner
Senior Member


Posts: 1187
Registered: 01-05


test case for pitch=1024 is processed 1618 times for 100 msec
test case for pitch=1056 is processed 8539 times for 100 msec
optimized screen pitch is 1056

Old Post 01-11-10 09:45 #
Mike.Reiner is online now Profile || Blog || PM || Email || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04



Mike.Reiner said:
test case for pitch=1024 is processed 1618 times for 100 msec
test case for pitch=1056 is processed 8539 times for 100 msec


hence AMD 64 X2 4200 is also shit, although it is better than p4 - difference is only 5x instead of 16x. core2 for the win!

Last edited by entryway on 01-11-10 at 10:08

Old Post 01-11-10 09:51 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12666
Registered: 07-06


Probably there's a rendering limit in the software engine, which doesn't actually attempt to render all 4000+ fps, but is always in sync with the screen and thus will never draw more than 85 or 100 frames, picking only a fixed amount per second.

This puts it in advantage compared to the OpenGL engine, which actually tries fully rendering all frames and thus introduces an additional bottleneck. Just my opinion though.

Old Post 01-11-10 10:18 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04



Maes said:
Probably there's a rendering limit in the software engine, which doesn't actually attempt to render all 4000+ fps, but is always in sync with the screen and thus will never draw more than 85 or 100 frames, picking only a fixed amount per second.

vsync is not implemented for "windib" sdl videodriver. windib is default for non-9x platforms

Old Post 01-11-10 10:21 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12666
Registered: 07-06


Probably there's a rendering limit in the software engine, which doesn't actually attempt to render all 4000+ fps, but is always in sync with the screen and thus will never draw more than 85 or 100 frames, picking only a fixed amount per second.

Either that, or the rendering overhead is so trivial in some cases (like in Doom.wad timedemos, which have very simple levels and few monsters) that using OpenGL actually makes things worse in terms of overhead (more probable). In the case of nuts.wad the difference is diminished, but is still dramatic. Seems that OpenGL doesn't do much good when only billboard sprites are involved (perhaps trying something with ultra-complex architecture and BSP trees would make the scale tip?).

In any case, this proves that there are situations where a software renderer may have the edge over an accelerated one: if there's so little visual processing that the OpenGL overhead is not "paid back" by the acceleration.

Old Post 01-11-10 10:23 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Jodwin
Forum Staple


Posts: 3441
Registered: 02-05



Maes said:
In any case, this proves that there are situations where a software renderer may have the edge over an accelerated one: if there's so little visual processing that the OpenGL overhead is not "paid back" by the acceleration.

Or when your GPU sux. :P One proof that OpenGL is just better is that recording videos with glboom+ gets better frame rates than recording with prboom+, given a good enough gpu. If you don't believe it, you can try recording a Long Days demo with both exes using any screen recorder of your choice.

Old Post 01-11-10 10:28 #
Jodwin is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
entryway
Forum Staple


Posts: 2733
Registered: 01-04


GL has advantage over software on high resolutions. On my work computer (Dual Core E5200 @ 2.5, GeForce 9500 GT) I have only 1.2x decreasing in speed between 1600x1200 and 640x480 for GL and 2x for software. Most people now have LCD and they are obliged to use high resolutions

Old Post 01-11-10 10:42 #
entryway is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
Maes
I like big butts!


Posts: 12666
Registered: 07-06



Jodwin said:

Or when your GPU sux.



If the GPU is slower in rendering a sufficiently complex scene than you can do in pure software with the same quality (as was the case with the S3 "accelerators" back in 1996-1997) then yeah, you can safely say the GPU sucks.

But that's hardly the case today, if you use anything with 1 or more dedicated pixel and vertex pipelines (aka anything better than an S3 Savage and pre-4000 Intel GMA).

What doesn't change is that issuing an OpenGL or Direct3D directive has some software overhead, that's a hard and fast fact.

As with parallel processing, if the problem at hand is simple/small enough, a non-threaded non-parallel version may outperform the multithreaded one, because the overhead of using threads will not be paid back in extreme cases, unless you have a particularly optimized OS with VERY low latency for this sort of stuff.

This explains perfectly why Doom.wad timedemos often beat their GL counterparts: there's so little visual processing that just invoking anything OpenGL is slower than actually rendering the scene in software, before a single thing is even rendered in hardware.


entryway said:
GL has advantage over software on high resolutions. On my work computer (Dual Core E5200 @ 2.5, GeForce 9500 GT) I have only 1.2x decreasing in speed between 1600x1200 and 640x480 for GL and 2x for software. Most people now have LCD and they are obliged to use high resolutions


Exactly.

The overhead of calling OpenGL is pretty much constant whether you render nothing, a single pixel, or a Doom 3 scene, and largely independent of the screen resolution. Of course, it's better to have enough stuff for the GPU to do, rather than invoking it needlessly/for little reason. So more complex scenes and higher resolutions will have higher efficiencies over lower ones, which will instead waste most of their time in calling overheads.

This is more of an OS/driver issue: if the drivers and OS are particularly optimized for dealing with a bazillion of micro-OpenGL calls, then maaaaybe there will be little difference, and the OpenGL may even slowly recuperate (still, not impressively so). Early Direct3D suffered very badly from this, and needed large execution buffers to be assembled in order to make it actually worth using.

I guess that Linux and nVidia should perform better under these circumstances, while Windows and ATI should be at a disadvantage.


Jodwin said:
One proof that OpenGL is just better is that recording videos with glboom+ gets better frame rates than recording with prboom+, given a good enough gpu. If you don't believe it, you can try recording a Long Days demo with both exes using any screen recorder of your choice.


Could be, but the circumstances are entirely different: in this case OpenGL is NOT required to fire away needlessly a bazillion frames per second, and the better decoupling/parallelism between game engine and rendering allows for better game engine/input recording smoothing, which is harder to do for a single threaded software render-engine status update playloop cycle. Plus, you don't record demos at an accelerated pace ;-)

Last edited by Maes on 01-11-10 at 10:54

Old Post 01-11-10 10:45 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
myk
volveré y seré millones


Posts: 15226
Registered: 04-02



Super Jamie said:
The old Doombench website tests are made with Shareware v1.9 DEMO3, the author of that page wrote somewhere that be believed the other DEMOs produced inconsistent or artificially inflated benchmark figures.
The only difference, from the other demos in the shareware, must be that E1M7 is a bit more complex, so it may tax some aspects more than the other demos. This doesn't mean the effects of the others are artificial. All benchmarks depend on contexts because some systems are better at certain things.

I noticed -nosound lowers realtics a little bit, and I get slightly lower realtics under full DOS than in Windows 98.


Jodwin said:
Or when your GPU sux. :P
And when it sucks, there's an additional reason why OpenGL is not beneficial; visual glitches. In this sense, from personal experience, the only engine that has OpenGL that's good on older hardware is Risen3D... or my graphics card, at least.


entryway said:
Most people now have LCD and they are obliged to use high resolutions
They are?

Old Post 01-11-10 11:36 #
myk is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit/Delete || Quote
All times are GMT. The time now is 19:14. Post New Thread    Post A Reply
 
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Software vs Hardware timedemos

Show Printable Version | Email this Page | Subscribe to this Thread

 

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are OFF
[IMG] code is ON
 

< Contact Us - Doomworld >

Powered by: vBulletin Version 2.2.5
Copyright ©2000, 2001, Jelsoft Enterprises Limited.