PrBoom-Plus running slow

Sorry if this has been addressed previously; I tried to do some searches but couldn't find anything.

I'm using PrBoom-Plus a lot these days, and although I really like it, I've noticed that it runs a great deal slower than ZDoom. I'm using the latest version btw. I have it configured to NOT be graphics intensive(no glBoom) and to emulate original DOOM graphics and behaviors, with "uncapped framerate" selected. However, no matter what wad I play using PrBoom-Plus, including any of the iwads, it is quite a bit slower and laggier than ZDoom. I seriously doubt that it's my system because I have a fast system for starters, and secondly ZDoom is never slow, even with very large wads. Is there anything I can do to speed up PrBoom-Plus, or might there be some odd option that I could disable to make it run quicker?

Share this post


Link to post

ZDoom has magical 999 FPS. It can be running a level with 50000 monsters all infighting and doesn't even blip. Nobody can scientifically explain it :P

Share this post


Link to post
Vordakk said:

I have it configured to NOT be graphics intensive(no glBoom)

Actually gl mode should be much faster on most systems.

But if you want software rendering, then perhaps you could post your cfg (or at least the business end of it), and it might be possible to offer more specific help. You might want to check if you have enhanced monster AI enabled, and things like that which slow things down rather needlessly. If you are using a vanilla or Boom complevel (recommended for maps that don't use MBF features), then these will be automatically disabled anyway.

Share this post


Link to post

I use -complevel 9 with 1680x1050 resolution. I mostly use it to test wads I'm making or to play old stuff like AV or Eternal DOOM.

Share this post


Link to post
Quasar said:

ZDoom has magical 999 FPS. It can be running a level with 50000 monsters all infighting and doesn't even blip. Nobody can scientifically explain it :P


Strange, PrBoom-Plus lags much less than zDoom for me when playing pWADs like Nuts.wad.

Share this post


Link to post

IIRC, ZDoom has a garbage collector, which somehow speeds things up significantly by removing redundant data in the RAM, or something along those lines, whereas other ports don't have anything like that. You'd have to ask either Graf, Randy, Gez, Blzut3, or any people like that who understand it better than I to explain it to you in more detail/accuracy.

Share this post


Link to post

Actor processing is more intensive in ZDoom since there are a bazillion features that ZDoom has to take into account that PrBoom+ doesn't.

However, there are other things that are much faster, for example running traces uses a quick blockmap algorithm instead of traversing the BSP tree. PrBoom+ cannot use that faster method because it'd desync demos. Try complevel 0 to force use of that method to see if you get some speedup. If so, that's probably it.

Share this post


Link to post

Wow, thanks! I had no idea there was so much behind-the-scenes disparity between the two source ports. This explains a lot.

Share this post


Link to post
Vordakk said:

I use -complevel 9 with 1680x1050 resolution. I mostly use it to test wads I'm making or to play old stuff like AV or Eternal DOOM.

I think zdoom is fastest port on high resolutions and not stress-test levels like nuts.wad, sunder.wad, etc. I've tested on av.wad map01 and dv.wad map03 at 1920x1080 and zdoom is ~15% faster than prboom-plus on both. At 640x480 prboom-plus is faster.

BTW, by some reasons zdoom does not show my native 1920x1080 resolution in resolutions list and I forced it manually through cfg.

Share this post


Link to post

If the speedup is in rendering (pure game logic has no reason to be affected by screen resolution changes), then it must be thanks to Randy's hyper-optimized assembly code.

Share this post


Link to post

SDL_Flip can be slow as hell, that's for sure. Nothing like seeing 80% of program execution time being spent in a library call.

The difference when running with the GL backend with EE is pretty amazing. When the ARB PBO extension is enabled, it's even possible to have asynchronous screen updates, so the call returns immediately and some of the work of pushing it down to the card and out to the screen at the next refresh happens on some system thread I don't have to be concerned with.

Share this post


Link to post
Ladna said:

I say we blame SDL.


seconded. SDL may seem ok on a normal machine, but on an older machine (or lower speed CPU with lower IPC) SDL is the biggest bottleneck ever.

Share this post


Link to post

Quasar said:
The difference when running with the GL backend with EE is pretty amazing. When the ARB PBO extension is enabled, it's even possible to have asynchronous screen updates, ...[/B]


Ah, you use a pixel buffer, interesting. I'm assuming that speeds things up quite a bit. Something I'd be interested in doing for other ports.

Share this post


Link to post

So the 'flip' stuff is where all the time gets lost. Before I measured the cycles of functions I always thought the upscaling of the 320x200 screen(as used in choco) was slow
and the speedup via SSE2 in even cases(x2, x4, x6, x8) was negligible.
But after some checking I discovered to my dismay that SDL has a serious handbrake somewhere in the api.

Share this post


Link to post
Quasar said:

SDL_Flip can be slow as hell, that's for sure. Nothing like seeing 80% of program execution time being spent in a library call

That is what SDL does for SDL_Flip()

HDC hdc, mdc;
int i;

hdc = GetDC(SDL_Window);
if ( screen_pal ) {
  SelectPalette(hdc, screen_pal, FALSE);
}
mdc = CreateCompatibleDC(hdc);
SelectObject(mdc, screen_bmp);
for ( i=0; i<numrects; ++i ) {
  BitBlt(hdc, rects[i].x, rects[i].y, rects[i].w, rects[i].h,
    mdc, rects[i].x, rects[i].y, SRCCOPY);
}
DeleteDC(mdc);
ReleaseDC(SDL_Window, hdc);
I have these values for software 1600x1200 without status bar on map01
       software GL backend 
8bit    140 fps     85 fps
32bit    54 fps     72 fps
Just tested zdoom on my home computer — 120 fps. On my work computer zdoom was faster at 1920x1080

Share this post


Link to post

I wonder if using a shader for palette conversion can make GL backend faster.

Share this post


Link to post
tempun said:

I wonder if using a shader for palette conversion can make GL backend faster.


I replaced correct filling of w*h*4 buffer for GL with memcpy(buffer, pixels, w*h) and there is no any fps improvement at 1600x1200 and 640x480. Even without shaders at all.

void UpdatePixels(unsigned char* dst)
{
  int x, y;

  unsigned int *pal = (unsigned int*)(vid_8ingl.colours +
    256 * vid_8ingl.palette * 4);

  if (V_GetMode() == VID_MODE8)
  {
#if 1
    memcpy(dst, (byte*)vid_8ingl.screen->pixels,
        vid_8ingl.screen->pitch * REAL_SCREENHEIGHT);
#else
    for (y = 0; y < REAL_SCREENHEIGHT; y++)
    {
      byte *px = (((byte*)vid_8ingl.screen->pixels) + y * vid_8ingl.screen->pitch);
      int *py = ((int*)dst) + y * vid_8ingl.width;
      for (x = 0; x < REAL_SCREENWIDTH; x++)
      {
        *(int*)py = pal[*(byte*)px];
        px += 1;
        py += 1;
      }
    }
#endif
  } else if (V_GetMode() == VID_MODE15 || V_GetMode() == VID_MODE16)

Share this post


Link to post

I wish there were a free software, professional-grade abstraction layer that does what SDL claims to do.

Video:
- OpenGL
- Linux: DGA/X11
- OS X: Quartz2D
- Windows: DirectDraw

Audio:
- PortAudio/PortMidi

Networking:
- TCP: Steal SDL_Net
- UDP: ENet

Threading:
- Steal SDL_Thread

Input (keyboard/mouse/joystick):
- Linux: XInput2
- OS X: Cocoa
- Windows: Message Loop/XInput

Filesystem:
- Simple wrappers

Stupid C/C++ API differences:
- strcasecmp vs. stricmp, etc.

It would be a fair amount of work but Jesus, don't you want it so badly? I swear to God if I have to :%s/stricmp/strcasecmp/g one more time I'll probably just explode.

Share this post


Link to post
Ladna said:

I swear to God if I have to :%s/stricmp/strcasecmp/g one more time I'll probably just explode.

I do agree, however for this particular problem I prefer a different solution ;)

#ifdef STUPID_PLATFORM
#define strcasecmp  _stricmp  
#define strncasecmp _strnicmp 
#endif

Share this post


Link to post

stricmp makes more sense than strcasecmp. The i stands for insensitive, while the case seems to imply it's the case-sensitive version and that the "normal" version strcmp is case-insensitive.

Share this post


Link to post
entryway said:

I replaced correct filling of w*h*4 buffer for GL with memcpy(buffer, pixels, w*h) and there is no any fps improvement at 1600x1200 and 640x480.

Replacing GL_BGRA with GL_LUMINANCE (mapped buffer is 4x smaller) increases FPS twice (85->160) and it becomes faster than clean software (140 fps)

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now