Vordakk Posted July 26, 2012 Sorry if this has been addressed previously; I tried to do some searches but couldn't find anything. I'm using PrBoom-Plus a lot these days, and although I really like it, I've noticed that it runs a great deal slower than ZDoom. I'm using the latest version btw. I have it configured to NOT be graphics intensive(no glBoom) and to emulate original DOOM graphics and behaviors, with "uncapped framerate" selected. However, no matter what wad I play using PrBoom-Plus, including any of the iwads, it is quite a bit slower and laggier than ZDoom. I seriously doubt that it's my system because I have a fast system for starters, and secondly ZDoom is never slow, even with very large wads. Is there anything I can do to speed up PrBoom-Plus, or might there be some odd option that I could disable to make it run quicker? 0 Share this post Link to post
Quasar Posted July 26, 2012 ZDoom has magical 999 FPS. It can be running a level with 50000 monsters all infighting and doesn't even blip. Nobody can scientifically explain it :P 0 Share this post Link to post
Grazza Posted July 26, 2012 Vordakk said:I have it configured to NOT be graphics intensive(no glBoom)Actually gl mode should be much faster on most systems. But if you want software rendering, then perhaps you could post your cfg (or at least the business end of it), and it might be possible to offer more specific help. You might want to check if you have enhanced monster AI enabled, and things like that which slow things down rather needlessly. If you are using a vanilla or Boom complevel (recommended for maps that don't use MBF features), then these will be automatically disabled anyway. 0 Share this post Link to post
Vordakk Posted July 26, 2012 I use -complevel 9 with 1680x1050 resolution. I mostly use it to test wads I'm making or to play old stuff like AV or Eternal DOOM. 0 Share this post Link to post
Archy Posted July 26, 2012 Quasar said:ZDoom has magical 999 FPS. It can be running a level with 50000 monsters all infighting and doesn't even blip. Nobody can scientifically explain it :P Strange, PrBoom-Plus lags much less than zDoom for me when playing pWADs like Nuts.wad. 0 Share this post Link to post
Blastfrog Posted July 26, 2012 IIRC, ZDoom has a garbage collector, which somehow speeds things up significantly by removing redundant data in the RAM, or something along those lines, whereas other ports don't have anything like that. You'd have to ask either Graf, Randy, Gez, Blzut3, or any people like that who understand it better than I to explain it to you in more detail/accuracy. 0 Share this post Link to post
Gez Posted July 26, 2012 Actor processing is more intensive in ZDoom since there are a bazillion features that ZDoom has to take into account that PrBoom+ doesn't. However, there are other things that are much faster, for example running traces uses a quick blockmap algorithm instead of traversing the BSP tree. PrBoom+ cannot use that faster method because it'd desync demos. Try complevel 0 to force use of that method to see if you get some speedup. If so, that's probably it. 0 Share this post Link to post
Vordakk Posted July 26, 2012 Wow, thanks! I had no idea there was so much behind-the-scenes disparity between the two source ports. This explains a lot. 0 Share this post Link to post
entryway Posted July 27, 2012 Vordakk said:I use -complevel 9 with 1680x1050 resolution. I mostly use it to test wads I'm making or to play old stuff like AV or Eternal DOOM. I think zdoom is fastest port on high resolutions and not stress-test levels like nuts.wad, sunder.wad, etc. I've tested on av.wad map01 and dv.wad map03 at 1920x1080 and zdoom is ~15% faster than prboom-plus on both. At 640x480 prboom-plus is faster. BTW, by some reasons zdoom does not show my native 1920x1080 resolution in resolutions list and I forced it manually through cfg. 0 Share this post Link to post
Gez Posted July 27, 2012 If the speedup is in rendering (pure game logic has no reason to be affected by screen resolution changes), then it must be thanks to Randy's hyper-optimized assembly code. 0 Share this post Link to post
Quasar Posted July 31, 2012 SDL_Flip can be slow as hell, that's for sure. Nothing like seeing 80% of program execution time being spent in a library call. The difference when running with the GL backend with EE is pretty amazing. When the ARB PBO extension is enabled, it's even possible to have asynchronous screen updates, so the call returns immediately and some of the work of pushing it down to the card and out to the screen at the next refresh happens on some system thread I don't have to be concerned with. 0 Share this post Link to post
Csonicgo Posted August 1, 2012 Ladna said:I say we blame SDL. seconded. SDL may seem ok on a normal machine, but on an older machine (or lower speed CPU with lower IPC) SDL is the biggest bottleneck ever. 0 Share this post Link to post
Coraline Posted August 1, 2012 Quasar said: The difference when running with the GL backend with EE is pretty amazing. When the ARB PBO extension is enabled, it's even possible to have asynchronous screen updates, ...[/B] Ah, you use a pixel buffer, interesting. I'm assuming that speeds things up quite a bit. Something I'd be interested in doing for other ports. 0 Share this post Link to post
_bruce_ Posted August 1, 2012 So the 'flip' stuff is where all the time gets lost. Before I measured the cycles of functions I always thought the upscaling of the 320x200 screen(as used in choco) was slow and the speedup via SSE2 in even cases(x2, x4, x6, x8) was negligible. But after some checking I discovered to my dismay that SDL has a serious handbrake somewhere in the api. 0 Share this post Link to post
entryway Posted August 2, 2012 Quasar said:SDL_Flip can be slow as hell, that's for sure. Nothing like seeing 80% of program execution time being spent in a library call That is what SDL does for SDL_Flip() HDC hdc, mdc; int i; hdc = GetDC(SDL_Window); if ( screen_pal ) { SelectPalette(hdc, screen_pal, FALSE); } mdc = CreateCompatibleDC(hdc); SelectObject(mdc, screen_bmp); for ( i=0; i<numrects; ++i ) { BitBlt(hdc, rects[i].x, rects[i].y, rects[i].w, rects[i].h, mdc, rects[i].x, rects[i].y, SRCCOPY); } DeleteDC(mdc); ReleaseDC(SDL_Window, hdc); I have these values for software 1600x1200 without status bar on map01 software GL backend 8bit 140 fps 85 fps 32bit 54 fps 72 fpsJust tested zdoom on my home computer — 120 fps. On my work computer zdoom was faster at 1920x1080 0 Share this post Link to post
tempun Posted August 4, 2012 I wonder if using a shader for palette conversion can make GL backend faster. 0 Share this post Link to post
entryway Posted August 4, 2012 tempun said:I wonder if using a shader for palette conversion can make GL backend faster. I replaced correct filling of w*h*4 buffer for GL with memcpy(buffer, pixels, w*h) and there is no any fps improvement at 1600x1200 and 640x480. Even without shaders at all. void UpdatePixels(unsigned char* dst) { int x, y; unsigned int *pal = (unsigned int*)(vid_8ingl.colours + 256 * vid_8ingl.palette * 4); if (V_GetMode() == VID_MODE8) { #if 1 memcpy(dst, (byte*)vid_8ingl.screen->pixels, vid_8ingl.screen->pitch * REAL_SCREENHEIGHT); #else for (y = 0; y < REAL_SCREENHEIGHT; y++) { byte *px = (((byte*)vid_8ingl.screen->pixels) + y * vid_8ingl.screen->pitch); int *py = ((int*)dst) + y * vid_8ingl.width; for (x = 0; x < REAL_SCREENWIDTH; x++) { *(int*)py = pal[*(byte*)px]; px += 1; py += 1; } } #endif } else if (V_GetMode() == VID_MODE15 || V_GetMode() == VID_MODE16) 0 Share this post Link to post
Ladna Posted August 4, 2012 I wish there were a free software, professional-grade abstraction layer that does what SDL claims to do. Video: - OpenGL - Linux: DGA/X11 - OS X: Quartz2D - Windows: DirectDraw Audio: - PortAudio/PortMidi Networking: - TCP: Steal SDL_Net - UDP: ENet Threading: - Steal SDL_Thread Input (keyboard/mouse/joystick): - Linux: XInput2 - OS X: Cocoa - Windows: Message Loop/XInput Filesystem: - Simple wrappers Stupid C/C++ API differences: - strcasecmp vs. stricmp, etc. It would be a fair amount of work but Jesus, don't you want it so badly? I swear to God if I have to :%s/stricmp/strcasecmp/g one more time I'll probably just explode. 0 Share this post Link to post
Quasar Posted August 5, 2012 Ladna said:I swear to God if I have to :%s/stricmp/strcasecmp/g one more time I'll probably just explode. I do agree, however for this particular problem I prefer a different solution ;)#ifdef STUPID_PLATFORM #define strcasecmp _stricmp #define strncasecmp _strnicmp #endif 0 Share this post Link to post
Gez Posted August 5, 2012 stricmp makes more sense than strcasecmp. The i stands for insensitive, while the case seems to imply it's the case-sensitive version and that the "normal" version strcmp is case-insensitive. 0 Share this post Link to post
entryway Posted August 5, 2012 entryway said:I replaced correct filling of w*h*4 buffer for GL with memcpy(buffer, pixels, w*h) and there is no any fps improvement at 1600x1200 and 640x480. Replacing GL_BGRA with GL_LUMINANCE (mapped buffer is 4x smaller) increases FPS twice (85->160) and it becomes faster than clean software (140 fps) 0 Share this post Link to post