Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
elic

Prboom+ Signal 11: The Story Continues

Recommended Posts

Back in 2011 I was experiencing problems using prboom+: the engine would randomly crash due to signal 11. I reported the bug here, but before I could post any useful information, the bug stopped appearing. I don't remember what I did to stop it (or if it just stopped on its own) but recently it reappeared. I switched to the newer prboom+ 2.5.1.4 and the crashes still occur.

My OS is Windows 7.

stdout.txt:

M_LoadDefaults: Load system defaults.
 default file: C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.cfg
 found C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad

PrBoom-Plus v2.5.1.4 (http://prboom-plus.sourceforge.net/)
 found C:\Users\radioshack\Desktop\wad\Doom.wad
IWAD found: C:\Users\radioshack\Desktop\wad\Doom.wad
PrBoom-Plus (built May 26 2013 23:39:20), playing: The Ultimate DOOM
PrBoom-Plus is released under the GNU General Public license v2.0.
You are welcome to redistribute it under certain conditions.
It comes with ABSOLUTELY NO WARRANTY. See the file COPYING for details.
V_Init: allocate screens.
V_InitMode: using 32 bit video mode
I_CalculateRes: trying to optimize screen pitch
 test case for pitch=5120 is processed 5742 times for 100 msec
 test case for pitch=5152 is processed 5740 times for 100 msec
 optimized screen pitch is 5120
I_InitScreenResolution: Using resolution 1280x768
 found C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad
 found C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad
 found C:\Users\radioshack\Desktop\wad\Doom.wad
D_InitNetGame: Checking for network game.
W_Init: Init WADfiles.
 adding C:\Users\radioshack\Desktop\wad\Doom.wad
 adding C:\Users\radioshack\Desktop\wad\prboom-plus-2.5.1.4.test/prboom-plus.wad
 adding C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad
W_InitCache

Loading DEH lump from C:\Users\radioshack\Desktop\wad\Megawads\DTWID.wad
Loading DEH file C:\Users\radioshack\Desktop\wad\Megawads\DTWID.deh
M_Init: Init miscellaneous info.
R_Init: Init DOOM refresh daemon - 
R_LoadTrigTables: Endianness...ok.
R_InitData: Textures Flats Sprites 
R_Init: R_InitPlanes R_InitLightTables R_InitSkyMap R_InitTranslationsTables R_InitPatches 
P_Init: Init Playloop state.
I_Init: Setting up machine state.
I_InitSound:  configured audio device with 1024 samples/slice
Fluidplayer: Fluidsynth version 1.1.3
fl_init: error loading soundfont SGM-V2.01.sf2
portmidiplayer device list:
  MMSystem:Microsoft MIDI Mapper
  MMSystem:Microsoft GS Wavetable Synth
  MMSystem:Timidity++ Driver
  MMSystem:BASSMIDI Driver
portmidiplayer: Opening device MMSystem:Microsoft MIDI Mapper for output
I_InitSound: sound module ready
S_Init: Setting up sound.
S_Init: default sfx volume 15
HU_Init: Setting up heads up display.
I_InitGraphics: 1280x768
I_UpdateVideoMode: 0xe0000000, SDL buffer, direct access
SetRatio: width/height parameters 1280x768
SetRatio: storage aspect ratio 5:3
SetRatio: assuming square pixels
SetRatio: display aspect ratio 5:3
SetRatio: overruled by user configuration setting
SetRatio: revised display aspect ratio 4:3
SetRatio: gl_ratio 1.600000
SetRatio: multiplier 1/1
ST_Init: Init status bar.
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
P_GetNodesVersion: using normal BSP nodes
P_GetNodesVersion: using normal BSP nodes
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
P_GetNodesVersion: using normal BSP nodes
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
P_GetNodesVersion: using normal BSP nodes
vorb_registersong: failed
mad_registersong failed: input buffer too small (or EOF)
db_registersong: couldn't load as tracker
Exp_RegisterSongEx: Using player portmidi midi player
P_GetNodesVersion: using normal BSP nodes
I_SignalHandler: Exiting on signal: signal 11
I_ShutdownSound: 

Share this post


Link to post

stdout.txt

I_CalculateRes: trying to optimize screen pitch
 test case for pitch=5120 is processed 5742 times for 100 msec
 test case for pitch=5152 is processed 5740 times for 100 msec
 optimized screen pitch is 5120
I_InitScreenResolution: Using resolution 1280x768

This is interesting...are you timing the memory cache system by increasing the horizontal resolution? If so, cool!

Share this post


Link to post
kb1 said:

This is interesting...are you timing the memory cache system by increasing the horizontal resolution? If so, cool!

Sometimes it makes sense:

Core2 (1x)
test case for pitch=1024 is processed 28294 times for 100 msec
test case for pitch=1056 is processed 28896 times for 100 msec

AMD 64 X2 4200 (5x)
test case for pitch=1024 is processed 1618 times for 100 msec
test case for pitch=1056 is processed 8539 times for 100 msec

Pentium4 (16x)
test case for pitch=1024 is processed 1130 times for 100 msec
test case for pitch=1056 is processed 18550 times for 100 msec


IIRC, old versions of prboom run faster at 1600x1200 than at 1024x768.

PrBoom has simple check

if (!(SCREENPITCH % 1024))
  SCREENPITCH += 32;
PrBoom+ uses test function, because not only 1024 is noticeable slower on some old hardware.

Share this post


Link to post
GreyGhost said:

It might be my old nemesis the SDL_mixer, see if disabling music makes a difference.

IIRC, SDL_mixer causes crashes even without SIGSEGV message. If he got "signal 11", then something is wrong with prboom-plus, and I need adress of crash (-devparm) and used exe+map

Share this post


Link to post

Important:
Is it a random crash at unpredictable times, that seems to only happen with PrBroom ?

Is it predictable where it crashes when starting PrBoom, always at the same place ?
- immediately restart in exactly the same way, and report if it fails in exactly the same way

Does it depend on which PWAD or IWAD is loaded, or which game is being run ?
- does the failure change when a different PWAD or game is selected

Test another comparable program with many ptrs ?
video game
Linux kernel compile
another unrelated Doom port

Run PrBoom under a debugger to get exactly which instruction is segfaulting.
Save segfault location for three failures.
If location is not consistent, then suspect memory failure.
So much data are memory ptrs that random memory errors will hit one sooner than you will notice an odd pixel or draw on the screen.


Signal 11: Segmentation fault

- random: overclocking, heat, low virtual memory,
memory problem (bit flip hitting a ptr address)

- PrBoom only, for all wads: blame PrBoom

- PrBoom only on certain wads: blame the wad, and maybe PrBoom is not checking adequately for corrupt or ZDoom wads.

Almost every cause of Sig11 (it is about Linux kernel compiles, but it also cover sig11 problems rather well)
http://www.bitwizard.nl/sig11/

Share this post


Link to post

it is random
it does not depend from iwad/pwad
it happens only in software mode, probably in R_DrawColumn
[/vanga mode off]

Share this post


Link to post

If you're still using that SDL_mixer postmix callback from PrBoom 2.x, then you're crashing because the channels[] array is not protected with a semaphore. Get some proper multithreading in there.

Share this post


Link to post

Thanks for all the responses. I'll try playing with -devparm and updating when the game crashes again. Also I might try cleaning my heat sinks.

Interestingly enough, while the crashes almost always occur inconsistently in random places, a while back there was one place where the problem kept occurring. During the first trek into the central courtyard of Coffee Break Map11, the game crashed several times while I was fighting the cyberdemon. After a while this stopped happening, and I can now play the map without the bug occurring.

Quasar said:

If you're still using that SDL_mixer postmix callback from PrBoom 2.x, then you're crashing because the channels[] array is not protected with a semaphore. Get some proper multithreading in there.

Are you directing this post at me? Honestly I have no idea at what any of this means.

Share this post


Link to post

Still would help to narrow the possibilities.
It is likely more than one segfault source exists in PrBoom and SDL mixer. Two people getting a segfault does not mean it is the same cause. Unless you are exchanging information privately, there has not been enough to exclude these possibilities.

I find the report that it went away and then came back again later to be most suspicious. Software faults do not react that way without some environment change (changing your config settings would do it though, or selecting different options).

Tests for the SDL mixer suspect.
1. Faults in the mixer should vary with different music and sound (long vrs short) because that affects contention for mixer resources.
The fault should vary with different wads.
2. Software draw mode should not affect software faults in the sound mixer.
3. Can be tested by turning off sound effects and music. Does the segfault stop ?
4. If the SDL mixer faults this much, it should also fault the same with other SDL mixer programs. Try DoomLegacy, it uses SDL mixer too.
(It might also segfault if there is a memory failure just due to similarity to PrBoom layout). Try other SDL games too.
5. A debugger can verify the segfault is in SDL mixer code.

Tests for drawer faults.
1. Usually caused by releasing a texture from memory while status bar drawing is still using it. Some textures have multiple uses.
In DoomLegacy, I had to resort to locking all status bar textures so they cannot be released.
2. Test by disabling in code all the releasing and purging of memory allocation. Mostly, this can be done by modifying Z_Free (or the equivalent). Does the segfault stop ?
3. Some other user gets the same fault on a different machine.
4. Run the program in a debugger and record the exact failure location for three failures. Software failures will be consistent in some way, like the address, or the instruction that fails.

Tests for memory failure faults.
1. Unfortunately, the existence of only one program, or even just software draw triggering the segfault does not prove anything. It is a matter of putting a memory pointer in the failing location while toggling the neighbor bits in a contrary way.
The particular program that fails due to a particular memory fault is not related to its size, nor is it predictable.
2. Memory test programs cannot find all kinds of memory failure.
I have more than once written a memory test program to try to find a fault. None of them ever found anything. In one case it was an operating system problem, and for the other I changed the memory chips.
3. If anyone can verify the same segfault on a different machine, then cannot be memory failure. Just having segfaults (like due to SDL mixer) by itself does not prove this segfault is not memory failure. It wastes time to try chasing down in software a fault that is hardware based. However, software modifications can move the fault location around and temporarily mask memory failure.
Same fault, different machines, is the best discriminant.
4. Make sure all your memory is the same brand. With mixed brands must adjust memory settings by hand (some BIOS do not do this well).
5. Swap the memory chip locations and retest. Does the segfault change character ?
6. Run with only one memory enabled at a time. Test for segfault.
7. Alter the BIOS memory settings, wait states, and retest.
8. Disable cache and retest.
9. Change all the memory out and retest. Most drastic.
One person reported that all his chips had same failure pattern, and that swapping positions did not affect failure. Swapping out all chips did cure it.
10. Start another execution intensive program first, leave it running, and retest PrBoom.
This moves the PrBoom execution location (but not so much in cache).
Does the segfault change in anyway ? If it is software, it should not be affected in any way. If it is memory failure, it should change. I might still segfault, but in a different location.

Share this post


Link to post
Processingcontrol said:

Are you directing this post at me? Honestly I have no idea at what any of this means.

Nope, that was toward entryway. He earlier rebuffed my advice to add concurrency protection to the channel structs in PrBoom-Plus's i_sound.c, and I have never checked since then to see if it was ever implemented.

The SDL_audio core's audiospec callback is registered as the "run" function for an SDL thread. This means that, with respect to the main application, it runs asynchronously. SDL_mixer in turn calls its postmix from the function it registers as the SDL audiospec callback.

If the audiospec callback preempts the main application thread during I_StartSound or I_StopSound, or the main application thread preempts the audiospec callback in order to change any of the data in the channels structure, then a race condition is absolutely inevitable.

Any amount of simply wishing this wasn't the case won't prevent it, and setting an affinity flag does not (and let me emphatically stress this as much as possible) prevent threads of the same application from preempting each other. Affinity only restricts those threads to all sharing the same CPU or core for scheduling.

wesleyjohnson said:

Tests for memory failure faults.


Speculation about memory faults is useless. I've used machines with memory faults. They BSOD and reset constantly. This is not a memory fault. Never blame the end user's hardware when something is obviously a software error. Lazy and shameful to even think about it until everything else has been absolutely ruled out.

Share this post


Link to post

I have run PrBoom 2.5 on XP and Linux, with SDL, and have not seen segfault problems.
PrBoom software faults should not be affected significantly enough by the difference in OS that my machines remain unaffected.

I keep memory faults open as a possibility because they are difficult to differentiate from software, and the reported fault is suspiciously like memory failure. It is usually the first to get eliminated. The tests only have to show any characteristic that excludes the memory fault possibility.

Memory faults that do not BSOD have been reported frequently. I have had machines with them, and they did not BSOD. Linux has proven capable of finding faulty memory where Windows would not notice.
It is easy to suspect that PrBoom could find faulty memory where Windows would not BSOD. It only requires that the failing bit be outside of the windows OS itself, and that is most of memory.

It is impossible to exclude all software possibilities first because no software is ever bug free. That time will never occur and no one could ever prove that it had arrived. To not check the memory possibility until software has been looked at leads to never checking the memory.

As far as I am concerned, you can do both in parallel.

I usually do one test from one possibility and one from another until something important happens.

Should make some attempt to exclude the memory failure possibility or confirm a connection. While a comprehensive memory test is impossible and there will be no absolute answer of memory perfection ... it is worth at least a half hours effort.

Share this post


Link to post

First time this happened was 2011 (what time of year was that).
Then it stopped.
Now it is happening again.
Does that correspond to summer, winter, and then summer again ???
Should give the HEAT possibility some more consideration (in addition to the above efforts).

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×