Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Maes

Only on DW: Mochadoom Techdemo [now with some fixes]

Recommended Posts

It's very hard to debug in this way, but seeing how the problem occurs by running and stopping the program repeatedly makes me think of memory leaks in the JVM somewhere -and those can only occur in native code- as well as some hard to detect memory corruption.

BTW, I fixed the wiper effect in the 1.2 release of the techdemo, maybe you want to run that and see if it progresses any more -or at least crashes in a more indicative manner ;-)

Also, since I suspect you're the only person here with a single-core CPU, maybe the multithreaded renderer enters some sort of race condition -which perversely I can't debug on any of my dev machine. This would also explain why you get lockups rather than a clear Exception or JVM crash message -what you describe sounds more like a deadlock of sorts.

So I made a special version which uses the serial renderer for the "game", just to eliminate -or confirm that hypothesis.

Edit: I couldn't confirm that on a single core Pentium III Celeron 1200 I had laying around, since it ran the parallel version just fine. However the benchmark results were interesting:

Serial spinner: 35 fps
Parallel spinner: 34 fps.
Game demo: loaded just fine and played at near normal speed.

If you consider that the RAM was 384 MB of PC-100 SDRAM and the graphics card an AGP 2x S3 Virge (yes, AGP 2x!) I think that's pretty amazing for 960 x 600 resolution, especially a Java program.
Sadly that means that the crash you're experiencing might be related to something more subtle like bad memory or hardware conflicts :-/

Share this post


Link to post

It still locks in the same place. it starts the screen wipe, then locks and munches 50 CPU. Benchmarks work fine. as always.


*sigh* I don't know what else to do. I've already reinstalled Java and checked the version and all that.

Edit: going -Xint and warping to a map works. I can move around, but slowly.

Share this post


Link to post

Well, since if doesn't crash outright you could run java with the -Xprofile switch, start a new game and let it run -even in its hanged state- for one minute straight and post the whole console output including the thread loggers so I can see where most of the time is spent -on bad video cards, a lot of time is spent flushing the video buffers-.

Just remember that the -warp argument must go after mochadoom.bat, while -Xprofile (and any VM arguments) right after java

e.g.

java -Xprofile -cp mochadoom.jar i/Main > logfile.txt
will generate a nice logfile that can be uploaded to pastebin.

Also, sorry to ask you but have you tried focusing the window and moving around right after the wipe?

It would also be interesting to see the whole specs of your computer (unless you're using the same exact Gateway box as Creaphis), however as I said it ran successfully in a really dismal configuration.

Also, try -warping to a level without Xint.

EDIT:

Another thing to try, directly related to video is to run java with the -Dsun.java2d.noddraw=true parameter e.g.
java -Dsun.java2d.noddraw=true -cp mochadoom.jar i/Main
and see what happens. Strangely, it have me a noticeable framerate boost with an ATI 4250 integrated videocard, I expected quite the opposite to happen.

Share this post


Link to post

Seems there's some sort of deadlock in the wiper -something is apparently too magic on your PC and somehow it manages to start a single wall rendering thread without starting a floor thread, thus deadlocking on one of the barriers.

I have no idea why only you are getting this anomalous situation, and I can't really debug by proxy -plus you said you're getting the same lockup with the serial version, so somehow the "main" thread gets trapped in the wiper and never rejoins with the rest of the program, which would explain why it never gets a chance to fire up the floor rendering thread.

If you wish, we can arrange finding a common time via PM where I can send you specially modified debug versions and you can mail back observations, or hope that I "smarten up" enough to detect this vexing bug myself.

The only real difference between the spinners and the game is that the wiper never gets called. Similarly, starting a game with -warp should prevent the wiper from showing up and ruining shit -unless you end a game with F7 or die on nukage and restart a level.

Share this post


Link to post

I tried -warp e1m1 and got this:


http://pastebin.ca/2019856

I still see "Called!" in there...

EDIT: Figured it out. Setting the CPU affinity to one core before starting the wiper actually makes it behave as intended. Setting the CPU affinity back to normal doesn't crash the game. calling the wipe code again normally will lock the game.


what the hell?

So yeah, running the wipe with one core does the trick! can that be fixed somehow?

Share this post


Link to post

Since you got this crash even with the serial version -which doesn't explicitly use any threading tools-, the CPU affinity itself only masks the actual problem.

I will look into it and, if necessary, I'll introduce a forced timeout in the wiper which would cause it to end if it happens to lag behind. I'll try setting some timing-sensisive variables to "volatile" and see if that changes the behavior for the better.

The ideal would be for me to be able to debug step-by-step on your machine -deadlocking bugs are hard to reproduce, but we'll have to settle for the next best thing :-)

Edit: it might be possibly related to this glitch with Java's System.nanoTime() (also explained here, which is JVM and even OS/CPU specific: if even a single thread is allowed to "drift" from CPU to CPU it's possible to get negative elapsed times, since each core keeps its own time. On some OSes like linux this is accounted for, and I've never had this problem on any of the Intel CPUs (and on one quad-core Athlon II X640).

I take you have a socket 939 Athlon X2 installed?

I will try three "soft" fixes:

  1. Changing a timeout check to ticks-->0 from ticks--!=0 in the wiper code alone, with no other changes. (Variation 12a)
  2. As a), plus using System.currentTimeMillis() instead of System.nanoTime(). This has the side effect of limiting maximum timer resolution down to 15-16 ms on Windows, and thus maximum frame rate under any circumstance can only be about 67 FPS, but should not require particular processor affinity. Linux systems should be less affected, as that function has true millisecond resolution. (Variation 12b)
  3. As a), plus using a "sanitized" getTime method: if a discrepancy is detected such as attempting to return a number of elapsed tics smaller than the previously returned time, then that previous time is returned instead. (Variation 12c)
    Also, you will also get warning messages in the console saying: Timer discrepancies detected :" and a count of how many have occurred.
Please try all three of them without setting affinity manually, especially if you have an older Athlon 64 X2 multicore, and tell me which one works best for you (perversely, I have one at my hometown but can't test it right now -_- )

Edit 2:

Apparently AMD has released a Windows-only utility for AMD dual cores found on http://support.amd.com/us/Pages/dynamicDetails.aspx?ListID=c5cd2c08-1432-4756-aafa-4d9dc646342f&ItemID=153

"The AMD Dual-Core Optimizer helps to correct the resulting video performance effects or other incorrect timing effects that these applications may experience on dual-core processor systems, by periodically adjusting the core time-stamp-counters, so that they are synchronized."


Also, try installing the latest CPU driver as well (it doesn't come built-in with SP3, nor is it obtainable from auto updates).

In any case, I will decide the best course of action (e.g. leave everything as is and advise using external fixes, revert to millisecond timers or try to write a special AMD-only timing function that will try to account for two different timing bases).

Share this post


Link to post

Friggin' AMD!

Alright i'll test all of these.
edit: 12a works!
edit: 12c works!
edit: 12b works, but it is a little slow.

so all the soft fixes work. 12c seems to be the best since I do get the wipe.


EDIT!!!!: Installing that AMD utility makes the original work perfectly! THANKS!

Share this post


Link to post
Csonicgo said:

Friggin' AMD!

Alright i'll test all of these.
edit: 12a works!
edit: 12c works!


I'm more concerned that the 12a fix alone merely masks the negative elapsed time problem, is the 12c version spitting out discrepancy errors as you go?

Try invoking the wiper repeatedly by starting/ending games to see if you get discrepancies some or all of the time -the code is running on singletics, so GetTick() is only called by the wiper, really.

Csonicgo said:

so all the soft fixes work. 12c seems to be the best since I do get the wipe.


So that means that 12a alone ends it prematurely or somesuch? How does 12b behave? Do you get time discrepancy warnings with 12c at the console?

Csonicgo said:

EDIT!!!!: Installing that AMD utility makes the original work perfectly! THANKS!


Then that's the only right way to go. I had in mind another trick like e.g. making a special set of timers specifically for AMD, taking into account that the timebase can be one of two values.

However this little adventure has prompted me to make the ticker function customizable -I'll include the default plus some alternate implementations ;-)

Again, thank you a lot for testing this out for me :-)

Share this post


Link to post

12c didn't spit out much of anything but the errors were of a different "formatting". only thing I saw "new" was this:

RWI Buffer resized. Actual capacity 5760
yeah 12a ended the wipe prematurely. 12b was a little slow and jerked around a bit.


EDIT: after installing the AMD hotfix you linked, now everything works. they all wipe.


EDIT2: I quit and started a new game multiple times and the wipe stays smooth and consistent now with no timing issues. I feel that installing that thing actually may have fixed this nonsense. :v

EDIT3: Maes I want to hug you. Updating those CPU drivers and installing that fix made my YAMAHA Softsynth work again. Bromance, dude.

Share this post


Link to post

I'm surprised that 12b caused slowdown -maybe getTimeMillis is extremely slow for this task. Then again it has the advantage of being consistent. By "slow and jerky" you refer to the wipe or it actually caused the game
itself to be slow and jerky?

12a does not account for screwed up timebases -it just takes into account that a negative time can throw the tick counter below zero.

12c actually does check for errors vs the timebase, but that means it still has 1/2 chance to "stall" the timer until it gets sceduled on the "correct" core, and would cause random speedups/slowdown if used in a correctly timed game of Doom.

There are ways to improve this behavior, but since it's AMD's fault for screwing up -and even official games come with the AMD utility- it's not my job to try and fix them ;-)

The RWI message is from the rendering subsystem -it just means it allocated more resources for parallel column drawing dynamically.

Share this post


Link to post
Maes said:

I'm surprised that 12b caused slowdown -maybe getTimeMillis is extremely slow for this task. Then again it has the advantage of being consistent. By "slow and jerky" you refer to the wipe or it actually caused the game
itself to be slow and jerky?

the wipe was. it was like a cracked record, honestly, it would smoothly wipe, then skip, then smoothly wipe. doing it now it is smooth... Daaaaaaaaaamn.

Edit: I would link to that fix in your documentation, Maes, it is a lifesaver for your program and pretty much every program that bypasses the OS for timing.

Share this post


Link to post

Yeah, probably it was AMD again screwing with the system timer itself -nanoTime() gets read from the core's internal timer and is the most vulnerable, currentTimeMillis() is reported directly from Windows, and even THAT is struggling to keep consistent ;-)

Share this post


Link to post
Csonicgo said:

Hey Maes in the little demos you made me, some of them had some extra features you were working on! :P


I can only think of the weapon bobbing as a major change, plus there's a delay at the start while the translucency color map is being computed -dunno if I left translucency in the demos though ;-)

I will release a new demo version as soon as I create a customized timer system, and perhaps getting some linedef actions to work. Also, complete scaling up for menus and fixing the status bar and messages.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×