Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Maes

On the timing precision of source ports

Recommended Posts

The subject was raised in another thread, but I thought it merits its own discussion here.

The seminal post was Gez's:

Gez said:

A tic lasts 1/35th of a second, right? So how many milliseconds does that make? 28.571428571428571428571428571429... milliseconds. Problem is, not many OSes/libraries let you use floating point values for milliseconds in waiting and synchronizing functions, so a tic ends up lasting exactly 28 milliseconds. So 35 tics end up lasting only 0.98 seconds.


I also heard the "28 ms" figure being quoted for ZDoom and derivatives, but from my experience with Windows programming, it's a well-known fact that many basic timing functions only allow RTC accuracy (aka 64 Hz, or 15.625 ms), so even the "millisecond" timers can only actually measure times with a general formula of

t = 15m + 16n
where m,n are two zero or positive integer numbers. Anyone who has experimented with Windows process timing knows what I'm talking about : all times measured with a "stopwatch" type of code seem to consist of a sum of integer multiples of 15 or 16 ms, and may even be rounded down to zero if too short. In any case, the room for error is always at least +/- 8 ms.

This has implications for Doom source ports, as it's not possible to measure exactly neither 28 nor 29 ms on Windows (unless one uses the more accurate nanosecond-grade timers, which however are not present in all APIs) , but the closest one can get with the above formula would be 31 or 32 ms. This means that a second in a port using the "millisecond accuracy" Windows timer can have anything from 31.25 to 32.25 tics, 3 to 4 tics short of the intended value of 35. That's a 9-12% slower pace, more than enough to affect gameplay and player performance, by giving an unfair advantage (that, alone, would be a good reason not to accept non-vanilla demos, even if everything else was absolutely equal).

Vanilla Doom used the 8253 PIT to derive a reasonably accurate timing (thanks, Quasar!), which would work out to 35.094 tics/sec. It could be made even more accurate, but let's say that this is the "canon", and it works out to almost exactly 28.5 ms.

On OSes and libraries that actually let you set a millisecond accuracy, having either a fixed 28 or 29 ms interval as a substitute would be a poor choice , as effective ticrate would be 35.7 and 34.5, accordingly. Alternating between 28 and 29 for each timing interval would give an average of 28.5, getting closer to vanilla, while, in theory, libraries and OSes with better-than-ms accuracy timers should have no problems.

It would be interesting to compile a table to see which timing functions are used by which source ports on which OSes (some libraries, e.g. SDL, may map to different precision timers depending on the OS).

Share this post


Link to post
Maes said:

(that, alone, would be a good reason not to accept non-vanilla demos, even if everything else was absolutely equal)

Or to make a 8253 category, considering otherwise you'd have to dump a lot of demos and speedrunners simply due to not using outdated hardware.

I don't think it's very wise to limit speedrunning to only those who have disposable income that greatly eclipses the actual game's cost. Even if ~10% more reaction time isn't insignificant, it still seems like a silly move considering it may not just be Windows that has this issue.

Share this post


Link to post

Well, COMPET-N still only accepts demos recorded with doom.exe and doom2.exe regardless of platform, so one has to already jump through some hoops to get them working on modern hardware. That includes emulators, VMs, and, surprise-surprise, booting to actual DOS, at least for as long PCs will be "100% IBM PC compatibles" ;-)

I doubt many people boot to actual DOS nowadays however, so VMs and emulators will necessarily receive the lion's share of the attention, so it all boils down to how accurate 8253 PIT emulation is under each emulator, or if it's done at all.

I think that DOSBox goes to great lengths to be as accurate as possible in emulating the 8253 and other subsystems (to the point of pixel-accurate timing), while others like OracleVM, QEMU etc. might not be that accurate (and in fact, many VMs are inadequate for real-time gaming, especially if a single-tasking OS or exclusive runtime with strict timing assumptions is used).

For non-vanilla ports, I pointed to some possible solutions if a timing problem is indeed identified: switching to higher-precision timers when possible (though that may affect OS/version support), or use averaging techniques to mitigate systematic deviations from the "ideal" timing.

As to whether a 10% slowdown could be considered "cheating" or not, remember that the human reaction time is considered to be close to 0.1 sec, so in a demo which is intended to showcase real-time skill, I'd say that's a quite non-negligible factor. Though calling it "cheating" would not be exactly accurate: it's more similar to how certain olympic records were attributed to high altitude. Yet none of them were invalidated, and many were broken afterwards, so it's just something we have to live with ;-)

Share this post


Link to post

I don't think any self-respecting Windows program that needs precise timing is still using GetTickCount which may be the only function still being affected by RTC accuracy.

The multimedia timer is millisecond precise and that's what most software uses.
The only exception may be old NT versions (pre-XP)

Share this post


Link to post

The error doesn't have to be accumulative - you can correct for it in the next frame. If you're really cool, you can use the sound mix buffer pointer for timing, and enjoy 1/44,100th timing precision!

Share this post


Link to post
Graf Zahl said:

I don't think any self-respecting Windows program that needs precise timing is still using GetTickCount which may be the only function still being affected by RTC accuracy.


Even the so-called "Multimedia Timers" default to the "classic" 15-16 ms, unless explicitly changed. So just a "MM Timer" doesn't guarantee that you get the "full" millisecond resolution (which would, by itself, still be inadequate for Dooming).

kb1 said:

The error doesn't have to be accumulative - you can correct for it in the next frame.


Yup, if you can get actual millisecond accuracy you could use an alternation of 28 and 29 ms to get an average of 28.5 ms in the long run. If you can't get better than Windows's "traditional" 15-16 ms however, there's nothing you can do, short of allowing a VERY short 15 or 16 ms tic once in a while to compensate for all the other, overlong tics before it.

kb1 said:

If you're really cool, you can use the sound mix buffer pointer for timing, and enjoy 1/44,100th timing precision!


Exactly how you can do that on Windows or Linux? Even on oldschool DOS, with direct Sound Blaster programming, the "DSP" generated interrupts only when DMA transfers began and ended, not during each ADC/DAC read/write cycle, and then the "timing" you'd get could be quite coarse, depending on how large the DMA buffer/packet size was.

Share this post


Link to post
Maes said:

Even the so-called "Multimedia Timers" default to the "classic" 15-16 ms, unless explicitly changed. So just a "MM Timer" doesn't guarantee that you get the "full" millisecond resolution (which would, by itself, still be inadequate for Dooming).



That info is completely out of date. I have been using the MM timer for many, many years on XP, Vista, 7 and 8.1 and not even once did it show such a low timing resolution. As I said, that may have been true on older NT-based Windows systems predating XP.


Here's what I get when running this code:

	for (int i = 0; i < 50; i++)
	{
		OutputDebugString(FStringf("%u\n", timeGetTime()));
		Sleep(0);
	}
10850627
10850627
10850628
10850628
10850628
10850628
10850629
10850629
10850629
10850629
10850629
10850629
10850629
10850629
10850629
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850630
10850631
10850631
10850631
10850631
10850631
10850631
10850631
10850631
10850631
10850631
10850631
10850632
10850632
10850632
10850632
10850632
10850632
10850633
10850633
10850633
10850633

It's clearly in 1ms resolution, and it has been like that for more than 10 years in my programs.

Share this post


Link to post

Indeed, timeGetTime exists since Windows 2000 professional.

Sadly, many programming languages, when implemented for the Win32 environment, still have their own timing functions piggybacking on GetTickCount. I can cite Java's System.currentTimeMillis() and Fortran's (both in GCC and Intel's implemenations) various DATE_AND_TIME, SYSTEM_CLOCK etc. subroutines. It's ofc possible to use System.nanotime() in Java or call the more accurate Win32 API timing functions directly in Fortran, but try explaining the latter to a lab class full of freshmen ;-)

My point is that this Windows' limitation is still around, ready to rear its ugly head every now and then (especially as a novice developer's trap).

Also, any Win32 source ports made pre-XP (or compiled with settings that allow compatibility with Windows 9x) might still be affected by it, even if they are running under a more capable environment. It all depends on how the availability (or lack thereof) of the more accurate timers is handled at the compiler and/or runtime level.

Share this post


Link to post

QueryPerformanceCounter and QueryPerformanceFrequency is used to work out epochs on Microsoft platforms in every engine I've worked with professionally. It requires a bit of extra math from your part to get the time value you want, as it's literally a performance counter and the processor's speed respectively. The best granularity I've seen on it when converting to a double-precision float is around 525 cycles (but that's also purely because I don't ever use the raw values returned).

You can get close to the correct 35Hz granularity using that and a combination of Sleep. Kinda. Sleep is a tricky one. It does not mean "Halt execution of this thread for this many milliseconds", it means "Halt execution of this thread for at least this many milliseconds and resume execution whenever the OS thread scheduler damn well feels like it". If your system or process is under high load, a Sleep( 0 ) can in fact schedule you milliseconds (or even full seconds) in to the future.

Share this post


Link to post
Maes said:

I can cite Java's System.currentTimeMillis()



That'd be lack of reading the docs, not a real problem with the function. It never claims to be millisecond precise.

From the documentation of this function:

Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds.


And from past experiences developing for old JavaMobile phones I can confirm that you can't take anything for granted here. Among different decvices I got precisions ranging from 1 ms up to 50(!) ms.

Share this post


Link to post

I wonder what Doom95 or the early WinBoom, WinDoom etc. ports use ;-)

Share this post


Link to post

I think they use timeGetTime as well. That function originates from 16 bit Windows and had been present since forever.

AFAIK it was only pre-XP NT-versions which did not have 1ms precision, but not Win9x.

Share this post


Link to post

Well...




Turns out that both are used...somewhere. And I should probably get a life.

Share this post


Link to post

GetTickCount is used by the C runtime library, i.e., it's always present in the import table, so this isn't saying anything.
timeGetTime, on the other hand must be explicitly used to get a reference to it.

Share this post


Link to post

Well....then we need to see which ports with access to a true millisecond-accuracy timer use only "short tics" (28 ms), only "long tics" (29 ms), or an averaging technique. The Way Doom Was Meant To Be Played (TM) must be preserved, for great justice!

Share this post


Link to post
Maes said:

Yup, if you can get actual millisecond accuracy you could use an alternation of 28 and 29 ms to get an average of 28.5 ms in the long run. If you can't get better than Windows's "traditional" 15-16 ms however, there's nothing you can do, short of allowing a VERY short 15 or 16 ms tic once in a while to compensate for all the other, overlong tics before it.

What I meant was, if you subtract last time from current time, you may be off a few ms. from frame to frame, but you'll still achieve an average of exactly 35 fps. You may be off 15 ms., but, even after 10 minutes, you'll still be off by only 15 ms.

Maes said:

Exactly how you can do that on Windows or Linux? Even on oldschool DOS, with direct Sound Blaster programming, the "DSP" generated interrupts only when DMA transfers began and ended, not during each ADC/DAC read/write cycle, and then the "timing" you'd get could be quite coarse, depending on how large the DMA buffer/packet size was.


In Windows, you can do your own audio mixing/buffer management via winmm.dll. If you use buffers 44100/35 = 1260 bytes, you'd get a callback every 1/35th (and your next buffer had better be ready).

Back to the multimedia timers, you can use timeGetTime(), but to get 1 ms. precision, you must first call timeBeginPeriod 1 once, and this is global to that Windows session :(. However, I would guess that most anyone setting timeBeginPeriod is probably setting it to 1 anyway :)


Graf Zahl said:

That info is completely out of date. I have been using the MM timer for many, many years on XP, Vista, 7 and 8.1 and not even once did it show such a low timing resolution. As I said, that may have been true on older NT-based Windows systems predating XP.


Here's what I get when running this code:

	for (int i = 0; i < 50; i++)
	{
		OutputDebugString(FStringf("%u\n", timeGetTime()));
		Sleep(0);
	}
10850627
10850627
10850628
10850628
10850628
10850628
10850629
10850629

etc.

It's clearly in 1ms resolution, and it has been like that for more than 10 years in my programs.

That's because GetTickCount is using the same counter as timeGetTime, and some program on your machine (or a library you're using) is calling timeBeginPeriod 1. To prove it, re-run your code above, after calling timeBeginPeriod 50. (But some other app on your box may set timeBeginPeriod frequently).
Even on your machine, the first GetTickCount is most likely only accurate to 50 ms. or so, but, after timeBeginPeriod 1, each subsequent call to timeGetTime is accurate, relative to the last one. Again, perhaps, some app on your machine is calling timeBeginPeriod 1, I'd almost guarantee it.

GooberMan said:

QueryPerformanceCounter and QueryPerformanceFrequency is used to work out epochs on Microsoft platforms in every engine I've worked with professionally. It requires a bit of extra math from your part to get the time value you want, as it's literally a performance counter and the processor's speed respectively. The best granularity I've seen on it when converting to a double-precision float is around 525 cycles (but that's also purely because I don't ever use the raw values returned).

You can get close to the correct 35Hz granularity using that and a combination of Sleep. Kinda. Sleep is a tricky one. It does not mean "Halt execution of this thread for this many milliseconds", it means "Halt execution of this thread for at least this many milliseconds and resume execution whenever the OS thread scheduler damn well feels like it". If your system or process is under high load, a Sleep( 0 ) can in fact schedule you milliseconds (or even full seconds) in to the future.

Yes, if your processor and OS support it, the QueryPerformance functions are the best by far, as it ticks based on the processor clock, as you mentioned. You can get accuracy to within a few nanoseconds on some machines. But, you must have fallback code in case the processor/OS does not support it. Read the docs. Supposedly, it can get weird on multiprocessor systems running at different clock speeds, but that's kinda to be expected.

The Sleep(0) is not needed for the QueryPerformance funcs directly. Rather, you need the Sleep call to allow the rest of Windows to "breathe": input/output device threads, OS threads, file system threads, etc. If your port always reads 100% / number_of_cores in Windows, it probably is not yielding to Windows. Not yielding to Windows causes strange things to happen outside the port: Other apps timeout, and/or cause REALLY long pauses when they finally grab the CPU, if ever.

I do not know the equivalent Linux/Mac functions, but the processor clock performance counters should be available in both, running modern 80x86 CPUs.

Share this post


Link to post
kb1 said:

If your port always reads 100% / number_of_cores in Windows, it probably is not yielding to Windows.

That should only happen if you've done something silly like set the thread priority to real time, or you're not processing messages correctly. The thread scheduler otherwise will shuffle your thread out according to the thread priority of other threads in the system. I can't remember the exact time the scheduler gives you for your thread slice (or if it's common knowledge, it may have been something I saw at one of Microsoft's confidential tech events) but that's still not stuff you should care about under normal circumstances.

At the very least, yeah, if you want to not give yourself up to the hands of fate and make sure you hit the exact timestep you're after, Sleep is not the way to go. It in every way is the "play nice with the other children" way to go though. I really wish there was a better way to give up your thread timeslice though without letting the scheduler go wild.

Share this post


Link to post
kb1 said:

What I meant was, if you subtract last time from current time, you may be off a few ms. from frame to frame, but you'll still achieve an average of exactly 35 fps. You may be off 15 ms., but, even after 10 minutes, you'll still be off by only 15 ms.


This is actually a perfect example of why averages aren't always significative: as you said, on the long run, having an initial error offset or even a systematic uncertainty of +/- 8ms doesn't mean much, but the devil is in the details: if you systematically run "ultralong tics" using the crappy RTC timer, you will be missing 2-4 tics every second. Sometimes 2, sometimes 3, sometimes 4.

Periodically, you might get a second in which somehow these "ultralong" tics even out ( e.g. first tic occurs right at the beginning of this new second, last tic occurs right at the end of this second, etc. ) so you get a "perfect second" with 35 tics or even a "run like Hell second" where more than 35 tics elapse, and the engine will have to catch up.

So, in the end you might get something close to 35 tics on the long run, but you will have a significant fluctuation as the engine plays yo-yo with the grossly coarse timer.

As an even more extreme example of that, imagine a platform where the timers are so coarse, that you can't get better than 1/10 second, and yet you still try running Doom with 35 tics/sec. You will get frequent situations where the engine will realize that more than one "game tic" has elapsed in the real world (up to 5, if you're unlucky), so it will have to furiously process 5 game tics in a row and slow down rendering to the timer's resolution (if there's no better source of timing), otherwise the framerate and game speed will be like a yo-yo.

Share this post


Link to post

I decided to do a little empirical test:



DOS on left, Chocolate Doom on right.

I made a long hallway for a Demon to constantly run down and teleport back to the start (with an approx. 25 second period). Then I loaded up DOS on my ole ancient Dell laptop (I put the Windows 98 boot disk on a flash drive and used that version of DOS, if it matters). Then I started them both up (also with recording a demo on both, in case that somehow mattered) and filmed it for about 5 minutes.

Results:

Choco: teleport at frame 942 (00:31.431)
DOS: teleport at frame 946 (00:31.565)

*fast forward*

Choco: teleport at frame 8784 (04:53.093)
DOS: teleport at frame 8788 (04:53.226)
Result: 4 frame difference at beginning, 4 frame difference at end. Timer is so close to perfect as to make no difference.

Share this post


Link to post

If anything, the interesting thing here is why would there be a 4-frame difference with such a simple map, using the port that's supposed to be the most accurate in terms of vanilla-fidelity. I presume you didn't move, shoot or otherwise did anything that could affect the RNG in either port?

Regardless of how the real wall-clock timers work, the inner workings should be absolutely synchronized on the same frame.

Also, what kb1 noticed before: on the long run the timer may appear to be OK, but if it's too coarse it will cause noticeable fluctuations in the visual frame rate and perceived smoothness of gameplay.

A better test would be a map with a lot of action and noticeable movement (e.g. put doomguy in a cage, surround it with pinkies or other monsters, protect doomguy from projectiles with impassable linedefs, and have him rotate constantly, so that the view changes in every frame): if there are significant timer imprecisions, they should be noticeable in a side-by-side test as one of the two ports exhibiting a "rubber band" effect. They will be "in sync" on the long run, yeah, but when examined on a frame-by-frame basis, there should be perceivable differences, with the more inaccurate one showing a "Tiramolla" effect.

Share this post


Link to post
Maes said:

If anything, the interesting thing here is why would there be a 4-frame difference with such a simple map, using the port that's supposed to be the most accurate in terms of vanilla-fidelity. I presume you didn't move, shoot or otherwise did anything that could affect the RNG in either port?

Because they started out with a 4-frame difference, of course. There was no point killing myself trying to sync up the starts precisely when I could define my own start and end points.

Share this post


Link to post

So in layman's terms after 4 and a half hours they were frame-by-frame identical.

Share this post


Link to post
Linguica said:

Because they started out with a 4-frame difference, of course. There was no point killing myself trying to sync up the starts precisely when I could define my own start and end points.


I thought you were reporting frame numbers/gametics as indicated by the engines (which should be identical for identical events) plus a time you measured yourself.

Share this post


Link to post
VGA said:

So in layman's terms after 4 and a half hours they were frame-by-frame identical.

No, after 4 and a half minutes.

Share this post


Link to post
Linguica said:

Result: 4 frame difference at beginning, 4 frame difference at end. Timer is so close to perfect as to make no difference.

This only shows that Chocolate Doom and DOOM.EXE do not go out of sync frame-wise. But this thread was about the exact amount of time that a single frame took up, or not?

Share this post


Link to post
fabian said:

This only shows that Chocolate Doom and DOOM.EXE do not go out of sync frame-wise. But this thread was about the exact amount of time that a single frame took up, or not?


Yup, and that "exact" amount can vary wildly between source ports. Even a tic of DOOM.EXE doesn't last exactly 1/35th of a second, actually not even as close as it possibly could (though I suspect that it's tied to the VGA's refresh rate, which must also not be exactly 70 Hz, either). Hey after all, NTSC's frame rate isn't exactly 30 Hz either, but rather 29.97 ;-)

Share this post


Link to post
fabian said:

This only shows that Chocolate Doom and DOOM.EXE do not go out of sync frame-wise. But this thread was about the exact amount of time that a single frame took up, or not?

I thought empirical confirmation that DOS Doom and a modern Windows port ran at the same overall speed was relevant but I guess I was wrong then

Share this post


Link to post
Linguica said:

I thought empirical confirmation that DOS Doom and a modern Windows port ran at the same overall speed was relevant but I guess I was wrong then

We've got to obsess over the possibility of a few machine clock ticks' worth of difference, so we can convince ourselves that all demos not recorded on specific DOS machines with verified interrupt dispatch response times in nanosecods must have asterisks on Compet-n :P

Share this post


Link to post

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×