Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Sign in to follow this  
Maes

Audio frame rate in Doom?

Recommended Posts

Exactly how often/rapidly is the audio engine in Doom games supposed to react to the submission of new sound sources and the status updates of existing ones? Ideally, since updates are submitted after each (real) tic, it should in theory be able to react as fast as the standard ticrate of 35 fps.

By examining the linuxdoom source code however, I found two quite conflicting/unclear and indirect definitions of the "audio sample rate". From isound.c:

// Update all 30 millisecs, approx. 30fps synchronized.
// Linux resolution is allegedly 10 millisecs,
//  scale is microseconds.
#define SOUND_INTERVAL     500
OK, so a period of 30 millisecs would give an audio update rate of nearly 33 fps. Not exactly 35 fps, but reasonably close. However that "500" number seems arbitrary, and I don't see how it could be used to generate a period of 30 ms (unless it's in the time units of a particular timer chip?)

However the mixing routine in sound.c defines this as the smallest audio buffer length that can be processed (presumably without it being interrupted and without the channel data being altered during mixing):
// Needed for calling the actual sound output.
// Needed for calling the actual sound output.
#define SAMPLECOUNT		512
#define NUM_CHANNELS		8
// It is 2 for 16bit, and 2 for two channels.
#define BUFMUL                  4
#define MIXBUFFERSIZE		(SAMPLECOUNT*BUFMUL)

#define SAMPLERATE		11025	// Hz
#define SAMPLESIZE		2   	// 16bit
In particular, SAMPLERATE/SAMPLECOUNT = 21.53 which means that in a single second, there can be only 21 complete and unique sound buffers gettting mixed, if each is 512 samples long (to achieve a rate of 35, they would have to be smaller, closer to 315 samples per buffer).

If synchronous sound update is forced after every tic, then at a rate of 35 fps there will be 17920 samples mixed per second (instead of 11025), leading to excessively fast mixing which will have to be dropped/sound weird.

These problems affect mostly synchronous mixing if you are doing your own sound mixing, of course, but can also affect asynchronous sound servers to a degree, depending on what is their minimum sound "chunk".

These discrepancies and the lack of a clear guideline made me adopt an adjustable "audio rate" model in Mocha Doom's new sound engine, so that SAMPLECOUNT=SAMPLERATE/AUDIO_RATE, combined with a forced timer that decouples the sound engine from the rest of the program, essentially working as an internal audio server/pseudo-interrupt system.

Of course, the linuxdoom code is not the best way to draw conclusions about sound, but isn't the linuxdoom mixing routine actually used in vanilla, with DMX doing only the output? If so, with what parameters for buffer size, interval rate etc?

Also, exactly what approaches are used in other source ports? Is the sound rate controllable/controlled or it's something left to the implementation of the sound library being used?

Share this post


Link to post

I wouldn't use Linux Doom as the reference on anything sound related, considering that it has a hacked-up external process serve as its sound engine and was whipped up by Dave Taylor in what must have been, judging by the quality of the code, about 15 minutes.

In DOS everything was interrupt driven. DMX cached/buffered the sounds as soon as they were handed to it and then started pushing them out on a hardware sound channel as soon as the next timer event occurred. I am not aware of the exact frequency at which the DMX Timer Service Module (functions beginning with TSM) set up the system timer, but it was capable of registering and running various callbacks off that interrupt-driven system and sound was one such.

Since the audio sample rate was set to 22050 Hz, you can assume that the callback was made often enough so that the ideal bitrate of the audio stream was satisfied by the length of the buffer times the number of times it was called per second.

Because to do otherwise would mean the audio would start to skip, and I certainly never had that problem in vanilla DOOM ;)

Share this post


Link to post

Why 22050 and not 11025?

If anything, mixing at 22050 would be pointless unless the samples themselves are supersampled to 22 KHz (and post-filtered) or, the very least, some channels could be delayed for an odd number of samples so that you actually get a little signal dithering and -a bit- smoother sound, otherwise you'll simply be turning the samples into nasty time-stretched rectangles.

Unless you are referring to the total data rate for two 8-bit channels under DOS. That would balloon to 44100 bytes/sec for 11 KHz/16-bit/stereo.

Anyway, in my earliest sound attempts with Mocha I tried to play 512 sample buffers 35 times a second, straight. Rather than skips, I got weird "hustling" effect where the samples sounded shorter and "hurried up" as the output tried to match the overflowing mixing rate of 17 KHz with just 11 KHz. It sounded like a time compression effect, with no pitch variation ;-)

Share this post


Link to post
Maes said:

Why 22050 and not 11025?

If anything, mixing at 22050 would be pointless unless the samples themselves are supersampled to 22 KHz (and post-filtered) or, the very least, some channels could be delayed for an odd number of samples so that you actually get a little signal dithering and -a bit- smoother sound, otherwise you'll simply be turning the samples into nasty time-stretched rectangles.


Super shotgun sounds are 22khz.

As far as upsampling goes, linear interpolation is simple (easily accomplished in realtime on the hardware they had, hardly even slower than nearest neighbor), and doesn't produce enough artifacts to bother most people. The 8 bit sampling depth gives a noise floor of -50dB, so it's not like you'll get quality whatever you do.

You have to support arbitrary sampling rates in the sound lumps anyway, so no point in doing something that depends too much on assumptions on the incoming data.

Share this post


Link to post

SSG @ 22 KHz? LOL and I wondered why the opening sounds sounded so cooler in Mocha Doom all of a sudden :-)

Is there any hard data for this (e.g. that mixing in vanilla actually occured at 22 KHz and that sounds were interpolated/expanded/whatever in memory)? Did this occur even in early versions, when there were no 22 KHz sounds at all? Because it sure doesn't show on the linux code at all, not even as some funky exception for the SSG sounds, let's say.

Share this post


Link to post
Maes said:

SSG @ 22 KHz? LOL and I wondered why the opening sounds sounded so cooler in Mocha Doom all of a sudden :-)

Is there any hard data for this (e.g. that mixing in vanilla actually occured at 22 KHz and that sounds were interpolated/expanded/whatever in memory)? Did this occur even in early versions, when there were no 22 KHz sounds at all? Because it sure doesn't show on the linux code at all, not even as some funky exception for the SSG sounds, let's say.

Yes I have partially reversed DMX and it uses the constant value 22050 at numerous places as a divisor. The original version of DMX used in DOOM used an 11025 Hz rate. The oft-referenced "sample rate increase" which occurred some time around Doom 1.4 was accompanied with the breaking of pitch shifting support and GUS support. GUS support was eventually restored IIRC, but pitch shifting never did work again after that.

No offense but you should probably know I wouldn't mention 22050 as the sample rate unless I had an empirical reason to do so ;)

Share this post


Link to post
Quasar said:

pitch shifting never did work again after that.

Technically, they did restore it for the Xbox port at least. ;) Though I presume they got rid of DMX entirely and replaced it with DirectSound.

(One can regard it as the latest evolution of the Doom code since it was made by Id from their original codebase; rather than made by other people or from linuxdoom through some other port.)

Share this post


Link to post

The latest interface-ports code in ReMooD basically does this:

1. In D_DoomLoop(), after some tics are ran, call S_UpdateSounds() which updates the audio buffer.
2. Start of S_UpdateSounds()
3. Call the interface code to see if the existing buffer has finished playing. If it is still playing, return to D_DoomLoop().
4. If it is finished, start mixing.
5. Normalize the entire buffer piece (memset)
6. Go through each channel mixing everything for whatever sound there is
7. When finished filling the buffer, tell the interface code that mixing is done.
8. Return to D_DoomLoop()

By default the audio is at 11KHz, 8-bit, Stereo with the buffer at 1024 samples (no stuttery on systems or delay of sorts). It can be toggled between 8/16-bit, Mono, Stereo, Surround (as of this writing, surround sound does not work), 5.5KHz, 22KHz, 44KHz, etc.

In general if the game requests a sound buffer that the interface can not handle then sound will fail to work until a correct setting is chosen. For example Allegro lacks surround sound, so if you try to create a buffer with 4 channels (FL, FR, BL, BR) then it will fail.

Linear interpolation of sounds is done at mixing time, so if a sound is 22KHz but the game is using 11KHz for sound, then the sound will just play twice as fast. Random sound pitch does the same thing also, for simplicity (and is capped between 0.75 and 1.25).

Share this post


Link to post

In my earliest implementation of the "classic" sound mixer, I tried to do synchronous updates after each frame (I mixed 1 audio buffer after each Doomloop tic, and submitted to the audio output when I had enough of them (e.g. 3 or 5, depending on how much I wanted to buffer). The mixing routine was on the same thread as the main DoomLoop, and was executed at its end with SubmitSound. The actual sound playback was done on a separate thread which kept track of completed mixing buffers in a "buffer queue". It's also possible to keep the mixing routine and the output thread entirely silent if no mixing has taken place, saving CPU cycles and memory bandwidth.

This had the disadvantage that audio update rate was tied to the actual frame rate, so on slower systems this caused stuttering as there might not be 3 or 5 continuous audio chunks to send to the output.

I fixed this by adding an independent timer: now the audio engine (and the timer) run on their own threads, and are isolated from the main game (the timer keeps ticking at a selected rate, no matter what the main game does). When sounds are submitted, they are converted into messages which are enqueued to the sound threads, and processed with priority (messages can specify starting a new channel, updating an existing one, or stopping one). All messages stacked up in the queue are processed at each timer tick.

I can change the actual mixing rate to anything I want, without this affecting pitch effects. However, for efficiency's sake, it's more convenient for all sound effects to be the same format and sample rate (the mixing routine however works with 8-bit mono effects which are converted to 16-bit stereo ones using Doom's classic method during mixing, but it should be possible to generalize it).

Again, I would like to know how vanilla Doom handled those mixed sound effects: did they simply "rectangularize" 11 KHz ones to play them at 22 KHz?

A useless trivia: Doom's pitching effects allow for a -/+ 400% pitch variation, according to the pre-built step tables. Also, that's where the 64K audio sample limit comes from: they are using 16.16 fixed point "indices" into samples to achieve pitch variations, of which of course only the "integer" part is retained. However it should be possible to extend this to e.g. 48.16 or 32.32 allowing both larger samples and/or more fine pitch control.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
×