Published May 16, 2022
| Updated May 17, 2022
Reading time: 16 minutes
Table of Contents
I develop a game audio library called Kira. Here's some of the hard parts I've figured out. If you decide to make an audio library for some reason, learn from my experimentation!
When it comes to graphics, different games have different acceptable framerates. Most people would consider 60 FPS to look "smooth". But even if the framerate dips a little bit below 60 FPS, it's not the end of the world. The game will keep displaying the previously rendered frame until the next one is ready. If it's a small enough frame drop, the player might not even notice.
But if your game can't produce audio as quickly as the operating system wants (this is called a buffer underrun), the operating system has no choice but to fill the gaps with silence. And the player will notice.
If you can't listen to the example audio, imagine that when a game had frame drops, the monitor would display a black screen for all the frames the game couldn't produce quickly enough. Audio stuttering is the equivalent of that.
When writing audio code, you want to avoid underruns at all costs. This means you have to do your audio processing on a separate thread. If you tried to do audio processing on the same thread as your graphics and input, your audio would stutter when the graphics rendering becomes too demanding.
You also can't block the audio thread. If something could cause the audio thread to pause for an unknown amount of time, you shouldn't do it. If the audio thread pauses for too long, it won't be able to process audio fast enough, leading to buffer underruns.
Notably, this means you can't allocate or deallocate memory on the audio thread. When you ask the operating system to allocate or deallocate memory, you have to pause the thread until the OS is ready to get around to your request. Usually it'll do it quickly, but if the system is taxed, the OS might deprioritize the audio thread, leading to a long period where you can't process any audio.
Keeping these two constraints in mind, let's look at some problems that come up when creating a game audio library.
A game audio library provides functions for playing and modifying audio that you can call from gameplay code. It'll look vaguely like this:
let mut sound_handle = audio.play; /// later... sound_handle.set_volume;
When we play a new sound, we need to load audio data from a file on the gameplay
thread and send it to the audio thread. (We can't load the audio data on the
audio thread because that takes too long and could lead to buffer underruns.) To
send the audio data to the audio thread, we can use a ringbuffer, such as the
// on the gameplay thread audio_producer.push; // on the audio thread while let Some = audio_consumer.pop
But how do we tell the audio thread to modify an existing sound (e.g. setting the volume of a sound that's already playing or setting the playback state of a sound)?
We could allow the gameplay thread to control data on the audio thread directly
by giving it shared ownership of the data via a
On the audio thread, we'll store the sound state as an
sound handle on the gameplay thread will have a clone of that
To access audio data on the gameplay thread, we just have to lock the
Of course, the audio thread also has to lock the data before it can access it.
Waiting for other threads to unlock the data does, in fact, block the audio
thread, which is one of the things we definitely shouldn't do. So
There is a way we can share data among multiple threads without having to lock it: atomics. Atomics are special versions of primitive types that the CPU knows how to keep synchronized between threads.
We can make each modifiable field of the sound atomic:
The sound handle will get clones of each field:
And to set those fields from the gameplay thread:
Using atomics won't block the audio thread, but it does have some limitations. The largest atomic is 64 bits. That's enough space for a volume level or a playback state, but what if we want to send a more complex command to the audio thread? For example, what if we want to smoothly adjust the volume of a sound over a period of time? Maybe even with a user-specified easing curve?
If we represented all the needed information for that command as a struct, it would look something like this:
That's more than we can fit in one atomic. We could store the command in multiple atomics, but then we have to keep them synced up. If we limited the maximum duration, maybe we could store it in 16 bits. I'm sure this is a solvable problem, but the solution won't be very ergonomic. So what else can we do?
Why not just send commands via a ringbuffer? We're already using them for sending audio data.
We can describe all of the possible commands with an enum:
The sound handle will own a command producer that it can push to:
And the audio thread will own a command consumer that it can pop from:
There is a downside to this approach: every sound has to periodically poll for new commands. Most sounds will not be changed at any one time, so it seems wasteful that they all have to poll for commands. (And in my unscientific benchmarking, all of the polling does make a noticeable difference in performance.)
Maybe we can just use one ringbuffer to collect the commands for every sound? We already have a ringbuffer for sending audio data, so let's just expand that to send a command enum:
Of course, we need a way to tell the audio thread which sound we want to change, so let's add some unique identifiers to those commands.
Every sound handle will need the ID of the sound it's meant to control. Also,
every sound handle will need to push commands to the same command producer, so
we'll need to wrap it in a
Unlike the last time we tried using
Mutex is only shared on
the gameplay thread, so there's no risk of blocking the audio thread.
On the audio thread, we'll have some code along these lines:
while let Some = command_consumer.pop
This works! It's reasonably efficient, and we can send arbitrarily complex commands to the audio thread without blocking anything.
There's only one problem: where does the
SoundId come from?
We need to store resources (sounds, mixer tracks, etc.) on the audio thread in a way that provides:
The arena data structure is a natural fit. An arena is essentially a
slots that can be occupied or empty. When we insert an item into the arena, the
arena picks an empty slot to insert the item into and returns a key that
contains the index of that slot. Accessing individual items is as fast as
indexing into a
Vec. Iterating over items is slow if you loop through every
slot and filter out the empty ones, but you can use a linked list to make
iteration much faster.
So this will be the flow of sending resources to the audio thread:
So we'll have to wait a bit for the audio thread to return the key, but it shouldn't take too long, right?
When you hear audio coming from speakers, you're hearing an analog representation of digital samples that are evenly distributed over time. But an application does not produce those digital samples at a constant rate. If it did, then if any of the samples took too long to calculate, the application would fall behind and wouldn't be able to produce audio fast enough, leading to underruns. Instead, the operating system periodically asks the application to produce a batch of samples at a time.
Let's say the operating system wants to output audio at 48,000 Hz (or samples per second), and it requests 512 samples at a time. The audio thread will produce 512 samples of audio, then sleep until the operating system wakes it up for the next batch of samples. The operating system might not need to wake it up for another 10 milliseconds, since that's the amount of audio it has queued up.
If the gameplay thread sends a command to play a sound right after the audio thread falls asleep, the audio thread won't receive it and send back the sound ID until 10ms later. To put that into perspective, in a 60 FPS game, 10ms is more than half a frame. So if we played two sounds in a row, blocking the gameplay thread each time to wait for the audio thread to send back a sound ID, we could end up with a frame drop. That's not acceptable performance for an audio library.
So arenas are out.
If we store resources in a hash map, we can create keys on the gameplay thread
and just send them to the audio thread along with the command to add a resource.
The standard library's
HashMap isn't very quick to iterate over, but the
indexmap crate solves that problem for
Here's the new flow:
Problem solved! We don't have to wait for the audio thread to send back an ID.
There are some downsides to this approach, though:
IndexMaps lose capacity over time...wait, what?
If you're like me, you'd be surprised to learn the latter fact. But I'll prove it to you!
Here's a small example where I add items to an
IndexMap. Every 5 items added,
I remove 3 items at arbitrary indices. Every time I add an item, I print the
length and total capacity of the map. You can run this code snippet yourself
We end up with a result something like this:
28 / 28 26 / 26 // capacity decreases 27 / 27 28 / 28 29 / 56 30 / 56
Notice the dip from a capacity of 28 to a capacity of 26? This isn't a bug,
it's just how the hashbrown algorithm works
IndexMap and the standard library's
It turns out there is a workaround: the capacity will never decrease if you don't exceed 50% of the capacity. So we could just allocate twice as much space as we need to avoid the problem. Kira v0.5 uses this approach, but I didn't feel comfortable relying on an unspoken implementation detail of a library.
Maybe an arena can work; it would just need to let us generate new keys from the gameplay thread. A suggestion from the Rust Audio discord got me thinking about how atomics could be used to serve this purpose. I eventually came up with an arena that has two components:
When we want to add an item to an arena, we first reserve a key from the arena controller. If too many keys have been reserved, the controller will tell us the arena is full and not give us a key. If there are slots available, we can send the key along with the command to add an item and return the key immediately to the caller.
I find this to be a really elegant solution, since it solves multiple problems at once:
Vec, so there won't be any surprises with the capacity
This is the solution I'm using for Kira v0.6. You can see my implementation of the arena here.
Making audio libraries is hard. I don't know the best way to do it. This is just what I've tried and how it went for me.