How to design a dynamic game audio mix

Illustration: Daniel Zender

A dynamic game audio mix can transform a good game into a truly immersive experience.

Let’s begin today’s discussion on game audio by re-establishing the same gameplay scenario from a previous article. The player is walking through a rainforest. They hear the ambient noise with one-shots scattered throughout the soundscape. They can hear their footsteps move over terrain. Suddenly, their mission objective is updated and a UI notifier is seen, and heard, on-screen. As the player ventures deeper into the rainforest, ambient music assigned to the region starts to play.

As the player continues to walk, they hear their footsteps brush against foliage and different surface types. Suddenly, enemy footsteps are heard out in the near distance before a firefight breaks out. Characters are shouting at each other, running around everywhere, and gunshots are echoing through the rainforest. These diegetic effects all appear as though they’re emanating from the same physical space, thanks to a reverb plugin working behind-the-scenes.

Suddenly, the soundscape has become significantly more complex and expansive. Though this is an example that an audio designer could be faced with in a standard open-world game, the principle can be applied to multiple game types. All of these sounds that are being triggered in the world need to be controlled and relayed to the player in a logical way that, above all, sounds great. The first way that this can be tackled is through a simple strategy – organization.

Organizing mixer busses

When creating a song inside of a DAW, the basic method for organizing mixer groups is usually by instrument or function. The same principle can be applied when working on game audio. Audio middleware, as well as game engines, have mixer functionalities that behave in a very similar way to what you would find in a DAW.

 Image of the FMOD Mixer view in the Fmod “Celeste” Walkthrough Project

The FMOD Mixer view in the Celeste walkthrough project

The principle here is the same as when you’re structuring the mix for a song. Sound effects of the same type should be placed within the same mixer bus structure. And much like when working on a large arrangement, these larger categories of sounds start to form smaller and more specific categories.

Usually, when working on a mix bus structure for a game, it’s best to use a hierarchical, ‘top-down’ approach. This will allow for sweeping changes to larger categories of sound later on. Below is a diagram that uses an example for a mixer bus dedicated to character sounds.

An example of a mixer group system for dynamic game audio

As outlined in the diagram, all of the associated foley movement and footsteps could easily be placed within the same parent bus. But within this parent category, we can separate character sounds into several different subcategories such as player sounds, enemy sounds, player foley, player footsteps, enemy foley, enemy footsteps, and more.

How to think about organizing your mixer busses

If desired, the designer could make groups even more granular by separating foley and footsteps by specific characteristics like surface types. With a layout such as this, any mix rules or plugins applied to the top-level ‘Character’ bus will have an effect on all subsequent child busses. Likewise, rulesets and plugins can be added to these child busses to affect more specific behaviors related to smaller subgroups.

There can be a number of different ways to organize mixer busses. There are some tried-and-true methods, but mixer bus structure can be entirely dependent on the game’s genre. In the example above, it’s up to the designer to dictate what’s considered a ‘character sound.’ For instance, in our example the designer has opted not to include weapon sounds within the character category, and might instead place weapon effects within their own mixer category.

Organization matters

However, there could very well be a scenario in a game where it makes sense to house weapons or player abilities under a Character bus. It’s ultimately up to the game audio designer to make the choice that’s the best fit for the project they’re working on. As you’ve probably started to gather, the organization method is an important decision. Once the hierarchies are in place, the designer can start to add rulesets and effects to large parts of the game with ease.

Using plugins on a mixer group

Much like a music producer would place a reverb plugin on a mixer bus in a DAW, oftentimes, a game audio designer will want to add a reverb plugin to different mixer groups to alter how that sound emanates throughout the world. This is where organization starts to become very important. Reverb will affect large groups of sounds based on a player’s location. In the diagram below, you can see that there’s a new mixer group called ‘World’ that contains additional child categories.

An example of a mixer group system for dynamic game audio

This larger group houses the majority of diegetic sounds (sounds that occur within the game’s world) – in this case, the sounds that should be affected by the World’s reverb settings. Now, a reverb plugin can be added to the World bus or, alternatively, an auxiliary send can be created which houses the effect. With both methods, the result will be the same; all sounds on the parent bus as well as the contained subgroups will be affected by the reverb.

Reverb behavior in games

The reverb applied to sounds in a game is usually dictated by a game object known as a reverb zone. Reverb zones can be thought of and applied in an engine in a number of ways. The simplest interpretation, however, is that they act as a trigger zone. Once a player enters a trigger zone, a message is sent to the audio engine to enable the necessary reverb behavior.

An example of a reverb zone in a game engine can be seen in the image below:

Image of a Trigger Zone in Unity denoting a reverb area for a cave in the Wwise Adventure Game

A trigger zone in Unity denoting a reverb area for a cave in the Wwise Adventure Game

Once the player enters a reverb zone, the reverb settings that correspond to that location need to be correct on the corresponding reverb plugin. These settings are usually dictated by a mix state.

Using mix states

In game audio middleware, a mix state can be thought of as a preset that saves the plugin settings on a mixer bus. When the player enters the reverb zone for the rainforest in our gameplay example, the mix state associated with the rainforest environment will be triggered. Likewise, if the player enters a different location containing a different reverb zone, such as an interior warehouse, the mix state associated with the warehouse would be triggered. This allows a single reverb plugin to correspond to many different presets, and a mixer group to route audio through different reverb sends.

The versatility of mix states

The use cases for mix states can be expanded beyond reverb as well, to any situation where we might want to associate different real-time effects to specific game states. For example, let’s say that when the player’s health is low, a designer wants to add a low-pass filter to World effects and fade in a heartbeat sound effect. There are a few ways this can be accomplished. In the example below, a low-pass frequency value on a filter plugin is tied to a ‘player health’ parameter in the World mixer bus.

Image of a Low Pass Filter effect on the World bus increasing as a Player Health parameter decreases

A low-pass filter on the World bus that becomes more pronounced as player health decreases

Additionally, volume modulation can be in place on a looping heartbeat so that it’s only audible when a certain mixer state is active. By putting these sorts of rules in place, a game audio designer can create a dynamic mix that constantly adjusts itself to best serve the current gameplay scenario.

Voice limiting and prioritization

Proper mix group organization and mix state utilization do a lot of the leg work for transforming a good-sounding game into a great-sounding game. However, in the gameplay scenario outlined at the beginning of this article, we have numerous sound groups emanating from tens (if not hundreds) of emitters simultaneously. Remember, each character and each weapon in the game have their own emitter and corresponding instance of sound. If the player is going up against several enemies, each with their own emitter, the soundscape multiplies quickly. Furthermore, if additional groups like weapons or vehicles come into the mix, things can quickly start to get out of hand.

The need to keep CPU efficiency in mind

Once the mixer hierarchy is organized, it’s time to think about how voices will be limited. This is important because every instance of sound that occurs in the game has a direct impact on the amount of CPU being used by the audio engine. If CPU usage starts to get too high, sounds will become delayed, and if things start to get even more severe, it could slow down the frame rate of the game. Typically, a profiler in game audio middleware will help you monitor your CPU usage so you know what instances can be improved from voice limiting. Even if you’re not using middleware, many game engines have their own built-in profilers as well.

Image of the Fmod Profiler from the “Celeste” walkthrough project

The FMOD profiler from the Celeste walkthrough project

By profiling the different gameplay loops that exist in a game, a designer can start to deduce offending mix groups that have an excessive amount of audio instances and begin limiting them.

Voices and the law of two-and-a-half

In the context of game audio, a voice refers to a single instance of sound. When confronted with a soundscape with an excessive amount of voices, it can be difficult to discern which ones should be eliminated for the sake of performance.

The most important question a designer can ask themselves in these situations is, “What is the player most focused on during this instance of gameplay?” A helpful way to answer this question is to turn to Walter Murch’s law of two-and-a-half. To summarize, the law states that when there are two visible subjects on-screen, the sounds corresponding to those subjects’ movements need to be perfectly in-sync. However, when there are more than two subjects, any sync point is as good as any other. This concept is most effective when thinking about footsteps, but it can also be applied to other circumstances, especially in video games.

With this in mind, a game audio designer can determine how many sounds from each mix group need to be audible at a given moment. In a gunfight, for instance, how many enemy weapons does the player really need to hear simultaneously? If there are three or more enemies in a given battle, the likely answer is that the two most important enemy weapons need focused localization. Additional weapon sounds, in this case, would only serve to populate the soundscape. This theory of limiting can also be applied to mix groups such as vehicles. How many vehicles should be clearly audible at the same time?

In the large scope of a game, this principle can be applied to numerous sounds that exist in the world. The limitation of voices not only improves the overall performance of the game, but also has the additional benefit of making the mix sound more clear.

Prioritization

As already mentioned, in any given gameplay scenario there’s the possibility of hundreds of emitters being triggered at the same time. Each of those emitters contain their own voices and additional layers of sound. Setting limits on a mixer group as well as individual sound effect types definitely helps, but it’s only part of the overall picture. To create a clear and cohesive mix that doesn’t overload the CPU, a game audio designer needs to define the most important sounds in their game. This is where prioritization comes into play.

Image of Prioritization options for a Sound in Wwise

The prioritization options for a sound in Wwise

Without proper prioritization, a sound engine will start to limit voices indiscriminately when it reaches the maximum quantity it can feasibly trigger in a given frame. To prevent accidental cutoffs, it’s important for a designer to define the sounds that are most important for their game. This can vary widely based on game type, but there are some rules of thumb to help define the highest priority sounds in a game.

How to determine your priorities

The first priority is often soundtrack music. Because soundtrack music is ‘2D’ and ever-present in the game world, it would be awful for music to be among the sounds that are unintentionally cut out during a dense action sequence. Therefore, soundtrack music should always sit at the highest priority. The same can be said for highly important UI notifiers, from both a gameplay and accessibility stand point. Something like a ping-system should always be audible in the heat of combat.

Now that the easy sounds are out of the way, this is where things start to get a bit more nuanced. For one thing, sounds that are in direct relation to the player character and visible on-screen should always be of high priority. In a third-person game with a fixed avatar, for example, this can include everything from footsteps to spellcasting and weapon fire sounds, since their absence will be disorienting to the player and break immersion.

Once player sounds are taken care of, it’s time to turn our attention towards the other aspects of the game. Again, the parameters given here can depend heavily on gameplay. For example, the player might be engaging in a perfectly-tuned combat scenario where they hear a clear mix of the right amount of gunshots, enemy footsteps, and vocalizations. Then, a large object starts to move in the distance and catches the player’s attention. How important is that object to the combat scenario? Is the object an important narrative device for the mission? Is destruction occurring? These are all very nuanced and difficult questions to grapple with in terms of the mix, but it all relates back to proper mix bus organization.

For example, in the above scenario, a designer might establish two separate ‘Object’ mixer groups, one designated to narrative-driven objects and the other to world objects. In this case, the narrative-driven objects might be designated to a higher priority category, to ensure that they’ll always trigger when they need to. Conversely, less important world objects might be removed from the soundscape in the heat of battle if the objects in question are off in the distance.

Virtualization

We’ve talked a lot about limiting sounds for performance reasons, but not the different types of limiting that can be utilized. There are cases where a sound won’t be allowed to play at all, others where it’ll be interrupted, and even times where a sound will be virtualized.

Virtualization is a process where all of the data relating to a sound is tracked, but the sound itself isn’t played. When a sound is cued up, its limitation parameters are first examined. If it shouldn’t be played, it’s then virtualized. Similarly, if a sound is already playing but is meant to be taken over by a higher priority sound, it can be virtualized so that its simulation continues where it left off, but the actual sound is no longer heard.

This saves a large amount of processing overhead, but still occupies some percentage of the CPU. This is because the timeline position for the sound is still being tracked, and an extra amount of operations need to be done in order to begin resuming playback if it suddenly no longer needs to be limited. Therefore, this type of optimization is best used on looping sounds that have a high probability of moving in and out of limitation states – things like looping ambiences or vehicle engines.

In the case of one-shot sounds, because they’re short bursts that are only heard once, it’s best to limit the sound entirely by interrupting them or not playing them whatsoever. A designer would not want to use CPU resources to keep track of the timeline of a sound that would end in a moment; by the time the limitation conditions expire, there’s a high possibility that a new instance of sound would be triggered regardless. In this case, the player would likely not notice its restart, interruption, or complete absence during gameplay.

Game audio engines are sometimes built with this design in mind, where in the case of limitation, one-shot sounds will be eliminated from the playback queue while looping sounds will be virtualized. Sometimes, though, the optimization choice is left up to the designer. This allows a designer to implement optimization that’s specific to their game.

Image of Virtualization Options for a sound in Wwise

The virtualization options for a sound in Wwise

Conclusion

When creating a cohesive game audio soundscape, it’s important to remember that each element of the soundscape will interact with the player at some point in time throughout gameplay. Much like when mixing a song, what’s most important to think about is if the parts benefit the whole. A collection of sounds that sound good on their own but don’t fit the rest of the game will be a detriment to the game’s overall audio. What matters most is that the audio is in service to the gameplay. The audio, mix, and implementation strategies should always serve to compliment what the player is experiencing during gameplay, and feel like a natural aspect to the overall immersion.


Explore royalty-free sounds from leading artists, producers, and sound designers:

February 8, 2021

Ronny Mraz Ronny is a Technical Sound Designer at Avalanche Studios in New York City. There, he has worked on the recently released Just Cause 4, as well as the game’s three downloadable content packs: Dare Devils of Destruction, Los Demonios, and Danger Rising. He's also a member of the Adjunct Faculty at NYU Steinhardt where he teaches the class, "Scoring Techniques: Video Games."