How to create an audio soundscape for video games

Illustration: Daniel Zender

Consider the scenario where a player has just booted up their favorite open world game and walked into the middle of a lush rainforest.

Suddenly they’re enveloped by sound – a rich bed of insects and wildlife occupy the central focus of the soundscape, while various birds and other animals can be heard in brief punctuations in the distance. As the player navigates their way through the rainforest, they can feel their feet cross over the terrain. Though subtle, each footstep sounds unique.

The situation described above is an incredibly common audio scenario in video games at large, not just open world games. Breaking down the principal that is represented here, the player is entering a scenario that requires a dynamic and active soundscape, where different aspects of an atmosphere cohesively work together. As sound designers, we can use several basic principles of game audio to bring the space to life.

Creating looping sound effects

When distilling the different audio elements in this gameplay scenario, the first thing that’s needed is an ambient bed of sound that persists for the player when they enter the rainforest. In order to do that, we need a sound effect that seamlessly loops.

The first thing required, of course, is some content. If you don’t have any of your own field recordings, you can source a lot of common (and obscure) sounds on Splice.

Once you have your material, load it up in a DAW. An initial action that can be taken with the content is extracting any protruding sound effects from the general bed. This is because sounds that stick out to the listener have the propensity to make a loop more noticeable. If a player sits in the rainforest and hears the loop for a few minutes, hearing the same bird call every 15 seconds will slowly break the immersion – the loop should sound as seamless as possible.

Image of protruding effects being removed from an ambient bed

Protruding sounds being removed from an ambient bed

It’s good practice to save any content you’re removing from your basic loop, since it could be used later on.

After removing any protruding elements, the next step is to bring the elements of the base ambience layer back together. This can be done by making use of crossfading to smooth over these reconnections.

Image of the repaired ambience bed with protruding sounds removed

The repaired ambience bed with protruding sounds removed

Once the content is pieced together, it needs to be turned into a loop. For starters, a loop length of about 10 – 30 seconds should be more than sufficient for our purposes here, but loop lengths can vary vastly based on context. In order to turn the region into a loop, it simply needs to be split at around the halfway point (or really wherever the least amount of activity is). Then, as outlined in the image below, the positions of section B and section A need to be switched and reconnected via crossfade.

Image of a split looping region

A split looping region

Image of the split looping region inverted and repaired with a crossfade

The split looping region, inverted and repaired with a crossfade

Finally, zero crossing within the ambient region can’t occur at the starting point or the end point of the loop, since this will cause audible clicks when looping actually occurs. Sample-based zero crossing can be removed by adding very quick fades on either end of the regions, as demonstrated below.

Image of a fade preventing zero crossing at the beginning of a region

A fade preventing zero crossing at the beginning of a region

By doing this, a listener won’t notice the fade out / fade in or hear an audible click, and now the looping region can be rendered. For this purpose, either a stereo or mono .WAV file will work as an export format.

Creating one-shot texture sounds

Now that the initial looping bed is in place, we want to ensure that there are some additional details for the soundscape. These details are otherwise known as one-shot scatter sounds. These will ensure that the soundscape remains dynamic, no matter how much time the player spends in the rainforest.

Out in a real natural environment, these types of sounds would commonly take the form of isolated bird, insect, or other animal calls. They can also be accentuated by other types of movement that sound at a distant point from the listener.

Recording isolated animal sounds is difficult, but some can be accessed via Splice here. Alternatively, this is where we would make use of those protruding sound effects that were previously removed from the looping sound bed. These samples just need to be edited to ensure they blend in with the environment. These sounds can be edited based on their transients and tails, as outlined below. Additional effects or plugins (such as reverbs) can also be added to these effects to further allow them to blend with the initial sound bed.

Images of one-shots extracted from an ambience bed being edited

One-shots extracted from an ambience bed being edited

Building the soundscape in middleware

Next, these audio files need to be added into the middleware that we’re working in. The example here will use FMOD.

Audio actions like this are held within an FMOD event. Inside of an FMOD project, a new event can be greeted which holds an encapsulated instance of sound.

First, the initial bed needs to be added to the first track of the event. This will serve as the anchor in the soundscape and give a nice base layer to work with. Once the sound is added, the timeline needs to loop over the duration that the sound occupies on the timeline.

Image of the base ambience bed within a loop region in FMOD

The base ambience bed within a loop region in FMOD

Next, the one-shot sounds need to be added. Remember – these sounds need to be sporadic over the lifetime of the ambience. There are many different ways that this can be achieved, and I’m sure that you’re thinking of a few possibilities already. I encourage you to try out many different methods to hear the results, since there’s more than one correct way to accomplish this.

Since we’re working in FMOD, for this example, FMOD provides a great tool to work with called the Scatter Instrument. This helpful tool takes a collection of sound effects and plays a single audio sample from the collection at fixed intervals of time, at different distances from the listener. These parameters are dictated by the designer.

Image of the base ambient bed and the Scatter Instrument in FMOD

The base ambient bed and the Scatter Instrument in FMOD

This is the perfect way to add dynamic elements to the rainforest; by using the collection of one-shots that were edited from our initial ambience bed, texture and additional life can be added to the environment. Now that the main components of the soundscape are in place, the layers can be tuned to better fit the needs of the situation.

Single-channel and multi-channel files

You may have noticed that the different audio contexts above have different rendering protocols. As discussed in the last article, a 3D sound in a video game can change based on the player’s distance and orientation from an emitter. In this way, a single sound emitting from a point source has the possibility to emit from a single speaker in a surround sound configuration. In this instance, we would be wasting data and memory by having unique audio content in multiple channels. At a distance, all of this data would fold to mono regardless.

2D sounds operate a bit differently. Because there are no distance calculations done on this type of sound, we can assume that the way we hear the audio file outside of the game will be the way that we hear the audio file behave when placed in game. So, if the player is navigating a menu, for instance, we can add panning information or extra details in separate channels and can reliably expect the player to experience that effect.

Using automation to change the behavior of a sound

In our case, there’s a bed of ambience that needs to envelop the player when they’re in the designated area. When the player’s far away, however, they should hear the sound as a localized point source so that they can hear the rainforest accurately in the distance. This is what’s known as a 2D close / 3D far attenuation setting. When the player’s inside the rainforest, or close enough to the source of the sound, they’ll hear the rainforest in its full multi-channel glory. Conversely, when the player’s far away from the rainforest, they’ll hear the ambience bed as a point source emitting directly from the rainforest area, and audio will only emit from the appropriate surround speaker(s).

Demonstrated in FMOD below, panning settings can be automated over a sound’s distance from the listener. As the player moves closer to the rainforest, the sound slowly transforms from a point source to a sound that totally surrounds the player.

Image of “pan override” automation on a single track in FMOD

The ‘pan override’ automation on a single track in FMOD

Adding variation to repetitive sound effects

Now that the layers are in place and automation has been added to the base layer of our ambience, additional modulation needs to be added to the Scatter Sound module. As you may have gathered, even with several different variations of sound, over a long period of time the player will begin to notice repetitive samples. The more sample repetition that exists, the more likely the audio will pull the player out of immersion. Luckily, middleware has some built-in tools to allow for further variation on samples that already exist in a project.

Within the Scatter Sound module, sound is being created at different distances from the listener at different amounts of time by default. This is a great place to start, but there’s even more that can be done. Parameters can also be added to vary the pitch as well as the volume of each voice that’s created for our Scatter Sound. This way, the same single audio sample can be slightly different every time it’s used, and no two sounds will be the same.

This can be done very easily inside of FMOD, as seen below.

Image of pitch and volume randomization on a Scatter Instrument in FMOD

Pitch and volume randomization on a Scatter Instrument in FMOD

This technique can be expanded to many different levels of abstraction within FMOD, all the way up to the event level. This way, multiple levels of variation can be done to the same instance of audio within FMOD.

It’s important to note that when working with pitch modulation, it’s usually best to keep the randomization range between zero to two semitones depending on the content. Modulation outside of that range is usually used as an additional sound design utility and isn’t the most ideal for pure variation purposes. This technique can also be used across each different type of middleware. The general principle can also be applied to various different types of sound effects in your game.

Handling footsteps

Now that the basic ambience for the rainforest is in place, there’s still one aspect of the soundscape outlined at the beginning of the article to account for – the player’s footsteps.

Much like the ambient one-shot samples in the Scatter Instrument, footstep sounds are typically sourced from a collection of unique footstep samples. For this case, the footstep sounds will be rendered as mono; the additional information that two channels would provide is not needed in this instance, so sticking with mono will save a lot of memory.

On a new FMOD event created specifically for footsteps, the samples will be added as well as the same 2D close / 3D far falloff. By doing so, when the camera is in its primary position, the footsteps will sound 2D. However, if the camera moves away from the player, the footsteps will be placed in the world.

It’s important to keep in mind that footsteps can be implemented in a variety of formats. For example, the technique for footstep implementation in a first-person shooter will be different from a real-time strategy game with an overhead camera.

Creating footstep variations

The variation principles that we used within the Scatter Instrument can be expanded to a new footstep event in FMOD. In the image below, you’ll notice that instead of the Scatter Instrument, this event makes use of the Multi Instrument. Here, a collection of footstep samples has been added to this instrument, much like before. But as opposed to the ambience event, where the one-shots will repeat infinitely while the event is playing, in this case, footsteps are played once when the FMOD event is triggered from the game engine.

Despite this slight difference in implementation, the same principles of variation can be applied to the footstep sounds. Here, volume and pitch can be modulated with each sample that’s triggered. Even though there are a few samples in the Multi Instrument, each sample will have random modulation attached to its pitch and volume.

Image of pitch and volume randomization on a Multi Instrument in FMOD

Pitch and volume randomization on a Multi Instrument in FMOD

With a sound like footsteps that will be consistently present during the gameplay experience, continued variation is important to prevent ear fatigue for the player. For footsteps in particular, further variation can be added by adding different events that correspond to different surfaces. Then, there will be a collection of sounds for each of the different surface types the player walks on, each with their own set of randomization, creating a very dynamic soundscape that can persist throughout gameplay.

Conclusion

Designing audio for games often involves utilizing clever techniques that allow a designer to do a lot with just a few resources. In this example, we used a small collection of sounds and techniques to create a dynamic soundscape that could persist over many hours of gameplay.

When scaled out to a full production, the number of assets in the game as a whole might exponentially increase, and large changes often need to be made to groups of sounds for either mix or performance reasons. In the next article, we’ll discuss how game audio techniques scale to the size of a full game and how the scope can be managed.

Do you have any questions on the concepts covered in the article? Let us know in the comments below.


Explore royalty-free sounds from leading artists, producers, and sound designers:

January 15, 2021

Ronny Mraz Ronny is a Technical Sound Designer at Avalanche Studios in New York City. There, he has worked on the recently released Just Cause 4, as well as the game’s three downloadable content packs: Dare Devils of Destruction, Los Demonios, and Danger Rising. He's also a member of the Adjunct Faculty at NYU Steinhardt where he teaches the class, "Scoring Techniques: Video Games."