How procedural audio brings sounds to life in video games

procedural-audio-video-games-featured-image

Illustration: Daniel Zender

When listening to a video game’s soundscape, often times a player is hearing the playback of pre-recorded audio samples that interact with (and change based on) gameplay.

These sounds are created by the sound designer to fit the player’s experience, and can take many different forms such as looping sounds or one-shots, as we outlined in a previous article. However, there are some instances of sound in video games that are created procedurally at runtime, without a pre-rendered source.

Procedural audio is sound that is generated at runtime that typically utilizes synthesis driven by gameplay parameters. To put it simply, procedurally generated audio creates the sound effects that the player hears on the fly, based on a set of pre-defined behaviors.

This isn’t too different from other types of procedural generation you might find in other disciplines of game development, such as level design or environment art. At the cost of some additional CPU overhead, these types of sounds can create a remarkably low RAM footprint while allowing a sound designer to create a soundscape that’s closely intertwined with the mechanics of the game.

That said, procedurally generated content has its trade-offs. The more common procedural audio techniques often offer a more limited creative palette than what’s available with traditional sound design using recorded samples, unless a significant investment in technical development is put into the toolset. Therefore, both the creative and technical benefits of utilizing such techniques must be weighed against the potential creative limitations and tech investment.

In this article, we’re going to discuss some of the procedural audio techniques used in video games today. You’ve probably already heard some of these in your favorite games without realizing the difference between them and pre-rendered sounds playing simultaneously, despite the tradeoffs that the designers had to face.

Procedural vocalization techniques in No Man’s Sky

In No Man’s Sky, the audio team had the unique challenge of creating a dynamic soundtrack in a completely procedurally generated world. This meant that the majority of the assets in the game—even down to the types of creatures that were spawned in the world—were a result of algorithmic generation.

In order to create a soundscape as diverse as the game’s universe, the team created a bespoke plugin for the audio middleware, Wwise, that they coined as “VocAlien.” With VocAlien, they were able to synthesize the vocalizations for the unique creatures of the universe and closely tie the vocalizations to parameters associated with their characteristics.

The result became a sort of ‘instrument’ the designer could use to perform a sound based on the fundamentals of the creature’s base animations and behaviors. When actually added to the game, these performances are then warped based on the real-time gameplay data of the creatures as well as their behaviors in the world. At its core, the system acts as a filter that simulates mouth movement. In the 2017 GDC session seen below, sound designer Paul Weir breaks down the design process of the VocAlien tool, as well as the decision to hone in and focus on procedural vocal simulation.

Runtime synthesis in Just Cause 4

When driving a vehicle throughout the world in Just Cause 4, there’s an extra layer of sound that comes in when passing a non-player vehicle in traffic. This creates a ‘whoosh-by’ effect that simulates the wind between two vehicles moving at a high speed in passing traffic. These layers of sounds are generated by a noise synth written for the audio middleware, FMOD.

Using a combination of brown noise and white noise, the output of the synthesizer is controlled by a number of variables on the vehicle. The envelope of the sound, noise filtering, gain settings, and mix between noise types are controlled by factors like player vehicle speed, distance to non-player vehicles, and non-player vehicle speed. The azimuth of the listener in relation to the non-player vehicle as well as the orientation of the non-player vehicle also have a role to play in the sound’s behavior.

Each of these parameters combine to meet at an inflection point when the vehicles pass each other in traffic, creating a whooshing sound that’s tailored to the player’s exact driving scenario. In this case, the character of a whoosh can be easily created through synthesis, which allows for more control compared to recorded samples.

Using Pure Data in Two Dots

Each time the team for the mobile game Two Dots releases a new theme (or world), they also release a new set of sound effects specific to that theme. These sound sets manifest as synthesizer tones that respond to player input and gameplay, which then build upon each other to create the soundtrack that the player hears in game.

This poses two problems. First, as more themes are added into the game, even more sound effect files are also required. These sound effects continually occupy more and more disk memory, which is a problem on the mobile platform, where the download size of an application is incredibly important. In order to reduce the memory footprint, these files need to be compressed to ensure that their disk real estate remains low. Continuing to follow this pattern would mean that the more themes are added to the game, the lower the overall sound quality becomes, as continued compression is necessary.

As a solution, the team utilized Pure Data inside of Unity to synthesize their sound effects in real time. Pure Data is a visual programming language that allows users to generate audio though graph-like ‘patches’ using a series of built-in oscillators, filters, and effects. This allows the composers to script audio behavior right inside of Pure Data, using parameters that change the audio output at runtime based on gameplay.

This has a myriad of advantages—saving disk space and RAM, in addition to no longer needing to render out new sets of sound effects for each new theme. The composers can create a musical world for the new themes by establishing a set of behaviors. This also allows for more freedom to use pre-rendered assets for other more specialized purposes, if needed. You can learn more about how the team on Two Dots utilized Pure Data in this video.

Runtime synthesis in Grand Theft Auto 5

The team on Grand Theft Auto 5 utilized a synthesis toolkit called AMP (similar to Pure Data or Max) that was custom built inside of their own game engine to create the behaviors for procedural audio assets that are used in game. In the 2014 GDC session below, Alastair MacGregor details the synthesis engine that the team built for Grand Theft Auto 5 as well as the technical advantages and opportunities the system provided. From a creative standpoint, the system was utilized for content design on noise-associated assets, which allowed them to win back RAM on air conditioning units placed throughout the world as well as bicycle movement sounds.

When working with procedural audio, it’s always important to keep performance in mind. Saving memory is great, but if the tradeoff isn’t beneficial, it can ultimately be advantageous to use more traditional sound design techniques. In the video above, MacGregor details the technical steps that the team took to ensure that the use of the AMP system was always resulting in performance wins for the overall game. As MacGregor states in his talk, “There is no point in creating something that’s cool and looks great but is too expensive to use in game.”

Working within a small technical budget in Peggle Blast

In Peggle Blast, the team at PopCap had a huge challenge to tackle: creating a high-fidelity mobile audio experience within the confines of an incredibly small technical budget. In order to achieve this, sampled audio files were only used in very special case circumstances (these take up more memory resources as previously discussed), while the bulk of the sound effects in the game were generated procedurally.

The team utilized the real-time synthesis and digital processing capabilities available out-of-the-box in Wwise to generate sounds live while occupying a very small memory footprint. The toolset within Wwise can allow for a highly granular level of detail to be applied to synthesized sounds on the fly; this provided quick iteration time in tandem with gameplay.

The tradeoff? Creating detailed sounds from scratch is difficult and requires a lot of voices, or instances of audio generated in-game. In a typical scenario that involves an audio file exported from a DAW, there could be many layers of detail present in just one sample. However, when building a sound from scratch at runtime, all of those layers now exist as separate voices.

Though this occupies a negligible amount of memory, each one of those additional layers takes additional overhead from the CPU to be processed, which is especially important to consider when working on a mobile platform. This becomes not only a matter of memory optimization, but also CPU optimization and the ability to ‘do a lot with a little.’ In the 2015 GDC session below, the team talks about how they overcame the creative and technical challenges that were present with building a soundscape that’s largely generated at runtime.

These types of memory-saving techniques weren’t exclusive to sound design on the project; alternative techniques were also used for the music. In this case, the team opted to create the entire score using a set of MIDI sample banks inside of Wwise. In this way, memory could be optimized, as opposed to dealing with previously exported music files (which could occupy a lot of disk space). This allowed the team to optimize on a very granular level—not only by instrument selection, but also via the number of samples within each instrument, the number of musical voices being used at any given moment, etc.

The implementation of this kind of music strategy required careful thinking and tracking; you can learn more about the creative and technical process behind the soundtrack at this point in the video. Although the soundtrack isn’t an example of procedural audio in the traditional sense, the modularity that it provides is worth noting alongside our discussion—there are huge performance wins that can be had from a modular system where different pieces of pre-rendered audio are put together and combined with procedural elements at runtime.

Modular weapon sound design in Borderlands 3

Borderlands 3 boasts a crafting system where different pieces of weapons can be paired together to create an entirely new weapon. Though also not a traditional example of procedural audio, the highly modular weapon audio system in Borderlands 3 effectively compliments this unique level of variability in weapon choice offered by the game.

With the crafting system in mind, the audio time built a collection of sound effects using various characteristics based on different components of a weapon. For example, a snub nose barrel and a long barrel would each impact the sonic characteristics of a pistol, where all other aspects could be the same. Therefore, the team took a modular approach so that each piece of a weapon would come together to create the overall weapon sound.

A system like this has its advantages; creating bespoke sound effects from scratch for every possible weapon combination in the game would not only be incredibly time consuming and workload intensive, but also require a lot of memory overhead.

In order to create this modular system, the team recorded hundreds of different types of weapon assets that they would then design and mix together in a potential pool of combinations. The challenge was that every weapon element eligible to be combined needed to blend with the other sounds in a natural way. In this video, the team breaks down how they were able to overcome these challenges and design the weapon system’s audio.

The power and potential of procedural audio

Procedural audio offers both versatility as well as power when it comes to sound design. With the wide variety of information that game audio designers have access to in any modern gameplay scenario, the potential for closely tying those data points to audio feels limitless. However, procedural audio isn’t without its trade offs. At least for the immediate future, I think that we’ll find that sampled audio continues to yield the highest ‘fidelity.’

However, as the technical capabilities of platforms continue to evolve, it wouldn’t be a far-fetched proposition to say that procedural techniques will further augment the procedures of traditional video game sound design as a whole, allowing for more dynamic soundscapes that create an experience more closely tied with a player’s unique behaviors.

Do you have any questions about how procedural audio is used in video games? What sorts of topics in game audio would you like to see us cover next? Let us know in the comments below.

May 13, 2021

Ronny Mraz Ronny is a Technical Sound Designer at Avalanche Studios in New York City. There, he has worked on the recently released Just Cause 4, as well as the game’s three downloadable content packs: Dare Devils of Destruction, Los Demonios, and Danger Rising. He's also a member of the Adjunct Faculty at NYU Steinhardt where he teaches the class, "Scoring Techniques: Video Games."