Taos Engine ▦ Taos: Building a Modern WebGPU Game Engine

Chapter 14: Audio

Audio is a first-class engine subsystem living in src/audio/. It wraps the Web Audio API behind a small, game-oriented API — buses, triggered sound effects, spatial positioning, background music, an insertable effects chain, microphone capture, and analysis for visualization. The central object is AudioEngine: it is standalone (not tied to the renderer), so the game (crafty/) and the standalone samples drive it the same way. This chapter covers the engine API; Chapter 22 shows how Crafty wires it into footsteps, digging, and ambient music.

14.1 The AudioEngine and the Web Audio Graph#

Two routes through the AudioContext: spatial SFX (Buffer → BufferSource → Panner → destination) and music (Buffer → BufferSource → Gain → destination), with the listener and bootstrap notes on the side

The Web Audio API exposes an AudioContext — a graph of AudioNodes that runs on a dedicated audio thread. Sources (decoded buffers, oscillators, the microphone) connect through processing nodes (gain, panner, filters) to the destination (the speakers). AudioEngine owns one context and builds the standard plumbing on top of it:

import { AudioEngine } from '../src/audio/index.js';

const audio = new AudioEngine();
audio.resumeOnGesture();                 // unlock on first click / key / touch
const clip = await audio.load('/sfx/hit.ogg');
audio.play(clip, { bus: 'sfx' });        // one-shot

Browsers block audio until a user gesture, so the context starts suspended. resumeOnGesture() attaches one-shot listeners that resume it on the first interaction; resume() does it explicitly from inside an existing handler. The context is created lazily the first time it is needed, so constructing an AudioEngine at startup is cheap.

Clips are loaded and decoded once and cached by URL — AudioLoader dedupes concurrent loads of the same file so a preloading screen and a gameplay trigger share a single fetch:

const clip: AudioClip = await audio.load(url);   // cached + deduped
console.log(clip.duration, clip.channels, clip.sampleRate);

14.2 Buses and Volume#

Every sound routes through a named AudioBus — a mixing group with its own gain, mute, and optional effect chain. A fresh engine has master plus sfx, music, ui, and voice, all routed into master, which feeds the destination. Because the buses form a tree of gain nodes, volumes cascade: the effective level of an SFX voice is sfx.volume × master.volume.

Tree of gain nodes: the sfx, music, ui, and voice buses each route into master, which feeds the audio destination; effective level cascades as bus.volume times master.volume

audio.masterVolume = 0.5;          // shorthand for audio.master.volume
audio.bus('music').volume = 0.4;
audio.bus('sfx').fade(0.0, 0.5);   // ramp the SFX bus to silence over 0.5 s
audio.muted = true;                // silence everything; volumes are preserved

Muting a bus sets its gain to zero but keeps the logical volume, so unmuting restores the previous mix. This bus model is what the UI volume sliders and the global mute toggle drive (see Chapter 15).

14.3 Triggered Sound Effects#

Event flow: game events on the left feed playAt(), which spins up a one-shot BufferSource → Panner → destination chain that disposes itself on "ended"

play() (flat) and playAt() (spatial) both spin up a short-lived voice — a BufferSource → Gain → [Panner] → bus chain — and return a SoundHandle. The handle is the control surface for that one sound: stop it, fade it, repitch it, or move it. Finished voices are pruned automatically by update(), called once per frame.

const shot = audio.play(clip, { bus: 'sfx', volume: 0.9, playbackRate: 1.2 });
shot?.fadeOut(0.3);

// looping voice you keep a handle to:
const engineHum = audio.play(loopClip, { bus: 'sfx', loop: true });
// later: engineHum?.stop();

VoiceOptions cover the common per-sound knobs — volume, playbackRate (which also shifts pitch), detune (cents), loop, fadeIn, and a start offset. Slightly randomizing playbackRate per trigger is the cheapest way to keep repeated effects (footsteps, gunfire) from sounding mechanical.

14.4 Spatial Audio#

Top-down listener with three sources at different angles; HRTF panning produces per-ear gain values driven by the listener's orientation

Passing a position (or calling playAt) inserts a PannerNode into the voice. The panner uses the HRTF model for convincing directional cues and an inverse distance rolloff so sounds fade with range:

Inverse distance rolloff curve: gain stays at 1.0 inside refDistance (5 m), then falls off as 5/d, clamping at maxDistance (50 m, gain ≈ 0.1)

const voice = audio.playAt(clip, enemy.position, {
  bus: 'sfx',
  spatial: { refDistance: 2, maxDistance: 30, rolloffFactor: 1 },
});
voice?.follow(enemy);   // keep the panner glued to a moving object each frame

follow(target) binds the voice to anything exposing a position; update() pushes the latest position into the panner. The other half of spatial audio is the listener — the "ears" — which must track the camera:

// every frame, from your own loop:
audio.updateListener(camPos, camForward, camUp);
audio.update(dt);

For Scene/Component-driven apps the engine ships two components (src/engine/components/) that do this automatically: AudioListener (attach to the camera GameObject) and AudioSource (attach to any object to play a sound that tracks its world position):

cameraGO.addComponent(new AudioListener(audio));
const src = bee.addComponent(new AudioSource(audio, { clip: buzz, loop: true }));
src.play();

14.5 Music#

Background music loops through the music bus and supports crossfades. Only one track plays at a time; starting a new one replaces (and optionally fades) the old:

await audio.playMusic('/music/exploration.ogg', { fade: 1.5, volume: 0.8 });
// transition to combat:
await audio.playMusic('/music/combat.ogg', { fade: 2 });   // crossfades
audio.fadeOutMusic(2);                                      // back to silence

14.6 Effects, Microphone, and Analysis#

Effects are self-contained sub-graphs (an input/output pair) inserted into a bus's chain between its gain and its parent. The engine ships reverb (a convolver with a generated impulse response), biquad filters, delay/echo, distortion, and a compressor/limiter:

import { createReverb, createFilter } from '../src/audio/index.js';

const sfx = audio.bus('sfx');
const verb = sfx.addEffect(createReverb(audio.context, { seconds: 2.5, wet: 0.4 }));
sfx.removeEffect(verb);   // disposes it and rewires the chain

Microphone capture wraps getUserMedia. The raw input is not routed to the speakers by default (to avoid feedback) — connect it where you want it, and read its level/spectrum from the built-in analyser:

const mic = await audio.requestMicrophone();
mic.connect(audio.bus('voice').input);   // monitor through the voice bus
const level = mic.getLevel();            // 0..1 RMS

An AudioAnalyser is a transparent tap for visualization. Attach one to any bus (or the master) and read the waveform, spectrum, or RMS level each frame — reuse the output arrays to avoid per-frame allocation:

const a = audio.master.analyser({ fftSize: 2048 });
const wave = new Float32Array(a.node.fftSize);
function draw() { a.getWaveform(wave); /* … render … */ }

The audio_test sample exercises all of this — triggers, an orbiting spatial emitter, music, the effects rack, the microphone, and a live visualizer — with no renderer at all.

14.7 Summary#

The audio subsystem is a thin, game-oriented layer over the Web Audio API:

  • AudioEngine: standalone owner of the AudioContext; lazy creation, gesture-based unlock, per-frame update().
  • Buses: master / sfx / music / ui / voice mixing groups with cascading volume, mute, fades, and effect chains.
  • Clips & loading: decode-once, URL-cached, in-flight deduped.
  • Voices (SoundHandle): one-shot or looping; stop/fade/setVolume/setPlaybackRate/setPosition/follow.
  • Spatial audio: HRTF PannerNode voices, inverse distance rolloff, listener sync; AudioSource/AudioListener components.
  • Music: looping, single-track, crossfaded.
  • Effects / mic / analysis: insertable reverb, filter, delay, distortion, compressor; microphone capture; analyser taps.

Further reading:

  • src/audio/audio_engine.ts — the standalone AudioEngine: buses, loading, playback, music, mic, analysis
  • src/audio/audio_bus.ts, src/audio/audio_effects.ts — mixing groups and the insertable effects chain
  • src/audio/sound_handle.ts — a live voice (stop/fade/setPosition/follow)
  • src/engine/components/audio_source.ts, audio_listener.ts — Scene components over the engine
  • samples/audio_test.ts — end-to-end demo of every capability
  • Chapter 22 — how Crafty uses this API for footsteps, digging, and ambient music