Chapter 14: Audio
Audio is a first-class engine subsystem living in src/audio/. It wraps the Web Audio API behind a small, game-oriented API — buses, triggered sound effects, spatial positioning, background music, an insertable effects chain, microphone capture, and analysis for visualization. The central object is AudioEngine: it is standalone (not tied to the renderer), so the game (crafty/) and the standalone samples drive it the same way. This chapter covers the engine API; Chapter 22 shows how Crafty wires it into footsteps, digging, and ambient music.
14.1 The AudioEngine and the Web Audio Graph#
The Web Audio API exposes an AudioContext — a graph of AudioNodes that runs on a dedicated audio thread. Sources (decoded buffers, oscillators, the microphone) connect through processing nodes (gain, panner, filters) to the destination (the speakers). AudioEngine owns one context and builds the standard plumbing on top of it:
import { AudioEngine } from '../src/audio/index.js';
const audio = new AudioEngine();
audio.resumeOnGesture(); // unlock on first click / key / touch
const clip = await audio.load('/sfx/hit.ogg');
audio.play(clip, { bus: 'sfx' }); // one-shot
Browsers block audio until a user gesture, so the context starts suspended. resumeOnGesture() attaches one-shot listeners that resume it on the first interaction; resume() does it explicitly from inside an existing handler. The context is created lazily the first time it is needed, so constructing an AudioEngine at startup is cheap.
Clips are loaded and decoded once and cached by URL — AudioLoader dedupes concurrent loads of the same file so a preloading screen and a gameplay trigger share a single fetch:
const clip: AudioClip = await audio.load(url); // cached + deduped
console.log(clip.duration, clip.channels, clip.sampleRate);
14.2 Buses and Volume#
Every sound routes through a named AudioBus — a mixing group with its own gain, mute, and optional effect chain. A fresh engine has master plus sfx, music, ui, and voice, all routed into master, which feeds the destination. Because the buses form a tree of gain nodes, volumes cascade: the effective level of an SFX voice is sfx.volume × master.volume.
audio.masterVolume = 0.5; // shorthand for audio.master.volume
audio.bus('music').volume = 0.4;
audio.bus('sfx').fade(0.0, 0.5); // ramp the SFX bus to silence over 0.5 s
audio.muted = true; // silence everything; volumes are preserved
Muting a bus sets its gain to zero but keeps the logical volume, so unmuting restores the previous mix. This bus model is what the UI volume sliders and the global mute toggle drive (see Chapter 15).
14.3 Triggered Sound Effects#
play() (flat) and playAt() (spatial) both spin up a short-lived voice — a BufferSource → Gain → [Panner] → bus chain — and return a SoundHandle. The handle is the control surface for that one sound: stop it, fade it, repitch it, or move it. Finished voices are pruned automatically by update(), called once per frame.
const shot = audio.play(clip, { bus: 'sfx', volume: 0.9, playbackRate: 1.2 });
shot?.fadeOut(0.3);
// looping voice you keep a handle to:
const engineHum = audio.play(loopClip, { bus: 'sfx', loop: true });
// later: engineHum?.stop();
VoiceOptions cover the common per-sound knobs — volume, playbackRate (which also shifts pitch), detune (cents), loop, fadeIn, and a start offset. Slightly randomizing playbackRate per trigger is the cheapest way to keep repeated effects (footsteps, gunfire) from sounding mechanical.
14.4 Spatial Audio#
Passing a position (or calling playAt) inserts a PannerNode into the voice. The panner uses the HRTF model for convincing directional cues and an inverse distance rolloff so sounds fade with range:
const voice = audio.playAt(clip, enemy.position, {
bus: 'sfx',
spatial: { refDistance: 2, maxDistance: 30, rolloffFactor: 1 },
});
voice?.follow(enemy); // keep the panner glued to a moving object each frame
follow(target) binds the voice to anything exposing a position; update() pushes the latest position into the panner. The other half of spatial audio is the listener — the "ears" — which must track the camera:
// every frame, from your own loop:
audio.updateListener(camPos, camForward, camUp);
audio.update(dt);
For Scene/Component-driven apps the engine ships two components (src/engine/components/) that do this automatically: AudioListener (attach to the camera GameObject) and AudioSource (attach to any object to play a sound that tracks its world position):
cameraGO.addComponent(new AudioListener(audio));
const src = bee.addComponent(new AudioSource(audio, { clip: buzz, loop: true }));
src.play();
14.5 Music#
Background music loops through the music bus and supports crossfades. Only one track plays at a time; starting a new one replaces (and optionally fades) the old:
await audio.playMusic('/music/exploration.ogg', { fade: 1.5, volume: 0.8 });
// transition to combat:
await audio.playMusic('/music/combat.ogg', { fade: 2 }); // crossfades
audio.fadeOutMusic(2); // back to silence
14.6 Effects, Microphone, and Analysis#
Effects are self-contained sub-graphs (an input/output pair) inserted into a bus's chain between its gain and its parent. The engine ships reverb (a convolver with a generated impulse response), biquad filters, delay/echo, distortion, and a compressor/limiter:
import { createReverb, createFilter } from '../src/audio/index.js';
const sfx = audio.bus('sfx');
const verb = sfx.addEffect(createReverb(audio.context, { seconds: 2.5, wet: 0.4 }));
sfx.removeEffect(verb); // disposes it and rewires the chain
Microphone capture wraps getUserMedia. The raw input is not routed to the speakers by default (to avoid feedback) — connect it where you want it, and read its level/spectrum from the built-in analyser:
const mic = await audio.requestMicrophone();
mic.connect(audio.bus('voice').input); // monitor through the voice bus
const level = mic.getLevel(); // 0..1 RMS
An AudioAnalyser is a transparent tap for visualization. Attach one to any bus (or the master) and read the waveform, spectrum, or RMS level each frame — reuse the output arrays to avoid per-frame allocation:
const a = audio.master.analyser({ fftSize: 2048 });
const wave = new Float32Array(a.node.fftSize);
function draw() { a.getWaveform(wave); /* … render … */ }
The audio_test sample exercises all of this — triggers, an orbiting spatial emitter, music, the effects rack, the microphone, and a live visualizer — with no renderer at all.
14.7 Summary#
The audio subsystem is a thin, game-oriented layer over the Web Audio API:
AudioEngine: standalone owner of theAudioContext; lazy creation, gesture-based unlock, per-frameupdate().- Buses:
master/sfx/music/ui/voicemixing groups with cascading volume, mute, fades, and effect chains. - Clips & loading: decode-once, URL-cached, in-flight deduped.
- Voices (
SoundHandle): one-shot or looping;stop/fade/setVolume/setPlaybackRate/setPosition/follow. - Spatial audio: HRTF
PannerNodevoices, inverse distance rolloff, listener sync;AudioSource/AudioListenercomponents. - Music: looping, single-track, crossfaded.
- Effects / mic / analysis: insertable reverb, filter, delay, distortion, compressor; microphone capture; analyser taps.
Further reading:
src/audio/audio_engine.ts— the standaloneAudioEngine: buses, loading, playback, music, mic, analysissrc/audio/audio_bus.ts,src/audio/audio_effects.ts— mixing groups and the insertable effects chainsrc/audio/sound_handle.ts— a live voice (stop/fade/setPosition/follow)src/engine/components/audio_source.ts,audio_listener.ts— Scene components over the enginesamples/audio_test.ts— end-to-end demo of every capability- Chapter 22 — how Crafty uses this API for footsteps, digging, and ambient music