Taos Engine ▦ Taos: Building a Modern WebGPU Game Engine

Volumetric Fluid — Technical Deep Dive

The Volumetric Fluid sample is a GPU grid-based (Eulerian) fluid simulator that renders fire, smoke and water as a participating medium. A 3D "stable fluids" solver advances velocity / density / temperature fields entirely in compute shaders, and a fullscreen ray-march composites the density field as volumetric emission and absorption over a procedural sky and ground. It exercises the engine's render graph (compute and render passes), the RenderContext device / timing plumbing, the Camera + CameraController fly camera, and the shared BloomPass / TonemapPass post-processing passes.


1. Overview#

You fly a free camera (WASD + mouse, no pointer lock) around a single simulation box that sits on a checkerboard ground plane. A control panel on the right switches between four templates and three grid resolutions, and a slider bar at the bottom tunes three live multipliers.

The four templates (all the same solver, re-tuned):

Key Template Render mode Emitter What you see
1 Campfire fire plume Continuous hot plume, black-body emission, flicker
2 Fireball fire burst Spherical bursts re-fired every 3.2 s
3 Smoke smoke plume Cool buoyant plume, sun-lit and self-shadowed
4 Splashing Water water drip Drops fall under gravity into a churning pool

Controls:

  • WASD + mouse — fly the camera (CameraController, pointerLock: false).
  • 1–4 — select template; R — reset the current template; T — trigger (re-fire a burst/drip immediately).
  • G — toggle the render-graph visualization overlay.
  • Panel buttons mirror the keys, plus Low / Medium / High resolution buttons.
  • Sliders: Speed (0–2x), Opacity (0.3–2x), Detail (0–2x).

All panel state (template, resolution, three sliders) is persisted to localStorage under the key crafty.fluid.settings and restored on reload.


2. Architecture#

File map for the sample (samples/fluid_test.html, samples/fluid_test.ts, and everything under samples/fluid/):

File Responsibility
fluid_test.html Page shell: canvas, info HUD, the (empty) #panel populated by JS, the slider bar, and the source_viewer.ts include.
fluid_test.ts App entry. Owns camera, UI, persisted settings, per-frame parameter packing, and wires the compute + render-graph passes each frame.
fluid/presets.ts The four FluidPreset configs, the fixed GRID resolution, and the world-space box (BOX_MIN / BOX_SIZE).
fluid/fluid_sim.ts FluidSim class — owns the field 3D textures and six compute pipelines, records the solve into its own command buffer.
fluid/fluid_sim.wgsl The six solver compute kernels (advect, vorticity, forces, divergence, pressure, project).
fluid/fluid_render_pass.ts FluidRenderPass — a render-graph Pass that ray-marches the density field into an HDR target.
fluid/fluid_render.wgsl Vertex (fullscreen triangle) + fragment (volumetric ray-march) shader.

The compute simulation (FluidSim) is deliberately not a render-graph pass. It submits its own command buffer first; the render graph then consumes the density texture FluidSim produced. Only the ray-march, bloom and tonemap go through the graph.


3. The simulation#

3.1 The grid and its fields#

The reference grid is GRID = [64, 96, 64] cells (presets.ts) — tall in Y to give plumes vertical room. It maps to a world-space box BOX_MIN = [-6, 0, -6], BOX_SIZE = [12, 18, 12], sitting on the ground at y = 0.

FluidSim (fluid/fluid_sim.ts) allocates these 3D textures, each dimension: '3d' with TEXTURE_BINDING | STORAGE_BINDING usage:

Field Format Count Holds
_vel rgba16float 2 (ping-pong) velocity xyz
_den rgba16float 2 (ping-pong) r = density, g = temperature
_prs r32float 2 (ping-pong) pressure
_div r32float 1 velocity divergence
_vort rgba16float 1 curl (vorticity) of velocity

Density and temperature share one rgba16float texture (.x / .y), so advection moves both in a single trilinear sample. Pressure and divergence use r32float — scalar fields needing the precision for the Jacobi relaxation.

The host owns the ping-pong: _vi and _di index the slot holding the current velocity / density state. Each step that produces a new field reads slot i and writes slot i ^ 1, then flips the index. _prs ping-pongs internally inside the pressure loop. _div and _vort are single-buffered (written then immediately consumed within the same pass).

3.2 The solver pipeline#

fluid_sim.wgsl is a single module with six @compute entry points, all @workgroup_size(4, 4, 4) (the constant WG = 4 in fluid_sim.ts; dispatch is ceilDiv(grid, 4) per axis). The order each step() records is:

advect  →  vorticity  →  forces/emit  →  divergence  →  pressure (×N)  →  project

This is a classic Stam "stable fluids" loop. Each kernel is wired by its own explicit GPUBindGroupLayout (LayoutSet), so the WGSL binding list (@binding(0)@binding(11)) is a superset — any one entry point only touches the subset its layout declares.

1. Advection — cs_advect. Semi-Lagrangian: for cell c, trace backward back = pos - velocity * dt and trilinearly resample both velocity and density at that point. Unconditionally stable for any dt.

let back = pos - loadVel(c) * dt;
let newVel = sampleVelField(back);
let newDen = sampleDenField(back);   // r = density, g = temperature

sampleVelField / sampleDenField convert cell-center coordinates to texture UVW with (p + 0.5) / grid and use a linear-filtering, clamp-to-edge sampler.

2. Vorticity — cs_vorticity. Computes the curl of the (advected) velocity field via central differences of the six neighbors and stores it per cell into _vort. This is precomputed so the forces pass can apply vorticity confinement cheaply.

3. Forces & emission — cs_forces. Everything that changes the fields outside advection and projection:

  • Buoyancy: vel.y += (buoyancy * (temperature - ambientTemp) - weight * density) * dt. Hot fluid rises; smoke weight drags its mass back down a little.
  • Gravity: vel.y -= gravity * density * dt — pulls dense fluid down (the water preset's main force).
  • Vorticity confinement: if vorticityStrength > 0, takes the gradient of |vort|, normalizes it, and adds eps * cross(N, vort) * dt. This re-injects the small swirls that semi-Lagrangian advection numerically smears away.
  • Source injection: when the emitter is active, emitterFalloff(c) gives a soft weight (flat disk for shape=0 plumes, soft sphere for shape=1 bursts/drips). Density is added, temperature is max'd up to the emitter temperature, and velocity is mix'd toward the injected velocity (plus an optional outward radial component for fireballs).
  • Cooling / dissipation: temperature decays linearly (-cooling * dt, clamped at 0); density and velocity decay exponentially (density *= exp(-densityDissipation * dt), likewise velocity).
  • Top fade: a smoothstep over the top ~6 cells multiplies density down by up to 70%, so fluid softly leaves the open top of the box instead of piling up against the (closed-pressure) ceiling.

4. Divergence — cs_divergence. Central-difference divergence of the post-force velocity into _div: 0.5 * ((r-l) + (u-d) + (f-b)).

5. Pressure — cs_pressure. One Jacobi relaxation step of the pressure Poisson equation:

prsOut = (l + r + d + u + b + f - div) / 6.0;

The host runs this pipeline _iterations times in a loop, ping-ponging _prs[0] ↔ _prs[1] via two pre-built bind groups (pingA / pingB). The iteration count is forced even (iterations + (iterations & 1)) so the final relaxed pressure always lands back in _prs[0], which the projection pass then reads. iterationsFor() in fluid_test.ts scales it with grid size: round(20 * gridX / GRID[0]) — 20 at Medium, ~15 at Low, ~30 at High — because a bigger grid needs more relaxation sweeps to propagate pressure across it.

6. Projection — cs_project. Subtracts the pressure gradient to make the velocity field divergence-free:

var vel = loadVel(c) - 0.5 * vec3<f32>(r - l, u - d, f - b);

It then enforces boundary conditions: zero normal velocity on the four side walls (x and z extremes), and a floor that fluid cannot sink through (if (c.y == 0) { vel.y = max(vel.y, 0.0); }). The top is left open. Reads of out-of-domain cells everywhere in the shader go through clampCoord, giving Neumann-style "no flux through the wall" sampling.

3.3 The SimParams uniform#

fluid_test.ts packs SIM_PARAM_FLOATS = 28 floats (7 × vec4, 112 bytes) each frame and FluidSim.step() uploads them. The WGSL SimParams struct maps them as:

vec4 Contents
grid grid x, y, z, _
timing dt, time, vorticityStrength, _
emitterPos emitter center xyz (grid space), radius
emitterVel injected velocity xyz, radial burst speed
emitter rate, temperature, shape (0=plume/1=sphere), active
forces buoyancy, weight, gravity, cooling
dissipation densityDissipation, velocityDissipation, ambientTemp, _

3.4 Reset and resize#

reset() destroys and re-allocates every field texture (all zeroed by WebGPU) — used on template switch and R. resize() additionally updates gridX/Y/Z, the dispatch dimensions, and the pressure iteration count, so the grid can change resolution at runtime.


4. Presets#

presets.ts defines four FluidPreset objects. Each is the same solver tuned differently, plus a renderMode for the ray-marcher. Key differentiators:

Param Campfire Fireball Smoke Water
renderMode fire fire smoke water
emitterMode plume burst plume drip
buoyancy 22 14 10 0
weight 0.9 0.6 0.4 0
gravity 0 0 0 30
cooling 1.15 0.9 0.6 0
densityDissipation 0.38 0.5 0.12 0.32
velocityDissipation 0.22 0.5 0.2 0.12
vorticity 9 13 6 3
emitterRadius 9 8 6 6
emitterHeight 9 36 8 80
emitterRate 2.7 16 1.9 24
emitterTemp 1.75 2.0 0.9 0
emitterVelY 16 7 9 −22
emitterRadialSpeed 0 27 0 0
cycleSeconds 0 3.2 0 0.85
densityScale 9 8 11 16
emissiveStrength 2.7 3.1 0 0
tint warm gray dark gray light gray blue

Reading the table:

  • Fire presets have high buoyancy and cooling, non-zero emissiveStrength, and the ray-marcher emits via the black-body fireRamp. Campfire is a steady mid-height plume; Fireball is a high (emitterHeight 36) spherical burst with strong outward emitterRadialSpeed and a short cycleSeconds re-fire.
  • Smoke has low dissipation (long-lived density, 0.12), no emission, a light-gray albedo, and relies on lightMarch self-shadowing for shape.
  • Water has zero buoyancy/cooling and a large gravity of 30; the emitter drips from emitterHeight 80 with a negative emitterVelY (−22, downward), and the blue tint plus high densityScale make a dense glossy medium.

GRID, BOX_MIN, BOX_SIZE are also exported here so both the sim and the renderer agree on the domain.


5. Rendering#

5.1 The pass#

FluidRenderPass (fluid/fluid_render_pass.ts) is a render-graph Pass<void, FluidRenderOutputs>. Its create() builds a render pipeline with a fullscreen-triangle vertex shader, a 256-byte uniform buffer, a linear sampler, and a bind-group layout of { uniform, texture_3d<float>, sampler }. The volume texture is not a graph resource — it is owned by FluidSim and bound directly in the execute callback via setVolume(sim.densityView).

addToGraph creates a transient FluidHDR texture (HDR_FORMAT = rgba16float, full canvas size), declares it as a cleared attachment, and in the execute callback binds the uniform + volume + sampler and issues enc.draw(3) — one fullscreen triangle.

5.2 The ray-march#

fluid_render.wgsl's fs_main does the volumetric integration:

  1. Ray reconstruction. The fullscreen triangle's NDC is unprojected with invViewProj at depths 0 and 1; ro is the camera position, rd the normalized world-space ray.

  2. Background. skyColor(rd) is a vertical gradient with a sharp sun disc (pow(dot, 250)) and soft halo (pow(dot, 6)). If the ray points down, it intersects the y = 0 plane and shades groundColor — a checkerboard with distance fade and a Lambert term from the sun. bgDist records that hit.

  3. Box intersection. intersectBox is a slab test returning (tNear, tFar). If the ray misses, or the box is fully behind the ground, it returns just the background.

  4. March. STEP_COUNT = 64 primary steps between tEnter and tExit = min(boxFar, bgDist), step size dh. The entry point is jittered by a static hash12(frag.pos.xy) dither to hide slice banding without TAA. The loop accumulates front-to-back and early-outs when transmittance < 0.01.

  5. Per-sample shading. Density field.x becomes extinction sigma = density * densityScale. The shading branches on renderMode:

    • Fire (mode 0): radiance = fireRamp(temperature) * emissiveStrength * density + scatter * sigma, where fireRamp is a 5-stop HDR black-body ramp (values up to ~3.2, so the hot core blows out past 1.0 for the bloom pass). scatter adds a little ambient + sun-lit smoke around the flame.
    • Smoke (mode 1): a light-scattering medium — radiance = (ambient + sunColor * shadow) * tint * sigma, where shadow comes from lightMarch.
    • Water (mode 2): the same scattering term, plus when the march first crosses a dense iso-surface (density > 0.4) it shades a one-off glossy splash highlight: a gradient-of-density surface normal feeds a Blinn-style spec (pow(dot(n,h), 56)) and a Fresnel-weighted sky reflection.
  6. Self-shadowing — lightMarch. From each marched point it casts a short secondary ray of LIGHT_STEPS = 4 samples toward the sun, accumulates density, and returns exp(-sum * dh * densityScale * shadowDensity) — the light that survives to the sample. This gives smoke and water volumetric self-shadowing.

  7. Composite. accum + bg * transmittance — the integrated volume over the remaining background visibility.

5.3 The RenderParams uniform#

fluid_test.ts packs RENDER_PARAM_FLOATS = 44 floats (a mat4 + 7 × vec4, 176 bytes; the buffer itself is 256 bytes). The mat4 is the camera's inverse view-projection; the trailing vec4s carry camPos/time, boxMin/renderMode, boxSize/STEP_COUNT, sunDir/densityScale, sunColor/LIGHT_STEPS, tint/emissiveStrength, and misc (SHADOW_DENSITY, AMBIENT). Fixed scene constants in fluid_test.ts: SUN_DIR = [0.413, 0.731, 0.543], SUN_COLOR = [1.15, 1.05, 0.9], AMBIENT = 0.09, SHADOW_DENSITY = 1.0.

5.4 Post-processing#

After FluidRenderPass produces the HDR target, the shared BloomPass blooms it (threshold 1.1, knee 0.6, strength 0.55 for fire vs 0.18 otherwise — so fire glows much harder), and TonemapPass applies exposure 1.1 with the ACES curve to the canvas backbuffer.


6. Editing / interaction#

The control panel is built procedurally in fluid_test.ts into the #panel div: a Template section (one button per PRESET_ORDER entry), a Resolution row of three buttons, and Trigger / Reset action buttons. refreshButtons() keeps the on/off CSS classes and the mode description in sync.

Emitter. computeEmitter(elapsed) builds the per-frame EmitterState (position, radius, velocity, rate, temperature, shape, active) from the active preset. All grid-space quantities are multiplied by scale (see §7). Behavior by emitterMode:

  • plume — a continuous source at the box base. For fire it also adds a flicker: two beat-frequency sines pulse the rate and temperature and wander the source position and injected velocity.
  • burst — a spherical pulse. Every cycleSeconds, emitStart is reset; the emitter is active only for the first 0.13 s of each cycle.
  • drip — like burst but picks a fresh random (dropX, dropZ) offset each cycle and is active for the first 0.1 s.

T / Trigger sets emitStart = -1e9. Because computeEmitter re-fires when elapsed - emitStart >= cycleSeconds, that huge negative value forces an immediate burst/drip on the next frame.

R / Reset calls selectPreset(currentKey), which calls sim.reset() (zeros all fields) and forces an immediate emit.

Sliders are multipliers layered on top of the active preset, read fresh each frame in frame():

Slider Range Effect
Speed 0–2x Scales dt: dt = min(deltaTime, 1/30) * speedMul.
Opacity 0.3–2x Scales the ray-march densityScale (densityScale * opacityMul).
Detail 0–2x Scales the simulation vorticity strength.

Resolution buttons call setResolution(i), which sets activeGrid, recomputes scale = activeGrid[0] / GRID[0], calls sim.resize(...) with a new iteration count, and forces an emit. The three resolutions are Low [48,72,48], Medium GRID = [64,96,64], High [96,144,96].


7. Frame flow#

frame() in fluid_test.ts, once per requestAnimationFrame:

  1. ctx.update() advances frame timing; on a canvas resize it returns true and cache.trimUnused() is called. The FPS HUD shows ctx.fps and the grid.
  2. cameraController.update(...) then camera.updateRender(ctx); ctx.activeCamera = camera.
  3. Read the three slider multipliers; clamp dt to 1/30 s and scale by speed.
  4. computeEmitter(elapsed) → pack the 28-float simParamssim.step(simParams). FluidSim.step() writes the uniform, records all six kernels (with the pressure loop) into a FluidSimEncoder compute pass, and submits its own command buffer immediately.
  5. Pack the 44-float renderParams; fluidRenderPass.setVolume(sim.densityView) and updateParams(...). Update bloom and tonemap params.
  6. Build the render graph for the frame:
const graph = new RenderGraph(ctx, cache);
const backbuffer = graph.setBackbuffer('canvas');
const volume = fluidRenderPass.addToGraph(graph);
const bloom = bloomPass.addToGraph(graph, { hdr: volume.hdr });
tonemapPass.addToGraph(graph, { hdr: bloom.result, backbuffer });
const compiled = graph.compile();
graphViz.setGraph(graph, compiled);
void graph.execute(compiled);

So each frame is two submissions: first the compute solve (outside the graph), then the graph's single command buffer for ray-march → bloom → tonemap. The render graph never sees the simulation; it only consumes the density texture the compute pass already produced.

The scale factor. Because the templates are tuned for the Medium grid, fluid_test.ts multiplies every cell-space quantity by scale = activeGrid[0] / GRID[0] before packing: emitter position/radius/velocity, buoyancy, weight, gravity and vorticity. World-space rendering quantities (BOX_MIN, BOX_SIZE) do not scale — the box is the same size at every resolution; only the cell count (and thus detail and cost) changes.


8. Notable techniques and gotchas#

  • Compute is not in the render graph. FluidSim records and submits its own command buffer. The graph only handles the screen passes. The density texture crosses that boundary as a plain GPUTextureView passed to FluidRenderPass.setVolume(), relying on WebGPU's automatic ordering between the two submissions on the same queue.

  • Even pressure iterations. The Jacobi loop ping-pongs _prs[0] ↔ _prs[1]. The iteration count is rounded up to even so the relaxed pressure always ends in _prs[0], which cs_project is hard-wired to read. iterationsFor() scales iterations with grid size so relaxation quality is roughly constant.

  • Shared density/temperature texture. Packing both into one rgba16float (.x / .y) means advection transports them together with a single sample, and the ray-marcher reads both at once via sampleDensity.

  • Unfilterable-float pressure. The pressure bind-group layout marks prsIn and divIn as unfilterable-float (tex(4, true) / tex(5, true) in fluid_sim.ts) because r32float is not filterable — those kernels only ever use integer textureLoad, never the linear sampler.

  • Open-top boundary. Projection closes the four sides and the floor but leaves the top open, and cs_forces additionally fades density in the top ~6 cells so plumes dissipate gracefully instead of stacking against the ceiling.

  • Vorticity confinement. Semi-Lagrangian advection is stable but numerically diffusive — it smears small eddies. cs_vorticity + the confinement term in cs_forces re-inject that lost rotational detail; the Detail slider scales its strength, so turning Detail down visibly smooths the motion.

  • HDR fire ramp drives bloom. fireRamp returns values up to ~3.2, so flame cores exceed 1.0 and the BloomPass (threshold 1.1) picks them up as glow. Bloom strength is raised to 0.55 for fire vs 0.18 for smoke/water.

  • Dither instead of TAA. A static hash12 jitter on the ray entry point hides the banding from only 64 march steps without needing temporal accumulation.

  • dt clamping. dt is clamped to 1/30 s before the speed multiplier, so a frame-rate hitch cannot blow up the simulation (semi-Lagrangian advection stays stable, but forces and emission would over-inject on a huge dt).

  • scale keeps templates resolution-independent. Cell-space forces and emitter parameters are multiplied by scale so a campfire looks like a campfire at Low, Medium or High — just with more or less detail and GPU cost.