Chapter 9: GPU Particle System
Particle effects bring a scene to life — rain streaking past the player, snow drifting through the air, smoke rising from an explosion, sparks flying from a mining pick. Taos implements a fully GPU-driven particle system where every phase (spawn, update, compaction, render) runs on the GPU via compute shaders. There is no CPU readback of particle state and no per-particle CPU work at runtime.
9.1 Architecture Overview#
The particle system is built around a four-stage compute pipeline executed each frame:
Spawn ──► Update ──► Compact ──► Indirect Write ──► (Translate) ──► Render
Each stage is a compute (or render) pass that operates on GPU-resident buffers. The host CPU only uploads a small uniform block (time, camera matrices, emitter transform, forward-shader params) once per frame.
| Stage | Shader | Purpose |
|---|---|---|
| Spawn | Generated per-config | Initialize new particles from the emitter |
| Update | Generated per-config | Apply modifiers (gravity, drag, noise, etc.) |
| Compact | particle_compact.wgsl |
Build aligned list of alive particles |
| Indirect Write | particle_compact.wgsl |
Write alive count into indirect draw buffer |
| Translate (mesh mode only) | particle_instance_translate.wgsl |
Build per-instance {model, normalMatrix, colorTint} from the alive list |
| Render | particle_render.wgsl / particle_render_forward.wgsl / instanced PBR |
Draw billboard quads or instanced meshes via indirect draw |
The spawn and update shaders are generated at pipeline creation time from a ParticleGraphConfig description, producing specialized WGSL that inlines only the needed modifiers — no runtime branching over uniforms.
9.2 Particle Graph Config#
The particle system is described declaratively through the ParticleGraphConfig interface (src/particles/particle_types.ts):
// ── from src/particles/particle_types.ts ──
interface ParticleGraphConfig {
emitter: EmitterNode;
modifiers: ModifierNode[];
renderer: RenderNode;
events?: EventNode[];
splits?: SplitNode[]; // sub-emitters — see 8.12
}
EmitterNode#
The emitter controls how many particles exist, how fast they spawn, their initial properties, and the geometric region they emerge from:
// ── from src/particles/particle_types.ts ──
interface EmitterNode {
position: [x, y, z]; // local offset from the GameObject transform
maxParticles: number; // GPU buffer capacity
spawnRate: number; // particles per second
lifetime: [min: number, max: number];
shape: SpawnShape; // sphere | cone | box | sdf_surface (see 8.13)
initialSpeed: [min: number, max: number];
initialColor: [r, g, b, a];
initialSize: [min: number, max: number];
roughness: number; // PBR for deferred / mesh renderer
metallic: number;
}
ModifierNode#
Modifiers are per-frame behaviors applied in order during the update compute pass. Each type produces a different WGSL code snippet:
| Modifier | Effect |
|---|---|
gravity |
Constant downward acceleration |
drag |
Velocity-proportional deceleration |
force |
Constant directional acceleration |
swirl_force |
Circular force rotating over time |
vortex |
Solid-body rotation around the emitter axis |
curl_noise |
Divergence-free turbulent flow via curl of a noise field (FBM) |
turbulence |
Plain vector noise — cheaper than curl noise, not divergence-free |
wind |
Directional force whose strength is modulated by low-frequency noise (gusts) |
point_attractor |
Pulls particles toward a world-space point with linear falloff over radius |
radial_force |
Pushes particles outward from a world-space center (negative strength = pull) |
speed_limit |
Clamps length(velocity) to a maximum |
size_random |
Stable random size per particle slot |
size_over_lifetime |
Linear size interpolation |
size_by_speed |
Size mapped from length(velocity) range to a size range |
rotation_rate |
Constant angular velocity (rad/s) on the sprite rotation |
rotation_over_lifetime |
Linear rotation interpolation (radians) |
color_over_lifetime |
Linear color + alpha interpolation |
alpha_over_lifetime |
Linear alpha-only interpolation (cheaper than full color) |
box_color |
While inside an AABB, set RGB to the normalized position within the box (alpha preserved) |
velocity_color |
Set RGB to normalize(abs(velocity)) — visualize per-axis motion (alpha preserved) |
bounds_kill |
Kill particles that leave a world- or emitter-space AABB |
plane_collision |
Half-space collider through center/normal; bounce or kill on contact |
sphere_collision |
Spherical collider; bounce or kill on contact |
block_collision |
Terrain heightmap collider; bounce, kill, or stick on contact (stick freezes the particle on the surface — used for snow accumulation) |
sdf_attractor |
Pulls particles toward the surface of the bound SDF along its gradient (see 8.13) |
sdf_collision |
Pushes particles out of the bound SDF along its gradient; bounce or kill on contact |
sdf_stick |
Snaps particles to the SDF surface and zeroes velocity on contact (see 8.13) |
expression |
Runs an ExpressionGraph or ExprNodeGraph — a composable scripting layer over the modifier chain (see 8.14) |
The four collider modifiers (plane_collision, sphere_collision, block_collision, sdf_collision) share a kill: boolean flag. When true the particle dies on contact and bounce/friction are ignored; otherwise the velocity is reflected with coefficient of restitution bounce and tangential damping friction (both in [0, 1]). block_collision adds a third contact mode via an optional stick: true flag that freezes the particle on the ground (see 8.6); kill takes precedence over stick.
Modifiers that take a position or center (point_attractor, radial_force, box_color, bounds_kill, plane_collision, sphere_collision) carry a space: 'world' | 'emitter' field. In 'emitter' mode the literal is treated as an offset from the emitter's effective origin (the GameObject transform plus the emitter's local position) — so a fountain or attractor follows the system around without per-frame CPU bookkeeping.
RenderNode#
Controls how particles are rasterized. Four top-level variants:
// ── from src/particles/particle_types.ts ──
type RenderNode =
| {
type: 'sprites';
blendMode: 'additive' | 'alpha';
billboard: 'camera' | 'velocity';
shape?: 'soft' | 'pixel' | 'line' | 'fire' | 'smoke' | 'ember';
renderTarget?: 'gbuffer' | 'hdr';
emit?: number; // HDR emission multiplier (default 4.0)
stretch?: number; // velocity streak length per unit speed (default 0.04)
thickness?: number; // velocity billboard cross-axis width (default 1.0)
size?: number; // camera billboard uniform scale (default 1.0)
sunDirection?: [number, number, number]; // for `smoke` fake lighting
sunColor?: [number, number, number];
}
| { type: 'points' }
| {
type: 'trail'; // ribbon threaded through each particle's path
segments: number; // max committed history points per particle (ring size)
width: number; // ribbon width at the head, world units
timeInterval?: number;// commit a point at least this often, seconds (default 0.03)
minDistance?: number; // ...or once moved this far, world units (default 0.1)
emit?: number; // HDR emission multiplier (default 4.0)
}
| { type: 'mesh'; mesh: Mesh; material: Material;
align?: 'world' | 'velocity'; renderTarget?: 'gbuffer' | 'hdr' };
Two billboard modes exist:
camera— quad always facing the camera (snow, smoke, fire).velocity— quad aligned with the velocity vector, stretched by speed (rain, sparks, embers).
Six sprite shapes are supported, each backed by its own fragment shader in src/shaders/particles/particle_render_forward.wgsl:
soft— radial alpha falloff (default; snow, dust, generic puffs).pixel— hard-edged square, no falloff, no HDR boost (blocky debris).line— bright HDR-emissive rectangle with round caps. Pair withbillboard: 'velocity'; the visible "glow" around the line is produced by the downstream BloomPass picking up the high emissive intensity.fire— animated procedural flame. Vertical FBM noise carves an organic silhouette; color ramps hot → cool over lifetime. No texture needed.smoke— animated procedural puff with noise-modulated alpha and fake side-lighting alongsunDirection/sunColor(a half-Lambert lookup against the billboard's view direction; no shadowing).ember— small bright HDR glow with per-particle twinkle (sin(time * 9 + perParticlePhase)). Pair withvelocityfor trailing embers orcamerafor stationary sparkles.
Renderer knobs that tune the forward sprite shaders (all optional, with sensible defaults):
| Knob | Effect |
|---|---|
emit |
Multiplies the fragment-shader output color. Crank to ~12–24 for hot sparks/fire so they punch through bloom; drop to 1.0 for ordinary alpha sprites. |
stretch |
Velocity-aligned streak length grows by 1 + stretch * speed. Use ~0.3–1.0 for sparks; default 0.04 is barely visible. |
thickness |
Cross-axis width multiplier for velocity-aligned billboards. Lets sparks pick a thin wire (~0.4) or a chunky bolt (~2.0) independently of initialSize. |
size |
Uniform quad-scale multiplier for camera-aligned billboards (pixel/soft/fire/smoke/stationary ember). Tunes apparent size without retuning the emitter. |
sunDirection / sunColor |
Read by the smoke shape only; toward-sun unit vector and color used for the fake lit edge. |
Two render targets are supported for sprites:
hdr— forward alpha-blended rendering into the HDR color buffer (transparent effects). All six shapes use this path.gbuffer— deferred opaque rendering into the four-attachment G-Buffer (albedo, normal, emissive, depth) for hard billboards. The emissive attachment is new — opaque particles now contribute to the emissive channel alongside the rest of the scene.
Mesh particles#
{ type: 'mesh', mesh, material } renders each live particle as an instance of a real PBR mesh — useful for arrows, debris cubes, magic projectiles, anything that wants depth, lighting, and silhouette but still benefits from GPU-driven spawn/update/recycle. The pass owns a per-instance buffer of { mat4 model, mat4 normalMatrix, vec4 colorTint } (144 bytes) populated each frame by particle_instance_translate.wgsl, which reads the alive list and packs transforms compacted at slots [0, aliveCount) so the indexed indirect draw never touches dead slots.
align chooses how the mesh frame is built from particle state:
'world'— identity rotation +particle.rotationspin around world +Y. The mesh's authored axes face world axes.'velocity'— mesh local +Y aligns withparticle.velocitydirection (world +Y when velocity ≈ 0), andparticle.rotationspins the mesh around that local +Y. Useful for arrows, sparks, debris.
The material must support the HAS_INSTANCING shader variant (see PbrMaterial); its WGSL is compiled with that define and its bind-group layout for MATERIAL_GROUP is reused as-is. In HDR forward mode the pass borrows the ForwardPass lighting/IBL bind group so mesh particles share the same lights as the rest of the forward-lit scene.
Trail particles#
{ type: 'trail', segments, width } draws each live particle as a variable-width, camera-facing ribbon threaded through a short history of its own past positions — comet tails, tracers, energy beams. Unlike the velocity-aligned line sprite (which only reflects instantaneous velocity), a trail follows the particle's actual path, so it curves through arcs and noise.
Each particle owns a ring buffer of at most segments committed points (a separate GPU buffer, maxParticles × segments × 16 bytes). The update compute shader appends the live position to the ring whenever either condition fires:
- the global time-tick elapses (
timeIntervalseconds — driven CPU-side so the cadence is frame-rate independent), or - the particle has moved
minDistanceworld units since its last committed point (decided per-particle in the shader).
So the trail's length is bounded by both time and distance; set minDistance: 0 to rely purely on the time cadence, or a large timeInterval to rely purely on distance. The per-particle write cursor + point count are packed into the otherwise-unused trail_meta word of the particle struct, and a trail_commit flag rides in the spare slot of the compute uniforms — so trails add one storage buffer (bound at group(0) binding(4) of the compute data group, clear of the four-bind-group ceiling that heightmap/SDF/splits compete for) and no new per-particle struct growth.
Rendering is a single drawIndirect of one triangle-strip per particle (2 × (segments + 1) vertices; WebGPU restarts the strip between instances). particle_trail.wgsl builds the ribbon in the vertex stage: the head vertex tracks the live position, older vertices read the ring, and a camera-facing cross axis (cross(tangent, viewDir)) gives the strip width. Width tapers from width at the head to zero at the tail and alpha fades along the same axis, so the trail dissolves as it thins. It renders additively into the HDR target, so the BloomPass makes hot tracers glow. See the Particle Trails sample.
SplitNode#
A sub-emitter that spawns a separate child system from the parent's particles. Useful for effects like fireworks, flares, and projectile trails. See 8.12 for the GPU pipeline.
// ── from src/particles/particle_types.ts ──
interface SplitNode {
trigger: 'on_alive' | 'on_death';
rate?: number; // for on_alive: avg splits/sec per parent particle
count: number; // children spawned per fire event
inherit: {
velocity?: number; // scale parent velocity into child initial velocity
color?: boolean; // child takes parent's color instead of emitter.initialColor
};
child: ParticleGraphConfig; // recursive — children can have splits too
queueCapacity?: number; // GPU queue size (default 1024); overflow drops
}
on_alive fires per frame with probability rate * dt; on_death fires once when the parent crosses life >= max_life. Position is always inherited (the child's emitter origin becomes parent_position + child.emitter.position).
9.3 The Particle Struct#
Every particle is a fixed-size 64-byte struct stored in a GPU storage buffer:
// ── from the particle shaders ──
struct Particle {
position : vec3<f32>, // offset 0
life : f32, // offset 12 (-1 = dead)
velocity : vec3<f32>, // offset 16 (when STUCK: stored impact-velocity dir, unit length)
max_life : f32, // offset 28
color : vec4<f32>, // offset 32
size : f32, // offset 48
rotation : f32, // offset 52
flags : u32, // offset 56 (bit 0 = STUCK)
_pad : u32, // offset 60
} // total: 64 bytes
The life field serves double duty: a value of -1.0 marks the slot as dead (available for recycling), while a non-negative value counts upward toward max_life. When life >= max_life the particle is killed in the update shader.
The flags field is a bitfield for per-particle state. Bit 0 (PARTICLE_FLAG_STUCK) is set by block_collision with stick: true on first ground contact (see 8.6): the update shader then skips all modifiers and position integration for that particle, and the renderer reinterprets velocity as the stored impact-velocity direction (unit length) and orients the sprite to lie along the surface it landed on. Single-bit values keep room for future per-particle state without growing the struct beyond 64 bytes.
On construction, the entire particle buffer is initialized with life = -1.0 so every slot starts dead.
9.4 GPU Buffers#
Each ParticlePass owns five GPU buffers:
| Buffer | Size | Usage |
|---|---|---|
particleBuffer |
maxParticles × 64 |
STORAGE |
aliveList |
maxParticles × 4 |
STORAGE |
counterBuffer |
4 | STORAGE |
indirectBuffer |
16 | INDIRECT |
computeUniforms |
80 | UNIFORM |
The indirect draw buffer layout is [vertexCount, instanceCount, firstVertex, firstInstance]. It is initialized to [6, 0, 0, 0] — six vertices (two triangles for a quad) with zero instances. The compact pass writes the live particle count into the instanceCount field.
9.5 The Spawn Stage#
The spawn shader is generated by buildSpawnShader() in src/particles/particle_builder.ts. It inlines:
- The emitter's spawn shape code (sphere, cone, or box).
- Initial value ranges (lifetime, speed, size, color).
- Any
on_spawnevent actions.
Each workgroup thread handles one new particle:
// ── from src/particles/particle_builder.ts (generated) ──
@compute @workgroup_size(64)
fn cs_main(@builtin(global_invocation_id) gid: vec3<u32>) {
if (gid.x >= uniforms.spawn_count) { return; }
let idx = (uniforms.spawn_offset + gid.x) % uniforms.max_particles;
let seed = pcg_hash(uniforms.spawn_offset + gid.x);
let speed = rand_range(speedMin, speedMax, seed + 1u);
var p: Particle;
p.life = 0.0;
p.max_life = rand_range(lifeMin, lifeMax, seed + 2u);
p.color = vec4<f32>(cr, cg, cb, ca);
p.size = rand_range(sizeMin, sizeMax, seed + 3u);
p.rotation = rand_f32(seed + 4u) * 6.28318530717958647;
// Inline spawn shape code:
// sphere: sample cone, rotate by emitter quaternion, offset by radius
// cone: sample cone, rotate by emitter quaternion, offset by radius
// box: uniform random in half-extents, rotate by emitter quaternion
particles[idx] = p;
}
The number of particles spawned this frame is computed on the CPU by accumulating spawnRate × dt and then rounding down. This accumulator-based approach handles variable frame rates smoothly — particles are not created or lost when the frame rate fluctuates.
Spawn Shapes#
Sphere. Samples a direction from a spherical cap (solid angle θ), rotates it by the emitter's world rotation, and places the particle at the emitter position plus that direction times the sphere radius.
Cone. Identical math to sphere — the cone is a spherical cap with half-angle θ; the particle emerges from the cap surface.
Box. Uniformly samples [-hx, hx] × [-hy, hy] × [-hz, hz], rotates the offset by the emitter quaternion, and adds it to the emitter position. The velocity direction is always world_up (used for rain/snow falling from a horizontal plane).
9.6 The Update Stage#
The update shader is also generated by buildUpdateShader(). It iterates every particle slot (not just alive ones — dead slots are skipped) and applies the configured modifiers in order:
// ── from src/particles/particle_builder.ts (generated) ──
@compute @workgroup_size(64)
fn cs_main(@builtin(global_invocation_id) gid: vec3<u32>) {
let idx = gid.x;
if (idx >= uniforms.max_particles) { return; }
var p = particles[idx];
if (p.life < 0.0) { return; }
p.life += uniforms.dt;
if (p.life >= p.max_life) {
// on_death event actions
particles[idx].life = -1.0;
return;
}
let t = p.life / p.max_life;
// Inline modifier code (selected examples):
// gravity: p.velocity.y -= strength * dt;
// drag: p.velocity -= p.velocity * coefficient * dt;
// force: p.velocity += direction * strength * dt;
// curl_noise: p.velocity += curl_noise_fbm(p.position * scale + time * timeScale, octaves) *
// strength * dt;
// wind: p.velocity += direction * (strength * (1 + noise_gust * turbulence)) * dt;
// point_attractor: p.velocity += (target - p.position).normalized * strength * falloff * dt;
// speed_limit: if (length(v) > max) { v = v * (max / length(v)); }
// color_over_lifetime: p.color = mix(startColor, endColor, t);
// alpha_over_lifetime: p.color.a = mix(start, end, t);
// size_over_lifetime: p.size = mix(start, end, t);
// size_by_speed: p.size = mix(sizeMin, sizeMax, clamp01((|v|-speedMin)/(speedMax-speedMin)));
// rotation_rate: p.rotation += rate * dt;
// bounds_kill: if (outside(p.position, min, max)) { p.life = -1.0; return; }
// plane_collision: signed_dist < 0 -> kill or reflect with bounce/friction
// sphere_collision: inside sphere -> kill or push out + reflect with bounce/friction
// block_collision: below heightmap -> kill or bounce off a (0,1,0) ground plane
p.position += p.velocity * uniforms.dt;
particles[idx] = p;
}
Curl Noise Turbulence#
The curl_noise modifier generates turbulent flow patterns by taking the curl of a noise field — this produces divergence-free velocity fields that look like natural wind or water turbulence. The implementation computes the curl via finite differences of three decorrelated Perlin noise potentials:
// ── from the particle update shader ──
fn curl_noise(p: vec3<f32>) -> vec3<f32> {
// Finite difference: sample noise at ±ε on each axis
// curl(F) = ∇ × F = (dFz/dy - dFy/dz, dFx/dz - dFz/dx, dFy/dx - dFx/dy)
}
FBM (fractal Brownian motion) sums multiple octaves for richer detail.
Heightmap Collision for Terrains#
The block_collision modifier checks whether a particle has hit the terrain surface. It samples a heightmap — a 2D array of HEIGHTMAP_RES × HEIGHTMAP_RES (default 128×128) float heights centered on the emitter — bound as a single storage buffer to the update compute pass. The collision math is the same in either case: project the particle's world XZ into the heightmap's local UV, look up the cell, compare against the particle's Y.
What's terrain-specific is how the heightmap buffer gets populated. The shared block_collision modifier doesn't care what produced the heights — it just reads the buffer — so the same particle pipeline serves both terrain systems shipped with the engine:
| Terrain | Baker | Source |
|---|---|---|
| Crafty voxel block world | WorldRainHeightmapPass |
Samples a GPU-resident world heightmap texture (WorldHeightmapGpu), which the CPU refreshes per-chunk as columns load/change |
CDLOD terrain (samples/terrain/) |
RainHeightmapPass |
Samples the terrain's virtual-texture atlas directly — reuses the same atlas + page table the geometry pass reads, so the heightmap matches the rendered ground exactly |
Both bakers run as compute passes that dispatch one thread per heightmap cell, sample the underlying surface representation, and write into the particle pass's heightmap storage buffer:
// ── from samples/terrain/terrain_rain_heightmap.wgsl ──
@compute @workgroup_size(8, 8)
fn cs_main(@builtin(global_invocation_id) gid: vec3u) {
if (gid.x >= params.rain_res || gid.y >= params.rain_res) { return; }
let step = (params.rain_extent * 2.0) / f32(params.rain_res - 1u);
let wx = params.rain_origin_xz.x - params.rain_extent + f32(gid.x) * step;
let wz = params.rain_origin_xz.y - params.rain_extent + f32(gid.y) * step;
heightmap[gid.y * params.rain_res + gid.x] = vt_sample_height(vec2f(wx, wz));
}
At 128×128 = 16k cells with one texture lookup each, the bake is effectively free on GPU — replacing a per-frame CPU loop (terrainHeightFast / world.getTopBlockY) that previously ran ~10 ms even after biome short-circuiting. The fallback CPU path is still supported via ParticlePass.updateHeightmap() for hosts that don't have a GPU baker wired up, but the bundled passes have been converted.
The render graph stitches the two passes together via the ParticleDeps.heightmapWriter?: ResourceHandle dep: the baker pass declares a storage-write on the heightmap buffer (imported as an external graph resource) and returns the versioned output handle; the particle pass forwards that handle as heightmapWriter, which becomes a read edge on the particle compute. The graph then orders the baker before the particle update and inserts the necessary barriers automatically — no manual sync.
heightmapWriter is optional: if the host still bakes the heightmap on the CPU (via updateHeightmap), it's simply left undefined and the particle pass treats the buffer as already-current at execute time.
When a particle's Y falls below the sampled terrain height, one of three contact modes fires — checked in order kill → stick → bounce, so kill wins over stick if both are set:
kill: true— setlife = -1.0and return (the particle dies). Used by rain.stick: true— snap the particle onto the surface, store its impact-velocity direction (normalized) back intovelocity, setPARTICLE_FLAG_STUCKinflags, write back, and return. The early-return is critical: it skips the remaining modifiers and the trailing position integration for the rest of this frame, and the update-loop guard skips them every frame after — without it, gravity would re-add velocity and the flake would slide off the ground next frame. Used by snow to accumulate flakes on terrain (see snowConfig in 8.11).- default (no
kill, nostick) — push the position up to the surface and reflect the velocity as if hitting a(0, 1, 0)ground plane, scaling the normal component bybounceand the tangential components by(1 - friction).
// ── from the particle update shader (stick = true case shown) ──
let _bc_uv = (p.position.xz - vec2<f32>(hm.origin_x, hm.origin_z)) / (hm.extent * 2.0) + 0.5;
if (all(_bc_uv >= vec2<f32>(0.0)) && all(_bc_uv <= vec2<f32>(1.0))) {
let _bc_xi = clamp(u32(_bc_uv.x * f32(hm.resolution)), 0u, hm.resolution - 1u);
let _bc_zi = clamp(u32(_bc_uv.y * f32(hm.resolution)), 0u, hm.resolution - 1u);
let _bc_h = hm_data[_bc_zi * hm.resolution + _bc_xi];
if (p.position.y <= _bc_h) {
// stick:
let _bc_spd = length(p.velocity);
let _bc_dir = select(vec3<f32>(0.0, -1.0, 0.0),
p.velocity / max(_bc_spd, 0.0001),
_bc_spd > 0.0001);
p.velocity = _bc_dir;
p.position.y = _bc_h;
p.flags = p.flags | PARTICLE_FLAG_STUCK;
particles[idx] = p;
return;
// (or kill: life = -1.0; return;)
// (or bounce: y = _bc_h, v.y = -v.y * bounce, ...)
}
}
The update shader's main body opens with an early-out for stuck particles so they age normally but skip every modifier and the position integration:
// ── from src/particles/particle_builder.ts ──
p.life += uniforms.dt;
if (p.life >= p.max_life) { /* on_death + recycle */ }
// Stuck particles freeze on the surface for the rest of their lifetime.
if ((p.flags & PARTICLE_FLAG_STUCK) != 0u) {
particles[idx] = p;
return;
}
// ... modifiers + position integration ...
Only particles within the heightmap's axis-aligned rectangle are tested — particles outside pass through unchanged. The two analytic colliders, plane_collision and sphere_collision, use the same kill / bounce / friction scheme but with arbitrary plane (point + normal) or sphere (center + radius) parameters baked into the shader at codegen time. They do not currently support stick — it's specific to block_collision.
9.7 The Compact Stage#
After update, the particle buffer contains a mix of alive and dead particles. The compact stage rebuilds a dense alive_list array and writes the count into the indirect draw buffer — all on the GPU with no CPU involvement.
First dispatch (cs_compact): every thread checks one particle slot. If alive, it atomically increments the counter and writes its index:
// ── from src/shaders/particles/particle_compact.wgsl ──
@compute @workgroup_size(64)
fn cs_compact(@builtin(global_invocation_id) gid: vec3<u32>) {
let idx = gid.x;
if (idx >= uniforms.max_particles) { return; }
if (particles[idx].life < 0.0) { return; }
let slot = atomicAdd(&counter, 1u);
alive_list[slot] = idx;
}
Second dispatch (cs_write_indirect): a single workgroup copies the atomic counter into the indirect buffer's instanceCount field:
// ── from src/shaders/particles/particle_compact.wgsl ──
@compute @workgroup_size(1)
fn cs_write_indirect() {
indirect[1] = atomicLoad(&counter);
}
The counter buffer is reset to zero at the start of each frame (written by the CPU before the compute pass).
9.8 The Render Stage#
Particles are drawn via indirect draw — the indirectBuffer is bound as the draw parameters, so the GPU decides how many instances to render without CPU intervention.
Forward HDR (Transparent)#
For alpha-blended effects like rain, snow, sparks, fire, smoke, and embers, the particle_render_forward.wgsl shader writes directly into the HDR color buffer with depth read-only. The shader exposes two vertex entry points (vs_main for velocity-aligned, vs_camera for camera-aligned) and a family of fragment entry points keyed by the renderer's shape field (fs_main, fs_snow, fs_pixel, fs_line, fs_fire, fs_smoke, fs_ember). The pipeline picks one vertex + one fragment at construction time — no runtime branching.
A small ForwardParams uniform (group 2, 48 bytes) carries emit, stretch, time, thickness, sun_dir, sun_color, and size_mul to both stages. Only the time slot is updated every frame (so fire/smoke/ember can animate their noise fields); the rest is written once at construction from the renderer config.
// ── from src/renderer/render_graph/passes/particle_pass.ts ──
// Forward HDR pipeline: alpha blend, no depth write
const renderPipeline = device.createRenderPipeline({
vertex: { module: renderModule, entryPoint: vsEntry },
fragment: {
module: renderModule,
entryPoint: fsEntry,
targets: [{
format: HDR_FORMAT,
blend: {
color: { srcFactor: 'src-alpha', dstFactor: 'one-minus-src-alpha', operation: 'add' },
},
}],
},
depthStencil: { format: 'depth32float', depthWriteEnabled: false, depthCompare: 'less' },
});
Velocity billboard (vs_main). The quad is aligned with the particle's velocity direction. The long axis stretches proportionally to speed, creating streak-shaped raindrops:
// ── from src/shaders/particles/particle_render_forward.wgsl ──
let vel_dir = normalize(velocity);
let right = normalize(cross(vel_dir, cam_dir));
let stretch = 1.0 + speed * 0.04;
let world_pos = p.position
+ right * ofs.x * p.size
+ vel_dir * ofs.y * p.size * stretch;
The fragment shader fades the alpha at the tips of the streak and multiplies the color by EMIT_SCALE (4×) to produce bright, visible raindrops against the dark sky.
Camera billboard (vs_camera). The quad normally always faces the camera, creating a soft disc. Used for snow and smoke:
// ── from src/shaders/particles/particle_render_forward.wgsl ──
let right = camera.view[0].xyz; // world-space right
let up = camera.view[1].xyz; // world-space up
let world_pos = p.position + right * ofs.x * p.size + up * ofs.y * p.size;
When the particle's PARTICLE_FLAG_STUCK bit is set (snow that hit the ground via block_collision with stick: true), vs_camera switches to a surface-aligned basis instead. The particle's velocity field has been repurposed by the update shader to hold the impact direction as a unit-length vector, so the face-normal of the quad is -normalize(velocity) — the direction the flake came from. A stable tangent is chosen (world +X, falling back to world +Z if the impact was near-vertical) to avoid a degenerate cross product:
// ── from src/shaders/particles/particle_render_forward.wgsl ──
var right: vec3<f32>;
var up: vec3<f32>;
if ((p.flags & PARTICLE_FLAG_STUCK) != 0u) {
let n = -normalize(p.velocity); // face back the way the flake came from
let t = select(vec3<f32>(1, 0, 0), vec3<f32>(0, 0, 1), abs(n.x) > 0.95);
right = normalize(cross(t, n));
up = cross(n, right);
} else {
right = vec3<f32>(camera.view[0][0], camera.view[1][0], camera.view[2][0]);
up = vec3<f32>(camera.view[0][1], camera.view[1][1], camera.view[2][1]);
}
For snow that fell nearly straight down, this yields a quad lying roughly flat on the ground, tilted slightly into whatever lateral drift the flake had at impact — accumulated snowflakes appear to lie on the terrain instead of standing up edge-on. The per-particle rotation continues to apply as in-plane spin in either branch, so neighboring stuck flakes don't look identical.
The snow fragment shader applies a radial alpha falloff from the center, producing circular flakes:
// ── from src/shaders/particles/particle_render_forward.wgsl ──
let uv = in.uv * 2.0 - 1.0;
let d2 = dot(uv, uv);
if (d2 > 1.0) { discard; }
let alpha = in.color.a * (1.0 - d2);
Deferred G-Buffer (Opaque)#
For opaque billboard particles (debris, solid projectiles), the particle_render.wgsl shader writes into the G-Buffer (albedo + normal + emissive) with full depth testing:
// ── from src/renderer/render_graph/passes/particle_pass.ts ──
// GBuffer pipeline: writes albedo+normal+emissive, depth write on
targets: [
{ format: GBUF_ALBEDO_FORMAT }, // albedo_roughness (rgba8unorm)
{ format: GBUF_NORMAL_FORMAT }, // normal_metallic (rgba16float)
{ format: GBUF_EMISSIVE_FORMAT }, // emissive (rgba16float)
],
depthStencil: { format: GBUF_DEPTH_FORMAT, depthWriteEnabled: true, depthCompare: 'less' },
ParticleDeps.gbuffer correspondingly requires all four handles (albedo, normal, emissive, depth) in deferred and mesh-into-gbuffer modes; the pass throws at addToGraph time if any is missing.
The vertex shader constructs a camera-facing quad (identical to the camera billboard path). The fragment shader clips to a circle and encodes the face normal (camera-to-particle direction) into the G-Buffer:
// ── from src/shaders/particles/particle_render.wgsl ──
@fragment
fn fs_main(in: VertexOutput) -> GBufferOutput {
let d = length(in.uv - 0.5) * 2.0;
if (d > 1.0) { discard; }
let N = normalize(in.face_norm);
out.albedo_roughness = vec4<f32>(in.color.rgb, mat_params.roughness);
out.normal_metallic = vec4<f32>(N * 0.5 + 0.5, mat_params.metallic);
}
This allows opaque particles to receive lighting from the deferred shading pass naturally.
9.9 Per-Frame CPU Upload#
The CPU's per-frame work is limited to a handful of writeBuffer calls (compute uniforms, atomic counter reset, camera uniforms, and — when the renderer is a forward sprite — a 4-byte patch into ForwardParams for the animated time slot). The ParticlePass.update() method:
- Advances the internal clock (
_time += dt). - Accumulates spawn count from
spawnRate × dt. - Decomposes the emitter's world transform into position + rotation quaternion.
- Packs a
ComputeUniformsstruct (20 floats) and writes it to the GPU. - Resets the atomic counter and updates the spawn ring-buffer offset.
- Packs
CameraUniforms(72 floats: view, proj, viewProj, invViewProj, camera position, near/far) and writes them to the GPU. - For forward sprite renderers, patches the 4-byte
timeslot ofForwardParamsso the fire/smoke/ember shaders animate.
// ── from src/renderer/render_graph/passes/particle_pass.ts ──
update(ctx, dt, view, proj, viewProj, invViewProj, camPos, near, far, worldTransform): void {
this._time += dt;
this._spawnAccum += this._config.emitter.spawnRate * dt;
this._spawnCount = Math.min(Math.floor(this._spawnAccum), this._maxParticles);
this._spawnAccum -= this._spawnCount;
// ... decompose world transform, write uniforms ...
ctx.queue.writeBuffer(this._computeUniforms, 0, cu.buffer as ArrayBuffer);
ctx.queue.writeBuffer(this._counterBuffer, 0, this._resetArr);
ctx.queue.writeBuffer(this._cameraBuffer, 0, camData.buffer as ArrayBuffer);
}
9.10 Runtime Spawn Rate Adjustment#
The setSpawnRate() method allows changing the particle emission rate without rebuilding the persistent pass instances (the per-frame render graph is always rebuilt, but it just picks up the new value via addToGraph()):
// ── from src/renderer/render_graph/passes/particle_pass.ts ──
setSpawnRate(rate: number): void {
this._config.emitter.spawnRate = rate;
}
This is used by the weather system to adjust rain/snow intensity dynamically. Changing the rate only writes a new float in the config object — no shader regeneration, no pipeline rebuild, no buffer reallocation. The new rate takes effect on the next frame's spawn accumulation.
9.11 Bundled Particle Configurations#
Crafty ships several pre-defined particle configurations in crafty/config/particle_configs.ts:
| Config | Emitter | Notes |
|---|---|---|
rainConfig |
wide flat box, 24 000 particles/s | shape: 'line' velocity streaks, sized like Minecraft's classic rain lines; killed on ground contact |
snowConfig |
wide flat box, 1 500 particles/s | camera billboards, curl-noise drift, long lifetimes |
blockBreakConfig |
sphere burst (spawnRate: 0) |
shape: 'pixel' cubes; driven by ParticlePass.burst() on block break |
explosionConfig |
sphere burst (spawnRate: 0) |
soft fire-colored sprites with color + size fall-off |
sparkConfig |
sphere burst (spawnRate: 0) |
shape: 'line', emit: 16, gravity + drag — impact/anvil/magic discharges |
fireConfig |
narrow sphere, 90 particles/s | shape: 'fire' with upward force, curl noise, size/alpha fall-off |
smokeConfig |
narrow sphere, 28 particles/s | shape: 'smoke' with sunDirection/sunColor for fake lit edges |
emberConfig |
sphere burst (spawnRate: 0) |
shape: 'ember', buoyant + mild gravity, curl-noise drift |
campfireConfig |
inherits fireConfig |
parent flames feed on_alive ember and on_death smoke sub-emitters via splits |
Rain#
// ── from crafty/config/particle_configs.ts ──
export const rainConfig: ParticleGraphConfig = {
emitter: {
position: [0, 0, 0],
maxParticles: 80000,
spawnRate: 24000,
lifetime: [2.0, 3.5],
shape: { kind: 'box', halfExtents: [35, 0.1, 35] },
initialSpeed: [0, 0],
initialColor: [0.75, 0.88, 1.0, 0.9],
initialSize: [0.012, 0.02],
roughness: 0.1,
metallic: 0.0,
},
modifiers: [
{ type: 'gravity', strength: 9.0 },
{ type: 'drag', coefficient: 0.05 },
{ type: 'color_over_lifetime', startColor: [0.75, 0.88, 1.0, 0.9],
endColor: [0.75, 0.88, 1.0, 0.0] },
{ type: 'block_collision', bounce: 0.4, friction: 0.2, kill: true },
],
// Spark shape with emit=1 gives flat (non-HDR) velocity-aligned streaks,
// sized like Minecraft's classic rain lines.
renderer: {
type: 'sprites', blendMode: 'alpha', billboard: 'velocity', shape: 'line',
renderTarget: 'hdr',
emit: 1.0, // non-HDR; rain shouldn't bloom
stretch: 0.35, // streak length per unit of fall speed
thickness: 0.9, // keep the line thin
},
};
Rain uses a wide, flat box emitter (70×0.2×70 blocks), velocity-aligned billboards for streak effect, gravity at 9 m/s², mild drag, and fades out via alpha over lifetime. Particles that hit the terrain are immediately removed via block_collision (kill: true). The line shape is used here with emit: 1.0 so the streaks read as solid rain lines rather than glowing sparks — the same shader covers both ends of the spectrum just by tuning the emission knob.
Fire / smoke / embers / campfire#
The fire, smoke, and ember shapes use procedural fragment shaders — no textures required. fireConfig and smokeConfig can be dropped onto any emitter to get a flame or rising puff; emberConfig is burst-only and pairs naturally with explosions or impacts.
campfireConfig is the composite showcase: it spreads fireConfig as the parent system, then attaches two SplitNodes — an on_alive ember spawn (~0.4 splits/sec/particle) and an on_death smoke spawn — so a single ParticlePass produces a complete flame + drifting embers + rising smoke column from one config. See section 8.12 for how splits feed their children through a GPU spawn-request queue.
Snow#
// ── from crafty/config/particle_configs.ts ──
export const snowConfig: ParticleGraphConfig = {
emitter: {
maxParticles: 80000,
spawnRate: 1500,
lifetime: [30.0, 105.0],
shape: { kind: 'box', halfExtents: [35, 0.1, 35] },
initialSpeed: [0, 0],
initialColor: [0.92, 0.96, 1.0, 0.85],
initialSize: [0.025, 0.055],
roughness: 0.1,
metallic: 0.0,
},
modifiers: [
{ type: 'gravity', strength: 1.5 },
{ type: 'drag', coefficient: 0.8 },
{ type: 'curl_noise', scale: 1.0, strength: 1.0, timeScale: 1.0, octaves: 1 },
// Stick on terrain: snow accumulates on whatever block it lands on, with
// the sprite reoriented along its impact-velocity direction so flakes
// appear to lie ~flat on the surface.
{ type: 'block_collision', bounce: 0, friction: 1, kill: false, stick: true },
],
renderer: { type: 'sprites', blendMode: 'alpha', billboard: 'camera', renderTarget: 'hdr' },
};
Snow uses camera-facing billboards (soft discs), very slow fall speed (gravity 1.5 m/s², high drag), long lifetimes (30–105 seconds), and curl-noise turbulence for drifting motion. The block_collision modifier uses stick: true rather than kill: true, so each flake freezes on the terrain it lands on and the camera-billboard vertex shader reorients its quad along the impact direction (see 8.8) — flakes appear to settle flat on the ground and remain visible for the rest of their (long) lifetime, producing a visual snow accumulation effect with no separate accumulation buffer. Because stuck particles skip the modifier chain entirely, they cost essentially nothing in the update pass beyond the early-out branch.
9.12 Sub-emitters (Splits)#
A SplitNode in a parent config creates a child ParticlePass fed by parent particles. Each split owns a fixed-size GPU spawn-request queue plus an atomic counter. The parent's update shader appends (position, velocity, color) triples into the queue when its trigger fires; the child's queue-spawn shader consumes the queue each frame.
Queue entry layout#
// ── from src/particles/particle_builder.ts (PARTICLE_HEADER_WGSL) ──
struct SpawnRequest {
position : vec3<f32>, // parent position at trigger
_sr_pad0 : f32,
velocity : vec3<f32>, // parent velocity (for inherit.velocity)
_sr_pad1 : f32,
color : vec4<f32>, // parent color (for inherit.color)
} // 48 bytes / 16-byte aligned
Parent: trigger codegen#
The parent's update shader gains a fourth bind group containing each split's queue + counter. For on_alive, a per-particle PRNG draw gates a rate * dt probability test; for on_death, the trigger fires inside the existing death branch. Both reserve count slots with a single atomicAdd, then write:
// ── generated by splitWriteWgsl() in particle_builder.ts ──
{
let _sp_base = atomicAdd(&split_0_counter, ${count}u);
if (_sp_base + ${count}u <= ${capacity}u) {
for (var _sp_j = 0u; _sp_j < ${count}u; _sp_j++) {
var _sp_req: SpawnRequest;
_sp_req.position = p.position;
_sp_req.velocity = p.velocity;
_sp_req.color = p.color;
split_0_queue[_sp_base + _sp_j] = _sp_req;
}
}
}
Queue overflow is silent (the bounds check guards against partial writes).
Child: queue-spawn shader#
buildQueueSpawnShader() emits a parallel spawn shader that dispatches over the queue capacity, exits early past the live counter, and seeds children using the parent state:
// ── generated by buildQueueSpawnShader() ──
@compute @workgroup_size(64)
fn cs_main(@builtin(global_invocation_id) gid: vec3<u32>) {
let live = atomicLoad(&queue_count);
if (gid.x >= live) { return; }
let req = queue[gid.x];
let slot = atomicAdd(&queue_offset, 1u);
let idx = slot % uniforms.max_particles;
// Child emitter origin = parent particle position + child's local offset
let emitter_origin = req.position + vec3<f32>(epx, epy, epz);
// ... apply child's spawn shape, inherit.velocity, inherit.color ...
particles[idx] = p;
}
The child's emitter's shape, initialSpeed, lifetime, etc. still drive the spawn — only the origin comes from the parent. inherit.velocity adds a scaled copy of the parent's velocity to the child's initial velocity (rocket-trail behavior); inherit.color replaces the child emitter's initialColor with the parent's current color.
A persistent per-child atomic queue_offset cycles through the child's particle ring buffer for slot allocation. Rare same-frame collisions with normal rate-driven spawn are tolerated — they simply overwrite, matching the existing cyclic-recycling semantics.
Pass orchestration#
ParticlePass.create() is recursive: when config.splits is non-empty the parent allocates one queueBuffer + counterBuffer per split and recursively constructs each child pass with a reference to its parent queue. Children may have their own splits — nesting is free.
parent.compute (spawn → update → writes queue) ─┐
▼
child.compute (queue-spawn → normal spawn → update)
▼
grandchild.compute (...)
The render graph orders these passes automatically — the parent declares writes on the queue buffers, each child declares reads on its parent's queue, and the graph's resource-dependency tracking takes care of the rest. Children's render passes thread HDR outputs sequentially after the parent's, so post-process effects (bloom, tonemap) see the composite.
Authoring tip#
A child emitter typically wants spawnRate: 0 so it only emits via the parent's queue. The Firework template in the editor sample (samples/particle_graph_editor.ts) shows this end-to-end: a slow rocket parent → on_death Split with count: 60 → a spawnRate: 0 sphere-burst child with gravity, drag, and a fading color curve.
9.13 Signed Distance Fields (SDFs)#
A signed distance field is a 3D texture where each texel stores the shortest distance from that point in space to a mesh surface, signed negative inside and positive outside. The gradient points away from the surface, so a single sample tells a particle "how far am I from the surface, and which way do I step to get there?" — exactly the data structure a particle system wants for surface emission and arbitrary-shape collision.
SDFs are no longer a particle-only feature; the bake-from-mesh resource (SdfVolume), the r32float storage format, and the compute bake live in the shared src/sdf/ subsystem and are documented in §4.8 (Mesh Signed Distance Fields). This section covers only how the particle system consumes a baked SdfVolume: a graph binds at most one, and four node types use it — an emitter shape that spawns particles on the surface, an attractor that pulls particles toward it, a collider that pushes them out, and a "stick" modifier that snaps particles onto the surface on contact.
You bake one the same way any consumer does — SdfVolume.fromMesh(ctx, mesh, { worldOffset }) against a Mesh created with { keepData: true } (see §4.8) — then bind the result to the graph.
Binding to a particle graph#
A graph references its SDF via the top-level sdf field — mirroring how a heightmap is implicitly bound when block_collision is present:
// ── from src/particles/particle_types.ts ──
interface ParticleGraphConfig {
emitter: EmitterNode;
modifiers: ModifierNode[];
renderer: RenderNode;
events?: EventNode[];
splits?: SplitNode[];
/** Required when sdf_attractor / sdf_collision / sdf_surface are used. */
sdf?: SdfVolume;
}
One SDF per particle system. Multiple SDF modifiers in the same graph all sample the same bound volume — having two SDF colliders (an attractor pulling toward shape A, a collider pushing out of shape B) is currently expressed as two particle systems. The constraint keeps the bind-group story simple; lifting it would require either an array binding or a per-modifier bind group.
ParticlePass detects the SDF-using nodes via hasSdf(config) and slots a dedicated bind group into the spawn, update, and queue-spawn pipelines (group 2 normally, group 3 when a heightmap is also present). The pass throws at construction time if a config combines block_collision + sdf_* + splits — that's three optional bind groups beyond data/uniforms, which would exceed WebGPU's 4-bind-group core limit.
The four SDF nodes#
sdf_surface spawn shape — Emits particles on the surface of the bound SDF. The spawn shader picks a uniform-random point inside the SDF's AABB, samples the distance, and projects the point onto the surface along the gradient:
// ── from spawnShapeWgsl() / 'sdf_surface' case ──
let _ss_p = mix(sdf_uni.world_min, sdf_uni.world_max, vec3<f32>(rand, rand, rand));
let _ss_d = sdf_sample(_ss_p);
let _ss_g = sdf_gradient(_ss_p);
let _ss_jit = (rand * 2.0 - 1.0) * thickness;
p.position = _ss_p - _ss_g * (_ss_d - _ss_jit);
p.velocity = _ss_g * speed;
thickness ≥ 0 jitters the projected position along the gradient so the spawn band has finite width rather than being a strict isosurface. Initial velocity is the outward normal times initialSpeed — particles spray off the surface in the direction their texel was nearest. Distribution isn't perfectly uniform over surface area (a texel deep inside maps to the same surface point as one just inside), but it's good enough for visible effects, and the sample-and-project approach is allocation-free in the shader.
sdf_attractor modifier — Pulls particles toward the surface. The acceleration is -strength · gradient · (distance − offset), so the pull weakens to zero at the offset surface and dampens as the particle approaches:
// ── from modifierWgsl() / 'sdf_attractor' case ──
let _sa_d = sdf_sample(p.position) - offset;
let _sa_g = sdf_gradient(p.position);
p.velocity -= _sa_g * _sa_d * strength * uniforms.dt;
Treating (distance − offset) as a scalar potential gives convergence without an explicit falloff radius. offset lets the attractor target a parallel surface (positive: a halo standing off the mesh; negative: a target inside).
sdf_collision modifier — Particles whose sampled distance is negative are pushed back along the gradient with the same bounce / friction / kill semantics as the other colliders:
// ── from modifierWgsl() / 'sdf_collision' case ──
let _scd_d = sdf_sample(p.position);
if (_scd_d < 0.0) {
let _scd_n = sdf_gradient(p.position);
p.position -= _scd_n * _scd_d; // push to surface
let _scd_vn = dot(p.velocity, _scd_n);
if (_scd_vn < 0.0) {
let _scd_norm_v = _scd_vn * _scd_n;
let _scd_tan_v = p.velocity - _scd_norm_v;
p.velocity = _scd_tan_v * (1.0 - friction) - _scd_norm_v * bounce;
}
}
bounce is the coefficient of restitution along the surface normal, friction damps the tangential component. kill: true skips the bounce and recycles the particle — useful for impact effects where the contact spawns a sub-emitter via on_death.
sdf_stick modifier — Snaps particles onto the surface on contact and pins their velocity, so they stay glued for the rest of their lifetime. When the sampled signed distance falls to threshold or below the particle is stepped onto the isosurface along the gradient and velocity is zeroed:
// ── from modifierWgsl() / 'sdf_stick' case ──
let _st_local = (sdf_uni.transform_inv * vec4<f32>(p.position, 1.0)).xyz;
if (all(_st_local >= sdf_uni.world_min) && all(_st_local <= sdf_uni.world_max)) {
let _st_d = sdf_sample(p.position);
if (_st_d <= ${threshold}) {
let _st_g = sdf_gradient(p.position);
p.position -= _st_g * _st_d;
p.velocity = vec3<f32>(0.0);
}
}
The AABB-inside guard avoids a subtle bug: sdf_sample clamps UVW to [0, 1], so a query outside the AABB returns the nearest boundary texel's value (an underestimate of the real distance). Without the guard a particle raining down well outside the AABB could see a near-zero boundary sample and stick to an invisible AABB face. Place sdf_stick after sdf_attractor and any integration-affecting modifiers (gravity, drag, noise) in the modifier list so the snap wins for the frame; with velocity pinned at zero, the trailing position-integration step is a no-op. A small positive threshold (~0.005) catches fast-moving particles that would otherwise tunnel past the surface in a single step. Other modifiers (color/size over lifetime) keep applying — the particle is frozen in place but still animates its appearance.
sdf_collision and sdf_stick are usually mutually exclusive: stick fully clamps motion on contact, so a bounce running before it would be wasted work (and the bounce velocity would just be zeroed this frame).
Runtime transform#
The SDF is baked once in a local frame and positioned at runtime (the general mechanism is described in §4.8). Concretely, the particle bind-group uniforms carry the forward transform matrix (used by sdf_surface to lift random local-AABB samples into world space), its inverse transform_inv (used by every sdf_sample call to pull world particle positions back into the SDF's local frame), and a uniform scale that converts local-frame distances back to world units:
fn sdf_sample(p: vec3<f32>) -> f32 {
let local_p = (sdf_uni.transform_inv * vec4<f32>(p, 1.0)).xyz;
// ... existing AABB lookup + trilinear interpolation on local_p ...
return result * sdf_uni.scale;
}
Host code drives the transform via ParticlePass.setSdfTransform(ctx, transform, scale) — typically called once per frame. The transform is independent of the bake, so the same SdfVolume can be moved, rotated, and uniformly scaled without re-baking. Non-uniform scale isn't supported in this path (distances would no longer match the geometry); for that, re-bake the SDF with the desired non-uniform shape baked in.
Why this fits the particle pipeline#
SDFs slot into the existing pipeline without disturbing it:
- Spawn shape is just another WGSL snippet generated by
spawnShapeWgsl(), called once per spawned particle. - Attractor / collider are ordinary modifiers in the per-frame update loop — they read
p.position, writep.velocity(andp.positionfor collision), exactly like every other modifier. - Sampling is bounded: one texture lookup per
sdf_sample(eighttextureLoads for the manual trilinear), one gradient = six samples. Compared to a brute-force triangle test against a mesh, that's a constant cost regardless of mesh complexity. - One bind group: texture + sampler + uniform — no per-modifier state, no specialized pipelines beyond the existing spawn/update shader codegen.
The cost model is uniform across all four uses: a particle that touches the SDF pays for a handful of texture loads per frame. The expensive work is the one-time bake, not the per-frame query.
9.14 Expression Graphs#
Most modifiers (gravity, drag, color-over-lifetime, …) are fixed-shape primitives — each one is a single WGSL snippet templated on a small handful of parameters. That covers the common cases but stops short of "I want this particle to follow a Lissajous curve" or "fade alpha when the particle leaves this radius." For everything in between, the engine ships a small expression-graph IR that compiles down to the same WGSL emission path as every other modifier — slotted in as a new expression ModifierNode that runs in spawn, update, or on_death event scope.
The IR has two layers. Every value is an Expr node (constants, attribute reads, math/vector ops, time, random); every side effect is an Action node (setAttribute, setVariable, if/else). Two equivalent surface forms — a compact recursive AST and a flat node-and-edge graph — both lower to the same WGSL block.
Two equivalent forms#
The system has two authoring layers that round-trip through the same compiler:
| Form | File | When to use |
|---|---|---|
Recursive AST (ExpressionGraph) |
src/particles/particle_expr.ts | Hand-coding. expr and action builder helpers keep call sites terse. |
Node graph (ExprNodeGraph) |
src/particles/particle_expr_graph.ts | What the visual editor saves. Preserves wiring (one output → many consumers) for visualization. |
The expression modifier accepts either form — they are disambiguated structurally ('actions' in graph vs 'nodes' in graph) and both flow through compileExpressionGraph():
Building the AST form#
expr.* constructors build value nodes; action.* constructors build side-effecting statements. Number literals are auto-wrapped in constF32, so the code reads more like math than IR construction:
// ── AST form (illustrative — compare with the node-graph version below) ──
import { expr, action } from '../src/particles/index.js';
const onSpawn: ExpressionGraph = {
variables: [
// angle: ring phase + a tiny per-spawn jitter so consecutive particles
// don't stack on a single point when omega is small.
{ name: 'angle', type: 'f32',
initial: expr.add(
expr.mul(expr.time(), state.omega),
expr.random({ mode: 'perSpawn', min: -0.04, max: 0.04, seed: 1 })) },
// Pre-compute cos/sin once — used by both position and velocity.
{ name: 'ca', type: 'f32', initial: expr.cos(expr.variable('angle')) },
{ name: 'sa', type: 'f32', initial: expr.sin(expr.variable('angle')) },
],
actions: [
action.set('position', expr.add(
expr.emitterOrigin(),
expr.vec3(
expr.mul(expr.variable('ca'), state.radius),
expr.mul(expr.variable('sa'), state.radius),
0))),
action.set('velocity', expr.vec3(
expr.mul(expr.neg(expr.variable('sa')), state.flingSpeed),
expr.mul(expr.variable('ca'), state.flingSpeed),
expr.random({ mode: 'perSpawn', min: -state.spread, max: state.spread, seed: 2 }))),
],
};
Building the node-graph form#
ExprGraphBuilder is a small mutable builder that returns each node handle, so chained connect() calls don't juggle raw ids. Inline literals on a port (inputValues) replace what would otherwise be standalone Constant nodes:
// ── from samples/spinning_sparks_graph.ts ──
import { ExprGraphBuilder } from '../src/particles/index.js';
const g = new ExprGraphBuilder();
// angle = time * omega + random[-0.04, 0.04]
const time = g.add('time');
const omegaT = g.add('mul', { inputValues: { b: state.omega } });
g.connect(time, 'value', omegaT, 'a');
const jitter = g.add('random', { params: { mode: 'perSpawn', min: -0.04, max: 0.04, seed: 1 } });
const angle = g.add('add');
g.connect(omegaT, 'value', angle, 'a');
g.connect(jitter, 'value', angle, 'b');
// One compound Set writes both attributes in a single execution step.
const setSpawn = g.add('setAttribute', {
params: { entries: [
{ attr: 'position', mode: 'overwrite' },
{ attr: 'velocity', mode: 'overwrite' },
] },
});
// ... wire `position` and `velocity` Expr outputs into the matching ports ...
Particle attributes#
Every expression-graph block runs against a single live particle (p). Attributes available for read and write:
| Attribute | WGSL type | Notes |
|---|---|---|
position, velocity |
vec3<f32> |
|
color |
vec4<f32> |
rgb / alpha are sub-accessors (writing rgb preserves alpha) |
size, rotation |
f32 |
|
age, max_age |
f32 |
Lifetime in seconds; age is the current life |
normalized_age |
f32 |
Read-only: age / max(max_age, 1e-6) — clamped to avoid /0 on spawn |
setAttribute supports six composition modes: overwrite (default), add, subtract, multiply, min, max. A compound setAttribute with an entries array writes several attributes in one execution step (each one getting its own input port named after the attribute).
Sources, math, and effects#
Every node kind is listed in NODE_KINDS (particle_expr_graph.ts), grouped by category for the editor palette:
- Sources (no inputs) —
constF32/constVec3/constVec4,time,deltaTime,particleId,emitterOrigin,attribute(read a particle attribute, optionally compound with one output per entry),random,variableGet. - Math (unary) —
neg,abs,sign, trig (sin/cos/tan/asin/acos/atan),exp/log/sqrt,floor/ceil/fract,saturate. - Math (binary) —
add/sub/mul/div/mod,min/max/pow/atan2/step. - Vector —
length,normalize,dot,cross,makeVec3,swizzle(e.g.'xz','rgb','xyzw'). - Composite —
mix,clamp,smoothstep,select(branchless ternary). - Compare → bool —
lt/le/gt/ge/eq/neq,and/or. - Effects (statements) —
setAttribute(compound),setVariable.Action.ifprovides if/else control-flow in the AST form.
Random modes#
The random node draws from a pcg_hash PRNG; the seed source depends on mode:
| Mode | Seed source | When to use |
|---|---|---|
perParticle |
idx (slot index) |
Stable per particle every frame — random size, hue, lifetime bias |
perFrame |
idx ^ frame_seed |
Re-draws each frame — stochastic forces, twinkle |
perSpawn |
seed (cumulative spawn index) |
Only valid in spawn scope — random initial spread |
Each random node also carries a small integer salt (seed param) that disambiguates several random calls in the same graph, so they don't all draw the same value.
Compilation scopes#
A graph compiles differently depending on where it runs:
'update'— regular per-frame modifier list andon_deathactions. Hasidx(slot index),p(mutable Particle),t(normalized age),emitter_origin,uniforms.time,uniforms.dt.'spawn'—on_spawnevent actions inside the spawn shader. Hasidx,p(just-initialized),seed(per-spawn PCG seed),emitter_origin,uniforms.time,uniforms.dt. Not(always zero).
particle_builder.ts picks the scope when emitting the modifier WGSL, so the same ExpressionGraph can be reused in either context — the only difference is which random-seed expression is generated.
Why it composes cleanly#
Expression graphs don't bypass the modifier pipeline — they slot into it. Order in the surrounding modifiers list matters: an expression placed after gravity sees the updated velocity. A graph that reads velocity after a drag modifier sees the post-drag value. This means complex behaviors can be expressed as small expression-graph snippets composed with the bundled primitives, rather than as a monolithic "do everything" graph.
9.15 Visual Node-Graph Editor#
The expression IR is meant to be authored visually as well as in code. The particle_graph_editor sample is a live editor that builds a ParticleGraphConfig from canvas nodes, recompiles the GPU passes on every change, and runs the resulting system in the background scene so the effect updates in real time.
Two-tier visual structure#
The editor distinguishes two flavors of wiring, drawn differently so the chain reads at a glance:
- Execution edges — thick white arrows with triangular ports on the node headers. They sequence Emitter → Modifier → … → Renderer, run an
on_spawn/on_deathtrigger into its action chain, and feed aSplitinto its child Emitter. - Data edges — thin colored wires between expression-node value ports. The stroke color is the inferred output type (
TYPE_COLORin particle_expr_graph_types.ts):f32green,vec2teal,vec3blue,vec4pink,boollavender. The same palette colors the port circles.
Expression nodes draw with the same per-row input/output port layout shown in the standalone graph illustration above — input ports stack down the left edge, output ports stack down the right edge, and a port with no incoming edge shows its inline literal (or the kind-wide default) right next to the port label.
Templates#
A Template dropdown swaps the entire graph for a pre-built showcase. Each template seeds state.nodes/state.edges and triggers a particle-pass rebuild:
| Template | What it shows |
|---|---|
empty |
Blank canvas — start from scratch |
default |
Simple Emitter + gravity + color/size ramps + sprite Renderer |
firework |
Parent rocket → on_death Split bursting child sparks + on_alive trail Split |
fire, smoke, campfire |
Procedural-shape sprite renderers from §9.11 |
mesh |
Mesh renderer (instanced PBR cube) with align: 'velocity' |
sdf |
SDF attractor + collision against a baked mesh SDF |
sdf_sticky |
SDF attractor + sdf_stick (particles freeze on contact) |
expression_demo |
Spinning sparks built from an ExprGraphBuilder ring-math graph |
spinning_sparks |
Same as above but assembled inline as the editor's serialized node form |
Type inference#
inferGraphTypes() walks the graph in reverse-topological order and assigns a concrete WGSL type (bool/f32/vec2/vec3/vec4) to every output port. Polymorphic kinds (most binary math is any → any) take the widest input type — so mul(vec3, f32) resolves to vec3 like WGSL does. The editor uses this to color ports and edges, giving visual feedback when a connection would change the result type.
Read-only overlay#
The same node layout is also available as a read-only overlay (samples/lib/particle_graph_overlay.ts) — pressing P in any sample that opted in pops up the editor's visualization for the currently-running ParticleGraphConfig. Expression modifiers are expanded inline as a layered node cluster rather than collapsed to an opaque "Expression" block, so a hand-built ExprNodeGraph (e.g. spinning_sparks_graph.ts) shows up with the same wiring the editor would have produced.
9.16 Summary#
The GPU-driven particle system features:
- Five- (or six-) stage pipeline: Spawn → Update → Compact → Indirect Write → optional Translate → Render, all on GPU
- Configurable graphs:
ParticleGraphConfigwith emitter, modifier, and renderer nodes; per-modifierspace: 'world' | 'emitter'so attractors/colliders can follow the GameObject transform - Rich modifier library: forces (gravity, drag, force, swirl, vortex, curl noise, turbulence, wind, point attractor, radial force, speed limit), appearance (size/rotation/color/alpha curves, size-by-speed,
box_color,velocity_color), and collision (bounds-kill, plane / sphere / heightmap / SDF colliders with kill-or-bounce semantics, plusblock_collision'sstick: truemode for snow-on-terrain accumulation andsdf_stickfor snap-and-pin behavior) - Signed distance fields: bind a mesh SDF baked by the shared
src/sdf/subsystem (§4.8) and use it as a surface emitter, attractor, collider, or stick target — see 8.13 - Expression graphs: composable scripting layer (
expressionModifierNode) with two equivalent surface forms — recursive AST (expr/action) and flat node graph (ExprGraphBuilder). Sources, math, vector, composite, compare, and effect nodes; per-particle / per-frame / per-spawn random; runs in spawn, update, oron_deathscope — see 8.14 - Shader generation: specialized WGSL emitted per config — no runtime branching on uniforms
- Indirect rendering: Compact pass produces dense alive lists; indirect draw eliminates CPU round-trips
- Three render paths:
- Forward HDR sprites with six shape shaders (
soft,pixel,line,fire,smoke,ember), tunable via per-rendereremit/stretch/thickness/size/sunDirection/sunColor - Deferred G-Buffer sprites (albedo + normal + emissive)
- Instanced PBR meshes — each live particle drawn as an instance of a real
Mesh+Materialwithalign: 'world' | 'velocity', shared with the forward lighting/IBL bind group in HDR mode
- Forward HDR sprites with six shape shaders (
- Sub-emitters:
SplitNodelets a parent system spawn child sub-systems via a GPU spawn-request queue, triggeredon_alive(rate-based) oron_death, with optional velocity/color inheritance — children recurse naturally for nested effects (fireworks, flares, trails,campfireConfig) - Live node-graph editor: the
particle_graph_editorsample composes multiple particle systems from one graph (oneParticlePassper renderer node) with eleven templates (Empty/Default/Firework/Fire/Smoke/Campfire/Mesh/SDF/SDF Sticky/Expression/Spinning Sparks), an "Add Node" menu covering every modifier and expression kind, color-coded data wires, white execution arrows, and a read-only overlay reusable from any sample — see 8.15 - Weather integration: Rain and snow configs driven by the weather system
File Reference#
| File | Purpose |
|---|---|
src/particles/particle_types.ts |
ParticleGraphConfig, EmitterNode, ModifierNode, RenderNode type definitions |
src/particles/particle_builder.ts |
WGSL code generation for spawn and update shaders |
src/particles/particle_expr.ts |
Recursive AST for the expression modifier: Expr / Action / ExpressionGraph + expr / action builder helpers |
src/particles/particle_expr_compile.ts |
compileExpressionGraph(graph, scope) — lowers an ExpressionGraph to a WGSL { ... } block |
src/particles/particle_expr_graph.ts |
Node-graph form (ExprNodeGraph), NODE_KINDS registry, ExprGraphBuilder, compound setAttribute/attribute entry helpers |
src/particles/particle_expr_graph_compile.ts |
compileExprNodeGraph() — lowers the node-graph form to an ExpressionGraph AST |
src/particles/particle_expr_graph_types.ts |
Type inference for the node graph; TYPE_COLOR palette used by the editor and overlay |
src/shaders/particles/particle_compact.wgsl |
Compact and indirect-write compute shaders |
src/shaders/particles/particle_render.wgsl |
Deferred G-Buffer billboard render shader (albedo + normal + emissive) |
src/shaders/particles/particle_render_forward.wgsl |
Forward HDR billboard render shader — two vertex stages (vs_main, vs_camera) and shape-specific fragment stages (fs_main, fs_snow, fs_pixel, fs_line, fs_fire, fs_smoke, fs_ember) |
src/shaders/particles/particle_instance_translate.wgsl |
Mesh-mode compute that packs {model, normalMatrix, colorTint} into the per-instance buffer for indexed indirect draws |
src/renderer/render_graph/passes/particle_pass.ts |
ParticlePass — full pipeline create/update/execute/destroy; exposes optional heightmapWriter dep so an external compute pass can bake the block_collision heightmap on GPU |
crafty/game/world_heightmap_gpu.ts |
Crafty's GPU-resident world top-Y texture; CPU refreshes per-chunk patches as the player streams chunks in |
crafty/game/world_rain_heightmap_pass.ts |
Per-frame compute pass that samples WorldHeightmapGpu to bake the rain/snow particle heightmap (crafty voxel terrain) |
samples/terrain/rain_heightmap_pass.ts |
Equivalent bake pass that samples the CDLOD terrain's VT atlas directly |
samples/terrain/terrain_rain_heightmap.wgsl / crafty/game/world_rain_heightmap.wgsl |
Compute shaders for the two bakers |
src/sdf/sdf_volume.ts |
SdfVolume runtime resource + bakeFromMesh / fromMesh helpers (shared subsystem — see §4.8) |
src/sdf/sdf_math.ts |
Pure CPU helpers: computeAabb, closestPointOnTri, bakeSdfCpu (test oracle) |
src/shaders/sdf/sdf_bake.wgsl |
Per-texel SDF bake compute shader (point-to-triangle, signed by face normal) |
samples/particle_graph_editor.ts |
Live visual editor: builds a ParticleGraphConfig from canvas nodes, recompiles on every edit |
samples/lib/particle_graph_overlay.ts |
Read-only graph overlay reusable from any sample (P to toggle) |
samples/spinning_sparks.ts / samples/spinning_sparks_graph.ts |
AST-form and node-graph-form versions of the same ring-orbit on_spawn expression |
crafty/config/particle_configs.ts |
Rain and snow particle configurations |
Further reading:
src/renderer/render_graph/passes/particle_pass.ts— Complete particle pass implementationsrc/particles/particle_builder.ts— WGSL shader generationcrafty/config/particle_configs.ts— Rain and snow configscrafty/game/weather_system.ts— Weather integration with particle passes