Chapter 24: Heightmap Terrain System
Chapter 17 describes how Crafty builds a Minecraft-style world out of 16×256×16 voxel chunks. That's one terrain style — a discrete cube grid that the player can dig and place blocks into. The engine ships a second, totally different terrain implementation alongside it: a continuous, heightmap-based renderer designed for very large outdoor worlds where the player just walks around. The two implementations don't share code; they don't even share an art style. This chapter is about the second one.
The heightmap terrain lives in src/terrain/ and is exercised by the mountain_fox, grassy_hills, and bounded_heightmap samples. Its feature set targets the same ground as a modern AAA landscape system: continuous LOD with no popping or cracks, virtually-textured heightmaps streamed across a world larger than memory, four-layer PBR splat materials, cascade-shadow integration, and GPU-driven indirect rendering with no per-frame CPU readback. The terrain's heights can come from procedural noise or from a supplied heightmap image (§24.10), and the world can run as a large camera-streamed expanse or be constrained to a fixed tile area — a finite island or arena (§24.11). All of it is one orchestrator class — TerrainSystem — plus five WGSL files and three render-graph passes.
24.1 What TerrainSystem Owns#
TerrainSystem is the only stateful object the host needs to construct. Its create(ctx) factory builds every persistent GPU resource the three render-graph passes need, and its update(camera) does the per-frame uniform upload and page-streaming work. The host then adds TerrainLodPass, TerrainShadowPass, and TerrainGeometryPass to the render graph in that order.
// ── from samples/mountain_fox.ts (typical wiring) ──
const terrain = TerrainSystem.create(ctx, { heightScale: 130, quality: TerrainQuality.Medium });
await terrain.loadLayerTextures({ albedo: [...], normal: [...], ... });
const lodPass = new TerrainLodPass(terrain);
const shadowPass = TerrainShadowPass.create(ctx, terrain);
const geomPass = TerrainGeometryPass.create(ctx, terrain);
// per frame:
terrain.update(camera);
const lod = lodPass.addToGraph(graph);
const shadow = shadowPass.addToGraph(graph, { cascades, shadowMap, ...lod });
const gbuf = geomPass.addToGraph(graph, { gbuffer, ...lod });
The breakdown of what TerrainSystem owns falls into four buckets:
| Bucket | Resources |
|---|---|
| Virtual texture | atlas_tex (rgba16f, 256² × numSlots layers, default 16), page_table SSBO (pagesPerSide² u32s, default 1024), page-gen compute pipeline + per-slot bind groups, a height-source texture (the procedural-mode dummy or a supplied heightmap), VirtualPageManager (CPU LRU). |
| LOD + draw state | camera UBO (80 B), params UBO (48 B — world_rect + geom_params + misc_params), patches[] SSBO (8192 × 16 B), indirect_args (20 B), the persistent 33² patch index buffer (6 144 indices), LOD compute pipeline. |
| PBR layer textures | Four texture_2d_array<rgba8> arrays (1024² × 4 slices each) for albedo / normal / roughness / displacement, plus a repeating layer sampler and a clamp-to-edge atlas sampler. |
| Shared bind groups | geomBindGroup (11 entries — used by the geometry pass), shadowGeomBindGroup (6 entries — depth-only subset for the shadow pass), lodBindGroup (4 entries — for the LOD compute). |
Two things to notice from the table. First, the patches buffer and the indirect-args buffer are both sized statically at startup; the LOD compute will rewrite them every frame but never reallocates. Second, the geometry pass and the shadow pass share the same patches[] and the same indirect_args — the LOD compute runs once per frame and feeds both. We come back to this when we look at the render-graph wiring in §24.8.
The world rect, page-grid size, and resident-slot count shown as defaults above are all create() options. Leaving them unset reproduces the classic streamed 8192-unit world; setting them shrinks the system to a fixed region (§24.11). The height-source texture defaults to a 1×1 placeholder that the procedural path never reads; setHeightmap() swaps in a real image (§24.10).
24.2 The Patch Quadtree and CDLOD#
The world is one big square — 8192 × 8192 world units centered on the origin by default, or whatever worldOrigin / worldExtent the host passes to create(). It is rendered as a continuous distance-LOD quadtree (CDLOD) — a tree of square patches, each rasterized as the same shared 33×33 vertex grid, with per-vertex morphing between adjacent LOD levels to hide the discrete jumps.
The quadtree root is the world rect: the LOD shader reads params.world_rect for the root origin and size rather than a baked-in constant, so the same shader drives both an 8192-unit expanse and a 1 km island. MAX_DEPTH is fixed at 8, so the finest patch is always worldExtent / 256 world units across — 32 m in the default world, 4 m in a 1 km one.
A patch close to the camera is small (and therefore high-detail per world meter); a patch far from the camera is large. The quadtree is implicit — there is no tree data structure in memory. At every frame the GPU rebuilds the cut, picking the right LOD for every region of the world based on where the camera is.
Selection: one thread per max-depth leaf#
LOD selection happens entirely on the GPU in terrain_lod.wgsl. The shader is dispatched as a (1 << MAX_DEPTH)² / 8² grid of 8×8 workgroups. With MAX_DEPTH = 8 that's a 256×256 thread grid — one thread per potential finest-LOD leaf cell.
Each thread walks the quadtree from the root downward. At each depth it computes:
level_size— the side length of a patch at this depth (ROOT_SIZE / 2^depth)dist— the camera-to-node nearest-point distance (XZ only — height is ignored)- The subdivision rule:
dist < (level_size * 0.5) * lod_distance_scale
If the rule says "subdivide", the thread descends to the next level. If it says "stop", the thread is on its final LOD — but it emits a patch only if it is the representative thread of that node (the bottom-left thread of the node's gid range). All other threads inside the same node bail out without emitting. This way each cell of the quadtree cut is written exactly once with no duplicates, no atomics for deduplication, and no inter-thread communication.
// ── from src/shaders/terrain/terrain_lod.wgsl ──
let should_subdivide = depth < MAX_DEPTH && dist < (level_size * 0.5) * lod_scale;
if (!should_subdivide) {
let mask = (1u << shift) - 1u;
if ((gid.x & mask) == 0u && (gid.y & mask) == 0u) {
let idx = atomicAdd(&indirect_args.instance_count, 1u);
if (idx < MAX_PATCHES) {
patches[idx].offset_x = origin_x;
patches[idx].offset_z = origin_z;
patches[idx].size = level_size;
}
}
return;
}
The single atomicAdd into indirect_args.instance_count is the only synchronization point in the whole pass. It returns the slot index where the patch descriptor goes, and it's the count drawIndexedIndirect will consume one pass later — no CPU readback, no compaction.
Per-vertex morph#
Each emitted patch is a 33×33 grid of vertices. The vertex shader derives (i, j) from vertex_index, the world XZ from (offset + uv * size), and the height from a virtual-texture sample. The trick that makes the seams between adjacent LOD patches invisible is the morph:
let camp = camera.position.xyz;
let probe = vec3f(fine_xz.x, camp.y, fine_xz.y);
let d = distance(camp, probe);
let lod_max = tile.size * lod_dist_scale;
let lod_start = lod_max * (1.0 - morph_zone);
let t = clamp((d - lod_start) / max(lod_max - lod_start, 1e-4), 0.0, 1.0);
let world_xz = mix(fine_xz, even_xz, t);
Each odd-indexed vertex of the 33² grid morphs smoothly toward its nearest even-indexed neighbor as the camera distance approaches this patch's lod_max. By the time the patch is about to age out (its size doubles in the next LOD), every odd vertex has reached its even neighbor — the rasterized grid now exactly matches the coarser-LOD grid that's about to replace it. The next-coarser patch starts at t = 0 (un-morphed), so its vertices live on positions every fine patch was also sampling at the seam.
The constants matter. For the seam to close, the coarse patch's far corner — the vertex furthest from the camera — must already be inside its own morph zone. A coarse patch is emitted as a leaf when its nearest-point distance is at least 0.5 * size * scale. Its far corner is at most √2 * size further. So the constraint is:
0.5·size·scale + √2·size < (1 − morph_zone)·size·scale
scale · (0.5 − morph_zone) > √2 ≈ 1.414
scale = 5, morph_zone = 0.15 gives 1.75 > √2 — comfortable margin. Earlier values (scale = 3, morph_zone = 0.30) gave 0.60 and produced visible 1-texel boundary cracks at every cross-LOD seam. The mountain_fox deep dive walks through the derivation in more detail.
24.3 GPU-Driven Indirect Drawing#
Once the LOD compute has finished, two things are true:
patches[0 .. N-1]containsNpatch descriptors (world offset + side length).indirect_args.instance_count = N.
The render passes can now consume that without ever asking the CPU what N is.
// ── from src/renderer/render_graph/passes/terrain_geometry_pass.ts ──
enc.setPipeline(this._pipeline);
enc.setBindGroup(0, this._terrain.geomBindGroup);
enc.setIndexBuffer(this._terrain.indexBuffer, 'uint32');
enc.drawIndexedIndirect(this._terrain.indirectBuffer, 0);
There is no vertex buffer. The 33×33 grid is described entirely by the persistent index buffer (PATCH_INDEX_COUNT = 32² · 6 = 6144 indices, built once at startup). The vertex shader derives positions from vertex_index and reads the per-instance patch descriptor by instance_index:
// ── from src/shaders/terrain/terrain.wgsl ──
fn compute_world_pos(vid: u32, iid: u32) -> vec3f {
let tile = patches[iid];
let verts = u32(params.geom_params.y); // 33
let i = vid % verts;
let j = vid / verts;
let uv = vec2f(f32(i), f32(j)) / f32(verts - 1u);
// ... morph + height sample ...
}
Render-graph edges for persistent buffers#
Both buffers are persistent — they outlive any single frame. The render graph normally only tracks resources it allocates itself, so a producer→consumer edge between the LOD compute pass (which writes them) and the render passes (which read them) wouldn't exist. Both touches happen via bind groups the graph doesn't know about.
The fix is to import the persistent buffers as graph resources:
// ── from src/renderer/render_graph/passes/terrain_lod_pass.ts ──
const patchesImported = graph.importExternalBuffer(this._terrain.patchBuffer, {
label: 'TerrainPatches', size: MAX_PATCHES * PATCH_BYTES,
});
const indirectImported = graph.importExternalBuffer(this._terrain.indirectBuffer, {
label: 'TerrainIndirect', size: INDIRECT_ARGS_BYTES,
});
graph.addPass('TerrainLOD', 'compute', (b) => {
patchesAfterLod = b.write(patchesImported, 'storage-write');
indirectAfterLod = b.write(indirectImported, 'storage-read-write');
// ...
});
The geometry pass then declares b.read(deps.patches, 'storage-read') and b.read(deps.indirectArgs, 'indirect'). With both ends declared, the graph compiler sees the dependency, refuses to cull the LOD pass, and orders the compute dispatch ahead of the indirect draw. WebGPU's auto-sync inserts the actual GPU barrier from there.
24.4 Virtual Texturing#
The 8192² world is far too much heightmap data to keep in GPU memory at full resolution. At ~1 m per texel a single mip-0 heightmap would be a 64 MB rgba16f texture (most of which is invisible at any given moment). The system instead uses a virtual-texture atlas: by default 1024 virtual pages (pagesPerSide² = 32²) of 256 world units each, of which only 16 (numSlots) are physically resident in GPU memory at a time. Both counts are create() options — a bounded world shrinks the grid until every page fits resident at once (§24.11).
Three pieces:
| Piece | What it is |
|---|---|
| Atlas | texture_2d_array<rgba16float>, 256² per layer, 16 layers. .r = height (world units); .g/.b = normal.x and normal.z. normal.y is reconstructed at sample time as √(1 − x² − z²). The format is filterable and storage-capable, which is rare. |
| Page table | storage<u32>[1024]. One entry per virtual page; value is the atlas layer index (0..15) or 0xFFFFFFFF ("not resident"). |
| Page-gen compute | One workgroup-grid per slot. Each thread writes one rgba16f texel: it evaluates the height function five times (center + four neighbors) — procedural noise or a heightmap-image sample, depending on the source mode (§24.10) — computes height + normal, and textureStores into the assigned atlas slot. |
Sampling through the indirection#
The terrain fragment shader looks up a world XZ position by going through the page table:
// ── from src/shaders/terrain/terrain.wgsl ──
fn vt_lookup(world_xz: vec2f) -> VtCoord {
let origin = params.world_rect.xy;
let extent = params.world_rect.zw;
let vu = clamp((world_xz - origin) / extent, vec2f(0.0), vec2f(0.9999));
let scaled = vu * VT_PAGES_PER_SIDE;
let vp = floor(scaled);
let page_idx = u32(vp.y) * u32(VT_PAGES_PER_SIDE) + u32(vp.x);
var c: VtCoord;
c.slot = page_table[page_idx];
c.page_uv = scaled - vp;
return c;
}
fn sample_atlas(world_xz: vec2f) -> vec4f {
let c = vt_lookup(world_xz);
let invalid = c.slot == VT_INVALID_SLOT;
let safe_slot = select(c.slot, 0u, invalid); // avoid OOB index in branch
let v = textureSample(atlas, atlas_sampler, c.page_uv, i32(safe_slot));
return select(v, vec4f(0.0), invalid);
}
The select-then-mask pattern (rather than an if-return) is mandatory. textureSample requires uniform control flow to compute derivatives for filtering, and an early-return on a per-fragment slot would diverge fragment-to-fragment. textureSampleLevel (used in the vertex shader, where divergence is fine) doesn't have this restriction.
When the slot is 0xFFFFFFFF the function returns vec4(0) — the splat falls back to grass at h = 0 and the visible result is a flat green plain at the edge of the streamed window. A real game would dedicate one slot to a permanently-resident coarse world overview so distance always has something to draw; that's an explicit future-work item in the mountain_fox deep dive.
24.5 Streaming the Resident Window#
Page streaming is what makes the virtual-texture system useful: as the camera walks across the world, pages that move into the player's vicinity are generated on demand, and pages that move out are evicted to make room.
Per-frame flow#
Each frame, TerrainSystem.update(camera):
- Computes the camera's containing virtual page
(cvx, cvz)by integer-dividing camera XZ by the page size (256 world units). - Builds the desired set — a 4×4 window of pages centered around
(cvx, cvz). That's 16 pages, exactly the number of atlas slots, so steady-state residency reaches 100 % occupancy. - Hands the desired set to
VirtualPageManager.setDesired(), which decides what to keep, what to evict, and what to fetch. - Re-uploads the CPU mirror of
page_tableonly if anything actually changed. - Submits a page-gen compute pass for each newly-assigned
(virtual_page, slot)pair.
VirtualPageManager — a CPU LRU#
The page manager is a small data structure:
// ── from src/terrain/terrain_system.ts ──
class VirtualPageManager {
private readonly _residency = new Map<number, number>(); // vp → slot, insertion order = LRU
private readonly _freeSlots: number[];
readonly pageTable: Uint32Array;
private readonly _pending: PageGenJob[] = [];
// ...
}
JavaScript Map iterates in insertion order, which we exploit as the LRU order: "touching" a resident page is delete + re-set to move it to the tail; evicting takes whichever resident page near the head is not in the desired set. New assignments push onto _pending, which the renderer drains on the same frame into per-slot page-gen dispatches.
The streaming radius (VT_STREAMING_RING = 2 → 4×4 window) is tuned to match the physical slot count: the desired set never exceeds 16 pages, so once the world has been visited at least once, eviction churn is bounded to the few pages the camera crosses each frame. A camera teleport that wants 16 fresh pages all at once will regenerate them in a single frame (~ms scale on a desktop GPU); a real production game would throttle that.
In bounded mode this whole window dance is skipped: the desired set is simply every page in the grid, computed once and camera-independent, so the entire region stays baked at full atlas detail with no eviction churn at all (§24.11).
Submit ordering and barriers#
Page-gen runs in its own command buffer, submitted before the render graph's command buffer. WebGPU's queue ordering guarantees that queue.submit([pageGen]) followed by queue.submit([graph]) produces the same observable ordering as recording both in one encoder — the atlas writes are visible to the render passes without any explicit barrier on our part. The page-gen pass uses storage_2d_array with access: 'write-only', and the geometry pass reads through textureSample on a texture_2d_array<f32> view; the driver handles the layout transition.
24.6 The Page-Gen Compute Pass#
Each page-gen dispatch handles one of the 16 atlas slots. The shader writes one texel per thread, 256² texels per slot, dispatched as 32×32 workgroups of 8×8.
terrain_height() opens with a branch on source_mode: when an image has been bound it samples that instead (covered in §24.10); otherwise it runs the procedural path below. Everything after — the 5-tap normal, the textureStore — is source-agnostic, which is exactly why a heightmap image drops in without touching any other stage.
The procedural height function is straightforward fBm + ridged mountain noise:
// ── from src/shaders/terrain/terrain_page_gen.wgsl ──
fn terrain_height(world_xz: vec2f) -> f32 {
let base = fbm(world_xz * 0.0011);
let plains = fbm(world_xz * 0.004) * 0.35;
let mtn = pow(ridged(world_xz * 0.0065), 1.7) * 1.4;
let mtn_mask = smoothstep(0.40, 0.72, base);
var h = plains + mtn_mask * mtn;
h = h * mix(0.45, 1.0, base);
return h * params.height_scale;
}
base is a very-low-frequency "continentalness" field. plains is a smaller fBm that gives low-altitude variation everywhere. mtn is ridged noise raised to a power to sharpen the ridges. mtn_mask (a smoothstep on the base) lets mountains appear only in regions where base is high, so the world has clearly mountain-bearing and clearly flat regions instead of mountains everywhere.
Normals come from finite-differences of the noise itself, not from sampling neighbor texels:
let h = terrain_height(world_xz);
let h_xm = terrain_height(world_xz - eps_x);
let h_xp = terrain_height(world_xz + eps_x);
let h_zm = terrain_height(world_xz - eps_z);
let h_zp = terrain_height(world_xz + eps_z);
let dx = (h_xp - h_xm) / (2.0 * params.texel_size.x);
let dz = (h_zp - h_zm) / (2.0 * params.texel_size.y);
let n = normalize(vec3f(-dx, 1.0, -dz));
textureStore(atlas_out, vec2i(gid.xy), i32(params.slot_index),
vec4f(h, n.x, n.z, 0.0));
The cost is five noise evaluations per texel instead of one, but the gain is that the normal is valid everywhere. Sampling neighbor texels would introduce a discontinuity at page boundaries — the normal at texel (0, y) in one page would be computed from a neighbor in a different page, and if those pages were generated at different times with different params, the gradient could shift by a few percent of a texel and create a visible crease.
Bit-identical CPU port#
A subtle correctness issue is that the player-controller (WalkerController in grassy_hills) needs to know the terrain height at the player's XZ every frame to keep the camera glued to the ground. Doing a GPU readback every frame would stall the pipeline; doing it asynchronously would lag the camera by a frame.
The solution is terrain_noise.ts — a CPU port of the page-gen procedural noise. It must produce bit-identical heights to the WGSL version. The first attempt used fract(sin(...)) for the hash, which broke because Math.sin (f64) and WGSL sin (f32) round differently for non-trivial arguments — the CPU and GPU heights drifted several meters apart at distant world coords, and the walker would sink into the ground when you walked there.
The fix is an integer hash (PCG-style mixing) that runs in Math.imul:
fn hash_u32(x_in: u32) -> u32 {
var v = x_in;
v = (v ^ 2747636419u) * 2654435769u;
v = (v ^ (v >> 16u)) * 2654435769u;
v = (v ^ (v >> 16u)) * 2654435769u;
return v;
}
fn hash21i(ix: i32, iy: i32) -> f32 {
let h = hash_u32(u32(ix) * 374761393u + hash_u32(u32(iy)));
return f32(h & 0x00ffffffu) / 16777216.0;
}
The mask to 24 bits before the divide is the second piece of the fix: values up to 2^24 - 1 round-trip through f32 exactly, so the result is bit-identical to the JS version regardless of whether the host runs the hash in f32 or f64.
24.7 PBR Materials and Splat Blending#
The terrain shader does no actual lighting. Its job is to fill the engine G-Buffer (albedo+roughness, normal+metallic, emissive, depth) — the engine's DeferredLightingPass then shades it the same way it shades every other opaque object. This is the integration point that lets the terrain participate in image-based lighting, screen-space global illumination, screen-space ambient occlusion, and cascade shadow sampling without any terrain-specific code paths.
Each fragment is blended from four layers:
| Slot | Layer | Where it appears |
|---|---|---|
| 0 | grass | low altitude, gentle slope |
| 1 | dirt | mid altitude |
| 2 | rock | steep slope, or high altitude |
| 3 | snow | high altitude, gentle slope |
The four layer arrays (albedo, normal, roughness, displacement) are loaded asynchronously by loadLayerTextures — JPG/PNG go through createImageBitmap + copyExternalImageToTexture; HDR (Radiance RGBE) is decoded on the CPU and uploaded as 8-bit unorm. Until each map resolves, the default initialization (white albedo, flat normal, mid-gray roughness, mid-gray displacement) is in place — so the first few frames shade legitimately as the textures stream in.
The splat weights come from height-and-slope smoothsteps modulated by a low-frequency fbm2 jitter so the seams meander naturally instead of forming clean horizontal contour lines. Weights are normalized to sum to 1 before they're used.
Anti-tile sampling#
The PBR maps repeat every 32 world units. At altitude the human eye can pick out the regular grid from the air; "every patch shows the same tile" is the first thing that breaks the illusion of a real landscape. The shader's anti_tile() does two things on top of every textureSample:
- A low-frequency UV warp drives a non-uniform offset every ~1.4 tile-periods.
- Two rotated phases (90° rotation, irrational scale
× 0.527) are blended with a hash at a different scale than the warp.
Per fragment that's 4 maps × 4 layers × 2 phases = up to 32 texture samples. Splat weights are commonly zero on 2–3 of the 4 layers, so the hardware cache hides most of the wasted reads on desktop; mobile GPUs see it. A future fix is dual-channel anti-tile (min(a, b) blend instead of hash-mix) which collapses the phase count to one sample per map.
Parallax mapping#
Each layer's displacement map drives a one-sample parallax offset on the UV fed to the other three maps. The cost is one extra textureSample per layer, and the result is convincing low-relief texture detail without an actual raymarch:
let view_ts = vec3f(dot(t, view_world), dot(b, view_world), dot(n_world, view_world));
let h = anti_tile(layer_disp_tex, layer_sampler, uv, i32(idx)).r - 0.5;
let offset = (view_ts.xy / max(view_ts.z, 0.3)) * h * (PARALLAX_STRENGTH / LAYER_TEXTURE_WORLD_REPEAT);
return uv - offset;
The max(.., 0.3) clamp on the denominator stops grazing-angle views from pushing the UV by tens of texels. Centering the displacement around 0.5 instead of [0, 1] means flat regions get no offset, and the effect symmetrically lifts ridges and deepens cracks instead of pushing everything one direction.
Tangent-space normals#
layer_world_normal builds a TBN per fragment by Gram-Schmidting world-X against the atlas-derived surface normal (the third axis falls out as a cross product). Because the XZ-UV mapping aligns texture.x with world.x, this gives a consistent tangent basis across the whole terrain without storing a per-vertex tangent attribute. The sampled normal map is decoded as map * 2 - 1 (GL convention), rotated by the TBN, and renormalized.
Per-vertex normals (the obvious first attempt) was dropped because a finer patch interpolates the normal over more samples along a shared edge than the coarse side, producing a visible shading discontinuity at every LOD seam. Per-fragment resampling sidesteps the problem entirely: both sides resolve the same world XZ to the same atlas normal.
24.8 The Three Render-Graph Passes#
Three passes consume the TerrainSystem state. The patches[] / indirect_args produced by the LOD pass are shared by both the shadow pass and the geometry pass — LOD selection runs once and feeds both.
TerrainLodPass — compute#
The simplest of the three. It imports the persistent patches and indirect-args buffers and dispatches the LOD compute. There are no other dependencies; the camera UBO + params UBO are referenced via the system's bind group.
// ── from src/renderer/render_graph/passes/terrain_lod_pass.ts ──
graph.addPass('TerrainLOD', 'compute', (b) => {
patchesAfterLod = b.write(patchesImported, 'storage-write');
indirectAfterLod = b.write(indirectImported, 'storage-read-write');
b.setExecute((pctx) => {
const enc = pctx.computePassEncoder!;
enc.setPipeline(this._terrain.lodPipeline);
enc.setBindGroup(0, this._terrain.lodBindGroup);
enc.dispatchWorkgroups(Math.ceil(LOD_GRID / 8), Math.ceil(LOD_GRID / 8), 1);
});
});
TerrainShadowPass — render, one draw per cascade#
The cascade shadow map produced by the engine's ShadowPass is a 4-layer depth32float array texture (one layer per CSM cascade). TerrainShadowPass appends terrain depth to each cascade in turn — one render pass per cascade, each rendering into a single-layer view of the same shadow map.
// ── from src/renderer/render_graph/passes/terrain_shadow_pass.ts ──
for (let c = 0; c < N; c++) {
graph.addPass(`TerrainShadowPass.cascade${c}`, 'render', (b) => {
b.read(deps.patches, 'storage-read');
b.read(deps.indirectArgs, 'indirect');
nextShadow = b.write(inHandle, 'depth-attachment', {
depthLoadOp: 'load', depthStoreOp: 'store',
view: { dimension: '2d', baseArrayLayer: c, arrayLayerCount: 1 },
});
// ...
});
}
Two notable choices:
- Front-face culling (
cullMode: 'front'). Rendering only back-facing geometry into the shadow map keeps self-shadowing acne off sun-facing slopes without slope-scaled depth bias. - Same patches[] as the camera view. The shadow pass doesn't rerun LOD selection for the light frustum. Some patches outside the camera's view but inside the light frustum get over-drawn or missed; the trade-off is one LOD compute instead of two. For a directional sun and a typical CSM frustum that follows the camera, the loss is small.
TerrainGeometryPass — render, fills the G-Buffer#
The visible-frame pass. Either loads an existing engine G-Buffer (when terrain is co-rendered with other deferred geometry) or creates a fresh one. Writes to the three G-Buffer color attachments + depth, then the engine's DeferredLightingPass consumes them downstream.
// ── from src/renderer/render_graph/passes/terrain_geometry_pass.ts ──
outAlbedo = b.write(albedo, 'attachment', { loadOp: load, storeOp: 'store', clearValue: [0,0,0,1] });
outNormal = b.write(normal, 'attachment', { loadOp: load, storeOp: 'store', clearValue: [0,0,0,0] });
outEmissive = b.write(emissive, 'attachment', { loadOp: load, storeOp: 'store', clearValue: [0,0,0,0] });
outDepth = b.write(depth, 'depth-attachment', { depthLoadOp: load, depthStoreOp: 'store', depthClearValue: 1.0 });
There's also a fourth pass — terrain_point_spot_shadow_pass.ts — that renders terrain into the point/spot VSM cube/2D shadow maps the engine maintains for forward dynamic lights, with a separate shader that emits moments rather than depth. Same patches[] / indirect_args as the other two render passes.
24.9 Two Bind Group Layouts, One Bind Group#
The shadow shader doesn't need the four PBR layer arrays — it only writes depth. The full geometry shader binds 11 entries; the shadow shader needs only 6. WebGPU bind-group layouts are pipeline-creation-time fixed, so this is enforced at the layout level:
// ── from src/terrain/terrain_system.ts ──
readonly geomBgl: GPUBindGroupLayout; // 11 entries: camera, params, patches,
// atlas + page table + atlas sampler,
// 4 layer textures + layer sampler
readonly geomBindGroup: GPUBindGroup;
readonly shadowGeomBgl: GPUBindGroupLayout; // 6 entries: the first 6 only
readonly shadowGeomBindGroup: GPUBindGroup; // same underlying resources
Each pipeline asks for the BGL it needs; both bind groups point at the same underlying GPU resources (camera UBO, params UBO, patches SSBO, atlas, page table, atlas sampler). The smaller BGL keeps the shadow pipeline layout compact and avoids binding the PBR arrays where they aren't sampled.
24.10 Driving Heights from a Heightmap Image#
Procedural noise is only one way to fill the atlas. Because the page-gen shader is the single place a height is ever created — and everything downstream (the quadtree, streaming, splat surfacing, shadows, deferred lighting) only ever reads heights back out of the atlas — swapping the source is a localized change. TerrainSystem.setHeightmap() does exactly that: it flips source_mode to 1, binds an image, and rebakes every resident page.
The shader branch is the whole mechanism. terrain_height() maps the world XZ into the heightmap's rect, samples the red channel as a normalized height, and scales it; outside the rect it returns a floor_height so a finite island sits in open water rather than ending in a vertical wall:
// ── from src/shaders/terrain/terrain_page_gen.wgsl ──
fn terrain_height(world_xz: vec2f) -> f32 {
if (params.source_mode == 1u) {
let uv = (world_xz - params.hm_origin_xz) / params.hm_extent_xz;
if (uv.x < 0.0 || uv.x > 1.0 || uv.y < 0.0 || uv.y > 1.0) {
return params.floor_height;
}
return textureSampleLevel(height_src, height_src_sampler, uv, 0.0).r
* params.height_scale;
}
// ... procedural fBm + ridged path (§24.6) ...
}
The 5-tap normal computation in cs_main is unchanged — it calls terrain_height() for the center and four neighbors regardless of source, so normals come out correct at page seams in both modes. That source-agnostic seam handling is the reason image heights "just work" with the existing LOD, morph, and shadow machinery.
Host-side, the caller hands over a texture and the rect it maps onto:
// ── from samples/bounded_heightmap.ts ──
const hm = await loadHeightmapTexture(device, HEIGHTMAP_URL, { resolution: 2048 });
terrain.setHeightmap(hm.texture, {
origin: [-WORLD_EXTENT / 2, -WORLD_EXTENT / 2],
extent: [WORLD_EXTENT, WORLD_EXTENT],
heightScale: HEIGHT_SCALE,
floorHeight: 0, // sea level outside the image
});
The texture is an r16float (filterable, ~11-bit mantissa — plenty for a normalized height) whose R channel is the height. loadHeightmapTexture in heightmap_loader.ts decodes a grayscale JPG/PNG at a modest internal resolution, runs a couple of box-blur passes, and resamples up before upload. That denoising matters: a raw JPG carries per-pixel and block-compression noise that, amplified by the height scale, turns into a forest of spikes — the blur averages it into smooth landforms. Callers with their own field can skip the image path and upload directly via heightmapDataToTexture.
The payoff is everything the system already does, now applied to an authored shape. The standalone heightmap_terrain sample (Chapter intro) builds a single full-resolution CPU mesh and bakes its surfacing into a stock PbrMaterial precisely because it has no LOD system to plug into — so it gives up continuous LOD and can only cast, not receive, the engine's cascaded shadows. Feeding the same image through TerrainSystem instead gets CDLOD, virtual-texture streaming, the four-layer splat, and full deferred lighting for free.
24.11 Bounded Terrain: a Fixed Tile Area#
The default world streams a 4×4 page window around the camera across a 32×32 grid, which reads as "endless" even though the extent is finite. The opposite regime — a small, fully-resident region with a hard edge — is what you want for an island, an arena, or any handcrafted playfield. Three create() options switch into it.
// ── from samples/bounded_heightmap.ts ──
const terrain = TerrainSystem.create(ctx, {
quality: TerrainQuality.Medium,
heightScale: 200,
worldOrigin: [-512, -512],
worldExtent: 1024, // 1 km island — drives the quadtree root AND the page grid
pagesPerSide: 8, // 8 × 8 = 64 pages
bounded: true, // every page resident; no camera-centered streaming
});
Each option targets one of the three places the old design assumed an infinite-feeling world:
worldExtent/worldOrigin→ the quadtree root. Since the LOD shader now readsparams.world_rect(§24.2), shrinking the extent shrinks the root. Patches are generated only inside the rect, so the terrain is a finite slab — beyond its edge there are simply no patches and the sky shows through.pagesPerSide→ the virtual-texture grid. A smaller grid means fewer, larger pages. Eight per side over 1024 m is 128 m per page; atMedium(256 texels/page) that's 0.5 m per heightmap texel.bounded→ residency. With it set, the desired-page set is the entire grid, computed once (§24.5).numSlotsdefaults topagesPerSide²so all 64 pages fit resident simultaneously — the whole island is baked once at startup and never streams again.
The cost is atlas memory: numSlots layers of pageTexels² rgba16f. The 8×8 Medium island above is 64 × 256² × 8 B ≈ 34 MB; the same grid at High (512 texels) would be ~134 MB, so bounded worlds favor a modest page-texel count over a huge slot pool. create() throws if numSlots < pagesPerSide² in bounded mode rather than silently leaving distant pages unbaked.
Pairing the two features — a heightmap image inside a bounded rect, with floorHeight at sea level — is the island recipe the bounded_heightmap sample demonstrates: a finite, authored landmass that still participates in the engine's shadows, AO, and image-based lighting like any other deferred geometry.
24.12 Summary#
The heightmap terrain renderer is a worked example of a few WebGPU patterns that compose well together:
- CDLOD on an implicit quadtree, GPU-driven. One compute thread per max-depth leaf, per-thread quadtree walk, per-thread subdivision test. The representative-thread trick eliminates a deduplication step.
- Per-vertex morph with a derived constant constraint. The morph factor
tis computed per vertex from camera distance, and the constants(scale, morph_zone)are forced by the algebra of cross-LOD boundary matching. - Virtual texturing with a CPU LRU. Indirection through a page table, a tiny atlas slot pool, and a per-frame desired-set window keep the working set in memory. Page-gen is a separate command-buffer submit ahead of the render graph; queue ordering is the only barrier.
- Indirect draw, persistent buffers, imported into the render graph.
importExternalBufferis what makes the compute → draw dependency visible to the graph compiler — without it, the LOD pass would be culled. - Integration as a G-Buffer producer, not a special-case material. The terrain pass writes the engine G-Buffer and
DeferredLightingPassshades it like everything else. No terrain-specific lighting or shadow code in the engine's hot path. - A single, swappable height source. Heights are created in exactly one shader (
terrain_height), so a heightmap image replaces procedural noise with a one-branch change and nothing downstream notices (§24.10). - A configurable world rect, not a baked-in constant. Feeding the quadtree root and page grid from uniforms lets the same code render an endless streamed expanse or a fully-resident bounded island (§24.11).
The sample-level deep dive at Terrain - CDLOD GPU terrain rendering covers the deeper material (anti-tile sampling derivation, parallax mapping math, the sun arc, time-of-day cycle, aerial perspective + height fog post-effects, walker mode, and a full known-limitations table) at a level of detail beyond the scope of this chapter.