Taos Engine ▦ Taos: Building a Modern WebGPU Game Engine

Terrain — Technical Deep Dive

A small WebGPU terrain renderer built from scratch over the course of this sample, aiming at the feature set of UE-style landscape: continuous LOD with no popping or cracks, virtually-textured heightmaps with per-camera streaming across a world larger than memory, PBR splat-blended materials with anti-tile sampling + parallax mapping, a moving sun + moon with directional shadows, forward dynamic point lights, G-Buffer outputs feeding GTAO that attenuates only the ambient term, aerial perspective + height fog, and TAA. The goal was a clean reference for each technique with explicit derivations of the failure modes you run into and how the math forces particular constants.

The core renderer is one orchestrator class (TerrainRenderer) plus five WGSL files. A handful of small sample-local passes (AO composite, aerial perspective, height fog, debug points) layer post effects on top, and the page's mountain_fox.ts glues it all together. No engine RenderFeature integration, no scene graph — the sample directly drives the render graph.

Try it. npm run dev then samples/mountain_fox.html. Free-fly is the default — WASD + mouse drag. F toggles between free-fly and a first- person walker that snaps to the terrain surface (gravity + jump on Space). B toggles the LOD-colored tile-boundary overlay. The collapsible panel on the top-left toggles every effect (TAA / GTAO / bloom / DoF / motion blur / stars / rain / point lights / clouds) and exposes sliders for aerial perspective, height fog, clouds, and the time-of-day arc. Camera position / yaw / pitch are visible in the HUD; the URL params ?pos=x,y,z&yaw=deg&pitch=deg reproduce an exact view.

Files#

File Responsibility
samples/mountain_fox.html/.ts Page shell + render-loop wiring (camera + walker controllers, sun/moon arc, full post chain, effects panel widget, HUD, B / F key toggles).
terrain/terrain_renderer.ts TerrainRenderer class — owns the atlas, page table, page-gen / LOD / shadow / render pipelines, the per-layer PBR texture arrays + async loader, the CPU streaming page manager, and the addToGraph flow.
terrain/terrain.wgsl Main render: VS (camera + light variants), PBR FS with splat-blend over 4 textured layers, anti-tile sampler, parallax mapping, virtual-texture sampler, PCF shadow lookup, multi-light direct (sun + moon + point lights). Emits an HDR + G-Buffer-normal + ambient-only triple.
terrain/terrain_lod.wgsl GPU-driven CDLOD compute: per-thread quadtree walk that emits visible patches + atomically bumps the indirect draw count.
terrain/terrain_page_gen.wgsl Procedural page-gen: one slot of the atlas, height + normal packed into rgba16f.
terrain/terrain_debug_lines.wgsl LOD-color-coded patch boundary overlay (B key).
terrain/terrain_noise.ts CPU port of the page-gen procedural noise, bit-identical to the WGSL hash. Used by the walker to ground the player without a GPU readback.
terrain/walker_controller.ts First-person walker (gravity + jump + ridge-aware eye lift). Uses terrain_noise.ts for the height lookup.
terrain/ao_apply_pass.ts Composites a GTAO factor into the lit HDR — attenuates only the ambient component (hdr − ambient · (1 − AO)).
terrain/atmospheric_apply_pass.ts + .wgsl Aerial-perspective post: blends terrain toward an inline Rayleigh + Mie scattering color by camera distance.
terrain/height_fog_pass.ts Exponential height fog, exp-falloff with altitude, sky-pixel-aware. Tints with the active sky color.
terrain/debug_points_pass.ts RGB axis-cross overlay at the walker's ridge-sample points (debugHeightSamples toggle).
terrain/day_cycle.ts, terrain/fox_controller.ts, terrain/grass.wgsl, terrain/grass_pass.ts, terrain/instanced_geometry_pass.ts, terrain/shadow_instanced.wgsl, terrain/fox_touch_controls.ts, terrain/terrain.ts, terrain/fox_explorer.md Shared with the grassy_hills.html sample; not used by mountain_fox.html.
terrain/textures/<material>/ Per-material PBR maps used by the splat-blended FS — see Material textures.

Frame timeline#

update(camera, sun, sky, moon, ...)              ── per frame ──
 ├─ updateSunFromTimeOfDay()                     (sun arc + moon antipode + sky tint)
 ├─ TerrainRenderer.update():
 │   ├─ Stream pages around camera (CPU LRU)
 │   ├─ writeBuffer: cameraUBO, paramsUBO (sun+moon+point lights+lightVP),
 │   │                 indirect-args, page-table (if dirty)
 │   └─ submit: page-gen compute (only slots newly assigned this frame)
 ├─ atmosphere.update / cloud.update / star.update / rain emit transform

addToGraph(graph) for each pass, in this order:
 ├─ AtmospherePass            → HDR cleared with sky scatter
 ├─ TerrainRenderer:
 │   ├─ LOD compute            (writes patches[], atomically bumps indirect.instance_count)
 │   ├─ Shadow pass            (drawIndexedIndirect → depth from sun POV)
 │   ├─ Main render pass       (drawIndexedIndirect → 3 attachments: HDR + normal + ambient)
 │   └─ [optional] Debug boundary lines
 ├─ GTAOPass + AOApplyPass     (normal + depth → AO, composited into ambient only)
 ├─ AtmosphericApplyPass       (camera-distance haze, depth-aware)
 ├─ HeightFogPass              (exp altitude falloff, sky-pixel aware)
 ├─ CloudPass                  (volumetric raymarched, overlay on HDR)
 ├─ StarsPass                  (depth==1 gating, fades with sun altitude)
 ├─ ParticlePass (rain)        (HDR sprite forward, depth gated)
 ├─ TAAPass                    (sub-pixel jitter resolve)
 ├─ MotionBlurPass             (camera-only velocity, after TAA)
 ├─ DofPass                    (focal blur)
 ├─ BloomPass                  (HDR highlight extract + blur)
 ├─ DebugPointsPass            (walker ridge-sample crosses, gated to walker mode)
 └─ TonemapPass                (HDR → backbuffer)

Every effect after the terrain render pass is wired through the Effects panel so the user can dial them on / off and explore the chain interactively. A frame ends up at ~140-160 FPS at 1280×800 on an M-series-class GPU, scaling sub-linearly with patch count.

CDLOD on a quadtree, GPU-driven#

The terrain is a heightfield rendered as a continuous distance-LOD quadtree (CDLOD): a tree of square patches, each rasterized as the same shared 33×33 vertex grid, with per-vertex morphing toward the next-coarser LOD's grid as the camera distance approaches the LOD's transition range.

Selection#

Patch selection happens once per frame in terrain_lod.wgsl. The world is divided into a (1 << MAX_DEPTH)² grid of cells (for Phase 4: MAX_DEPTH = 8, 256² = 65 536 cells, dispatched as 32² workgroups). Each thread:

  1. Walks the quadtree from the root downward, hypothetically considering the node it belongs to at each depth.
  2. At each level, applies the subdivision test dist < (level_size * 0.5) * lod_distance_scale, where dist is the camera-to-node nearest-point distance (XZ).
  3. Stops at the first level that doesn't subdivide — that's its LOD.
  4. Emits the patch only if this thread is the "representative" of the node (the lower-left thread of the node's gid range). All other threads in the node bail out, so each leaf is written exactly once with no duplicates and no synchronization.

The shader atomicAdds into indirect_args.instance_count; the count is later consumed by drawIndexedIndirect. There's no CPU readback in the loop.

Indirect drawing#

Two render passes (shadow + main) both bind:

  • A persistent 33×33 patch index buffer (6 144 indices, never changes).
  • The patches storage buffer (filled by the LOD compute, one descriptor per visible patch: world offset + size + pad).
  • The indirect-args buffer the LOD compute populated.

The draw call in both passes is literally:

enc.setIndexBuffer(this._indexBuffer, 'uint32');
enc.drawIndexedIndirect(this._indirectBuffer, 0);

No vertex buffer at all — the VS derives (i, j) and the UV from vertex_index, then reads the per-instance patch descriptor by instance_index.

The boundary-cracks problem#

This is the part that needed the most thought, so it's worth walking through in detail.

What goes wrong with naive CDLOD#

In CDLOD, each LOD has a max view distance lod_max = size * scale. A vertex morphs toward its even-grid neighbor by a factor t = clamp((d - lod_start) / (lod_max - lod_start), 0, 1) where lod_start = lod_max * (1 - morph_zone) and d is the camera-to-vertex distance.

The intent is: when this LOD reaches the end of its range, all its vertices are fully morphed (t = 1) to the next-coarser grid. The next-coarser LOD takes over with t = 0 (no morph), so the seam between them lies on the coarser grid — vertices match.

The problem appears when a finer patch (LOD D+1) sits next to a coarser patch (LOD D) — which happens whenever the quadtree subdivides unequally. The finer patch's edge that faces the coarser patch needs:

  • finer-side t = 1 (fully morphed, lies on the coarser grid)
  • coarser-side t = 0 (un-morphed, at native coarser positions)

The finer-side condition is automatic: the boundary distance from the camera is greater than lod_max_fine, so t saturates to 1. The coarser-side condition isn't — it requires that every vertex on the coarser patch's edge that faces the finer patch have t = 0, i.e. distance <= lod_start_coarse.

Where the bound comes from#

A coarser patch is emitted as a leaf when its nearest-point distance nearest_dist >= 0.5 * size * scale (the subdivision threshold). Its far-corner distance can be up to nearest_dist + √2 * size (the patch's diagonal pointed away from the camera).

For the seam to hold at every vertex on the coarse edge, we need

nearest_dist + √2 · size  <  morph_start
0.5 · size · scale + √2 · size  <  (1 - morph_zone) · size · scale
0.5 + √2 / scale  <  1 - morph_zone
scale · (0.5 - morph_zone)  >  √2  ≈ 1.414

This is the CDLOD seam constraint — solve it for valid (scale, morph_zone) pairs:

scale morph_zone scale·(0.5−morph_zone) Safe?
3.0 0.30 0.60 ❌ (initial, broken)
5.0 0.15 1.75 ✅ (final values)
5.0 0.30 1.00 borderline
7.0 0.30 1.40 ≈ √2

The first row is what the renderer shipped with originally — every configuration of adjacent-different-LOD patches produced cracks, because the coarse patch's far corners landed inside its own morph zone (t > 0) while the adjacent finer patch was already fully morphed to coarse-grid (t = 1). A 1-texel vertical mismatch at thousands of patch boundaries is exactly the sub-pixel-hole pattern you see in the sky-bleed screenshots from the conversation.

What about per-patch morph?#

A natural-looking alternative is to compute t once per patch (from the patch's nearest-point distance) and reuse it for every vertex. This does fix the cross-LOD case — when the coarse patch is "not yet morphing," none of its vertices move.

But it breaks the same-LOD case: two adjacent same-LOD patches at slightly different center distances get slightly different per-patch t values. The shared-edge vertex morphs differently on each side → fresh cracks at every same-LOD boundary, which is much more common than cross-LOD.

Per-vertex morph + correct constants is the only configuration where both boundary types work, and scale = 5, morph_zone = 0.15 is what we ended up with.

Stitching across mip levels#

A subtler issue: the heightmap is sampled at a different mip level per LOD (finest patches at mip 0, coarsest at mip 5). At a cross-LOD boundary, the two sides would otherwise sample different mips at the same XZ → different heights → vertical mismatch.

Solution: blend mips by the morph factor:

let mip_native = log2(tile.size) - 5.0;
let h = sample_height_mip(world_xz, mip_native + t);

When the finer side is fully morphed (t = 1), it samples at mip_native + 1 — exactly the coarser side's mip_native. Same XZ, same mip → bit-identical height → seam closes.

Phase 3+ note. After moving to the VT atlas (no mip chain on the atlas), this mip-blending becomes a no-op (mip = 0 either way). The seam still holds because the CDLOD geometry constraints above guarantee vertex-position matching independent of the texture path.

Virtual texturing#

Phase 3 swaps the single mip-chained heightmap for an indirected virtual-texture atlas.

Component Detail
Atlas texture_2d_array<rgba16float>, 256×256 per layer, 16 layers. r = height; g/b = normal.x / .z (normal.y reconstructed at sample time).
Page table storage<u32>[1024] (Phase 4 size). One entry per virtual page; value = atlas layer index, or 0xFFFFFFFF ("not resident").
Page-gen compute Per-slot dispatch. Generates 256² rgba16f texels from the same procedural noise the VS would otherwise have sampled directly; height and normal come from the same noise call so they're always in sync.
Page-gen params One UBO per atlas slot, carrying (page_origin_xz, texel_size, height_scale, slot_index). Rewritten when the streamer reassigns a slot to a new virtual page.

Indirection in the shader#

fn sample_atlas(world_xz: vec2f) -> vec4f {
  let c = vt_lookup(world_xz);                      // virtual UV → (slot, page UV)
  let invalid = c.slot == VT_INVALID_SLOT;
  let safe_slot = select(c.slot, 0u, invalid);      // avoid OOB index in branch
  let v = textureSample(atlas, atlas_sampler, c.page_uv, i32(safe_slot));
  return select(v, vec4f(0.0), invalid);
}

The select-then-mask pattern (rather than if-return) is mandatory because textureSample requires uniform control flow to compute consistent derivatives — an early-return on a per-fragment slot would diverge fragment- to-fragment and the validator (correctly) rejects it. textureSampleLevel in the VS, and textureSampleCompareLevel for shadows, don't have this restriction; we picked them deliberately where divergent flow is hard to avoid.

World-tile streaming#

Phase 4 expands the virtual world to 8192² m (64× the area Phase 3 had) while keeping the physical atlas at 16 slots. Only the pages near the camera are resident; the rest are paged out.

Page manager (CPU)#

VirtualPageManager is a small LRU layer above the atlas:

  • Map<vp, slot> keyed in insertion order = LRU. "Touching" a page is delete + re-set, which moves it to the tail.
  • _freeSlots: number[] pop from the back; eviction grabs the head of the Map iteration (oldest non-desired page).
  • A CPU mirror of the page table (Uint32Array(1024)) is uploaded only when a frame actually changed the mapping.
  • Newly-assigned slots accumulate in a _pending list that the renderer drains into per-slot page-gen dispatches.

The "desired" set each frame is a (2 × VT_STREAMING_RING)² = 4 × 4 = 16 page window around the camera's containing virtual page — exactly the size of the physical atlas, so steady-state has no eviction churn; only camera motion across a page boundary triggers a single evict + gen cycle.

Submit ordering#

Page-gen is its own command buffer submitted before the render graph's buffer. WebGPU's queue ordering ensures that queue.submit([pageGen]) followed by queue.submit([graph]) produces the same observable ordering as recording both in one encoder — the atlas writes are visible to the render pass without any explicit barrier.

Beyond the streamed window, VT_INVALID_SLOT returns vec4(0) — the FS falls into grass + low-slope splat at h=0, producing a flat green plain. That's the visible "streaming edge" in screenshots. A real game would put a permanently-resident coarse world overview in slot 0 so distance has something to draw; the slot-15 strategy is left as future work.

PBR materials#

Four layers (splat order: low-altitude → mid → steep → peaks), splat-blended per fragment with height + slope weights and a low-frequency fBm jitter so layer boundaries meander naturally instead of forming clean contour lines. Each layer is a real-world PBR scan (1024² JPG/PNG/HDR) packed into three texture_2d_array<f32> resources (albedo + normal + roughness), plus a fourth array for the parallax displacement map. A constant per-layer LAYER_METALLIC rounds out the BRDF inputs.

Material textures#

Source maps live under samples/terrain/textures/<material>/:

Slot Material Albedo Normal Roughness Displacement
0 rocky_terrain_02 (low alt) JPG HDR (gl) HDR PNG
1 rocky_terrain (mid) JPG HDR (gl) HDR PNG
2 rock_face (steep) JPG HDR (gl) HDR PNG
3 snow_02 (peaks) JPG HDR (gl) JPG PNG

Loaded asynchronously by TerrainRenderer.loadLayerTextures() immediately after create(). JPG / PNG go through createImageBitmap + copyExternalImageToTexture (hardware decode); HDR (Radiance RGBE) is fetched and run through the engine's parseHdr, then a CPU-side RGBE → 8-bit unorm decode lands the values in a matching rgba8unorm array slice — so every layer ends up in the same format and the shader can sample uniformly. RENDER_ATTACHMENT is requested on each array texture because Chrome implements copyExternalImageToTexture via an internal render pass.

Until an individual map resolves, the layer falls back to the defaults written into the array at create time: albedo = white, normal = flat (0.5, 0.5, 1.0), roughness = 0.5, displacement = 0.5. So the first frames shade legitimately as the textures stream in rather than rendering black.

Splat weights#

let h_norm = clamp(world_pos.y / max(height_scale * 0.9, 1e-3), 0, 1);
let jitter = (fbm2(world_pos.xz * 0.018) - 0.5) * 0.10;

let grass = smoothstep(0.30 + jitter, 0.0,  h_norm)
          * smoothstep(0.50, 0.18, slope);
let dirt  = smoothstep(0.06, 0.20, h_norm)
          * smoothstep(0.55, 0.30, h_norm + jitter)
          * smoothstep(0.55, 0.25, slope);
let rock  = max(
  smoothstep(0.32 - jitter, 0.55, slope),
  smoothstep(0.40 + jitter, 0.70, h_norm) * 0.7,
);
let snow  = smoothstep(0.62 + jitter, 0.82, h_norm)
          * smoothstep(0.55, 0.30, slope);

The fBm jitter is wide-period (0.018 m⁻¹, so ~55 m period) and modulated into the smoothstep edges, not the result — that means each transition moves spatially rather than fading, which reads as a natural meander instead of a thresholded animation. Weights are normalized so they always sum to 1.

Anti-tile sampling — reducing the tile pattern#

With a 1024² texture repeating every LAYER_TEXTURE_WORLD_REPEAT = 32 m, a naïve textureSample on world_xz / 32 would put the same tile on every patch of every material. The seams are easy to spot at altitude (a regular grid of high-contrast features) and break the natural look that drove buying the PBR scans in the first place.

The fix is a two-step attack on the regular grid, all inside anti_tile() in terrain.wgsl:

  1. Low-frequency UV warp. A smooth_cell_hash (value-noise built from a sin-hash) drives a non-uniform offset every ~1.4 tile-periods. Adjacent patches sample warped-and-shifted versions of the texture, so the regularity of the underlying grid is hidden under a wobble.

  2. Two rotated phases, hash-blended. A second textureSample reads a 90° rotation of the warped UV scaled by an irrational factor (× 0.527) plus an arbitrary offset. The two phases are blended by a third smooth_cell_hash at a different spatial scale than the warp, so the two mechanisms don't reinforce each other into a visible pattern of their own.

fn anti_tile(t: texture_2d_array<f32>, s: sampler, uv: vec2f, layer: i32) -> vec4f {
  let warp = vec2f(
    smooth_cell_hash(uv * 0.7 + vec2f(2.13, 5.41)),
    smooth_cell_hash(uv * 0.7 + vec2f(7.71, 1.97)),
  ) - 0.5;
  let uv_w = uv + warp * 0.55;

  let a = textureSample(t, s, uv_w, layer);
  let uv2 = vec2f(-uv_w.y, uv_w.x) * 0.527 + vec2f(0.327, 0.713);
  let b = textureSample(t, s, uv2, layer);

  let h = smooth_cell_hash(uv * 0.21 + vec2f(11.3, 4.7));
  return mix(a, b, smoothstep(0.2, 0.8, h));
}

Every per-layer sample (albedo, normal, roughness, displacement) goes through this — two textureSamples per map, so per fragment the inner loop fans out to 4 maps × 4 layers × 2 phases = 32 texture samples at full splat occupancy. The cost stays acceptable because the LOD compute keeps patch counts modest and the splat weights commonly zero out 2-3 layers per fragment in hardware (the compiler doesn't elide them, but the cache makes the wasted reads cheap).

Two trade-offs that didn't survive: per-tile detail-noise modulation of albedo was tried first — cheaper, but doesn't actually hide the grid; it just smears it. Triplanar sampling (the canonical way to deal with slanted UVs on cliffs) multiplies the texture-sample count by 3 across all maps; with the layer count and anti-tile-phase count we'd be at ~96 samples/fragment, and the rocky source textures stretch acceptably on slopes anyway.

Parallax mapping#

Each layer's displacement map drives a one-sample parallax offset on the UV fed to the other three maps:

fn layer_uv_parallax(idx, world_pos, n_world, view_world) -> vec2f {
  let uv = layer_uv(world_pos);
  // Build TBN with world-X as the tangent reference (gram-schmidt).
  ...
  let view_ts = vec3f(dot(t, view_world), dot(b, view_world), dot(n_world, view_world));
  let h = anti_tile(disp_tex, samp, uv, idx).r - 0.5;
  let offset = (view_ts.xy / max(view_ts.z, 0.3))
             * h * (PARALLAX_STRENGTH / LAYER_TEXTURE_WORLD_REPEAT);
  return uv - offset;
}

The (view_ts.xy / view_ts.z) term undoes perspective on the offset so grazing angles don't push the UV by tens of texels; the max(.., 0.3) clamps the asymptote (at view_ts.z → 0 the divide would explode). Centering the displacement around 0.5 (rather than 0 → height) means flat regions get no offset and the effect symmetrically lifts ridges / deepens cracks instead of pushing everything one direction. PARALLAX_STRENGTH = 1.0 and the 1/repeat factor scale the offset to a small fraction of one tile.

This is the cheapest end of parallax — no occlusion, no raymarch. Visually convincing for low-relief rocky detail; grazing-angle artifacts are visible but mild. The same parallax-offset UV is used by every layer's albedo, normal, and roughness sample within a fragment, so a steep view doesn't cause the layers to "slide" against each other.

Tangent-space normals#

layer_world_normal builds a TBN per fragment by Gram-Schmidt'ing world-X against the atlas-derived surface normal (Z falls out as the cross product). Because the XZ-UV mapping aligns texture.x with world.x, this gives a consistent tangent basis without needing a stored tangent attribute. The sampled normal map is decoded as map * 2 - 1 (gl convention), then rotated by the TBN and renormalized.

Per-vertex normals were tried first and dropped: along a shared edge a finer patch interpolates the normal over more samples than the coarser side, which produces a visible shading discontinuity at every LOD seam. Per-fragment resampling of the atlas normal sidesteps that entirely — both sides resolve the same world XZ to the same sampled normal.

Direct lighting (multi-light BRDF)#

The blended material is shaded once per direct light by a clamp-protected Cook-Torrance GGX (brdf_direct). Three light terms add into the final direct contribution:

  1. Sunparams.sun_color multiplied by the shadowed BRDF.
  2. Moonparams.moon_color multiplied by the unshadowed BRDF (single cascade only tracks the sun; moon contribution is cheap fill light).
  3. Forward point lights — a loop over up to MAX_POINT_LIGHTS = 8 dynamic lights, each with position + radius + linear color + intensity. fast attenuation (1 − d²/r²)² / (d² + 1) clamped to 0, hard cutoff at radius. The sample uses three by default: a "torch" that follows the camera and two static fires near spawn (heights pulled from terrainHeight() so they sit on the surface).

Numerical care that matters:

  • All dot products clamp(..., 0, 1)dot(unit, unit) can land just past 1 from f32 rounding, and pow(1 - vdh, 5) of a slightly-negative input is undefined per the WGSL spec. That was a NaN source that TAA history feedback spread into expanding black regions early in development.
  • Specular capped at 50 — GGX's D-term diverges as roughness drops and ndh → 1, and a single overflow pixel in the rgba16f HDR target poisons its neighborhood through TAA's bilinear sampler.
  • Final clamp(direct + ambient, 0, 100) as a defensive last line.

Ambient is still a placeholder albedo * sky_color * 0.3. Real IBL is future work — but with the ambient now written to its own G-Buffer attachment, slotting in an irradiance LUT would touch only the FS, not the chain that consumes the ambient downstream.

G-Buffer outputs & AO compositing#

The terrain FS writes three color attachments, not just an HDR:

struct FsOut {
  @location(0) hdr:     vec4f,   // direct + ambient (full lit), rgba16float
  @location(1) normal:  vec4f,   // xyz = shading normal in world space
  @location(2) ambient: vec4f,   // rgb = ambient-only contribution
}

normal is the shading normal after the per-layer normal map perturbation

  • Gram-Schmidt rotation — i.e. the normal an SSR / motion-blur pass would want, not the raw geometric normal. ambient is exactly the albedo · sky_color · 0.30 term, separated out so downstream effects know how much of the HDR is "indirect" light.

The reason for the split is GTAO compositing. AO in proper deferred pipelines attenuates indirect / ambient light only — direct sun shouldn't get darker in a crevice; the BRDF + shadow map already handle that correctly. With ambient as its own buffer the composite (AOApplyPass) just evaluates:

final = direct + ambient · AO
      = (hdr − ambient) + ambient · AO
      = hdr − ambient · (1 − AO)

so neither the FS nor the AO pass needs to know the direct component separately. gtao.addToGraph consumes the normal + depth from the terrain pass and emits a single-channel AO image; aoApply then runs the subtraction in one fullscreen draw. The whole story stays inside two texture reads and one MAD.

The normal and ambient attachments are cleared to zero per frame so non-terrain pixels (sky written by AtmospherePass) drop out of the AO and ambient-only subtraction cleanly.

Sun arc, moon, and time of day#

The Effects panel exposes timeOfDay ∈ [0, 1] mapping to a 24-hour clock (0 = sunrise, 0.25 = noon, 0.5 = sunset, 0.75 = midnight). Each frame updateSunFromTimeOfDay() recomputes:

  • Sun direction along an east → up → west arc, sweeping through full elevation sin(angle) so the sun goes below the horizon for half the cycle.
  • Sun color ramped from black to warm white across the first ~15° of elevation, so dawn / dusk read warm and night turns direct lighting off entirely.
  • Moon direction as the antipode of the sun (azimuth flipped, elevation negated). Comes up exactly when the sun goes down.
  • Moon color a pale cool blue scaled by a MOON_INTENSITY = 0.18 fudge factor — the real-world sun:moon ratio is ~10⁵, but at the exposures the tonemap uses, 0.18 reads as "just enough surface detail at night."
  • Sky color linearly blended from [0.55, 0.72, 0.92] daytime to [0.02, 0.03, 0.08] near-black-indigo nighttime. Used by both the terrain FS ambient term and the cloud-pass ambient input.

The moon is fed through the same params.moon_dir / moon_color slots the terrain shader already reads and treated as a second unshadowed directional light. Setting moon_color = (0, 0, 0) (which the host does during the day) elides its contribution at no shader cost — the BRDF result gets multiplied by zero.

The atmosphere pass (AtmospherePass) takes the same sun direction every frame, so the sky color rebuilds physically as the sun arcs over.

Aerial perspective & height fog#

Two sample-local post-process passes layer atmospheric haze onto the lit terrain:

  • AtmosphericApplyPass — distance-based aerial perspective. Inlines the engine's Rayleigh + Mie + ozone scatter math for a horizontal ray, uses the camera-to-fragment distance as the blend weight along smoothstep(begin, end), and mixes the lit color toward the scattered fog color. The fog color is recomputed each frame from the current sun direction, so distant terrain tints warm at dusk and cool at midday without any extra inputs.

  • HeightFogPass — exponential fog with altitude falloff. Density at fragment Y = base_density · exp(-falloff · max(0, h_avg − fog_base)), so the haze concentrates in valleys and clears with elevation. The fog color is set to a tint of the active sky color every frame so dusk fog reads warm and night fog reads near-black-blue.

Both passes skip sky pixels (depth == 1.0) — AtmospherePass already filled those with the physically-correct horizon color, and re-blending would double-tint them. The two passes layer cleanly because aerial perspective is distance-only and height fog is altitude-only — orthogonal axes that compose by chained mix.

The order in mountain_fox.ts is AO composite → aerial perspective → height fog → clouds. Clouds composite on top of fogged terrain so cloud shadows / silhouettes sit at full saturation against the hazy backdrop, as they would in real atmosphere.

Walker mode & CPU/GPU height sync#

Pressing F swaps the active controller between the engine's CameraController (free-fly) and WalkerController. Walker mode clamps the eye to terrainHeight(x, z) + EYE_HEIGHT, applies gravity + jump on Space, and lifts the eye above any nearby ridge so the camera doesn't poke into the terrain on a steep upslope. Both controllers share yaw/pitch conventions so the look direction is preserved across mode switches.

Grounding the walker means querying the terrain height at the player's XZ every frame, with no GPU readback. That's done by terrain/terrain_noise.ts — a CPU port of the procedural noise in terrain_page_gen.wgsl. The constraint is that the two implementations must be bit-identical: fract(sin(...)) was tried first and broke, because the JS Math.sin and the WGSL sin round differently for non-trivial arguments, putting CPU and GPU heights several meters apart at distant world coords. The current port uses a 32-bit integer PCG-style hash (via Math.imul) that exactly mirrors the WGSL hash21i, then masks the result to 24 bits (range exactly representable in f32) before dividing by 2²⁴. Same hash, same fBm, same ridged-noise, same combine — CPU and GPU agree to within rounding.

The HUD prints the CPU-side height at the camera's XZ next to the walker's feet position; if those numbers drift apart the noise functions have gone out of sync (usually means the shader needs a hard refresh after an edit). The optional Debug Heights toggle drops RGB axis crosses at the 17 footprint points the walker samples, so a mismatch is visible in the viewport as well.

Effects panel#

The top-left widget the HTML page ships is a vanilla-DOM checkbox + slider panel that mutates a single fx state object the frame loop reads. Toggles cover every post-FX pass (TAA / GTAO / Bloom / DoF / Stars / Rain / Motion Blur / Point Lights / Clouds) plus a Debug Heights overlay (only while in walker mode). Sliders expose:

  • Aerial perspective intensity (0 – 1.5).
  • Height fog enable + intensity + base altitude + per-unit density.
  • Cloud coverage / density / altitude (the cloud top auto-tracks altitude + 120 to keep the layer thickness constant).
  • Time of day (formatted as HH:MM on the readout — the slider itself drives the [0, 1] timeOfDay value the sun arc consumes).

Building widgets in plain DOM (no React / no Lit / no engine UI lib) keeps the sample dependency-free at runtime; the panel is just three helper functions in mountain_fox.ts (checkbox, slider, section) plus the markup already in the HTML page shell.

Shadows#

Single-cascade directional shadow map, 4096² depth, with 3×3 PCF and a camera-following light-space ortho frustum.

  • The shadow pass reuses the same patches buffer + indirect args as the main render pass — just a different VP and a depth-only pipeline.
  • terrain.wgsl factors the shared VS work into compute_world_pos(vid, iid) (CDLOD morph + height sample); vs_main projects with camera.view_proj, vs_shadow_main projects with params.light_view_proj. Same geometry, guaranteed identical.
  • The shadow pipeline uses cullMode: 'front' (back-face shadow rendering) so only "behind" geometry writes the shadow map; combined with a constant receiver-side bias of 0.0008, self-shadowing acne on sun-facing slopes stays under control without slope-scaled bias.
  • The FS PCF uses textureSampleCompareLevel rather than textureSampleCompare for the same uniform-control-flow reason the VT lookup does — the out-of- frustum early-out would otherwise diverge.

The shadow camera follows the player. The frustum half-width (SHADOW_DISTANCE = 1024) matches the streamed VT window so the shadow map roughly covers the detailed-terrain area; outside the frustum, fragments treat themselves as lit.

Atmosphere & clouds#

Two stock engine passes drop into the chain directly:

  • AtmospherePass — physically-based atmospheric scattering (Rayleigh + Mie + ozone, optional precomputed transmittance / multiscatter LUTs). Run as the first color writer so it clears the HDR with sky; the terrain pass loads that as its starting HDR and the sky shows through where terrain isn't drawn. The pass needs the inverse camera VP and the sun direction every frame.
  • CloudPass — volumetric raymarched clouds with Henyey-Greenstein phase + extinction. Composited on top of the lit terrain HDR using the same depth buffer the terrain wrote, so clouds occlude correctly behind mountain ridges. Settings expose a base/top altitude, coverage, density, anisotropy, extinction, wind vector and ambient color.

Cloud altitudes have to be picked relative to the terrain — the engine's defaults (cloudBase=5, cloudTop=15) are sized for the meter-scale chunks in grassy_hills. Our terrain peaks around y=130, so the layer sits at cloudBase=140, cloudTop=260 — just above the mountain tops where you'd expect cumulus. The Effects panel's Altitude slider drives cloudBase directly, with cloudTop = altitude + 120 to hold the layer thickness constant; sliding the altitude below ground level collapses the layer into a thin valley-fog if you also dial coverage down.

The cloud noise field drifts each frame via cloudSettings.windOffset += wind * dt, so the clouds visibly move overhead. See Frame timeline for where atmosphere / clouds sit in the full pass chain.

TAA#

Standard reprojection-based TAA with sub-pixel jitter:

  • TAAPass.updateCamera(ctx) applies a Halton-sequence jitter to the camera's projection matrix.
  • The terrain's per-frame uniform upload uses camera.jitteredViewProjectionMatrix() so the rasterized geometry has the jitter the TAA pass expects.
  • The shadow VP doesn't take the jitter — shadow geometry is reprojection- agnostic; jittering it would just push the depth-acne pattern around.

TAA also smooths out the residual specular pinpricks from the mip-averaged atlas normals — even after min(spec, 50), the spatial aliasing of a steep mid-distance normal aligning with the sun creates single-pixel hot spots that TAA's variance clamping averages out across frames.

Render-graph integration#

A couple of the patterns used here aren't obvious if you're new to the sample's render graph and are worth calling out:

Imported external buffers#

The patches storage buffer and the indirect-args buffer are persistent GPU buffers held by TerrainRenderer. The graph wouldn't normally see a producer→consumer edge between the LOD compute (which writes them via its bind group) and the render pass (which reads them via drawIndexedIndirect and its bind group), because both touches happen outside the graph's declared-resources system.

graph.importExternalBuffer(buf, desc) imports a persistent buffer as a virtual resource. The LOD pass declares b.write(..., 'storage-write'), the render pass declares b.read(..., 'storage-read' / 'indirect'), and the graph compiler now sees a real edge — culling no longer drops the LOD pass, and the graph orders the dispatches correctly. WebGPU's auto-sync handles the actual barriers once the relative order is right.

The shadow map gets the same treatment: persistent texture (importExternalTexture), depth-attachment write in the shadow pass, sampled read in the render pass.

Two BGLs for one bind group#

The shadow pipeline reads the same patches / atlas / page-table bindings the main render pipeline does, but cannot bind the shadow map (it's writing it as the depth attachment). The debug-line pipeline doesn't need the shadow map either.

Rather than two separate bind groups, the renderer maintains:

  • renderBgl — 8 entries, including the shadow texture + comparison sampler.
  • geomBgl — 6 entries, no shadow stuff. Used by the shadow + debug pipelines.

Each pipeline takes the BGL it needs, and the renderer holds two bind groups (_renderBindGroup, _geomBindGroup) over the same underlying resources.

Known limitations / future work#

No async page-gen cap. If the camera teleports, all 16 pages can regenerate in one frame (~ms scale — fine for the demo, throttle in production).
No mip chain on the heightmap atlas. Distant pages alias on detail; TAA's temporal averaging hides most of it. The mip-blended height sampling that closed the original cracks is also a no-op now — the CDLOD seam constants alone hold the seam. Bringing back per-page mip chains would let textureSample pick LOD-appropriate detail in the VS and avoid the variance-loss specular pinpricks. The per-layer PBR texture arrays do request mips but never get them generated; the WebGPU generateMips extension would solve that in one call.
No "world overview" fallback page. Beyond the streaming radius the terrain returns h=0. Real games dedicate one atlas slot to a permanently-resident, very-coarse global page so distance has something to draw.
No cascaded shadow mapping. Single cascade only — shadow quality drops at the edges of the streamed region. Moon contribution is unshadowed for the same reason.
Crude ambient. Just albedo · sky_color · 0.3, even though the FS now emits it as its own G-Buffer attachment. Real IBL (irradiance from the engine's atmosphere LUTs, already baked nearby) would land cleanly without touching the AO composite or anything downstream of the terrain pass.
Texture sample fan-out. Anti-tile + 4 splat layers + parallax means up to 32 textureSample calls per fragment at full splat occupancy. The cache hides most of it on desktop, but mobile / integrated GPUs will see it. Likely fix: dual-channel anti-tile (min(a, b) blend instead of hash-mix) which collapses the phase count to 1 sample per map.
Normal map antialiasing. The few stubborn bright pixels at world positions with steep mip-averaged geometric normals are a known PBR issue. Toksvig anti-aliasing — store `
Point lights don't cast shadows. Forward-only, no shadow maps. Acceptable for the campfire-glow scale we use them at; not enough for any light that ought to occlude geometry behind it.
Quadtree LOD cap. MAX_PATCHES = 8192 in the LOD compute. With MAX_DEPTH = 8 the worst-case grid is 65 536, but pruning keeps actual emitted patches modest. If the cap is ever hit you'll see "missing patches" — bump it and the storage-buffer size.

What lives where (full path index)#

Function / class Where
TerrainRenderer.create / update / addToGraph / destroy samples/terrain/terrain_renderer.ts
TerrainRenderer.loadLayerTextures (PBR map loader) terrain_renderer.ts
Per-layer texture array decode (JPG/PNG/HDR → rgba8) terrain_renderer.ts loadIntoLayer
VirtualPageManager (LRU) bottom of terrain_renderer.ts
Per-thread quadtree walk terrain_lod.wgsl cs_main
Page-gen (heights + normals) terrain_page_gen.wgsl cs_main
compute_world_pos (CDLOD morph + VT height sample) terrain.wgsl
Splat weights / parallax / TBN / BRDF / shadow PCF terrain.wgsl FS helpers
anti_tile (reduce tile patterns) terrain.wgsl
Multi-light direct (sun + moon + point lights) terrain.wgsl fs_main
LOD-colored boundary overlay terrain_debug_lines.wgsl + _debugLinePipeline
Camera-following light VP terrain_renderer.ts update()
Sun arc + moon antipode + sky tint mountain_fox.ts updateSunFromTimeOfDay
AO compositor (hdr − ambient·(1 − AO)) terrain/ao_apply_pass.ts
Aerial-perspective post terrain/atmospheric_apply_pass.ts + atmospheric_apply.wgsl
Exp height fog post terrain/height_fog_pass.ts (inline WGSL)
Walker (gravity / jump / ridge eye-lift) terrain/walker_controller.ts
CPU port of the procedural noise terrain/terrain_noise.ts
Effects panel widget mountain_fox.ts buildEffectsPanel
Debug height-sample crosses terrain/debug_points_pass.ts