Chapter 7: Lighting

Lighting is the heart of any renderer. Taos implements a full physically-based shading pipeline supporting directional (sun), point, and spot lights, plus image-based lighting from environment maps.

7.1 Physically-Based Rendering Theory#

Taos uses the Cook-Torrance microfacet BRDF (bidirectional reflectance distribution function), which models a surface as a collection of microscopic facets. Every PBR shading calculation revolves around four vectors at the surface point — the surface normal, the directions to the light and the viewer, and the half-vector between them:

BRDF vectors at a surface point

The BRDF has three terms, each evaluated from dot products of these vectors:

Normal distribution function (GGX/Trowbridge-Reitz) — describes the statistical orientation of microfacets relative to the surface normal:

// ── from the PBR lighting shader ──
fn D_GGX(n dot h: f32, roughness: f32) -> f32 {
  let a = roughness * roughness;
  let a2 = a * a;
  let denom = (n dot h * n dot h * (a2 - 1.0) + 1.0);
  return a2 / (PI * denom * denom);
}

Geometry function (Smith GGX with Schlick-GGX) — describes microfacet self-shadowing:

// ── from the PBR lighting shader ──
fn G_SmithGGX(n dot v: f32, n dot l: f32, roughness: f32) -> f32 {
  let a = roughness * roughness;
  let GGX = G_GGX(n dot v, a) * G_GGX(n dot l, a);
  return GGX;
}

Fresnel function (Schlick approximation) — describes how reflectance varies with viewing angle:

// ── from the PBR lighting shader ──
fn F_Schlick(cosTheta: f32, F0: vec3f) -> vec3f {
  return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
}

The full BRDF for a single light is:

// ── from the PBR lighting shader ──
let NdotL = max(dot(N, L), 0.0);
let NdotV = max(dot(N, V), 0.0);
let H = normalize(L + V);
let NdotH = max(dot(N, H), 0.0);
let HdotV = max(dot(H, V), 0.0);

let D = D_GGX(NdotH, roughness);
let G = G_SmithGGX(NdotV, NdotL, roughness);
let F = F_Schlick(HdotV, F0);

let specular = (D * G * F) / (4.0 * NdotV * NdotL + 0.0001);
let diffuse = (1.0 - F) * (1.0 - metallic) * albedo / PI;
return (diffuse + specular) * radiance * NdotL;

The metallic parameter blends between dielectric behavior (specular highlights on a diffuse base) and metallic behavior (no diffuse, colored specular). F0 is 0.04 for dielectrics and albedo for metals, interpolated by metallic:

// ── from the PBR lighting shader ──
let F0 = mix(vec3f(0.04), albedo, metallic);

7.2 The Directional Light (Sun)#

Taos supports three light types — directional, point, and spot — each with a distinct geometry and falloff model:

Directional, point, and spot lights

The directional light represents the sun — an infinitely distant light source with parallel rays. It is defined by the DirectionalLight interface (src/renderer/directional_light.ts):

// ── from src/renderer/directional_light.ts ──
export interface DirectionalLight {
  direction: Vec3;
  intensity: number;
  color: Vec3;
  castShadows: boolean;
  lightViewProj?: Mat4;
  shadowMap?: GPUTextureView;
}

The direction is a unit vector pointing toward the light (from the surface toward the sun). intensity controls the overall brightness, and color tints the light.

Directional Light in the Lighting Pass#

The deferred lighting pass evaluates the directional light in a fullscreen shader. The light direction and cascade data are uploaded per frame:

// ── from lighting_pass.ts updateLight() ──
updateLight(
  ctx: RenderContext,
  light: DirectionalLight,
  cascadeData: CascadeData[],
  cascadeCount: number,
  shadowTexView: GPUTextureView | null,
  debugCascades: boolean,
  shadowSoftness: number,
): void {
  // Pack direction, intensity, color, cascade data into lightBuffer
  const data = this._lightScratch;
  data.set(light.direction.toArray(), 0);
  data[3] = light.intensity;
  data.set(light.color.toArray(), 4);
  data[7] = cascadeCount;
  // ... cascade view-proj matrices, split depths, texel sizes ...
  ctx.queue.writeBuffer(this.lightBuffer, 0, data.buffer as ArrayBuffer);
}

7.3 Point Lights#

A point light emits light equally in all directions from a position in space. It is defined by the PointLight interface (src/renderer/point_light.ts):

// ── from src/renderer/point_light.ts ──
export interface PointLight {
  position: Vec3;
  range: number;
  color: Vec3;
  intensity: number;
  castShadows?: boolean;
}

The range field limits the light's influence. The attenuation is a windowed inverse-square — pow(saturate(1 − (d/range)⁴), 2) / d² — which follows the physical 1/d² law but ramps smoothly to exactly zero at the range boundary so a light never leaves a hard disc on the floor:

// ── from src/shaders/point_spot_lighting.wgsl ──
fn point_attenuation(dist: f32, radius: f32) -> f32 {
  let r = dist / radius;
  return pow(saturate(1.0 - r * r * r * r), 2.0) / max(dist * dist, 0.0001);
}

Point lights are processed in the PointSpotLightPass, which renders additive lighting (srcFactor: 'one', dstFactor: 'one') over the HDR target produced by the deferred sun pass, for all active point and spot lights:

// ── from point_spot_light_pass.ts updateLights() ──
interface PackedPointLight {
  position: [number, number, number, 0];     // vec4 (w unused)
  color: [number, number, number, number];   // rgb + intensity
  range: number;
}

Up to MAX_POINT_LIGHTS = 32 point and MAX_SPOT_LIGHTS = 32 spot lights can be active per frame (point_spot_shadow_pass.ts:11-12). The deferred pass does not frustum- or tile-cull these lights — its fragment shader loops over every uploaded light at every pixel, rejecting each only by the cheap per-pixel range test if (dist >= range) { continue; }. That is affordable for a few dozen lights; the scalable path for hundreds of lights is the tiled Forward+ culling described in §7.9, which trims each tile's light list on the GPU before shading.

Point and spot light shadows — including how the PointSpotShadowPass bakes them into VSM moment arrays — are covered in Chapter 8 (§8.3–§8.5).

7.4 Spot Lights#

A spot light emits light in a cone from a position in a specific direction. Taos's SpotLight class (src/renderer/spot_light.ts) includes a lazy view-projection matrix computation:

// ── from src/renderer/spot_light.ts ──
export class SpotLight {
  position: Vec3;
  range: number;
  direction: Vec3;
  innerAngle: number;  // Full brightness cone half-angle (degrees)
  color: Vec3;
  outerAngle: number;  // Falloff cone half-angle (degrees)
  intensity: number;
  castShadows?: boolean;

  // Lazily computed from position + direction + outerAngle + range
  get lightViewProj(): Mat4;
  computeLightViewProj(near?: number): Mat4;
  markDirty(): void;
}

The view-projection matrix is computed from the light's parameters:

// ── from src/renderer/spot_light.ts ──
private _compute(near = 0.1): void {
  // Build a lookAt view from the light's position and direction
  const up = Math.abs(this.direction.y) > 0.99
    ? new Vec3(1, 0, 0)    // Avoid gimbal lock when pointing straight up/down
    : new Vec3(0, 1, 0);
  const view = Mat4.lookAt(this.position, this.position.add(this.direction), up);
  // Perspective projection matching the spot's cone angle
  const proj = Mat4.perspective(this.outerAngle * 2 * Math.PI / 180, 1.0, near, this.range);
  this._cachedLvp = proj.multiply(view);
  this._dirty = false;
}

A dirty flag avoids recomputation when the light's parameters haven't changed. Call markDirty() after mutating position, direction, outerAngle, or range.

Spot Light Attenuation#

The GPU evaluates spot light falloff using the inner and outer angles. A smoothstep between cos(outer) and cos(inner) gives a soft transition from the bright inner cone to the dark exterior:

Spot light cone and smoothstep falloff

// ── from the spot light shader ──
// Spot cone attenuation
let cosAngle = dot(normalize(lightDirection), -toLightDir);
let cosInner = cos(spotLight.innerAngle * PI / 180);
let cosOuter = cos(spotLight.outerAngle * PI / 180);
let spotFactor = smoothstep(cosOuter, cosInner, cosAngle);
// Full point-light attenuation * spot cone factor
let attenuation = spotFactor / (distSq + 0.01);

Light Cookies (Projection Textures)#

A spot light can carry an optional projection texture — a "cookie" or gobo — that is projected down the cone, tinting and masking the light the way a stained-glass window, a leaf canopy, or a film projector's gate would. The texture lives on the component (SpotLight.projectionTexture) and is uploaded into a shared rgba8unorm 2D array (PROJ_TEX_SIZE = 256, slot projTexIdx).

The cookie reuses the same light-space projection as the shadow map: the surface point is transformed by lightViewProj, the resulting clip coordinates become a [0,1]² UV, and the cookie is sampled there and multiplied into the light's radiance. Because the same UV drives both, a single in-frustum test guards the cookie and the shadow, and anything outside the cone's projected rectangle contributes nothing:

// ── from src/shaders/point_spot_lighting.wgsl ──
let ls = sl.lightViewProj * vec4<f32>(world_pos, 1.0);
let sc = ls.xyz / ls.w;
let uv = vec2<f32>(sc.x * 0.5 + 0.5, -sc.y * 0.5 + 0.5);
let in_frustum = all(uv >= vec2f(0.0)) && all(uv <= vec2f(1.0)) && sc.z >= 0.0 && sc.z <= 1.0;

var modulator = vec3<f32>(1.0);
if (in_frustum) {
  if (sl.shadowIdx >= 0) {                                       // VSM shadow
    let m = textureSampleLevel(vsm_spot, vsm_sampler, uv, sl.shadowIdx, 0.0).rg;
    modulator *= vec3f(vsm_shadow(m, sc.z));
  }
  if (sl.projTexIdx >= 0) {                                      // RGB cookie
    modulator *= textureSampleLevel(proj_tex, proj_sampler, uv, sl.projTexIdx, 0.0).rgb;
  }
} else {
  modulator = vec3<f32>(0.0);                                    // outside the cone gate
}
let radiance = sl.color * sl.intensity * att * cone * modulator;

Because the cookie is an RGB multiplier it can both shape (alpha-like masking in any channel) and colorize the beam — a single white spot light becomes a rose window by swapping its texture, with no extra draw cost. Spot shadows themselves use the same VSM scheme as point lights (Chapter 8, §8.3), but stored in a 2D array (VSM_SPOT_SIZE = 512) rather than a cube, since a spot only ever looks down one direction.

7.5 Area Lights#

Directional, point, and spot lights are all punctual — they emit from a single point (or, for the sun, a single direction), so their specular highlight is a pinpoint no matter how large the real light fixture is. A fluorescent tube, a softbox, or a glowing sphere produces a soft, shape-aware highlight that stretches and spreads with the source. Taos approximates these with representative-point area lights — sphere, tube (capsule), and rectangle emitters — implemented in src/shaders/modules/area_lights.wgsl and shared by the deferred, forward, and forward+ paths.

Sphere, tube, and rect area lights and the representative-point trick

A light is described by the AreaLight interface (src/renderer/area_light.ts); the same 64-byte struct serves all three shapes, with fields reinterpreted by kind:

// ── from src/renderer/area_light.ts ──
export interface AreaLight {
  kind: 'sphere' | 'tube' | 'rect';
  position: Vec3;   // world-space center
  color: Vec3;
  intensity: number;
  range: number;    // windowed inverse-square cutoff
  radius?: number;  // sphere / tube radius
  axisHalf?: Vec3;  // tube: half the end-to-end vector; rect: half-width edge
  halfUp?: Vec3;    // rect: half-height edge
}
export const MAX_AREA_LIGHTS = 8;

The Representative-Point Approximation#

The key idea is cheap and table-free: integrating the BRDF over the whole surface of the light is expensive, so instead the shader picks one point on the source — the representative point — that best stands in for the highlight, and shades a single punctual sample there. The trick is choosing that point per-lobe:

Specular points the reflection ray R = reflect(−V, N) at the closest point on the source. For a sphere that is the closest point on the sphere to the ray; for a tube, the closest point on the segment to the ray (then treated as a sphere); for a rect, the reflection ray's intersection with the panel, clamped to its bounds. This is what bends the highlight toward the geometry of the source.
Diffuse uses the direction to the source center (sphere/rect) or the closest point on the segment (tube), with the same windowed inverse-square falloff as a point light.

// ── from src/shaders/modules/area_lights.wgsl ──
// Closest point on a sphere (center L, radius `radius`) to the reflection ray R.
fn closest_point_on_sphere(L: vec3f, R: vec3f, radius: f32) -> vec3f {
  let center_to_ray = dot(L, R) * R - L;
  return L + center_to_ray * clamp(radius / max(length(center_to_ray), 1e-4), 0.0, 1.0);
}

Pointing the reflection ray at the representative point effectively widens the specular lobe, which would brighten the highlight unphysically as the source grows. To compensate, the GGX distribution is renormalized by (α / α′)², where α′ is the roughness widened by the source's angular size — so total reflected energy stays roughly constant as the sphere or tube gets bigger:

// ── from src/shaders/modules/area_lights.wgsl ──
fn area_ndf_normalization(roughness: f32, radius: f32, dist: f32) -> f32 {
  let a  = roughness * roughness;
  let aP = clamp(a + radius / (2.0 * max(dist, 1e-4)), 0.0, 1.0);
  let n  = a / max(aP, 1e-4);
  return n * n;
}

The Three Shapes#

A single dispatcher routes each light to its evaluator, so every lighting path shares one loop body:

// ── from src/shaders/modules/area_lights.wgsl ──
fn shade_area_light(al: AreaLight, P: vec3f, N: vec3f, V: vec3f,
                    F0: vec3f, albedo: vec3f, roughness: f32, metallic: f32) -> vec3f {
  if (al.kind < 0.5) {        // sphere
    return area_light_sphere(/* center, radius, ... */);
  } else if (al.kind < 1.5) { // tube — endpoints = position ± axisHalf
    return area_light_tube(/* p0, p1, radius, ... */);
  }
  return area_light_rect(/* center, halfRight, halfUp, ... */);  // one-sided
}

Sphere — a round, soft highlight that grows with radius: a glowing orb, a frosted bulb.
Tube (capsule) — the highlight stretches into a streak aligned with the segment position ± axisHalf: a fluorescent strip, a light-sabre.
Rectangle — a one-sided panel emitting from normalize(cross(halfRight, halfUp)); surfaces behind the panel receive nothing. The representative point is the reflection ray's hit on the panel clamped to its extents — a softbox or a glowing window.

Two deliberate limits keep area lights cheap: they cast no shadows, and they are evaluated with the base Cook-Torrance lobe only — the glTF extension lobes (clearcoat, sheen, iridescence, transmission) from §6.6 are skipped for them. The sphere/tube/rect above are the table-free representative-point approximations. For rectangles (and disks) there is also a reference-quality alternative — LTC, linearly-transformed cosines — described next.

Reference-Quality Rect & Disk: Linearly Transformed Cosines (LTC)#

The representative-point rect is cheap and reads well as a soft panel, but its specular is a single-point approximation: on rough surfaces and at grazing angles the highlight's shape, intensity, and the diffuse form factor drift from ground truth, and a true polygonal "the light's outline reflected in the surface" never appears. Linearly Transformed Cosines (Heitz, Dupuy, Hill & Neubelt, SIGGRAPH 2016) is the reference-quality fix that UE and Frostbite ship for rect lights, and Taos offers it as an opt-in path for rect and disk area lights.

The idea: a GGX specular lobe is approximated by a clamped-cosine distribution warped by a 3×3 matrix M that depends on (roughness, NdotV). A cosine distribution has a closed-form integral over a polygon (a sum of per-edge terms), so you can integrate the BRDF over the actual light polygon by (1) transforming the polygon's corner directions by M⁻¹ into the cosine's space, (2) evaluating the analytic clamped-cosine polygon irradiance there, and (3) scaling by a precomputed magnitude/Fresnel term. Diffuse is the same integral with M = identity (Lambert is a clamped cosine). The result is the correctly-shaped reflection of the quad at any roughness — a crisp rectangle on a mirror, a smoothly blurred one on a rough surface — with a correct diffuse form factor.

The matrix M⁻¹ and the magnitude/Fresnel terms cannot be computed at runtime or hand-authored; they are the output of an offline per-texel fitting and must be shipped as data. Taos vendors the well-known reference tables (Hill/Heitz, MIT — the same data three.js ships) as two 64×64 lookup textures in src/assets/ltc_tables.ts, base64-packed and uploaded as rgba16float (half-float is filterable in core WebGPU, so the linear table sampling needs no optional feature). The shading math lives in the standalone module src/shaders/modules/ltc.wgsl:

// ── from src/shaders/modules/ltc.wgsl ── (rect; disk approximates the ellipse as an octagon)
let uv  = ltc_uv(NdotV, roughness);              // (roughness, sqrt(1−NdotV))
let t1  = textureSampleLevel(ltc1, samp, uv, 0.0);
let t2  = textureSampleLevel(ltc2, samp, uv, 0.0);
let mInv = mat3x3<f32>(vec3(t1.x, 0.0, t1.y),    // sparse inverse transform
                       vec3(0.0,  1.0, 0.0),
                       vec3(t1.z, 0.0, t1.w));
let spec = (F0 * t2.x + (vec3(1.0) - F0) * t2.y) // Hill's LTC Fresnel approximation
         * ltc_integrate(N, V, P, mInv, c0, c1, c2, c3);
let diff = albedo * (1.0 - metallic)
         * ltc_integrate(N, V, P, /*identity*/ I, c0, c1, c2, c3);

It is opt-in per lighting path because it adds two table textures plus a sampler to the lighting bind group — pass areaLightLtc: true to ForwardLitFeature, ForwardPlusFeature, or PointSpotLightFeature (deferred). Enabling it compiles a LTC_AREA_LIGHTS shader variant (the same conditional-binding pattern as the SH-diffuse variant in §7.17) and routes rect/disk lights through ltc_shade_area while sphere/tube stay representative-point. A runtime flag in each pass's lighting uniform then lets you toggle LTC on and off live (via feature.setAreaLightLtc(...)) without rebuilding the pass — handy for A/B comparison, which area_light_test exposes on the L key. The disk shape is a fourth AreaLight.kind: its halfRight/halfUp are the ellipse's half-axes, and the integrator approximates it as an octagon (visually indistinguishable here while reusing the same edge integral); without LTC a disk falls back to its bounding rect.

One unit caveat: LTC integrates the source's solid angle directly, so there is no inverse-square term — intensity is interpreted as the panel's radiance (and range is unused for LTC rect/disk), matching three.js's RectAreaLight. The representative-point rect, by contrast, applies the windowed 1/d² falloff, so the same intensity reads brighter or dimmer between the two models. That is expected: they are different lighting models, not a bug to reconcile. AreaLight.twoSided makes an LTC panel emit from both faces (the representative-point rect is always one-sided).

Physical light units (lux · candela · lumens)#

The four light types above all carry an intensity number. By default it is an artist-chosen "looks right" value, but it can equally be a real-world photometric quantity — because the engine's BRDF is already radiometrically consistent. The diffuse term is Lambert (albedo/π), and the punctual falloff attenuate_inverse_square is a true 1/d² whose reference sits at d = 1 unit (§7.3). So a directional intensity already behaves as illuminance and a point/spot intensity already behaves as luminous intensity — there is no separate "physical mode" and no shader change. Authoring in real units is just a matter of using the right numbers (and a physical exposure to map them, §11.1).

The convention (world distances treated as meters, so candela = lux at 1 m):

Light	`intensity` is…	Author it as
Directional / sun	illuminance, lux	`light.lux = 100000` (direct sun)
Point	luminous intensity, candela	`point.lumens = 800` → `cd = lm/4π`
Spot	luminous intensity, candela	set `outerAngle`, then `spot.lumens` → `cd = lm/(2π(1−cosθ))`
Emissive / area	luminance, cd·m⁻²	`material.emissiveFactor`

Because bulbs are rated in lumens (total flux) but the inverse-square law works in candela (intensity), the lumens accessors on PointLight/SpotLight do the conversion (src/renderer/photometric_units.ts); lux on DirectionalLight is just an alias, since a directional light's intensity already is illuminance. The physical_lighting_test sample authors four bulbs by lumen rating, and because each lights an equal patch of wall from an equal distance, the wall brightness reads the lumen ratio directly. Real values only look right once exposure is physical too — covered in §11.1.

Light Layers (Per-Object Light Exclusion)#

Sometimes a light should illuminate the scene but spare one object. The motivating case is a rectangular area light at a window pouring warm "sunlight" into a room: because the emitter plane sits centimeters from the wooden window frame, the frame's reveals blow out to white under the near-field inverse-square term. Moving or dimming the light fixes the frame at the cost of the effect everywhere else. What you actually want is this light, but not that object.

Taos handles this with light-exclusion bitmasks — an 8-bit "layer" tag carried by both surfaces and lights, with exclusion semantics so the default is always "affects everything":

A surface (a MeshRenderer) carries lightExcludeMask (default 0).
A light carries excludeMask (default 0).
A light skips a surface when the two masks share a bit: (surfaceMask & lightMask) != 0.

Because both default to 0 and 0 & x == 0, every light affects every surface until something opts in — there is no "layer 0 is the default layer" gotcha and nothing to initialize. You only set a bit on the handful of objects and lights that need to diverge, and any geometry path that never writes the mask is automatically unaffected.

The mask is a deferred-path mechanism. The fullscreen lighting pass sees only pixels, not objects, so the surface tag has to be baked into a pixel. Taos stores it in the otherwise-unused alpha channel of the emissive G-Buffer target (rgba16float, so all 256 mask values resolve cleanly) — the geometry pass writes emissive.a = lightExcludeMask / 255, and nothing samples emissive alpha as light:

// ── from src/shaders/openpbr_geometry.wgsl ──
// emissive.a carries the 8-bit light-exclusion mask; .rgb is the emission.
out.emissive = vec4<f32>(emission, model.misc.x / 255.0);

The per-light mask rides the existing light buffers at no extra cost: for area lights it is packed into the spare high bits of the kind slot (code = kind + mask·4, exact in f32), so the 64-byte AreaLight struct is unchanged. The deferred PointSpotLightPass reads the surface mask once per pixel and skips any area light it shares a bit with:

// ── from src/shaders/point_spot_lighting.wgsl ──
let obj_exclude = u32(textureLoad(emissiveTex, coord, 0).a * 255.0 + 0.5);
for (var i = 0u; i < lightCounts.numArea && i < MAX_AREA_LIGHTS; i++) {
  if ((obj_exclude & area_light_exclude_mask(areaLights[i])) != 0u) {
    continue;   // this light excludes this surface
  }
  accum += shade_area_light(areaLights[i], world_pos, N, V, F0, albedo, roughness, metallic);
}

Application code is just two assignments — tag the object and the light with the same bit:

frameRenderer.lightExcludeMask = 1;   // the window frame opts into layer 1
windowLight.excludeMask        = 1;   // the window "sun" skips layer 1

The excluded object still receives every other light, plus IBL and reflection probes — only the tagged light is withheld, so the frame reads as normal lit wood while the rest of the room still catches the window light. The mask is honored today by the deferred area-light loop (the point and spot loops are a one-line extension of the same test); the forward paths ignore it. With eight independent bits a scene can carve out eight overlapping exclusion groups.

7.6 Image-Based Lighting (IBL)#

Image-based lighting uses an HDR environment map to illuminate surfaces with distant light. This provides ambient lighting that matches the sky and surrounding environment.

IBL requires three textures derived from the HDR sky cubemap — one for diffuse, one for specular at varying roughness, and a 2D table that captures the Fresnel integral:

IBL: irradiance map, prefilter mip chain, BRDF LUT

// ── from src/assets/ibl.ts ──
interface IblTextures {
  irradianceMap: GPUTexture;      // Diffuse irradiance (low-frequency)
  prefilterMap: GPUTexture;       // Specular prefilter (mipmapped)
  brdfLut: GPUTexture;            // BRDF integration lookup table (2D)
}

Irradiance map. A heavily blurred version of the sky cubemap (lowest mip). Sampled by the surface normal to give diffuse ambient light:

// ── from the lighting shader ──
let irradiance = textureSample(irradianceMap, sampler, normal).rgb;
let diffuseIBL = irradiance * albedo;

Prefilter map. A mipmapped cubemap where each mip level represents a different roughness. Sampled by the reflection direction and roughness level:

// ── from the lighting shader ──
let roughnessLevel = roughness * MAX_PREFILTER_MIP_LEVEL;
let prefiltered = textureSampleLevel(prefilterMap, sampler, reflection, roughnessLevel);

BRDF LUT. A 2D lookup table encoding the Fresnel-integral term of the split-sum approximation. Sampled by NdotV and roughness:

// ── from the lighting shader ──
let brdf = textureSample(brdfLut, sampler, vec2f(NdotV, roughness)).rg;
let specularIBL = prefiltered * (F0 * brdf.r + brdf.g);

The complete IBL contribution is:

// ── from the lighting shader ──
let ibl = (1.0 - metallic) * diffuseIBL + specularIBL;

Refining the Specular Reflection#

The reflection sampled by the raw R = reflect(−V, N) is an idealization in two ways the deferred shader corrects before the result reaches the screen.

Anisotropic bending. A brushed-metal or hair surface (§6.6, Anisotropy) reflects the environment as a streak, not a point — the same directional highlight the direct lobe produces, but for the whole sky. Sampling the prefilter cube along the plain mirror direction R would give a round reflection that contradicts the stretched direct highlight. So when a pixel is anisotropic, the reflection vector is bent toward the bitangent (Filament's approximation) before the cube is sampled:

// ── from src/shaders/deferred_lighting.wgsl ──
var R_spec = R;
if (aniso_active) {
  R_spec = reflect(-V, anisotropic_bent_normal(N, aniso_B, V, anisotropy));
}
let global_pf = textureSampleLevel(prefilter_cube, ibl_samp, R_spec, roughness * (IBL_MIP_LEVELS - 1.0)).rgb;

The same bent R_spec is reused for the reflection probes (§7.11), so a brushed surface streaks consistently against both the global sky and a local probe. Only the base specular lobe is bent — the clearcoat lobe keeps the un-bent R, because a clearcoat is an isotropic film over the anisotropic base.

Specular horizon occlusion. Normal-mapped and curved surfaces can produce a reflection vector that points below the real geometric surface — physically impossible, but the cubemap will happily return sky for it, leaking light through the object. The fix fades the reflection as R dips beneath the geometric horizon. The G-Buffer stores only the perturbed shading normal, so the shader reconstructs a geometric normal from neighboring depth taps and uses it as the horizon reference:

// ── from src/shaders/deferred_lighting.wgsl ──
// Geometric normal from screen-space depth neighbors (derivatives are illegal
// here — this runs after the sky `discard`, i.e. non-uniform control flow).
let wR = reconstruct_world_pos(/* coord + (1,0) */);
let wU = reconstruct_world_pos(/* coord + (0,1) */);
var Ng = normalize(cross(wU - world_pos, wR - world_pos));
if (dot(Ng, V) < 0.0) { Ng = -Ng; }
let spec_horizon = specular_horizon_occlusion(R_spec, Ng);   // 1 = open, → 0 below horizon

The neighbor taps are skipped at silhouette edges (where a neighbor lands on the far plane), so a depth discontinuity can't fabricate a bad geometric normal and darken the rim. spec_horizon then multiplies into specular_ibl — and, like AO, it occludes only the environment specular, never the direct lobe.

Screen-space reflections. IBL — even with probes — only reflects the baked environment; it cannot show a nearby object reflected in a wet floor or a polished table, because that object was never in the cubemap. For those high-frequency, scene-dependent reflections Taos adds a separate screen-space reflection (SSR) pass that ray-marches the depth buffer and composites the hit color over the IBL specular. SSR is part of the water/reflective-surface pipeline rather than the core lighting pass — see Chapter 17, §17.11 for its ray march, fade, and fallback-to-IBL logic.

7.7 The BRDF#

§7.1 presents the textbook Cook-Torrance form. The shipped implementation lives in src/shaders/modules/pbr_brdf.wgsl — a single module that every lit shader pulls in with #import "pbr_brdf.wgsl" rather than re-deriving the terms. It is a metallic-roughness Cook-Torrance model with two modern upgrades and a few supporting helpers:

Function	Role
`distribution_ggx(NdotH, roughness)`	GGX/Trowbridge-Reitz NDF; alpha = roughness² remap lives here.
`visibility_smith_ggx(NdotV, NdotL, roughness)`	Height-correlated Smith visibility — returns `V = G / (4·NdotV·NdotL)` with the specular denominator folded in.
`fresnel_schlick(cosTheta, F0)`	Schlick Fresnel for direct lighting.
`fresnel_schlick_roughness(cosTheta, F0, roughness)`	Roughness-clamped Fresnel for IBL — kills energy gain at grazing angles on rough surfaces.
`energy_compensation(F0, env_brdf)`	Restores energy lost to un-modeled multiple scattering inside the microsurface (`1 + F0·(1/Ess − 1)`).
`specular_horizon_occlusion(R, Ng)`	Fades IBL specular as the reflection vector dips below the geometric horizon (normal-mapped surfaces).
`attenuate_inverse_square(dist, range)`	Windowed `1/d²` falloff for punctual lights with a hard `range` cutoff.
`ambient_floor(albedo, metallic)`	A tiny non-IBL diffuse floor so surfaces never crush to pure black.

The key convention: because visibility_smith_ggx already folds in the 1/(4·NdotV·NdotL) denominator, callers combine the terms directly as D * Vis * F — and reuse F for the diffuse kD = (1 − F)(1 − metallic) weight. A single direct light therefore evaluates as:

// ── the shipped form (pbr_brdf.wgsl terms, assembled at the call site) ──
let Ecomp = energy_compensation(F0, env_brdf);
let F     = fresnel_schlick(VdotH, F0);
let D     = distribution_ggx(NdotH, roughness);
let Vis   = visibility_smith_ggx(NdotV, NdotL, roughness);
let spec  = D * Vis * F * Ecomp;
let kD    = (vec3f(1.0) - F) * (1.0 - metallic);
let diff  = kD * albedo / PI;
return (diff + spec) * radiance * NdotL;

The module owns PI and forbids importers from redeclaring it — a guard against the duplicate-symbol traps the textual #import system would otherwise allow.

The Extension Lobes#

The base lobe above is only the default surface. Taos's PBR materials add a stack of secondary BRDF lobes from the glTF KHR_materials_* set — clear-coat, sheen, iridescence, anisotropy, and the non-refractive translucency lobes (diffuse transmission and subsurface). Each one is a material feature: what it models physically, and how it is parsed from glTF, encoded, stored, decoded, and evaluated in the shader, is covered end-to-end in Chapter 6, §6.6, which owns the per-lobe code. This section covers only what belongs to lighting — where the lobe math lives and how a single light's radiance feeds it.

How the extension lobes stack on top of the base Cook-Torrance lobe: clearcoat over the top, sheen at grazing angles, iridescence shifting F0, anisotropy stretching the highlight, and transmission/subsurface gathering light from behind.

The direct-light lobes live in src/shaders/modules/pbr_extensions.wgsl (function shade_direct_ext), the thin-film math in iridescence.wgsl, and the anisotropic NDF/visibility in pbr_brdf.wgsl. The forward, forward+, and point/spot paths all #import that module so they match bit-for-bit; the deferred pass keeps a synced inline copy interleaved with its IBL code. Each lobe is gated on its parameter — reading per-material constants from the MaterialParams SSBO (indexed by the G-Buffer materialId) and per-pixel multipliers/direction from the params2 / params3 attachments (§6.1) — so a default material pays for none of them.

Composing the Lobes#

shade_direct_ext evaluates the full lobe stack (§6.6) for one light and returns its outgoing radiance. The order matters: the base lobe (anisotropic when applicable) and clear-coat are computed first into diff + spec and scaled by radiance · NdotL; sheen, diffuse transmission, and subsurface are then added on top.

// ── from src/shaders/modules/pbr_extensions.wgsl ──
fn shade_direct_ext(
  N: vec3f, V: vec3f, L: vec3f, NdotV: f32, roughness: f32, metallic: f32,
  F0: vec3f, albedo: vec3f, env_brdf: vec2f,
  radiance: vec3f, radiance_trans: vec3f, mp: MaterialParams,
  aniso_dir: vec3f, anisotropy: f32,
) -> vec3f { /* base + clearcoat + sheen + transmission + subsurface */ }

The two radiance inputs are the subtle part. radiance folds in light color · intensity · attenuation · shadow for the reflective lobes. radiance_trans is the same quantity for the transmission lobes, but callers pass it un-shadowed for the directional sun — a front-face shadow must not extinguish back-lit translucency, or a leaf would go dark exactly when the sun is behind it. For punctual lights radiance_trans equals radiance, because their visibility is geometric (cone, cookie, range) and does gate the transmitted light. This split is what makes a backlit leaf glow correctly under the sun while still respecting a spotlight's cone.

7.8 The Deferred Lighting Pass#

The DeferredLightingPass (src/renderer/render_graph/passes/deferred_lighting_pass.ts) is the core of the deferred renderer. It renders a fullscreen triangle that samples the G-Buffer and all shadow/lighting inputs, producing an HDR result and exposing persistent camera + light uniform buffers as handles that downstream passes (composite, godrays) can sample:

// ── from src/renderer/render_graph/passes/deferred_lighting_pass.ts ──
export class DeferredLightingPass extends Pass<DeferredLightingDeps, DeferredLightingOutputs> {
  readonly name = 'DeferredLightingPass';
  // Pipelines, BGLs, and samplers — created once in static create(ctx).
}

export interface DeferredLightingOutputs {
  hdr: ResourceHandle;          // direct lighting composited over input HDR
  cameraBuffer: ResourceHandle; // persistent uniform buffer handle
  lightBuffer: ResourceHandle;  // persistent uniform buffer handle
}

The fullscreen triangle approach avoids a vertex buffer — three vertices cover the entire clip space:

// ── from src/shaders/deferred_lighting.wgsl ──
@vertex
fn vs_main(@builtin(vertex_index) vi: u32) -> @builtin(position) vec4f {
  // Fullscreen triangle: covers NDC without a vertex buffer
  let uv = vec2f(f32((vi << 1) & 2), f32(vi & 2));
  return vec4f(uv * 2.0 - 1.0, 0.0, 1.0);
}

Each fragment samples the G-Buffer, reconstructs the world position from depth, evaluates the directional light with cascade shadow maps, adds AO/SSGI, adds IBL, and writes the HDR result:

// ── from the deferred lighting shader ──
// Lighting pass fragment shader (conceptual flow):
// 1. Sample G-Buffer: albedo/roughness, normal/metallic, emissive, depth,
//    plus the material params (specular, occlusion, materialId) and the
//    extension channels (params2 multipliers, params3 anisotropy tangent)
// 2. Reconstruct world position from depth + inverse view-proj
// 3. Look up extended params in the MaterialParams SSBO via materialId
// 4. Evaluate directional light (sun) with shadow cascade — base lobe plus
//    any active extension lobes (clear-coat, sheen, iridescence, anisotropy,
//    diffuse transmission, subsurface; see §6.6)
// 5. Add ambient occlusion (from SSAO texture) and baked occlusion (params.b)
// 6. Add indirect light (from SSGI texture)
// 7. Add IBL diffuse + specular
// 8. Add emissive; write HDR color

Refractive transmission is not handled here — see-through glass needs the lit scene as input, so it renders in a separate forward TransmissionPass after this pass completes (Chapter 6, §6.7).

7.9 The Forward Lighting Path#

Transparent objects cannot use deferred shading (the G-Buffer stores only one surface per pixel). Taos's ForwardPass evaluates the same PBR lighting model but in a forward rendering path.

The forward pass handles:

Directional light with cascade shadow sampling (same as deferred).
Point lights sampled in a loop over the active point light list.
Spot lights with spotlight cone attenuation and shadow maps.
IBL from the same irradiance / prefilter / BRDF textures.

The forward pass uses the same camera and light uniform buffers as the deferred passes, ensuring consistent lighting between opaque and transparent objects. Refractive transmission is a further specialization of the forward path — TransmissionPass (§6.7) — which samples the lit scene to bend it through glass rather than blending over it.

Forward+ (Tiled Light Culling)#

The plain forward pass loops over every point light at every fragment — fine for a handful of lights, but it scales as pixels × lights and collapses under a scene lit by hundreds of small lights (torches, embers, windows). The deferred path (§7.8) has the same problem: its PointSpotLightPass also tests all lights per pixel (§7.3). Forward+ (tiled forward) fixes this by culling the light list per screen tile on the GPU before any shading happens, so each fragment only iterates the few lights that actually reach its tile.

ForwardPlusPass (forward_plus_pass.ts) runs three graph passes per frame:

Depth pre-pass (render) — rasterizes geometry depth only, giving the cull step tight per-tile depth bounds. (When a deferred preset hands its G-Buffer depth in as externalDepth, this pre-pass is skipped entirely.)
Light culling (compute) — one workgroup per 16×16 tile builds the tile's view-space frustum and tests every point light's bounding sphere against it, writing a per-tile index list.
Shading (render) — re-renders the geometry with full PBR; each fragment reads only its tile's light list.

Forward+ tiled light culling: depth bounds → tile frustum → sphere test → per-tile list

The culling shader (light_culling.wgsl) is the heart of it. Each tile's 256 threads cooperatively reduce the tile's min/max depth (skipping background pixels so the depth slab hugs real geometry), unproject the tile's four screen-space corners into view-space rays to form four side planes through the eye, and clamp the slab to the reduced depth range:

// ── from src/shaders/light_culling.wgsl ──
// Side planes pass through the eye; inward-pointing normals from corner rays.
planes[0] = normalize(cross(c0, c1));
planes[1] = normalize(cross(c1, c2));
planes[2] = normalize(cross(c2, c3));
planes[3] = normalize(cross(c3, c0));
// Depth slab from the tile's actual geometry (view-space z is negative down -Z).
let zNear = viewFromNdc(vec3f(0.0, 0.0, minD)).z;   // closest geometry
let zFar  = viewFromNdc(vec3f(0.0, 0.0, maxD)).z;   // farthest geometry

Threads then stride across the light list, accepting a light when its bounding sphere (center lp, radius = range) is inside all four planes and overlaps the depth slab — a sphere-vs-frustum test that is just four dot products plus a slab check:

// ── from src/shaders/light_culling.wgsl ──
var inside = true;
for (var p = 0u; p < 4u; p = p + 1u) {
  if (dot(planes[p], lp) < -r) { inside = false; }     // outside a side plane
}
if (inside && hasGeometry) {
  if (lp.z - r > zNear || lp.z + r < zFar) { inside = false; }  // outside the slab
}
if (inside) {
  let slot = atomicAdd(&wgCount, 1u);                   // append to the tile list
  if (slot < MAX_LIGHTS_PER_TILE) { wgIndices[slot] = i; }
}

The surviving indices are written as numTiles blocks of MAX_LIGHTS_PER_TILE + 1 u32 — slot 0 is the count, the rest are light indices. The shading pass then loops only over tileLights[tile], so a fragment in an unlit tile does zero light work no matter how many lights exist globally.

A few constants bound the cost (forward_plus_pass.ts:24-28):

Constant	Value	Role
`TILE_SIZE`	16	Tile edge in pixels; one compute workgroup (256 threads) per tile
`MAX_LIGHTS`	256	Total culled point-light pool the storage buffer is sized for
`MAX_LIGHTS_PER_TILE`	64	Per-tile list capacity — a fragment touches at most this many lights

Only the sun (with cascaded shadows, §8) and IBL ambient are evaluated globally; just the point lights are tile-culled. This is what lets "many hundreds of point lights stay cheap" — the per-tile cap, not the global count, governs the worst-case fragment cost. Skinned transparents take a parallel route through the SkinnedForward / SkinnedForwardPlus material pass types (Chapter 6), the latter sharing this same culling machinery.

7.10 GPU-Based IBL Pre-Computation#

The three IBL textures — irradiance map, GGX prefiltered environment map, and BRDF LUT — could be pre-computed offline and shipped as assets, but Taos computes them at runtime on the GPU. This allows the IBL to adapt to the current sky (procedural or HDR) without managing additional texture assets per environment.

IBL baking pipeline: equirectangular sky → compute shaders → irradiance cube + prefiltered cube + BRDF LUT

BRDF LUT (CPU)#

The split-sum BRDF lookup table is view-independent (depends only on NdotV and roughness), so it is computed once on the CPU and cached per device. For each texel (NdotV, roughness), the function importance-samples the GGX distribution using a Hammersley low-discrepancy sequence and integrates the Smith G₂ visibility term weighted by the Fresnel coefficient:

// ── from src/assets/ibl.ts ──
function computeBrdfLutData(outW: number, outH: number, samples: number): Float32Array {
  for (let py = 0; py < outH; py++) {
    for (let px = 0; px < outW; px++) {
      const NdotV = (px + 0.5) / outW;
      const roughness = (py + 0.5) / outH;
      // Importance-sample GGX, accumulate scale (A) and bias (B)
      A += G_vis * (1 - Fc);
      B += G_vis * Fc;
    }
  }
}

The result is a 64×64 rgba16float texture (A in R, B in G), built from 512 importance samples per texel. Because it depends only on the BRDF model and not on the environment, it is computed exactly once per GPUDevice. The cache is a WeakMap<GPUDevice, GPUTexture>: multiple renderers sharing a device share the one LUT, it is never explicitly destroyed (the IblTextures.destroy() releases the irradiance and prefilter cubes but deliberately leaves the LUT), and when the device is garbage-collected the WeakMap entry goes with it. Across the many IBL rebuilds a day/night cycle triggers (§10.7), this LUT is touched exactly zero extra times.

Irradiance Map (GPU Compute)#

The diffuse irradiance map is a heavily blurred version of the HDR sky that stores the cosine-weighted hemisphere integral at every direction. The cs_irradiance compute shader (src/shaders/ibl_baking.wgsl) dispatches once per cube face (6 dispatches), each thread computing one output texel:

// ── from src/shaders/ibl_baking.wgsl ──
@compute @workgroup_size(8, 8, 1)
fn cs_irradiance(@builtin(global_invocation_id) id: vec3u) {
  let uv = (vec2f(id.xy) + 0.5) / f32(IRR_SIZE);
  let dir = cube_face_dir(u32(params.face), uv * 2.0 - 1.0);
  var irradiance = vec3f(0.0);
  for (var i = 0u; i < SAMPLES; i++) {
    let xi = hammersley(i, SAMPLES);
    let local_dir = cosine_sample_hemisphere(xi);
    let world_dir = tangent_frame(dir) * local_dir;
    irradiance += textureSampleLevel(sky_tex, sky_samp, equirect_uv(world_dir), 0).rgb;
  }
  textureStore(out_tex, id.xy, vec4f(irradiance / f32(SAMPLES) * params.exposure, 1.0));
}

Each output direction dir is the center of a cube face texel transformed to a unit vector. A tangent frame is built around that vector and 256 cosine-weighted hemisphere samples are taken from the equirectangular sky texture. The result is a 32×32 rgba16float cube map — low resolution since irradiance is very low-frequency.

GGX Prefiltered Environment Map (GPU Compute)#

The specular prefiltered cube follows the same pattern but uses importance sampling of the GGX distribution. Each mip level corresponds to a different roughness value — [0.04, 0.25, 0.5, 0.75, 1.0] — allowing the lighting shader to sample a mip level matching the surface roughness:

// ── from src/shaders/ibl_baking.wgsl ──
@compute @workgroup_size(8, 8, 1)
fn cs_prefilter(@builtin(global_invocation_id) id: vec3u) {
  let uv = (vec2f(id.xy) + 0.5) / f32(mipSize);
  let dir = cube_face_dir(u32(params.face), uv * 2.0 - 1.0);
  var color = vec3f(0.0); var weight = 0.0;
  for (var i = 0u; i < SAMPLES; i++) {
    let xi = hammersley(i, SAMPLES);
    let h  = ggx_importance_sample(xi, params.roughness);
    let l  = reflect(-dir, h);
    let ndotl = max(dot(dir, l), 0.0);
    if (ndotl > 0.0) {
      color += textureSampleLevel(sky_tex, sky_samp, equirect_uv(l), 0).rgb * ndotl;
      weight += ndotl;
    }
  }
  textureStore(out_tex, id.xy, vec4f(color / weight * params.exposure, 1.0));
}

The dispatch is 6 faces × 5 roughness levels = 30 workgroups, each sampling 256 GGX-importance-weighted directions per texel. The base mip is 128×128 per face, halving at each roughness level down to 8×8 at roughness 1.0.

Note that mip 0 uses roughness 0.04, not 0. At α² = 0 the GGX PDF collapses to a Dirac delta and D = α²/(π·denom²) becomes 0/0; the filtered-importance-sampling step (below) would then select a 1×1 source mip and the entire cube face would collapse to a single color. 0.04 is the engine's minimum useful roughness — at this prefilter resolution a true perfect mirror is visually indistinguishable from it, and the genuinely sharp end of reflection is anyway better served by screen-space reflections (§7.6) than by a 128×128 cube.

Firefly Suppression#

A 256-sample Monte Carlo integral is vulnerable to fireflies: bright sparkle artifacts that appear when one of the few sample directions happens to land on a very high-luminance HDR texel — the sun disc, a light fixture, a specular highlight reflected in the sky. The unlucky pixel inherits that texel's full brightness; its neighbor, sampling a direction one texel over, gets a dim sky reading. The result is per-texel sparkle in the prefiltered cube — most visible on smooth metals, where the lighting shader samples the low-roughness mips that are most likely to contain the artifact.

The baking shader (src/shaders/ibl_baking.wgsl) applies two complementary filters to suppress them. Both run inside each sample loop, on top of the naïve estimators shown above.

1. Filtered importance sampling (FIS). Rather than always sampling source mip 0, each sample picks a mip level whose texel footprint roughly matches the sample's solid angle. The PDF of the sample (cosθ/π for the cosine hemisphere, D/4 for GGX) gives a per-sample solid angle Ωs = 1/(N·pdf); comparing that against the source's per-texel solid angle Ωp = 4π/(W·H) yields the right mip. Each +1 of mip averages 4× more source pixels, trading negligible blur on smooth surfaces for far less variance in the integral:

// ── from src/shaders/ibl_baking.wgsl (cs_prefilter) ──
let omegaP = 4.0 * PI / (srcDim.x * srcDim.y);   // per source-texel solid angle
let pdf    = max(D * 0.25, 1e-6);                // GGX sampling pdf: D/4 when V=N
let omegaS = 1.0 / (f32(SAMPLES) * pdf);         // per-sample solid angle
let mip    = clamp(0.5 * log2(omegaS / omegaP) + FIS_MIP_BIAS, 0.0, mipMax);

let sample = textureSampleLevel(sky_tex, sky_samp, equirect_uv(L), mip).rgb;

FIS_MIP_BIAS = 2.0 adds two extra mip levels on top of the analytic choice — the belt to FIS's braces — so even a narrow GGX lobe at low roughness still averages a small patch of source pixels rather than a single texel. The irradiance shader does the same calculation with pdf = cosθ/π.

2. Per-sample luminance clamp. Even with FIS, a single very bright HDR texel can still dominate the average when it survives the mip pre-filter. The shader caps each sample's luminance to FIREFLY_LUM_CLAMP = 10.0 while preserving its chroma:

// ── from src/shaders/ibl_baking.wgsl ──
fn clamp_firefly(rgb: vec3<f32>, maxLum: f32) -> vec3<f32> {
  let L = luminance(rgb);
  return select(rgb, rgb * (maxLum / max(L, 1e-6)), L > maxLum);
}

This is biased — it darkens the brightest lights — but the bias is far less perceptually objectionable than the sparkle it removes. A similar threshold around 10 is common; higher values preserve punchy suns at the cost of reintroducing sparkles on high-contrast HDRs.

The two filters target different failure modes. FIS reduces variance by averaging more source area per sample; the luminance clamp caps the magnitude of any single contribution that survives FIS. Together they keep the irradiance and prefilter cubes clean at 256 samples — without them the same dispatch would need an order of magnitude more samples to look acceptable on smooth metals.

These compute dispatches run whenever the sky changes. src/assets/ibl.ts exposes two drivers for them. computeIblGpu() is the one-shot path — it allocates fresh textures, submits the work, and awaits onSubmittedWorkDone() before returning the ready-to-use textures; it suits a sky baked once at load. Taos's game instead uses IblBaker, which allocates its cubes and bind groups once and then re-records the convolution into a caller-supplied command encoder on demand — with no per-bake allocation and no CPU/GPU sync — so the IBL can be rebaked continuously as the procedural sky tracks the time of day (see §10.7).

The diffuse half of this output — the irradiance cube — is so low-frequency that it can be stored far more compactly as nine spherical-harmonic coefficients instead of a 32×32×6 texture; §7.17 covers that representation and how it slots into the same bake and shading code.

7.11 Reflection Probes#

Global IBL (§7.6 / §7.10) lights everything in the world with the same environment — fine for outdoor scenes where the sky is the only meaningful indirect light, but unconvincing indoors or inside any enclosed volume. A mirror inside a stone room shouldn't reflect the open sky; the inside of a portal arena shouldn't pick up the same horizon as the open plain it sits in. Reflection probes solve this by capturing the environment from a chosen point in the world into a cubemap, convolving that cube into a localized IBL pair, and blending it over the global IBL within a bounded influence volume.

The implementation uses a per-probe oriented bounding box that doubles as both the capture-culling volume and the spatial weighting volume, and reflections are parallax-corrected by intersecting against that box before sampling the cube.

Reflection probe pipeline: probe → 6 cube faces → irradiance + prefiltered cube-arrays → sampled in lighting

Component and Manager#

The user-facing surface is the ReflectionProbe component (src/engine/components/reflection_probe.ts). It is added to a GameObject whose world transform locates the probe and orients its box:

// ── from src/engine/components/reflection_probe.ts ──
export class ReflectionProbe extends Component {
  captureContent: ReflectionProbeContent = 'sky-and-clouds';  // 'sky' | 'sky-and-clouds' | 'scene'
  updateMode: ReflectionProbeUpdateMode = 'once';             // 'once' | 'every-frame' | 'on-demand'
  extentHalfSize: Vec3 = new Vec3(8, 4, 8);                   // local-space AABB half-extents
  boxProjection = true;                                       // parallax-correct reflections
  influenceFalloff = 0.25;                                    // linear fade band (fraction of half-extent)
  requestUpdate(): void { this.needsBake = true; }            // force re-bake on next frame
}

The three capture modes trade fidelity for cost. 'sky' evaluates the procedural sky shader per direction — by far the cheapest, since no geometry is involved. 'sky-and-clouds' adds a stylized 2D cloud layer projected onto the upper hemisphere. 'scene' adds a forward render of every MeshRenderer whose world AABB intersects the probe's capture extent — the most expensive option, but the only one that captures nearby walls, props, or characters.

Update modes match those tradeoffs. 'once' bakes on the first frame the probe is rendered and never again — the right choice for static scenes. 'every-frame' bakes every frame, paying the cost for moving probes or animated geometry. 'on-demand' bakes only after requestUpdate() returns, useful when the lighting changes occasionally (a door opens, a fixture is destroyed) but not every frame.

The ReflectionProbeFeature (src/renderer/features/reflection_probe_feature.ts) discovers probes in the scene each frame, assigns each one a slot in a fixed-size cube-array (MAX_PROBES = 8), determines which slots need re-baking, and uploads the per-probe uniform data the lighting shader consumes:

// ── from src/renderer/features/reflection_probe_feature.ts ──
// Three persistent cube-array textures shared across all probes
this.captureArray     = ctx.device.createTexture({ format: CAPTURE_FORMAT, size: [256, 256, 6 * MAX_PROBES], /* ... */ });  // raw env capture
this.irradianceArray  = ctx.device.createTexture({ format: 'rgba16float',  size: [ 32,  32, 6 * MAX_PROBES], /* ... */ });  // diffuse
this.prefilteredArray = ctx.device.createTexture({ format: 'rgba16float',  size: [128, 128, 6 * MAX_PROBES],
                                                   mipLevelCount: IBL_LEVELS, /* ... */ });                                  // specular (5 mips)

Sharing one cube-array per texture type — rather than allocating individual cubes per probe — avoids per-bake texture churn and lets the lighting shader index into a single binding by slotIndex rather than juggling N bind groups.

Capture Pass#

For each probe that needs baking, the ReflectionProbeCapturePass (src/renderer/render_graph/passes/reflection_probe_capture_pass.ts) renders six 256×256 faces around the probe's world position. The cube-face view-projection matrices are built by cubeCaptureViewProjs(position) — six lookAt matrices oriented along ±X, ±Y, ±Z, each with a 90° perspective.

Two pipelines render into every face. The sky pipeline runs first as a fullscreen triangle; its fragment shader reconstructs a view ray from NDC via the inverse view-projection and evaluates the procedural sky — a three-stop gradient (ground → horizon → zenith) with a soft sun disc and, in 'sky-and-clouds' mode, a sine-based fbm cloud layer. The sky pass writes depth as well, effectively clearing the face. The mesh pipeline runs second when captureContent === 'scene': a standard forward shader with simple Lambert + hemispheric ambient draws every mesh whose world AABB survives a per-probe cull, with normal depth testing.

The procedural sky used here is intentionally cheaper than the full atmosphere shader (§10.2) — IBL convolution is so bandlimited that the simplified gradient is indistinguishable from the real sky after blurring, and the savings let probes rebake every frame on modest GPUs.

IBL Convolution#

The capture cube on its own is not useful for shading: the lighting shader needs an irradiance map (diffuse, cos-weighted hemisphere integral) and a prefiltered map (GGX specular at varying roughness). The ReflectionProbeIblPass (src/renderer/render_graph/passes/reflection_probe_ibl_pass.ts) runs two compute shaders (src/shaders/reflection_probe_ibl.wgsl) immediately after capture.

The math is identical to the global IBL bake (§7.10) — 256 cosine-weighted samples for irradiance, 256 GGX-importance samples per roughness level for the prefilter mip chain — but reads from a cube-array slice rather than a single cubemap. The output is written to the matching slot in the persistent irradiance and prefiltered arrays. Firefly suppression (§7.10) is not run here: the capture cube has no mips for filtered importance sampling to step through, and the procedural sky avoids the extreme HDR luminance peaks that cause fireflies in the first place.

Box Projection#

Sampling a cubemap with the raw reflection direction R = reflect(-V, N) assumes the environment is infinitely far away — fine for the sky, wrong for an enclosed room. The mirror should reflect the actual wall, not what is straight along R at infinity. Box-projected reflections fix this by intersecting the reflection ray with the probe's bounding box, then sampling the cube along the direction from the probe center to the hit point:

With and without box projection: the reflection direction is bent to "stick" the cube sample to the box geometry

The shader does the intersection in the probe's local space, so the box is always axis-aligned regardless of how the GameObject is rotated:

// ── from src/shaders/deferred_lighting.wgsl ──
fn box_project_direction(p: ProbeData, worldPos: vec3<f32>, worldR: vec3<f32>) -> vec3<f32> {
  if ((p.flags & 1u) == 0u) { return worldR; }            // disabled — return original ray
  let localPos = (p.worldToLocal * vec4<f32>(worldPos, 1.0)).xyz;
  let localR   = (p.worldToLocal * vec4<f32>(worldR,  0.0)).xyz;
  let invR = vec3<f32>(
    select(1.0 / localR.x, 1e20, abs(localR.x) < 1e-5),   // guard division by zero
    select(1.0 / localR.y, 1e20, abs(localR.y) < 1e-5),
    select(1.0 / localR.z, 1e20, abs(localR.z) < 1e-5),
  );
  let tMax = ( p.halfExtentLS - localPos) * invR;          // t to +halfExtent planes
  let tMin = (-p.halfExtentLS - localPos) * invR;          // t to -halfExtent planes
  let tPos = max(tMax, tMin);                              // forward t per axis
  let t    = max(min(min(tPos.x, tPos.y), tPos.z), 0.0);   // nearest forward hit
  let hitWorld = worldPos + worldR * t;
  return normalize(hitWorld - p.positionWS);               // re-aim from probe center
}

The closed-form slab test takes the nearest positive t across all three axis pairs — that is the first face the reflected ray exits through. Re-aiming from the probe center to the hit point gives a direction the cubemap was actually captured along, so the resulting sample lines up with the room geometry. Toggling boxProjection to false skips the intersection and falls back to the as-if-infinite assumption; the raw capture is unmodified either way, so the cost is paid only at sampling time.

Influence Falloff#

A probe's effect is bounded by its AABB. Points well inside the box should be lit entirely by the probe; points outside should fall back to the global IBL; points in a thin band near the surface should blend smoothly between the two to avoid a visible seam where the probe's domain ends:

// ── from src/shaders/deferred_lighting.wgsl ──
fn probe_weight(p: ProbeData, worldPos: vec3<f32>) -> f32 {
  let local = (p.worldToLocal * vec4<f32>(worldPos, 1.0)).xyz;
  let d = abs(local) - p.halfExtentLS;
  if (d.x > 0.0 || d.y > 0.0 || d.z > 0.0) { return 0.0; }    // outside the box
  let interior = -max(max(d.x, d.y), d.z);                    // distance to nearest face
  let minHalf  = min(min(p.halfExtentLS.x, p.halfExtentLS.y), p.halfExtentLS.z);
  let fade     = max(p.influenceFalloff, 0.0) * minHalf;      // fade band width
  if (fade <= 0.0) { return 1.0; }
  return clamp(interior / fade, 0.0, 1.0);                    // linear blend
}

influenceFalloff (default 0.25) is expressed as a fraction of the smallest half-extent — keying off the smallest axis prevents a very thin box from having a fade band wider than the box itself. Set the falloff to zero for hard-edged probes (matching cubes in a level-of-detail grid, say); set it higher when overlapping probes need to cross-fade smoothly.

Multi-Probe Blending#

The deferred lighting pass loops over every active probe and accumulates a weighted average:

// ── from src/shaders/deferred_lighting.wgsl ──
var probe_irr_sum = vec3<f32>(0.0);
var probe_pf_sum  = vec3<f32>(0.0);
var probe_w_sum   = 0.0;
let pf_mip = roughness * (IBL_MIP_LEVELS - 1.0);
for (var pi = 0u; pi < probes.count && pi < MAX_PROBES; pi = pi + 1u) {
  let p  = probes.probes[pi];
  let w  = probe_weight(p, world_pos);
  if (w <= 0.0) { continue; }
  let sampleR = box_project_direction(p, world_pos, R);
  let irr_s = textureSampleLevel(probe_irr_cube, ibl_samp, N,       i32(pi), 0.0).rgb;
  let pf_s  = textureSampleLevel(probe_pf_cube,  ibl_samp, sampleR, i32(pi), pf_mip).rgb;
  probe_irr_sum = probe_irr_sum + irr_s * w;
  probe_pf_sum  = probe_pf_sum  + pf_s  * w;
  probe_w_sum   = probe_w_sum   + w;
}

Irradiance is sampled along the surface normal N (diffuse — direction-independent of view), while the prefiltered cube is sampled along the box-projected reflection direction with a mip level chosen by roughness — exactly the split-sum sampling pattern as the global IBL, just per-probe. After the loop the normalized probe contribution is blended over the global IBL with t = clamp(probe_w_sum, 0..1): full probe lighting where coverage saturates, full global lighting outside any probe, smooth crossfade in between.

Because the cube-array binding is fixed size and probes.count controls the loop length, disabling all reflection probes costs nothing in the shader — the loop simply runs zero iterations and the global IBL path wins.

Pipeline Position#

The two probe passes are inserted into the render graph before DeferredLightingPass, so the freshly-baked IBL arrays are visible by the time the lighting shader binds them:

// ── from src/renderer/features/reflection_probe_feature.ts ──
// 1. Capture probes that need baking → write into captureArray
ReflectionProbeCapturePass.addToGraph(graph, { captureArray, requests, /* ... */ });
// 2. Convolve captureArray → irradianceArray + prefilteredArray (compute)
ReflectionProbeIblPass.addToGraph(graph, { captureArray, irradianceArray, prefilteredArray, slots });
// 3. (later) DeferredLightingPass samples both arrays

All three cube-array textures and the probe UBO are persistent across frames — graph.importPersistentTexture() ensures the same physical resource is reused so 'once'-mode probes survive between frames without rebaking. When no probe needs a bake this frame, the capture and IBL passes still run but loop over an empty request list, which the render graph then culls.

Usage Example#

samples/reflection_probe_test.ts is the reference setup — a single scene-mode probe centered on a room, rebaked every frame so interactive geometry changes show up in the reflections:

// ── from samples/reflection_probe_test.ts ──
const probeGO = new GameObject({ name: 'ReflectionProbe' });
probeGO.setPosition(0, 1.4, 0);
const probe = probeGO.addComponent(new ReflectionProbe());
probe.extentHalfSize = new Vec3(8, 4, 6);
probe.captureContent = 'scene';        // capture nearby meshes
probe.updateMode = 'every-frame';      // rebake every frame
probe.boxProjection = true;            // parallax-correct
probe.influenceFalloff = 0.2;          // 20% fade band near the box surface
engine.scene.add(probeGO);

The sample's controls (1/2/3 cycle capture content, M cycles update mode, R forces re-bake, B toggles parallax) make the visual effect of each parameter directly observable — the easiest way to build intuition before placing probes in a real level.

Reflection Probe Settings#

Field	Default	Description
`captureContent`	`'sky-and-clouds'`	What to render into the cube: sky only, sky+clouds, or sky+scene meshes
`updateMode`	`'once'`	When to rebake: first frame, every frame, or only after `requestUpdate()`
`extentHalfSize`	`(8, 4, 8)`	Local-space AABB half-extents — culls scene-mode meshes and defines influence volume
`boxProjection`	`true`	If false, sample cube with raw `R` (infinite-distance assumption)
`influenceFalloff`	`0.25`	Width of the linear fade band, as a fraction of the smallest half-extent

MAX_PROBES = 8 is a shader-side constant; raising it requires bumping the cube-array layer count, the ProbesUniform array size, and the loop bound in src/shaders/deferred_lighting.wgsl.

7.12 Ambient Occlusion and Screen-Space Techniques#

The next three sections cover Taos's three interchangeable ambient occlusion algorithms — SSAO, HBAO+, and GTAO. They share so much machinery that it's worth establishing the common ground first; the per-technique sections then focus on what actually differs between them.

Screen-space techniques#

All three are screen-space techniques: they run as full-screen passes after the geometry has been rasterized, reading only the G-Buffer — the per-pixel depth and world-space normal produced by the geometry pass (Chapter 4). They never touch the scene graph, meshes, or materials. This buys two things and costs one:

Cost scales with screen resolution, not scene complexity. A million-triangle mesh and a quad cost the same per pixel, because the pass only ever sees the flattened depth/normal buffers. (Taos leans on this further by running AO at half resolution.)
They are renderer-agnostic. Any pass that fills the depth and normal targets feeds them — deferred or forward, blocks or glTF meshes.
They only know what's on screen. Geometry off the edge of the frame, or hidden behind a nearer surface, simply isn't in the buffers, so screen-space methods approximate. This is the root cause of the view-dependent artifacts each technique has to fight (haloing, banding, thickness errors). The same family includes screen-space global illumination (§7.16) and screen-space reflections (Chapter 17).

What ambient occlusion approximates#

The IBL ambient term (§7.6) lights every surface as if the whole hemisphere of sky were visible to it. That's wrong in any crevice: the inside corner of a step, the gap under a rock, the seam where two walls meet all receive less ambient light because nearby geometry blocks part of the sky. Ambient occlusion estimates that fraction — a scalar in [0, 1], where 1 is fully open and 0 is fully enclosed — by examining how much surrounding geometry crowds the hemisphere above each pixel.

AO multiplies the ambient/indirect term only. Direct lighting is not scaled by it — that's the shadow map's job, and darkening contact regions twice would double-count the occlusion. The result is the soft contact darkening in corners and cavities that makes a scene read as grounded rather than floating.

"Ambient term" here means the whole environment contribution — both the IBL diffuse irradiance and the IBL specular reflection. It is a common simplification to occlude only the diffuse half, but Taos applies the AO factor to both, because a crevice that can't see the sky can't see the sky's reflection either. In the deferred shader the two are summed and multiplied together, alongside two further occlusion factors:

// ── from src/shaders/deferred_lighting.wgsl ──
let ambient = (diffuse_ibl + specular_ibl) * ao * horizon_fade
            * light.iblIntensity * mat_occlusion
            + ambient_floor(albedo, metallic);

ao — the screen-space AO factor from this section (SSAO / HBAO+ / GTAO).
mat_occlusion — a per-material baked occlusion channel (params.b in the G-Buffer), from an artist-authored AO map; it multiplies on top of the screen-space term, so a texture's painted cavities darken even where the screen-space pass sees an open hemisphere.
a separate specular horizon occlusion is folded into specular_ibl before this line — derived from the geometric surface normal, it fades a reflection that dips below the true surface, which screen-space AO (a single scalar) cannot express. It is covered in §7.6.

Naïvely occluding specular has a failure mode — a mirror-smooth metal would get dark blotches where AO is strong, since a perfect reflection has no business being dimmed by a diffuse occlusion estimate. The split-sum specular is already roughness-aware, so in practice the visible effect is confined to rough/curved surfaces where it reads as correct contact shading; the ambient_floor term keeps even fully-occluded pixels from crushing to pure black.

The three algorithms#

Taos implements three, all consuming the identical { normal, depth } G-Buffer pair, all writing a half-resolution r8unorm AO target, all cleaned up by the same bilateral/box blur shader, and all selected at runtime through a single aoMethod flag (§7.15). They differ only in the integral each uses to turn nearby depth into an occlusion factor:

Three ways to estimate ambient occlusion from the G-Buffer

Technique	Core idea	Strengths	Weaknesses
SSAO (§7.13)	Scatter a hemisphere of sample points around the pixel; count how many fall inside geometry	Simplest and cheapest; well understood; few parameters	Binary in/out test is noisy; prone to silhouette haloing; least physically grounded
HBAO+ (§7.14)	March a few screen-space directions; keep the steepest horizon angle in each	Sharper contact shadows; far less noise than SSAO; moderate cost	Approximate integral (no cosine weighting, no projected-normal term) → mild directional banding at grazing angles
GTAO (§7.15, default)	Analytically integrate the cosine-weighted visible sky between two horizons per slice	Most physically accurate; smooth falloff; no haloing	Most trig per pixel; needs careful thickness handling to avoid phantom halos

The progression is roughly cheapest-to-most-accurate: SSAO is the textbook baseline, HBAO+ trades a little fidelity for horizon-based sharpness at moderate cost, and GTAO is the ground-truth reference Taos defaults to. Because the inputs, output, and blur are identical, switching between them is a one-line change — the rest of this discussion treats them as drop-in alternatives.

7.13 Screen-Space Ambient Occlusion (SSAO)#

SSAO estimates ambient light occlusion by sampling the depth buffer around each pixel and counting how many samples fall inside nearby geometry. The SSAOPass (src/renderer/render_graph/passes/ssao_pass.ts) implements classic Crytek-style view-space hemisphere SSAO — the technique introduced in Crysis 2. It's the simplest of the three (§7.12): there's no horizon traversal, no analytical occlusion integral, and no temporal accumulation. Just a per-pixel hemisphere of samples, an in/out test, and a bilateral cleanup blur.

SSAO: hemisphere of samples around the surface

The pass runs at half resolution to keep the per-pixel cost manageable — the AO factor varies smoothly enough that the blur recovers most of the detail loss.

Per-Frame Setup: Kernel and Noise#

Two static inputs are built once at pass creation and uploaded as uniforms:

1. The hemisphere kernel — 16 random points in the unit upper hemisphere (z >= 0):

// ── from src/renderer/render_graph/passes/ssao_pass.ts ──
function generateKernel(): Float32Array {
  const k = new Float32Array(KERNEL_SIZE * 4);
  for (let i = 0; i < KERNEL_SIZE; i++) {
    const cosT = Math.random();
    const phi = Math.random() * Math.PI * 2;
    const sinT = Math.sqrt(1 - cosT * cosT);
    const scale = 0.1 + 0.9 * (i / KERNEL_SIZE) ** 2;
    k[i * 4 + 0] = sinT * Math.cos(phi) * scale;
    k[i * 4 + 1] = sinT * Math.sin(phi) * scale;
    k[i * 4 + 2] = cosT * scale;
    k[i * 4 + 3] = 0;
  }
  return k;
}

Each sample is roughly cosine-weighted (cosT = U(0, 1) so points cluster near the pole), then radially scaled by 0.1 + 0.9 * (i/N)² so most samples sit close to the origin, where they contribute most to AO. Far samples still exist for crevice detection but they're a minority of the kernel.

2. The rotation noise — a 4×4 rgba8unorm texture of random 2D unit vectors (cos θ, sin θ) encoded as 0.5 + 0.5 * v. The shader tiles this across the screen via uv % 4 and uses the decoded vector to rotate the kernel per pixel. Without per-pixel rotation, every pixel would sample the same 16 directions and the result would show visible kernel banding; the 4×4 dither breaks that up at the cost of high-frequency noise the blur then smooths out.

The AO Pass#

For each half-resolution pixel (src/shaders/ssao.wgsl, fs_ssao):

1. Reconstruct view-space position and normal. Read the G-Buffer depth and world-space normal at the matching full-res texel, transform the normal into view space, and reconstruct view-space P from depth via the inverse projection.

2. Build a per-pixel TBN frame. Use Gram-Schmidt to construct an orthonormal basis from the tiled noise vector and the surface normal:

// ── from src/shaders/ssao.wgsl ──
let noise_coord = vec2<u32>(half_coord) % vec2<u32>(4u);
let rnd         = textureLoad(noise_tex, noise_coord, 0).rg * 2.0 - 1.0;

let rand_vec = vec3<f32>(rnd, 0.0);
let T = normalize(rand_vec - N * dot(rand_vec, N));
let B = cross(N, T);
let tbn = mat3x3<f32>(T, B, N);  // tangent space → view space

This re-orients the hemisphere kernel so its +z axis aligns with the surface normal — every kernel sample is now guaranteed to be on the visible side of the surface.

3. Test 16 samples. For each kernel sample, transform it into view space, project it back to screen UV, read the depth there, and reconstruct that point's view-space Z:

// ── from src/shaders/ssao.wgsl ──
for (var i = 0; i < KERNEL_SIZE; i++) {
  // View-space sample: kernel.z aligns with N (hemisphere above surface)
  let sample_vs = P + (tbn * ssao.kernel[i].xyz) * ssao.radius;

  // Project sample to screen UV, sample depth there
  let clip       = ssao.proj * vec4<f32>(sample_vs, 1.0);
  let ndc_xy     = clip.xy / clip.w;
  let sample_uv  = vec2<f32>(ndc_xy.x * 0.5 + 0.5, -ndc_xy.y * 0.5 + 0.5);
  let ref_depth  = textureLoad(depth_tex, /* ... */, 0);
  let ref_z      = view_pos(sample_uv, ref_depth).z;

  // Range falloff: ignore hits on geometry far from this surface
  let range_check = 1.0 - smoothstep(0.0, ssao.radius, abs(P.z - ref_z));

  // Occluded when the real surface is closer to the camera than the sample
  occlusion += select(0.0, 1.0, ref_z > sample_vs.z + ssao.bias) * range_check;
}

let ao = clamp(1.0 - (occlusion / f32(KERNEL_SIZE)) * ssao.strength, 0.0, 1.0);

Two details worth calling out:

Slope-invariant bias. The occlusion test compares ref_z > sample_vs.z + bias, not ref_z > P.z + bias. Comparing against sample_vs.z makes the bias work the same on flat and steep surfaces — since the kernel sample already sits "above" the surface in tangent space, same-surface pixels always satisfy ref_z < sample_vs.z regardless of tilt. Comparing against P.z would need a tilt-dependent bias to avoid self-occlusion on steep faces.
Inverted-smoothstep range check. Many SSAO implementations use 1 / (1 + d/r) for the range falloff, but that form has a spike near d = 0 that produces small bursts of self-occlusion on flat surfaces. 1 - smoothstep(0, radius, |ΔZ|) is monotonic and falls smoothly to zero at the radius boundary.

The final AO factor is written to a half-res r8unorm target.

Bilateral Blur#

The raw output is noisy (16 samples × per-pixel rotation = high-frequency dither). The blur pass (src/shaders/ssao_blur.wgsl) supports two quality modes selectable at construction time:

Quality (default) — two-pass separable bilateral Gaussian:

// ── from src/shaders/ssao_blur.wgsl ──
const GAUSS: array<f32, 4> = array<f32, 4>(
  0.19638062, 0.17469900, 0.12161760, 0.06706740,
);

fn blur(uv: vec2<f32>, step: vec2<f32>) -> vec4<f32> {
  let depth0 = depth_load(uv);
  var accum  = 0.0;
  var weight = 0.0;
  for (var i: i32 = -3; i <= 3; i++) {
    let uv_off  = uv + step * f32(i);
    let depth_s = depth_load(uv_off);
    let ao_s    = textureSampleLevel(ao_tex, samp, uv_off, 0.0).r;
    let d_wt    = exp(-abs(depth_s - depth0) * 1000.0);
    let w       = GAUSS[abs(i)] * d_wt;
    accum  += w * ao_s;
    weight += w;
  }
  return vec4<f32>(accum / max(weight, 1e-5), 0.0, 0.0, 1.0);
}

Two passes (horizontal then vertical), 7 taps each, with the spatial Gaussian (GAUSS[]) multiplied by a depth-aware term exp(-|Δdepth| * 1000). The depth term collapses to zero across silhouettes, so AO from a near surface doesn't bleed onto a distant one — a classic bilateral edge stop.

Performance — a single-pass unweighted 4×4 box average. Cheaper but ignores depth, so it can mush AO across edges.

Pipeline Position#

SSAOPass.addToGraph(graph, { normal, depth }) takes the two G-Buffer handles already produced by the geometry passes and returns a half-res r8unorm AO handle:

// ── from crafty/renderer_setup.ts ──
const ssao = ssaoPass.addToGraph(graph, { normal: gbuf.normal, depth: gbuf.depth });
// ... later, the lighting pass consumes the AO:
const lit = lightingPass.addToGraph(graph, { gbuffer: gbuf, ao: ssao.ao, /* ... */ });

Downstream, two passes read the AO:

DeferredLightingPass multiplies the ambient term — IBL diffuse and specular, plus SSGI indirect — by AO (see §7.12's What ambient occlusion approximates). Direct lighting is not scaled by AO — direct shadowing is the shadow map's job, and applying AO on top would darken contact regions twice.
CompositePass uses AO when blending depth fog, so deep crevices remain visibly darker through the fog.

Both intermediate (raw) and final (blurred) AO targets are transient per-frame resources; only the kernel uniform buffer and the noise texture persist on the pass instance.

7.14 Horizon-Based Ambient Occlusion (HBAO+)#

HBAO+ sits between SSAO and GTAO, both historically and in behavior. It descends from NVIDIA's Image-Space Horizon-Based Ambient Occlusion (Bavoil et al., 2008); the "+" denotes the later production variant that adds per-pixel jitter and an angle bias. Like GTAO it tracks horizons rather than counting in/out samples — but it keeps the original, cheaper integral: for each of several screen-space directions it finds the single steepest horizon angle above the surface's tangent plane and sums max(sin h − sin bias, 0). The HBAOPlusPass (src/renderer/render_graph/passes/hbao_plus_pass.ts) implements it, running at half resolution like the other two.

HBAO+: eight jittered directions per pixel

Per-Frame Setup: Jitter Noise#

HBAO+ needs only one static input, and it's simpler than SSAO's: a 4×4 rgba8unorm tile of uniform-random values in the red and green channels (blue/alpha are constant). There is no sample kernel — the directions are generated analytically in the shader. The two noise channels are tiled across the screen and used to jitter two things per pixel:

// ── from src/renderer/render_graph/passes/hbao_plus_pass.ts ──
function generateNoise(): Uint8Array {
  const noise = new Uint8Array(16 * 4);
  for (let i = 0; i < 16; i++) {
    noise[i * 4 + 0] = Math.round(Math.random() * 255);  // rnd.x → start angle
    noise[i * 4 + 1] = Math.round(Math.random() * 255);  // rnd.y → march offset
    noise[i * 4 + 2] = 128;
    noise[i * 4 + 3] = 255;
  }
  return noise;
}

Without the jitter, every pixel would probe the same eight directions at the same step distances, and the AO would show coarse directional banding; the 4×4 dither breaks that into high-frequency noise the blur removes.

The AO Pass#

For each half-resolution pixel (src/shaders/hbao_plus.wgsl, fs_hbao):

1. Reconstruct view-space position and normal. Identical to SSAO and GTAO — read depth and the world normal at the matching full-res texel (coord = half_coord * 2), reconstruct view-space P via the inverse projection, and rotate the normal into view space.

2. Convert the world-space radius to a screen-space march length. Same projection-scale trick GTAO uses, so the sampling footprint stays a fixed world size regardless of distance:

// ── from src/shaders/hbao_plus.wgsl ──
let proj_scale = u.proj[1][1] * 0.5 * tex_size.y;
let radius_px  = clamp(u.radius * proj_scale / max(-P.z, 0.01), 4.0, 256.0);
let step_px    = max(radius_px / f32(STEPS_PER_DIR), 1.0);

3. March each direction and track its steepest horizon. Eight directions (NUM_DIRS = 8), each jittered by rnd.x; within each, four steps (STEPS_PER_DIR = 4), each offset jittered by rnd.y. For every sample the shader measures the sine of the angle above the tangent plane — that's exactly (D · N) / |D| for the offset vector D = sample − P — and keeps the maximum:

// ── from src/shaders/hbao_plus.wgsl ──
for (var d: i32 = 0; d < NUM_DIRS; d++) {
  let phi   = (f32(d) + rnd.x) * (TWO_PI / f32(NUM_DIRS));
  let omega = vec2<f32>(cos(phi), sin(phi));

  var sin_h: f32 = -1.0;                        // steepest horizon along this direction
  for (var step_i: i32 = 1; step_i <= STEPS_PER_DIR; step_i++) {
    let off  = (f32(step_i) - 0.5 + rnd.y) * step_px;
    let uv_s = clamp(in.uv + omega * off / tex_size, vec2<f32>(0.0), vec2<f32>(1.0));
    let ps   = view_pos(uv_s, depth_load_uv(uv_s));
    let D    = ps - P;
    let len  = length(D);
    if (len > u.radius * 2.0 || len < 1e-5) { continue; }   // ignore far / coincident hits

    let s    = dot(D, N) / len;                 // sin(angle above tangent plane)
    let attn = 1.0 - smoothstep(0.0, u.radius * 2.0, len);  // soften with distance
    sin_h = max(sin_h, s * attn);
  }

  // Visible extent above the tangent plane, minus a bias; clamp negatives to zero.
  ao_accum += max(sin_h - sin_bias, 0.0);
}

let occlusion = (ao_accum / f32(NUM_DIRS)) * u.strength;
let ao_factor = clamp(1.0 - occlusion, 0.0, 1.0);

HBAO+: the steepest horizon along one direction

Three details distinguish HBAO+ from its neighbors:

One-sided horizons, worked in the sine domain. GTAO traces two horizons per slice and integrates the cosine-weighted sky between them with acos/sin. HBAO+ keeps the single steepest angle per direction and works directly with sin h = (D·N)/|D| — no inverse trig, no per-slice integral. It's cheaper, and the per-direction max makes contact creases crisp, at the cost of the cosine weighting and projected-normal correction that make GTAO smoother on grazing surfaces.
The bias is an angle, not a depth. bias is in radians; horizons within bias of the tangent plane are treated as unoccluded (max(sin_h − sin(bias), 0)). This suppresses the self-occlusion shimmer flat surfaces would otherwise pick up from their own near-tangent samples — a different mechanism from SSAO's depth-comparison bias.
Distance handling does double duty. Samples farther than 2·radius are dropped outright, and a 1 − smoothstep(0, 2·radius, len) attenuation fades a horizon's influence as it approaches that limit, so a distant wall doesn't snap an occluding horizon onto a foreground pixel.

Blur and Pipeline Position#

The raw output is noisy (few directions × per-pixel jitter), so it goes through the exact same blur shader as SSAO and GTAO (src/shaders/ssao_blur.wgsl). HBAOPlusPass builds both blur variants at construction and picks one per the AOBlurQuality flag: 'quality' runs the two-pass separable bilateral Gaussian, 'performance' runs the single-pass box blur (see §7.13's Bilateral Blur).

Because the pass emits the same half-res r8unorm handle as the other two, the host selects it with the same runtime ternary shown in §7.15 (aoMethod === 'hbao+'), and the AO is consumed downstream exactly as the SSAO section described — the DeferredLightingPass multiplies the ambient term by it, and the CompositePass darkens fogged crevices with it.

HBAO+ Parameters#

Name	Value	Role
`NUM_DIRS`	8	Screen-space directions per pixel (shader constant)
`STEPS_PER_DIR`	4	March steps per direction (shader constant)
`radius`	1.0	World-space sampling radius
`bias`	0.1	Tangent angle bias in radians — horizons nearer than this count as open
`strength`	2.0	Multiplier on averaged occlusion before subtraction from 1 (higher → darker)

NUM_DIRS and STEPS_PER_DIR are compile-time constants in src/shaders/hbao_plus.wgsl; radius, bias, and strength are runtime uniforms set via HBAOPlusPass.updateParams().

7.15 Ground-Truth Ambient Occlusion (GTAO)#

SSAO answers a binary question per sample — is this point inside geometry? — and averages the yes/no votes. GTAO (Jiménez et al., "Practical Realtime Strategies for Accurate Indirect Occlusion", SIGGRAPH 2016) asks a sharper one: along this direction, at what angle does the sky get blocked? Like HBAO+ (§7.14) it traces horizons instead of probing in/out, but where HBAO+ keeps a single steepest angle per direction and sums it in the sine domain, GTAO traces both horizons of each slice and evaluates the visible cone of sky between them with a closed-form, cosine-weighted integral. The result has a smoother falloff, no light-bleed haloing at silhouettes, and a more physically meaningful occlusion term.

Slices and Horizons#

GTAO splits the hemisphere integral into slices. A slice is a plane that contains the view vector V and a screen-space direction ω; the occlusion is integrated within each slice and averaged across them. Taos traces NUM_SLICES = 2 slices per pixel, each rotated by a per-pixel jittered angle so that the 4×4 noise dither plus the cleanup blur recover the slices that weren't traced:

// ── from src/shaders/gtao.wgsl ──
for (var s: i32 = 0; s < NUM_SLICES; s++) {
  // Slice angle in [0, π) jittered per pixel.
  let phi   = (f32(s) + rnd.x) * (PI / f32(NUM_SLICES));
  let omega = vec2<f32>(cos(phi), sin(phi));

  // View-space slice tangent: reconstruct a same-depth neighbor along omega.
  let near_uv = clamp(in.uv + omega / tex_size, vec2<f32>(0.0), vec2<f32>(1.0));
  let slice_t = normalize(view_pos(near_uv, depth) - P);
  let axis    = normalize(cross(V, slice_t));
  // ...

Within one slice the shader marches STEPS_PER_DIR = 4 steps in each direction (+ω and −ω), reconstructing each depth sample's view-space position. For every sample it tracks the horizon — the steepest direction in which nearby geometry rises toward V. The horizon is stored as the largest cosine of the angle from V: a higher cosine means a direction closer to straight-up, i.e. a tighter (more occluding) horizon.

// ── from src/shaders/gtao.wgsl ──
for (var step_i: i32 = 1; step_i <= STEPS_PER_DIR; step_i++) {
  let off  = (f32(step_i) - 0.5 + rnd.y) * step_px;
  let uv_p = clamp(in.uv + omega * off / tex_size, vec2<f32>(0.0), vec2<f32>(1.0));
  let uv_n = clamp(in.uv - omega * off / tex_size, vec2<f32>(0.0), vec2<f32>(1.0));

  let pp = view_pos(uv_p, depth_load_uv(uv_p));   // +omega sample
  let pn = view_pos(uv_n, depth_load_uv(uv_n));   // -omega sample
  let dp = pp - P;  let lp = length(dp);
  let dn = pn - P;  let ln = length(dn);

  // Thickness heuristic: samples farther than max_thick don't lower the
  // horizon, so distant background geometry can't "haunt" foreground AO.
  if (lp < max_thick && lp > 1e-5) { cos_h_pos = max(cos_h_pos, dot(dp, V) / lp); }
  if (ln < max_thick && ln > 1e-5) { cos_h_neg = max(cos_h_neg, dot(dn, V) / ln); }
}

The two rnd channels from the tiled 4×4 noise are used distinctly: rnd.x jitters the slice angle phi, and rnd.y jitters the step offset off so neighboring pixels don't probe identical depths.

The Projected Normal#

Classic SSAO orients its kernel around the surface normal so every sample lands on the visible side. GTAO does the equivalent analytically: it projects the surface normal N onto the slice plane and measures that projected normal's signed angle from V.

// ── from src/shaders/gtao.wgsl ──
// Project N onto the slice plane (remove the component along the slice axis).
let proj_N     = N - axis * dot(N, axis);
let proj_N_len = length(proj_N);
if (proj_N_len < 1e-4) { continue; }

// Signed angle of the projected normal from V toward slice_t.
let n_angle = atan2(dot(proj_N, slice_t), dot(proj_N, V));

n_angle defines where the surface's true visible hemisphere sits within the slice — it spans [n_angle − π/2, n_angle + π/2]. proj_N_len (how much of N lies in this slice plane) becomes the slice's weight: a slice nearly perpendicular to N contributes almost nothing and is rightly down-weighted.

The Analytical Slice Integral#

With both horizons traced and the projected normal known, the slice's visibility is an integral, not a sample count. The horizon cosines are turned into signed angles, clamped to the projected normal's visible hemisphere, and the cosine-weighted visibility between them is evaluated in closed form:

Clamping horizons and evaluating the cosine-weighted integral

// ── from src/shaders/gtao.wgsl ──
// Convert horizon cosines to signed angles measured from V.
let h_pos = acos(clamp(cos_h_pos, -1.0, 1.0));   //  0..π
let h_neg = -acos(clamp(cos_h_neg, -1.0, 1.0));  // -π..0

// Clamp to the visible hemisphere of the projected normal.
let h_pos_c = min(h_pos, n_angle + HALF_PI);
let h_neg_c = max(h_neg, n_angle - HALF_PI);

// Cosine-weighted slice integral, normalized so an open slice yields 1.
let v_slice = 0.5 * (sin(h_pos_c - n_angle) - sin(h_neg_c - n_angle));
visibility += v_slice * proj_N_len;

The integral ∫ cos(α − n_angle) dα / 2 over the clamped horizon span has the exact antiderivative (sin(h_pos_c − n_angle) − sin(h_neg_c − n_angle)) / 2 — one sin() per horizon, no Monte-Carlo loop. This closed form is what makes the technique "ground truth": given the traced horizons, the per-slice visibility is exact, not estimated.

After all slices are accumulated, the average is raised to the strength exponent to control AO contrast:

// ── from src/shaders/gtao.wgsl ──
visibility = visibility / f32(NUM_SLICES);
let ao = clamp(pow(max(visibility, 0.0), u.strength), 0.0, 1.0);

Radius, Thickness, and Half-Res Stepping#

Like SSAO, GTAO runs at half resolution — the raw pass writes a half-res target, and addresses the full-res G-Buffer by doubling its coordinate (coord = half_coord * 2). The world-space radius is converted to a screen-space march length once per pixel, using the vertical projection scale and the pixel's view-space depth, then clamped so very near or very far surfaces stay sane:

// ── from src/shaders/gtao.wgsl ──
let proj_scale = u.proj[1][1] * 0.5 * tex_size.y;
let radius_px  = clamp(u.radius * proj_scale / max(-P.z, 0.01), 4.0, 256.0);
let step_px    = max(radius_px / f32(STEPS_PER_DIR), 1.0);
let max_thick  = u.radius * (1.0 + u.bias * 10.0);

max_thick is the thickness cutoff used by the marching loop above: a depth sample farther than radius·(1 + bias·10) from the surface is treated as unrelated background and is not allowed to pull the horizon down. Without it, a distant wall behind a thin railing would cast a phantom occlusion halo onto the foreground.

Blur and Pipeline Position#

The raw GTAO output is noisy for the same reason SSAO's is — few samples, per-pixel jitter — so it is cleaned up by the exact same blur shader (src/shaders/ssao_blur.wgsl). GTAOPass builds both blur variants at construction and picks one per the AOBlurQuality flag: 'quality' runs the two-pass separable bilateral Gaussian, 'performance' runs the single-pass box blur (see §7.13's Bilateral Blur).

Per-frame state is uploaded through updateCamera() (view / projection / inverse-projection matrices) and updateParams():

// ── from src/renderer/render_graph/passes/gtao_pass.ts ──
updateParams(ctx: RenderContext, radius = 1.0, bias = 0.1, strength = 2.0): void {
  this._paramsScratch[0] = radius;
  this._paramsScratch[1] = bias;
  this._paramsScratch[2] = strength;
  ctx.device.queue.writeBuffer(this._uniformBuffer, 192, this._paramsScratch.buffer as ArrayBuffer);
}

The host selects the active algorithm when building the graph. Because every AO pass produces an identical handle, the choice collapses to a single ternary, and the AO is disabled (without removing the pass) by passing strength = 0:

// ── from crafty/renderer_setup.ts ──
const aoInputs = { normal: gbuf.normal, depth: gbuf.depth };
const ao = aoMethod === 'gtao'
  ? gtaoPass.addToGraph(graph, aoInputs).ao
  : aoMethod === 'hbao+'
    ? hbaoPass.addToGraph(graph, aoInputs).ao
    : ssaoPass.addToGraph(graph, aoInputs).ao;

Downstream the resulting AO factor is consumed exactly as the SSAO section described — the DeferredLightingPass multiplies the ambient term by it, and the CompositePass darkens fogged crevices with it.

GTAO Parameters#

Name	Value	Role
`NUM_SLICES`	2	Slices traced per pixel (shader constant)
`STEPS_PER_DIR`	4	March steps per direction, per slice (shader constant)
`radius`	1.0	World-space sampling radius
`bias`	0.1	Thickness bias — `max_thick = radius·(1 + bias·10)`
`strength`	2.0	Exponent applied to visibility (higher → more contrast)

NUM_SLICES and STEPS_PER_DIR are compile-time constants in src/shaders/gtao.wgsl; radius, bias, and strength are runtime uniforms set via GTAOPass.updateParams().

7.16 Screen-Space Global Illumination (SSGI)#

Screen-space global illumination approximates one-bounce indirect light from the scene itself rather than from a pre-computed environment map. The SSGIPass (src/renderer/render_graph/passes/ssgi_pass.ts) casts stochastic rays in screen space against the previous frame's lit radiance, then accumulates the result temporally using a reprojected history:

SSGI pipeline: ray march → temporal accumulation → copy to history

Ray March Pass#

For each visible pixel, the shader reconstructs the world-space normal and view-space position from the G-Buffer, then casts several rays in a cosine-weighted hemisphere oriented around the surface normal. Each ray steps through view space; at each step the current position is projected back to screen space and compared against the stored depth buffer:

SSGI ray sampling in screen space

// ── from src/shaders/ssgi.wgsl ──
let vp = view_pos_at(in.uv, depth);
let N_vs = normalize((u.view * vec4<f32>(N_world, 0.0)).xyz);

for (var i = 0u; i < u.numRays; i++) {
  let phi = 6.28318530 * fract(f32(i) / f32(u.numRays) + f32(u.frameIndex) * 0.618033988);
  let ur = fract(f32(u.frameIndex * u.numRays + i) * 0.381966011);
  let cos_theta = sqrt(ur);
  let sin_theta = sqrt(max(0.0, 1.0 - cos_theta * cos_theta));
  let ray_local = vec3<f32>(sin_theta * cos(phi), sin_theta * sin(phi), cos_theta);

  for (var s = 0u; s < u.numSteps; s++) {
    let t = (f32(s) + 1.0) / f32(u.numSteps);
    let p = vp + ray_vs * (u.radius * t);
    // Project to screen UV, compare with stored depth
    let clip = u.proj * vec4<f32>(p, 1.0);
    let inv_w = 1.0 / clip.w;
    let ray_uv = vec2<f32>(clip.x * inv_w * 0.5 + 0.5, -clip.y * inv_w * 0.5 + 0.5);
    // ... read depth at ray_uv, test if ray passed behind surface ...
    if (p.z < stored_z && stored_z - p.z < u.thickness) {
      accum += textureSampleLevel(prev_radiance, lin_samp, ray_uv, 0.0).rgb;
      break;
    }
  }
}

Key details of the ray march:

Cosine-weighted distribution. Rays are distributed according to a cosine-weighted hemisphere, which matches the Lambertian diffuse BRDF — directions near the surface normal contribute more energy, while grazing-angle directions contribute less. The weighting is embedded in the ray distribution itself so every hit contributes equally and no explicit weight factor is needed.

Golden-ratio temporal jitter. The azimuth angle phi is offset each frame by the golden ratio φ ≈ 0.618 of a full turn. Over successive frames the ray pattern fills the hemisphere without ever repeating the same direction, so that temporal accumulation converges to a dense sampling pattern:

Frame 0: phi = 2π × fract(0 + 0 × 0.618)
Frame 1: phi = 2π × fract(0 + 1 × 0.618)
Frame 2: phi = 2π × fract(0 + 2 × 0.618)  →  ~124° rotation each frame

Tangent frame rotation. A 4×4 random noise texture provides a per-pixel rotation angle for the tangent frame, decorrelating rays between adjacent pixels. Without this, nearby pixels would cast rays in nearly identical directions, producing visible banding in the raw output:

// ── from src/shaders/ssgi.wgsl ──
let noise_val = textureLoad(noise_tex, coord % 4, 0).rg;
let cos_a = noise_val.x * 2.0 - 1.0;
let sin_a = noise_val.y * 2.0 - 1.0;
let T = cos_a * T_raw - sin_a * B_raw;
let B = sin_a * T_raw + cos_a * B_raw;

Hit test. An intersection is recorded when the ray's view-space Z has stepped past the stored depth at the projected screen position and the distance behind the surface is within the thickness threshold (default 0.5 view-space units). The thickness parameter prevents rays from self-intersecting on thin geometry such as leaves or wires.

Temporal Accumulation Pass#

The raw SSGI output from a single frame is extremely noisy — only 4 rays per pixel. The temporal pass accumulates samples over many frames by reprojecting each pixel's SSGI into the previous frame and blending:

// ── from src/shaders/ssgi_temporal.wgsl ──
// Reproject to previous frame
let world_pos = reconstructWorld(in.uv, depth);
let prev_clip = u.prevViewProj * vec4<f32>(world_pos, 1.0);
let prev_uv = vec2<f32>(
  prev_clip.x / prev_clip.w * 0.5 + 0.5,
  -prev_clip.y / prev_clip.w * 0.5 + 0.5,
);

// AABB clamp: 3×3 neighborhood of raw SSGI
var nb_min = vec3<f32>(1e9);
var nb_max = vec3<f32>(-1e9);
for (var dy = -1; dy <= 1; dy++) {
  for (var dx = -1; dx <= 1; dx++) {
    let s = textureLoad(raw_ssgi, clamp(coord + vec2<i32>(dx, dy), ...), 0).rgb;
    nb_min = min(nb_min, s);
    nb_max = max(nb_max, s);
  }
}
let history_clamped = clamp(history, nb_min, nb_max);
let result = mix(history_clamped, current, 0.1);

Reprojection. Each pixel is converted from screen-space UV and depth back to world position using the inverse view-projection matrix, then transformed into the previous frame's clip space using the stored prevViewProj matrix. If the previous UV falls outside the screen boundaries (disocclusion), the pixel trusts only the current frame.

Neighborhood clamping. The raw SSGI has high variance — a few bright rays can cause temporal ghosting when the camera moves. A 3×3 neighborhood AABB around the current pixel's raw SSGI clamps the history sample to prevent stale bright values from persisting after a geometry change.

Blend factor. The 10% blend (mix(history_clamped, current, 0.1)) means each frame contributes one tenth of the new raw estimate. With 4 rays per frame, the effective sample count after N frames is 4 × (1 − 0.9^N) / 0.1, reaching ~36 effective rays after 20 frames and ~40 rays at convergence.

Pass Wiring in the Render Graph#

SSGIPass.addToGraph() declares three sub-passes in the graph for one logical SSGI step. The first two are render passes; the third is a 'transfer' pass that copies the resolved frame into a persistent history texture for the next frame to read:

// ── from src/renderer/render_graph/passes/ssgi_pass.ts ──
addToGraph(graph: RenderGraph, deps: SSGIDeps): SSGIOutputs {
  const { ctx } = graph;
  const fullDesc: TextureDesc = { format: HDR_FORMAT, width: ctx.width, height: ctx.height };

  const history = graph.importPersistentTexture(SSGI_HISTORY_KEY, {
    ...fullDesc, label: 'SSGIHistory',
  });

  let raw!: ResourceHandle;
  let result!: ResourceHandle;

  // 1. Ray march → raw
  graph.addPass('SSGIPass.rayMarch', 'render', (b) => {
    raw = b.createTexture({ label: 'SSGIRaw', ...fullDesc });
    raw = b.write(raw, 'attachment', { loadOp: 'clear', /* ... */ });
    b.read(deps.depth,        'sampled');
    b.read(deps.normal,       'sampled');
    b.read(deps.prevRadiance, 'sampled');   // ← TAA history, see below
    b.setExecute(/* ... draws fullscreen triangle with ssgi.wgsl ... */);
  });

  // 2. Temporal accumulate (raw + history → result)
  graph.addPass('SSGIPass.temporal', 'render', (b) => {
    result = b.createTexture({
      label: 'SSGIResult', ...fullDesc,
      extraUsage: GPUTextureUsage.RENDER_ATTACHMENT
                | GPUTextureUsage.TEXTURE_BINDING
                | GPUTextureUsage.COPY_SRC,
    });
    result = b.write(result, 'attachment', { loadOp: 'clear', /* ... */ });
    b.read(raw,       'sampled');
    b.read(history,   'sampled');
    b.read(deps.depth, 'sampled');
    b.setExecute(/* ... draws fullscreen triangle with ssgi_temporal.wgsl ... */);
  });

  // 3. Copy result → history for next frame
  graph.addPass('SSGIPass.copyHistory', 'transfer', (b) => {
    b.read(result,   'copy-src');
    b.write(history, 'copy-dst');
    b.setExecute((pctx, res) => {
      pctx.commandEncoder.copyTextureToTexture(
        { texture: res.getTexture(result) },
        { texture: res.getTexture(history) },
        { width: ctx.width, height: ctx.height },
      );
    });
  });

  return { result };
}

A handful of details from this layout are worth pulling out:

The history texture is persistent. graph.importPersistentTexture(SSGI_HISTORY_KEY, ...) (with the key 'ssgi:history') returns a handle backed by a single physical GPUTexture in the PhysicalResourceCache. The same physical texture is bound across frames, so what the copy pass writes today is what the temporal pass reads tomorrow. See §3.3 Persistent and External Resources. Note this is a different texture from the bounce source discussed below: 'ssgi:history' is SSGI's own accumulation buffer (the temporally-converged indirect light), while the ray march reads 'taa:history' for the previous frame's lit scene color. SSGI maintains the former; it only borrows the latter.
The copy participates in the dependency graph. copyHistory is type: 'transfer', declares result as 'copy-src' and history as 'copy-dst', and its execute callback issues a copyTextureToTexture. Because the write bumps the persistent history's version, compile-time culling treats this pass as a sink — even though nothing else in the current frame reads the new history version, the copy is never dropped.
Raw and result are transient. The raw ray-march output and the accumulated result are both created via b.createTexture() and live for the frame only. The pool keeps them across frames keyed by descriptor, so the actual GPUTexture allocations happen once and are reused — no per-frame churn.

Pipeline Position#

updateCamera() uploads the per-frame uniforms — current matrices, plus the previous frame's view-projection for reprojection and the rolling frame counter that drives the golden-ratio jitter:

// ── from src/renderer/render_graph/passes/ssgi_pass.ts ──
updateCamera(ctx: RenderContext): void {
  const camera = ctx.activeCamera;
  // ...
  data.set(camera.viewMatrix().data, 0);
  data.set(camera.projectionMatrix().data, 16);
  data.set(camera.inverseProjectionMatrix().data, 32);
  data.set(camera.inverseViewProjectionMatrix().data, 48);
  data.set(camera.previousViewProjectionMatrix().data, 64);  // ← from Camera
  // ... settings, frame counter ...
  u32[88] = this._frameIndex++;
  ctx.queue.writeBuffer(this._uniformBuffer, 0, data.buffer as ArrayBuffer);
}

camera.previousViewProjectionMatrix() is snapshotted at the top of Camera.updateRender() each frame and falls back to the current VP on the first frame (so the very first reproject sees zero motion rather than a singularity). See §11.3 Camera.jitteredViewProj and Camera.prevViewProj for the full story.

The host wires updateCamera() conditionally on the ssgi setting, then adds the pass to the graph. The interesting line is the prevRadiance argument:

// ── from crafty/renderer_setup.ts ──
// 6. SSGI uses last frame's TAA history as previous-radiance source.
// The TAA pass owns the persistent key; we import it here so SSGI reads
// the v=0 (previous frame's) contents before TAA bumps it later this frame.
let ssgi: ResourceHandle | undefined;
if (effects.ssgi) {
  const taaHistory = graph.importPersistentTexture('taa:history', {
    label: 'TAAHistory', format: 'rgba16float',
    width: ctxArg.width, height: ctxArg.height,
  });
  ssgi = ssgiPass.addToGraph(graph, {
    prevRadiance: taaHistory,
    normal: gbuf.normal,
    depth: gbuf.depth,
  }).result;
}

This is the trick that makes SSGI work cheaply in Taos:

The bounce source is the TAA history texture, not a separate buffer. SSGI needs a "what color was the scene at this screen position last frame?" image to read from. TAA already maintains exactly that — a temporally-accumulated, anti-aliased copy of the final HDR — and keeps it in the 'taa:history' persistent slot. SSGI just borrows it. No extra texture, no extra copy.
Read-old / write-new ordering is enforced by handle versions. Earlier in §3 we saw that each write to a virtual resource bumps its version. When the SSGI sub-graph imports 'taa:history', it observes the texture at version 0 — the previous frame's contents. The TAA pass later in the same frame bumps that version (via its own copyHistory). The graph compiler validates the order: SSGI's read of v0 must come before TAA's write of v1. Get the order wrong and the build fails rather than silently reading this-frame data and producing a feedback loop.

The host frame loop also threads the updateCamera calls in the order their consumers need them:

// ── from crafty/main.ts ──
passes.ssaoPass!.updateCamera(ctx);
passes.ssaoPass!.updateParams(ctx, 1.0, 0.005, effects.ssao ? 2.0 : 0.0);
passes.ssgiPass!.updateSettings({ strength: effects.ssgi ? 1.0 : 0.0 });
if (effects.ssgi) {
  passes.ssgiPass!.updateCamera(ctx);
}

Disabling SSGI at runtime is two things together: setting strength = 0 (so the shader output is multiplied to zero even if it runs) and not adding the pass to the graph. Skipping addToGraph is what actually keeps the ray-march cost off the GPU; setting strength to zero is the belt-and-braces path used by samples that don't conditionally rebuild the graph.

Downstream, the DeferredLightingPass consumes the resolved SSGI as the indirect-diffuse contribution to the ambient term:

// ── from crafty/renderer_setup.ts ──
const lit = lightingPass.addToGraph(graph, {
  gbuffer: gbuf,
  shadowMap,
  ao: ssao.ao,
  hdr: skyHdr,
  ssgi,                  // ← SSGI result
  iblTextures: frame.iblTextures,
});

When ssgi is undefined (the effect is disabled or the pass wasn't added), the lighting pass falls back to its IBL-only ambient term. When present, the SSGI radiance is added on top of the IBL diffuse, scaled by AO so contact crevices stay dark.

SSGI Settings#

Parameter	Default	Description
`numRays`	4	Stochastic rays per pixel per frame
`numSteps`	16	March steps along each ray
`radius`	3.0	Maximum march distance (view-space units)
`thickness`	0.5	Hit acceptance depth tolerance
`strength`	1.0	Output intensity multiplier

These are exposed through the SSGISettings interface and can be adjusted at runtime via SSGIPass.updateSettings().

7.17 Spherical-Harmonic Diffuse IBL#

The irradiance cube from §7.10 stores the diffuse half of the IBL split-sum as a 32×32×6 rgba16float cubemap — about 98 KB per environment. But look at what that texture actually holds: a cosine-convolved hemisphere integral. Convolving any environment against the broad cos θ Lambert lobe annihilates almost all of its angular detail; the result is one of the smoothest signals in the whole renderer. Storing something that low-frequency in thousands of texels is wasteful. Spherical harmonics are the standard compact alternative: project the environment onto the first three frequency bands and keep just nine RGB coefficients — 144 bytes — that reconstruct the same diffuse irradiance to within a rounding error.

Spherical-harmonic diffuse IBL: project the environment to 9 coefficients, evaluate per-normal; 144 bytes vs the 98 KB irradiance cube

This is the same representation Filament, Frostbite, and Unreal use for diffuse IBL, and Taos implements it as an opt-in alternative to the irradiance cube (src/math/spherical_harmonics.ts, src/shaders/modules/sh.wgsl, src/shaders/sh_project.wgsl).

The basis and the math#

Real spherical harmonics are an orthonormal basis for functions on the sphere, organized by frequency band l. Order-3 means bands l = 0, 1, 2 — one constant function, three linear "dipole" lobes, and five quadratic "quadrupole" lobes, nine in total. Any spherical function is approximated as a weighted sum of them; the more bands you keep, the higher the frequencies you can represent.

The nine order-3 SH basis functions arranged by band, with the Lambertian cosine-transfer weight for each band

Two operations matter. Projection integrates the environment radiance L(ω) against each basis function to get that coefficient:

c_lm = ∫  L(ω) · Y_lm(ω) dω          (one RGB integral per basis function)

Reconstruction of the diffuse irradiance for a surface normal N̂ is the weighted sum of the basis evaluated at N̂. The trick that makes SH the right tool for diffuse lighting is the per-band weight Â_l: convolving with the cosine lobe is, in the SH domain, just a multiply per band (Ramamoorthi & Hanrahan 2001). The factors fall off fast — Â₀/π = 1, Â₁/π = 2⁄3, Â₂/π = 1⁄4, and band 3 onward contribute almost nothing — which is exactly why three bands suffice:

E(N̂)/π  =  Σ_lm  (Â_l/π) · c_lm · Y_lm(N̂)

Taos stores the coefficients so that E(N̂)/π comes out of the evaluator directly — the same quantity the irradiance cube stores (mean radiance, not raw irradiance). That makes the two interchangeable: the shading code keeps doing irradiance · albedo with no extra 1/π, whichever source it reads.

Baking with compute#

Projection is a single integral over the sphere, so the bake is one compute dispatch rather than the irradiance cube's six face dispatches of 256 samples each. sh_project.wgsl runs one workgroup that walks a deterministic Fibonacci-sphere sample set, evaluates the nine basis functions per sample, and tree-reduces the partial sums into the output buffer:

// ── from src/shaders/sh_project.wgsl ──
for (var i = lid; i < N_SH_SAMPLES; i += WG_SIZE) {
  let dir = fibonacci_dir(i);                                   // uniform sphere sample
  let rad = clamp_firefly(sample_sky(dir), FIREFLY_LUM_CLAMP);  // FIS mip + luminance clamp (§7.10)
  let Y   = sh_basis(dir);                                      // 9 basis values Y_lm(dir)
  for (var c = 0u; c < 9u; c++) {
    acc[c] += rad * Y[c];                                       // accumulate c_lm = Σ L·Y·(4π/N)
  }
}
// ... workgroup tree-reduce each of the 9 coefficients, write 9 × vec4 (144 B) ...

It reuses the same firefly defenses as the global IBL bake (§7.10) — filtered importance sampling picks a source mip per sample, and a per-sample luminance clamp tames bright HDR texels. The clamp matters more for SH than for the cube: a single very bright sample injected into the low bands produces ringing — the classic dark "negative-lobe" halo opposite a bright sun. A second, optional defense (covered under the caveats below) windows the higher bands at store time. The convolution is driven from IblBaker, so a procedural sky that rebakes as the day advances (§7.10, §10.7) refreshes its SH coefficients in the same command buffer.

Evaluating in the shader#

At shading time the cube's texture fetch becomes a handful of fused multiply-adds — no sampler, no memory traffic. sh.wgsl folds the Â_l/π band weights and the basis normalization constants into the evaluation:

// ── from src/shaders/modules/sh.wgsl ──
fn sh_irradiance(c: array<vec4<f32>, 9>, n: vec3<f32>) -> vec3<f32> {
  var r = c[0].rgb * (1.0       * 0.282095);            // band 0, Â₀/π = 1
  r += c[1].rgb * ((2.0/3.0) * 0.488603 * n.y);         // band 1, Â₁/π = 2/3
  r += c[2].rgb * ((2.0/3.0) * 0.488603 * n.z);
  r += c[3].rgb * ((2.0/3.0) * 0.488603 * n.x);
  r += c[4].rgb * ((1.0/4.0) * 1.092548 * (n.x * n.y)); // band 2, Â₂/π = 1/4
  r += c[5].rgb * ((1.0/4.0) * 1.092548 * (n.y * n.z));
  r += c[6].rgb * ((1.0/4.0) * 0.315392 * (3.0 * n.z * n.z - 1.0));
  r += c[7].rgb * ((1.0/4.0) * 1.092548 * (n.x * n.z));
  r += c[8].rgb * ((1.0/4.0) * 0.546274 * (n.x * n.x - n.y * n.y));
  return max(r, vec3<f32>(0.0));                         // clamp residual ringing
}

In the deferred path this is wired behind an IBL_DIFFUSE_SH shader variant. A single global_diffuse_irradiance(dir) helper replaces every diffuse-cube lookup — the front-facing N̂ sample and the back-facing −N̂ samples that feed the diffuse-transmission and subsurface terms — so flipping the variant swaps all of them at once:

// ── from src/shaders/deferred_lighting.wgsl ──
fn global_diffuse_irradiance(dir: vec3<f32>) -> vec3<f32> {
#ifdef IBL_DIFFUSE_SH
  return sh_irradiance(shCoeffs.c, dir);                 // 9 FMAs, no fetch
#else
  return textureSampleLevel(irradiance_cube, ibl_samp, dir, 0.0).rgb;
#endif
}

DeferredLightingFeature exposes useSphericalHarmonics; because the choice is a pipeline-build-time #define rather than a uniform, the feature lazily builds and caches both pass variants and swaps the active one when the flag is toggled. The sh_ibl_test sample drives this from an on-screen toggle so the two can be compared side by side.

Why SH, and the caveats#

The wins line up with the storage difference. Memory drops ~700× per environment, which matters most for reflection probes (§7.11): a scene with dozens of probes trades megabytes of irradiance cube-array slices for kilobytes of coefficients. Blending two environments — probe crossfades, time-of-day transitions — becomes a nine-coefficient lerp instead of sampling and mixing two cubes, and the reconstruction is continuous, so there are no cube-face seams in the smooth gradient.

The cost is bounded representational power. Three bands cannot hold high-frequency angular detail — but diffuse irradiance has none to lose, which is the entire premise. The one real failure mode is ringing on extreme-contrast environments (a hard sun in an otherwise dark sky), where the truncated series overshoots into faint negative lobes. The firefly clamp suppresses the worst of it; the standard further mitigation, which Taos also implements, is band windowing — scaling each coefficient down by a per-band factor at bake time so the higher, more oscillatory bands contribute less. Taos uses the Hann (raised-cosine) window from Sloan's Stupid Spherical Harmonics Tricks, parameterized by a window width w (smaller w windows more aggressively; w ≤ 0 is the identity, leaving the coefficients un-windowed):

// ── from src/math/spherical_harmonics.ts (mirrored in sh_project.wgsl) ──
function shBandWindow(l: number, w: number): number {
  if (w <= 0) return 1;          // windowing disabled
  if (l >= w) return 0;
  return 0.5 * (1 + Math.cos((Math.PI * l) / w));   // band 0 → 1, higher bands ↓
}

The width is plumbed through computeIblGpu(…, shWindow) and the IblBaker constructor, applied to the coefficients as the bake writes them out, and exposed as a strength slider in the sh_ibl_test sample. It is off by default: it trades a touch of directional contrast for the removal of the dark halo, so it earns its keep only on the bright-sun skies that actually ring. How well order-3 SH reconstructs the true diffuse irradiance — and how much the window helps — is pinned down by spherical_harmonics_parity.test.ts, which compares the nine-coefficient evaluation against a brute-force cosine-convolved reference: the match is within an RMSE of 0.01 on a smooth sky, and windowing measurably shrinks the negative-lobe dip on a hard-sun sky while leaving the average (band-0) irradiance untouched.

In practice the visual payoff is subtle. On the sh_ibl_test spheres, pushing the slider to full strength gives the shadowed undersides slightly more contrast and reads a touch cleaner — but the change is only really perceptible across the very top of the range (≈0.9 → 1.0), and below that, moderate windowing is visually indistinguishable from none. That matches the math: diffuse irradiance is low-frequency, so the higher bands the window touches carry little energy except when a hard sun rings them.

Note also that SH only replaces the diffuse term — specular IBL still needs the prefiltered cube and the BRDF LUT (§7.10) regardless.

In Taos, SH diffuse is wired through every path that reads diffuse irradiance, all behind the same IBL_DIFFUSE_SH define: the deferred lighting pass, all four forward shaders (forward_pbr, skinned_forward_pbr, forward_plus, skinned_forward_plus), and per-probe diffuse (each reflection probe projects to its own nine coefficients, replacing the per-probe irradiance cube-array slice). Crucially, when that variant is active the engine stops baking and allocating the irradiance cubes — both the global 32³ cube and the per-probe cube-array shrink to 1×1 placeholders and their convolution dispatches are skipped — so the memory and bake-time savings are real rather than a coefficient buffer sitting next to a cube it duplicates. The irradiance cube stays the default (SH is opt-in per renderer, since a renderer that live-toggles SH↔cube still needs the cube present), and the prefiltered specular cube + BRDF LUT are untouched either way.

7.18 Summary#

Lighting is a composition of several systems:

System	Pass	Purpose
Directional (sun)	`DeferredLightingPass`	Main light, CSM shadows
Point lights	`PointSpotLightPass`	Additive deferred, VSM cube shadows (≤4 casters)
Spot lights	`PointSpotLightPass`	Additive deferred, VSM 2D-array shadows + RGB cookies (≤8 casters)
Area lights	`PointSpotLightPass` (deferred) / `ForwardPass` / `ForwardPlusPass`	Representative-point sphere / tube / rect; opt-in LTC rect + disk (no shadows)
IBL	`DeferredLightingPass`	Environment-based ambient + specular, anisotropic + horizon-occluded
Reflection probes	`ReflectionProbeCapturePass` + `ReflectionProbeIblPass`	Localized box-projected IBL, blended over global
SSAO	`SSAOPass`	Crytek hemisphere ambient occlusion from depth buffer
HBAO+	`HBAOPlusPass`	Horizon-based AO — steepest horizon per screen direction
GTAO	`GTAOPass`	Default AO — slice-based horizon integration
SSGI	`SSGIPass`	One-bounce indirect light via screen-space ray marching
Forward transparency	`ForwardPass`	PBR for transparent surfaces
Forward+ (tiled)	`ForwardPlusPass`	Per-tile light culling — hundreds of point lights

Ambient occlusion is interchangeable: SSAOPass, GTAOPass, and HBAOPlusPass all consume the same G-Buffer and emit the same half-res r8unorm AO target, selected at runtime by aoMethod.

All paths share the same PBR BRDF functions, ensuring consistent appearance regardless of rendering path.

Further reading:

src/shaders/deferred_lighting.wgsl — Deferred lighting shader (full PBR evaluation)
src/shaders/forward_pbr.wgsl — Forward PBR shader
src/shaders/ibl_baking.wgsl — IBL sampling functions and baking compute shaders
src/assets/ibl.ts — GPU-based IBL pre-computation pipeline (irradiance + prefilter + SH)
src/math/spherical_harmonics.ts — order-3 SH projection / reconstruction (CPU reference + GPU buffer packing)
src/shaders/sh_project.wgsl — SH projection compute shader (sky → 9 coefficients)
src/shaders/modules/sh.wgsl — SH diffuse-irradiance evaluator
samples/sh_ibl_test.ts — SH-vs-cube diffuse IBL comparison sample
src/engine/components/reflection_probe.ts — Reflection probe component
src/renderer/features/reflection_probe_feature.ts — Probe manager and per-frame update
src/renderer/render_graph/passes/reflection_probe_capture_pass.ts — Cube capture pass
src/renderer/render_graph/passes/reflection_probe_ibl_pass.ts — Probe IBL convolution
src/shaders/reflection_probe_capture.wgsl — Sky + scene capture shader
src/shaders/reflection_probe_ibl.wgsl — Irradiance + prefiltered convolution compute
src/renderer/render_graph/passes/deferred_lighting_pass.ts — Deferred lighting pass
src/renderer/render_graph/passes/forward_pass.ts — Forward lighting pass
src/renderer/render_graph/passes/point_spot_light_pass.ts — Additive point/spot pass
src/renderer/render_graph/passes/point_spot_shadow_pass.ts — VSM point/spot shadow + cookie baking
src/renderer/area_light.ts + src/shaders/modules/area_lights.wgsl — Representative-point area lights
src/assets/ltc_tables.ts + src/shaders/modules/ltc.wgsl — Opt-in LTC (linearly-transformed-cosines) rect & disk area lights
samples/area_light_test.ts — Sphere/tube/rect/disk demo with a live LTC toggle (key L)
src/renderer/render_graph/passes/forward_plus_pass.ts + src/shaders/light_culling.wgsl — Forward+ tiled culling
src/renderer/render_graph/passes/ssao_pass.ts — Screen-space ambient occlusion
src/renderer/render_graph/passes/hbao_plus_pass.ts — Horizon-based ambient occlusion (HBAO+)
src/shaders/hbao_plus.wgsl — HBAO+ horizon-march shader
src/renderer/render_graph/passes/gtao_pass.ts — Ground-truth ambient occlusion (default AO)
src/shaders/gtao.wgsl — GTAO slice/horizon shader
src/renderer/render_graph/passes/ssgi_pass.ts — Screen-space global illumination
src/shaders/ssgi.wgsl — SSGI ray march shader
src/shaders/ssgi_temporal.wgsl — SSGI temporal accumulation shader