HQ Volumetric Clouds — Deep Dive
This is the long-form companion to hq_cloud_test.ts. It walks through the architecture of HQCloudPass — a half-resolution multi-scatter cone-light volumetric cloud ray-march with temporal reprojection. The pass was originally part of the Taos engine; it is now a sample-local reference because the artifacts described in §"Open issues" below make it unsuitable for the game pipeline.
Status: WIP#
The biggest visible problem is axis-aligned rows of cloud puffs when the camera looks up from the ground at a low sun. The cone-sample light march and the half-res Halton jitter were tuned for a top-down flythrough view (samples/cloud_test.ts used to ship this pass as a toggle), where the issue is hidden by the steep view angle. The standalone sample exposes it on purpose — it is the right reproduction case for whoever picks the pass back up.
The pass is otherwise feature-complete: spherical-shell ray-march, three-octave multi-scattering, five-tap cone-sampled light march, Cornette-Shanks + back-lobe phase, temporal reprojection through the cloud-depth field, and an optional transmittance-LUT coupling for sunset coloring.
Files#
The HQ cloud pass is fully self-contained under samples/hq_cloud/:
| File | Purpose |
|---|---|
| hq_cloud_pass.ts | HQCloudPass — four-sub-pass orchestrator: half-res raymarch, temporal reproject, composite, history copy |
| hq_cloud_feature.ts | HQCloudFeature — RenderFeature wrapper that wires the pass into the engine pipeline |
| clouds_hq.wgsl | Half-res raymarch shader (fs_main MRT: scatter + cloud-depth) |
| cloud_reproject.wgsl | Bilateral upsample + reprojection + variance clip |
| cloud_composite.wgsl | Premultiplied-alpha composite over HDR + godray-glow-on-cloud term |
These imports stay in lockstep with the engine for the things they reuse: cloud_density.wgsl (procedural noise field, also used by CloudPass and GodrayPass) and atmosphere_luts.wgsl (transmittance LUT for sunset coloring). The shader-block preprocessor in the engine's ShaderBlockManager resolves both regardless of where the consuming .wgsl file lives.
Architecture: four sub-passes#
A single fullscreen ray-march at full resolution with 96+ primary steps and a 6-tap light cone per step would cost ~40 ms on a mid GPU — unshippable. HQCloudPass instead splits the work over four sub-passes wired into the render graph from a single addToGraph call:
Steps 1–2 do the heavy ray-marching at half resolution and upsample with temporal supersampling, step 3 lays the result over the lit scene HDR, and step 4 stashes the resolved frame as next frame's history. The persistent history texture is registered with graph.importPersistentTexture('hq_cloud:history', …) so it survives across frames — the same mechanism TAAPass and SSGIPass use.
Spherical-shell ray-march#
CloudPass does a flat-slab intersection (ray_slab(ro, rd, y_min, y_max)). For a player standing on the ground and looking up that is fine, but clouds 5 km away should be visibly lower than clouds overhead because the earth curves out from under them. HQCloudPass does a spherical-shell intersection instead — the same one PlanetCloudPass uses, and the same cloudBase / cloudTop field is reinterpreted as a world-radial radius (planetRadius + altitude) when planetRadius > 0:
// ── from clouds_hq.wgsl ──
fn march_range(ro: vec3<f32>, rd: vec3<f32>) -> vec2<f32> {
if (cloud.planetRadius > 0.0) {
return shell_range(ro, rd, cloud.cloudBase, cloud.cloudTop);
}
return ray_slab(ro, rd, cloud.cloudBase, cloud.cloudTop);
}
shell_range enumerates the four cases — camera below the shell (typical), camera inside the shell, camera above it, ray misses entirely — and returns the single near-side (t_start, t_end) traversal interval. The mirror implementation in planet_clouds.wgsl is the source of truth for the case analysis; clouds_hq.wgsl keeps the same names so the two stay in sync.
A consequence of the spherical model: the shell_height(p) helper that drives the density's vertical profile uses length(p) in planet mode instead of p.y, so a cloud's top/bottom is measured along the radial axis. A 60 m thunderhead 8 km from the camera is still a 60 m thunderhead — it just leans away from the camera as the world curves.
The procedural density field is the same cd_* module (src/shaders/modules/cloud_density.wgsl) the standalone flat-slab pass and the godray ray-march use. A hole in the HQ clouds is therefore the same hole in the standalone cloud pass and the same hole in the godrays.
Multi-scattering octaves#
Beer's law transmittance T = exp(-σ·d) is a single-scatter approximation. Real clouds are 95% multi-scattering — light bounces between droplets dozens of times before reaching the camera, which is why the side of a cumulus cloud away from the sun is not black: it is lit by light that has already scattered off the sun-facing side. A single-scatter march can only fake this with a generous ambient term, which is what CloudPass does.
The Wrenninge / Hillaire energy-conserving trick is to re-evaluate Beer's law and the phase function three times per primary step, geometrically attenuating the extinction coefficient, sun energy, and phase eccentricity by 0.5 each octave. Octave 0 is single-scatter; octaves 1 and 2 approximate second- and third-order in-scatter without shooting new rays:
// ── from clouds_hq.wgsl ──
fn multi_scatter_octaves(cos_theta: f32, g: f32,
optical_depth_step: f32, optical_depth_sun: f32) -> vec2<f32> {
var a = 1.0; // sun-energy scale
var b = 1.0; // extinction scale
var c = 1.0; // phase eccentricity scale
var sum_lum = 0.0;
var sum_w = 0.0;
for (var i = 0; i < 3; i++) {
let trans_sun = exp(-optical_depth_sun * b * cloud.extinction);
let trans_step = exp(-optical_depth_step * b * cloud.extinction);
let phase = dual_phase(cos_theta, g * c);
sum_lum += phase * trans_sun * a * (1.0 - trans_step);
sum_w += a * (1.0 - trans_step);
a *= 0.5; b *= 0.5; c *= 0.5;
}
return vec2<f32>(sum_lum, sum_w);
}
The multiScatter setting (HQCloudSettings.multiScatter) blends between the multi-octave term and the single-scatter term so the artist can dial back the brightening if it overshoots. With multiScatter = 1.0 the shaded side of the cloud reads as a soft gray-blue (the third-order term tints with the ambient sky); with multiScatter = 0.0 it falls back to the same single-scatter look as CloudPass.
Cone-sampled light march#
The shadow march in CloudPass is two straight-line taps from the current sample toward the sun. It works but produces a flat, hard self-shadow — every point along the cloud's shaded interior gets exactly the same density estimate, so the shading falls off in bands tied to the two sample positions rather than a continuous gradient.
The cone-sample march replaces those two taps with five taps arranged in a widening cone around the sun ray, plus one long "to-infinity" tap that catches towering self-shadow from thunderhead-style clouds:
// ── from clouds_hq.wgsl ──
od += cone_tap(p, sun_dir, vec3<f32>( 0.38, 0.16, 0.81), step, step * 0.25, 0.0);
od += cone_tap(p, sun_dir, vec3<f32>(-0.42, 0.59, -0.19), step, step * 0.50, 1.0);
od += cone_tap(p, sun_dir, vec3<f32>( 0.13, -0.78, 0.45), step, step * 0.75, 2.0);
od += cone_tap(p, sun_dir, vec3<f32>(-0.71, -0.22, -0.50), step, step * 1.00, 3.0);
od += cone_tap(p, sun_dir, vec3<f32>( 0.61, 0.74, -0.12), step, step * 1.25, 4.0);
Each cone_tap samples the cheap cd_coverage field (the variant of the density without the high-frequency detail erosion — same field the godrays sample, so self-shadow and godray shadow agree). The cone radius scales linearly with sample distance so the far taps sample a wider neighborhood and the self-shadow term reads as a soft volumetric occlusion rather than a single hard sampling line.
Phase function: Cornette-Shanks + back lobe#
Henyey-Greenstein (used in CloudPass) is the cheapest phase function that has a forward lobe — it correctly puts most scattered light close to the sun's direction, which is what produces the silver lining when a cloud is between you and the sun. Cornette-Shanks is a more accurate variant of the same idea — for the same asymmetry parameter g it has a slightly sharper forward peak, which is what makes the silver lining read as a thin bright rim rather than a soft glow.
The HQ pass uses a dual-lobe blend: 60% Cornette-Shanks forward (g = 0.85) for the silver lining, 40% Henyey-Greenstein with g = -0.18 for soft back-scatter. The back lobe stops the side of the cloud away from the sun from going pitch black when single-scatter would otherwise dominate, and it matches the look of real clouds at sunrise/sunset where the back-lit side has a soft amber fill:
// ── from clouds_hq.wgsl ──
fn dual_phase(cos_theta: f32, g: f32) -> f32 {
return 0.6 * cornette_shanks(cos_theta, g) + 0.4 * hg(cos_theta, -0.18);
}
Half-res raymarch + sub-pixel jitter#
A 64-step primary march at full 4K resolution is ~30 ms of GPU time. Rendering at half resolution is 4× cheaper — but a naive half-res cloud layer reads as a blurry mush, and the silhouettes against geometry tear. HQCloudPass ray-marches at half resolution, jitters each frame's sub-pixel offset along a Halton (2, 3) sequence (the same 16-sample pattern TAA uses), then has the temporal reproject pass re-converge the missing detail across roughly 4–8 frames of history.
// ── from hq_cloud_pass.ts ──
const hi = (this._frameIndex % 16) + 1;
const jx = halton(hi, 2) - 0.5;
const jy = halton(hi, 3) - 0.5;
// ...
data[16] = jx; // cloud.jitter.x — in half-res texels
data[17] = jy; // cloud.jitter.y
Two important details:
- Un-jittered invViewProj. The HQ pass does not use the camera's
jitteredViewProjectionMatrix(). TAA reuses the camera-level jitter because the GBuffer depth is what reconstructs world-pos for reprojection, and that depth is fed by the same jittered VP. The HQ cloud pass has its own jitter applied inside the half-res shader on the per-pixel ray direction, so the camera matrix it uploads is the plaininverseViewProjectionMatrix(). This keeps the HQ pass working when TAA is disabled and prevents double-jittering when it is. - MRT output. Each half-res pixel writes two attachments —
rgba16fpremultiplied scatter (RGB =color * (1 - T), A =1 - T) andr32fcloud-depth (linear camera-space distance to the scattering-weighted hit). The depth is not the first cloud sample'st: it is the weighted averagetover the integration, with the weight being(1 - T_step) * T_accumulated— i.e., the contribution of each sample to the final color. That number is what the reproject pass uses to look up the previous frame's world position, and a contribution-weighted depth lands on whatever surface the eye actually sees rather than the leading edge of the cloud.
Temporal reproject — bilateral upsample + variance clip#
The full-res reproject pass (cloud_reproject.wgsl) does three things in one fragment shader: it upsamples the half-res raymarch to full res with a depth-bilateral filter, it reprojects last frame's history using the per-pixel cloud-depth, and it variance-clips the history against the current frame's 3×3 neighborhood to suppress ghosting.
Bilateral upsample. A plain bilinear half→full upsample tears at silhouettes — the four enclosing half-res taps for a full-res pixel on a foreground object can mix in cloud color from adjacent "background" taps, producing a fringe of cloud along the silhouette. The bilateral upsample weights each tap by exp(-|d_half - d_scene| / σ) where d_half is the half-res cloud-depth and d_scene is the full-res scene depth, so a half-res tap that landed on the cloud bank past a foreground tree contributes almost nothing to the full-res pixel on the tree itself:
// ── from cloud_reproject.wgsl ──
let sigma = max(scene_lin * 0.05, 1.0);
let b00 = exp(-abs(d00 - scene_lin) / sigma);
let b10 = exp(-abs(d10 - scene_lin) / sigma);
let b01 = exp(-abs(d01 - scene_lin) / sigma);
let b11 = exp(-abs(d11 - scene_lin) / sigma);
let W = w00*b00 + w10*b10 + w01*b01 + w11*b11;
Reprojection through cloud depth, not scene depth. TAA reprojects through the scene depth because that is what gives the geometry's apparent motion. Clouds and terrain have very different apparent motion under camera rotation — terrain at 5 m moves five pixels per frame, clouds at 5 km move one — so reprojecting cloud history through scene depth would smear cloud color with whatever terrain happens to be in front. The HQ pass instead reprojects through the cloud depth it wrote in step 1:
// ── from cloud_reproject.wgsl ──
let cloud_t = sample_cloud_depth(in.uv, scene_lin);
let world_pos = cam_pos + ray_dir * cloud_t;
let prev_clip = u.prevViewProj * vec4<f32>(world_pos, 1.0);
let prev_ndc = prev_clip.xyz / prev_clip.w;
let prev_uv = vec2<f32>(prev_ndc.x * 0.5 + 0.5, -prev_ndc.y * 0.5 + 0.5);
This keeps the history sample tracking the cloud surface even when the player is moving past terrain — turn 90° in a flight sim and the clouds visibly stay put as the foreground rotates, exactly like real clouds.
Variance clip. A 3×3 neighborhood of the current upsampled signal gives a (min, max) AABB. The reprojected history is clipped (not clamped) toward the AABB along the line from the box center — Marco Salvi's variance-clip formulation, the same one taa.wgsl uses. Clipping preserves hue and only rectifies genuinely out-of-range history (cloud broke up under the previous-frame sample point, weather changed, etc.) while leaving stable converged pixels untouched.
// ── from cloud_reproject.wgsl ──
fn clip_aabb4(mn: vec4<f32>, mx: vec4<f32>, hist: vec4<f32>) -> vec4<f32> {
let p = 0.5 * (mx + mn);
let e = max(0.5 * (mx - mn), vec4<f32>(1e-5));
let v = hist - p;
let a = abs(v / e);
let ma = max(max(a.x, a.y), max(a.z, a.w));
if (ma > 1.0) { return p + v / ma; }
return hist;
}
The blend weight (reprojectBlend in HQCloudSettings) tunes how aggressively to lean on history: 0.05 keeps the cleanest image but adds visible lag under fast camera motion, 0.2 is more responsive but noisier, 0.08 is a reasonable default.
Composite and history copy#
The reproject pass's output is premultiplied — RGB is color * (1 - T), alpha is 1 - T. The composite pass (cloud_composite.wgsl) samples the resolved texture, optionally adds the godray glow term proportional to cloud opacity (the same conservation trick documented in chapter 9 §9.10), and the pipeline's blend state does the actual work:
// ── from hq_cloud_pass.ts ──
blend: {
color: { srcFactor: 'one', dstFactor: 'one-minus-src-alpha', operation: 'add' },
alpha: { srcFactor: 'one', dstFactor: 'one-minus-src-alpha', operation: 'add' },
}
This is the textbook premultiplied-alpha "over" operator: dst = src + dst * (1 - src.a). It composites the cloud volume over whatever the existing HDR pipeline put down — typically deferred lighting + the atmosphere sky — without re-tonemapping or rescaling.
Step 4 is a transfer pass that copies the resolved texture into the persistent history slot for the next frame. This is the same copyTextureToTexture pattern TAA and SSGI use; the cost is roughly one full-res 4-channel half-float memcpy per frame.
Open issues#
These are the things the next pass at this code should look at, ranked by visibility:
- Axis-aligned cloud-puff rows (the reason this sample exists). Looking straight up at a low sun, the procedural density field appears to align cloud puffs along ~8 m (detail rate 0.12) and ~80 m (base medium) grids. The hq_cloud_pass.ts source has a long comment around the noise sampler explaining the issue: the tileable cloud noise (generated by
perlinGradFbmTile/worleyTileinsrc/assets/cloud_noise.ts) has C¹ continuity at integer tile boundaries withrepeat, but the standardCloudPassblurs the resulting folds away with its lower step count. The sharper HQ march resolves the folds as visible bands. The cure is probably one of: (a) a true non-tiling 3D noise (gradient-noise variant that wraps), (b) a domain-warping pass that rotates the noise lookup per-sample, or (c) two decorrelated noise lookups blended by altitude so a single mis-aligned axis can't dominate. - Reproject latency on fast pans. With
reprojectBlend = 0.08the cloud silhouette lags by 4–6 frames during a 90°/s pan, then catches up as variance clip kicks in. Adjustable via the slider, but the default is tuned for a flight sim — for a first-person walker it should probably be ~0.15. - No clipmap for the noise lookup. Every primary sample does a 3D texture read at the same mip level regardless of distance. Far samples should sample a higher mip to reduce shimmer; closer samples should sample the base. The TAA history hides this partially but a clipmap-style mip selection would let the half-res march be even more aggressive on the primary step count.
Where the pass would slot in#
HQCloudFeature reads and writes frame.hdr and reads frame.depth, so it can go anywhere a CloudFeature could. The pass takes ~3–5 ms on a mid-range GPU at 1080p — about 3× the cost of the flat-slab CloudPass — and trades that frame budget for the silhouette quality of multi-scatter octaves, the soft shadows of the cone-sample march, the horizon curve of the spherical shell, and the visual stability of temporal reprojection. Most of that cost is the half-res ray-march itself; the reproject + composite + copy steps together are under 1 ms because they are bandwidth-bound rather than compute-bound.
The sample wires the feature into a deferred pipeline (Shadow → Geometry → Atmosphere → Deferred lighting → Godrays → HQCloud → AutoExposure → Composite) so the cloud composite sits over a fully-lit HDR target, including the godray fog texture for the glow-on-cloud term.