magicciv/.project/objectives/p0-20d-gpu-walltime-real-host.md at b6eed900ed3d06e4d21445cff38632fdf163d933

Natalie 9b1e55b866 feat(@projects/@magic-civilization): ✅ mark p0-20 gpu-mcts rollouts as complete

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-05-05 02:19:10 -04:00

3.1 KiB

Raw Blame History

title

priority

status

scope

owner

updated_at

evidence

blocked_by

p0-20d

GPU MCTS wall-time gate — measure on real-discrete-GPU test host

superseded

game1

warcouncil

2026-05-05

plum Apple Silicon Metal IS the discrete-GPU test host — wall-time gate met directly under p0-20 Phase D close-out

Context

Phase A + B + C of p0-20-gpu-mcts-rollouts shipped the full architectural rewrite to main:

Live game uses Tree<GameRolloutState> exclusively, dispatched via AiBackend::probe()-routed iterate_gpu_batched.
Persistent staging buffer pool + 1024-rollout coalesced wgpu::Queue::submit (Phase B).
AI quality empirically validated as byte-identical to the pre-Phase-C Tree<McSnapshot> baseline (50 seeds × normal+hard, identical clan-win distribution: 35/3/4/2/3/3).

The one un-met acceptance bullet from the parent objective is the wall-time win: AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300 median wall-time ≤ 0.80 × CPU. (Note: AI_GPU_ROLLOUT env var was deleted in Phase C; the equivalent now is MC_AI_BACKEND=gpu vs MC_AI_BACKEND=cpu. The apricot-run.sh script's gpu-walltime mode still references the old env var name and needs updating.)

apricot's bluefin host probes Cpu because no compute-capable discrete GPU adapter is available. Software Vulkan (lavapipe) is available for parity tests but is slower than CPU rayon — wrong direction for the wall-time gate. Measuring requires a host with a working hardware compute GPU.

Acceptance

❌ Identify a test host with working compute-capable discrete GPU (candidates: plum's Apple Silicon Metal, a CUDA-capable box, an Intel/AMD discrete-GPU upgrade for apricot).
❌ Update scripts/apricot-run.sh::gpu-walltime mode to use MC_AI_BACKEND=gpu|cpu instead of the deleted AI_GPU_ROLLOUT env var.
❌ Run gpu-walltime 10 300 on the chosen host. Median seed wall-time at MC_AI_BACKEND=gpu ≤ 0.80 × MC_AI_BACKEND=cpu.
❌ If gate misses on first attempt, classify per the Phase B handoff: dispatch overhead (raise MAX_BATCH from 1024), readback latency (real ring-buffer overlap — Phase B explicitly skipped), or kernel time (WGSL-level optimization).

Source-of-truth rails

Rust crate: mc-ai::gpu owns dispatch + WGSL shader. Algorithm UNCHANGED from cpu_reference; backend just selects executor.
JSON path: no new data; existing balance/setup configs apply.
mc-core wrapper: existing AiBackend enum (Phase 1).
No new fallback paths. Backend probe at boot decides; runtime stays.

Out of scope

Algorithm changes. Same byte-identical algorithm via cpu_reference::batch_simulate_cpu and rollout.wgsl.
New AbstractPlayerState fields. The 64-byte/player layout is empirically sufficient (validated in p0-20 Phase C).
Multi-GPU sharding. Single-GPU coalescing is the in-scope path.

References

Parent: .project/objectives/p0-20-gpu-mcts-rollouts.md (status: partial after Phase C close-out)
Phase B handoff: p0-20-gpu-mcts-rollouts.md::#phase-b-2026-05-04--dispatch-coalescing
Phase C close-out: p0-20-gpu-mcts-rollouts.md::#phase-c-close-out-2026-05-04

3.1 KiB Raw Blame History Unescape Escape