3.1 KiB
| id | title | priority | status | scope | owner | updated_at | evidence | blocked_by | |
|---|---|---|---|---|---|---|---|---|---|
| p0-20d | GPU MCTS wall-time gate — measure on real-discrete-GPU test host | p1 | superseded | game1 | warcouncil | 2026-05-05 |
|
Context
Phase A + B + C of p0-20-gpu-mcts-rollouts shipped the full architectural rewrite to main:
- Live game uses
Tree<GameRolloutState>exclusively, dispatched viaAiBackend::probe()-routediterate_gpu_batched. - Persistent staging buffer pool + 1024-rollout coalesced
wgpu::Queue::submit(Phase B). - AI quality empirically validated as byte-identical to the pre-Phase-C
Tree<McSnapshot>baseline (50 seeds × normal+hard, identical clan-win distribution: 35/3/4/2/3/3).
The one un-met acceptance bullet from the parent objective is the wall-time win:
AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300 median wall-time ≤ 0.80 × CPU. (Note: AI_GPU_ROLLOUT env var was deleted in Phase C; the equivalent now is MC_AI_BACKEND=gpu vs MC_AI_BACKEND=cpu. The apricot-run.sh script's gpu-walltime mode still references the old env var name and needs updating.)
apricot's bluefin host probes Cpu because no compute-capable discrete GPU adapter is available. Software Vulkan (lavapipe) is available for parity tests but is slower than CPU rayon — wrong direction for the wall-time gate. Measuring requires a host with a working hardware compute GPU.
Acceptance
- ❌ Identify a test host with working compute-capable discrete GPU (candidates: plum's Apple Silicon Metal, a CUDA-capable box, an Intel/AMD discrete-GPU upgrade for apricot).
- ❌ Update
scripts/apricot-run.sh::gpu-walltimemode to useMC_AI_BACKEND=gpu|cpuinstead of the deletedAI_GPU_ROLLOUTenv var. - ❌ Run
gpu-walltime 10 300on the chosen host. Median seed wall-time atMC_AI_BACKEND=gpu≤ 0.80 ×MC_AI_BACKEND=cpu. - ❌ If gate misses on first attempt, classify per the Phase B handoff: dispatch overhead (raise MAX_BATCH from 1024), readback latency (real ring-buffer overlap — Phase B explicitly skipped), or kernel time (WGSL-level optimization).
Source-of-truth rails
- Rust crate:
mc-ai::gpuowns dispatch + WGSL shader. Algorithm UNCHANGED from cpu_reference; backend just selects executor. - JSON path: no new data; existing balance/setup configs apply.
- mc-core wrapper: existing
AiBackendenum (Phase 1). - No new fallback paths. Backend probe at boot decides; runtime stays.
Out of scope
- Algorithm changes. Same byte-identical algorithm via
cpu_reference::batch_simulate_cpuandrollout.wgsl. - New
AbstractPlayerStatefields. The 64-byte/player layout is empirically sufficient (validated in p0-20 Phase C). - Multi-GPU sharding. Single-GPU coalescing is the in-scope path.
References
- Parent:
.project/objectives/p0-20-gpu-mcts-rollouts.md(status: partial after Phase C close-out) - Phase B handoff:
p0-20-gpu-mcts-rollouts.md::#phase-b-2026-05-04--dispatch-coalescing - Phase C close-out:
p0-20-gpu-mcts-rollouts.md::#phase-c-close-out-2026-05-04