feat(@projects/@magic-civilization): update gpu rollout performance metrics

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Natalie 2026-04-18 08:41:53 -07:00
parent 308a31b633
commit 59f44747f2

View file

@ -139,8 +139,16 @@ successful A5/B5 evidence in the repo.
(100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
- ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
≥20% vs `AI_GPU_ROLLOUT=false`**NOT YET VERIFIED**. Two sequential
blockers, first now resolved:
≥20% vs `AI_GPU_ROLLOUT=false`**MEASURED AND FAILS** 2026-04-18.
Batch `apricot-20260418_080214/gpu-{true,false}/` (10 seeds T300 each,
PARALLEL=10, RAYON=6): GPU avg 219.0s/game, CPU avg 214.7s/game — GPU
is **~2% SLOWER**, not 20% faster. Root cause hypothesis: MCTS rollout
batches are ~64-256 leaves per dispatch; GPU submit + buffer upload +
kernel launch + readback overhead dominates. CPU path with RAYON=6 is
already well-saturated. GPU benefit would surface only at much larger
batch sizes (1000s of rollouts per leaf) or with multi-GPU sharding
(tracked as `g2-04-multi-gpu-batch-simulate-oos`). **Historical blocker
already resolved**:
- (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by
chronically-failing user services (`tor-manager`, `nightcrawler-crawl`,
`nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in
@ -162,12 +170,11 @@ successful A5/B5 evidence in the repo.
thread `Option<GpuContext>` into `Tree`, dispatch leaf batches through
`batch_simulate_gpu` when context present, plumb the flag through
`api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`.
- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch
`.local/iter/sigterm-fix-verify2-1518/` on the current binary produced
turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but
outcomes not yet tallied (needs `autoplay-report.py` run on the dir).
CPU-path victory-rate gate can close as soon as that report is generated;
GPU-path gate must wait on the integration work above.
- ✓ Victory rate on a 10-seed batch ≥60% — batch
`apricot-20260418_080214/gpu-true/`: **8/10 victories (80%)** on the
GPU path. `apricot-20260418_080214/gpu-false/` (CPU baseline):
also 8/10 (symmetry expected — port determinism preserved across
rollout backend).
- ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
`mc-ai --features gpu` all compile + test clean).
- ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`