feat(@projects/@magic-civilization): ✨ update gpu rollout performance metrics

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-18 08:41:53 -07:00 · 2026-04-18 08:41:53 -07:00 · 59f44747f2
commit 59f44747f2
parent 308a31b633
1 changed files with 15 additions and 8 deletions
--- a/.project/objectives/p0-20-gpu-mcts-rollouts.md
+++ b/.project/objectives/p0-20-gpu-mcts-rollouts.md
@ -139,8 +139,16 @@ successful A5/B5 evidence in the repo.
  (100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
  lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
 - ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
-  ≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. Two sequential
-  blockers, first now resolved:
+  ≥20% vs `AI_GPU_ROLLOUT=false` — **MEASURED AND FAILS** 2026-04-18.
+  Batch `apricot-20260418_080214/gpu-{true,false}/` (10 seeds T300 each,
+  PARALLEL=10, RAYON=6): GPU avg 219.0s/game, CPU avg 214.7s/game — GPU
+  is **~2% SLOWER**, not 20% faster. Root cause hypothesis: MCTS rollout
+  batches are ~64-256 leaves per dispatch; GPU submit + buffer upload +
+  kernel launch + readback overhead dominates. CPU path with RAYON=6 is
+  already well-saturated. GPU benefit would surface only at much larger
+  batch sizes (1000s of rollouts per leaf) or with multi-GPU sharding
+  (tracked as `g2-04-multi-gpu-batch-simulate-oos`). **Historical blocker
+  already resolved**:
  - (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by
    chronically-failing user services (`tor-manager`, `nightcrawler-crawl`,
    `nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in
@ -162,12 +170,11 @@ successful A5/B5 evidence in the repo.
    thread `Option<GpuContext>` into `Tree`, dispatch leaf batches through
    `batch_simulate_gpu` when context present, plumb the flag through
    `api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`.
- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch
-  `.local/iter/sigterm-fix-verify2-1518/` on the current binary produced
-  turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but
-  outcomes not yet tallied (needs `autoplay-report.py` run on the dir).
-  CPU-path victory-rate gate can close as soon as that report is generated;
-  GPU-path gate must wait on the integration work above.
+- ✓ Victory rate on a 10-seed batch ≥60% — batch
+  `apricot-20260418_080214/gpu-true/`: **8/10 victories (80%)** on the
+  GPU path. `apricot-20260418_080214/gpu-false/` (CPU baseline):
+  also 8/10 (symmetry expected — port determinism preserved across
+  rollout backend).
 - ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
  `mc-ai --features gpu` all compile + test clean).
 - ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`