From 59f44747f25c78632e369d43648988c072911b5f Mon Sep 17 00:00:00 2001 From: Natalie Date: Sat, 18 Apr 2026 08:41:53 -0700 Subject: [PATCH] =?UTF-8?q?feat(@projects/@magic-civilization):=20?= =?UTF-8?q?=E2=9C=A8=20update=20gpu=20rollout=20performance=20metrics?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../objectives/p0-20-gpu-mcts-rollouts.md | 23 ++++++++++++------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/.project/objectives/p0-20-gpu-mcts-rollouts.md b/.project/objectives/p0-20-gpu-mcts-rollouts.md index ce1d4bec..90804bb7 100644 --- a/.project/objectives/p0-20-gpu-mcts-rollouts.md +++ b/.project/objectives/p0-20-gpu-mcts-rollouts.md @@ -139,8 +139,16 @@ successful A5/B5 evidence in the repo. (100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet. - ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops - ≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. Two sequential - blockers, first now resolved: + ≥20% vs `AI_GPU_ROLLOUT=false` — **MEASURED AND FAILS** 2026-04-18. + Batch `apricot-20260418_080214/gpu-{true,false}/` (10 seeds T300 each, + PARALLEL=10, RAYON=6): GPU avg 219.0s/game, CPU avg 214.7s/game — GPU + is **~2% SLOWER**, not 20% faster. Root cause hypothesis: MCTS rollout + batches are ~64-256 leaves per dispatch; GPU submit + buffer upload + + kernel launch + readback overhead dominates. CPU path with RAYON=6 is + already well-saturated. GPU benefit would surface only at much larger + batch sizes (1000s of rollouts per leaf) or with multi-GPU sharding + (tracked as `g2-04-multi-gpu-batch-simulate-oos`). **Historical blocker + already resolved**: - (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by chronically-failing user services (`tor-manager`, `nightcrawler-crawl`, `nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in @@ -162,12 +170,11 @@ successful A5/B5 evidence in the repo. thread `Option` into `Tree`, dispatch leaf batches through `batch_simulate_gpu` when context present, plumb the flag through `api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`. -- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch - `.local/iter/sigterm-fix-verify2-1518/` on the current binary produced - turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but - outcomes not yet tallied (needs `autoplay-report.py` run on the dir). - CPU-path victory-rate gate can close as soon as that report is generated; - GPU-path gate must wait on the integration work above. +- ✓ Victory rate on a 10-seed batch ≥60% — batch + `apricot-20260418_080214/gpu-true/`: **8/10 victories (80%)** on the + GPU path. `apricot-20260418_080214/gpu-false/` (CPU baseline): + also 8/10 (symmetry expected — port determinism preserved across + rollout backend). - ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`, `mc-ai --features gpu` all compile + test clean). - ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`