From 59f44747f25c78632e369d43648988c072911b5f Mon Sep 17 00:00:00 2001
From: Natalie <natalie@lilithuwu.com>
Date: Sat, 18 Apr 2026 08:41:53 -0700
Subject: [PATCH] =?UTF-8?q?feat(@projects/@magic-civilization):=20?=
 =?UTF-8?q?=E2=9C=A8=20update=20gpu=20rollout=20performance=20metrics?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
---
 .../objectives/p0-20-gpu-mcts-rollouts.md     | 23 ++++++++++++-------
 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/.project/objectives/p0-20-gpu-mcts-rollouts.md b/.project/objectives/p0-20-gpu-mcts-rollouts.md
index ce1d4bec..90804bb7 100644
--- a/.project/objectives/p0-20-gpu-mcts-rollouts.md
+++ b/.project/objectives/p0-20-gpu-mcts-rollouts.md
@@ -139,8 +139,16 @@ successful A5/B5 evidence in the repo.
   (100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
   lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
 - ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
-  ≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. Two sequential
-  blockers, first now resolved:
+  ≥20% vs `AI_GPU_ROLLOUT=false` — **MEASURED AND FAILS** 2026-04-18.
+  Batch `apricot-20260418_080214/gpu-{true,false}/` (10 seeds T300 each,
+  PARALLEL=10, RAYON=6): GPU avg 219.0s/game, CPU avg 214.7s/game — GPU
+  is **~2% SLOWER**, not 20% faster. Root cause hypothesis: MCTS rollout
+  batches are ~64-256 leaves per dispatch; GPU submit + buffer upload +
+  kernel launch + readback overhead dominates. CPU path with RAYON=6 is
+  already well-saturated. GPU benefit would surface only at much larger
+  batch sizes (1000s of rollouts per leaf) or with multi-GPU sharding
+  (tracked as `g2-04-multi-gpu-batch-simulate-oos`). **Historical blocker
+  already resolved**:
   - (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by
     chronically-failing user services (`tor-manager`, `nightcrawler-crawl`,
     `nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in
@@ -162,12 +170,11 @@ successful A5/B5 evidence in the repo.
     thread `Option<GpuContext>` into `Tree`, dispatch leaf batches through
     `batch_simulate_gpu` when context present, plumb the flag through
     `api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`.
-- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch
-  `.local/iter/sigterm-fix-verify2-1518/` on the current binary produced
-  turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but
-  outcomes not yet tallied (needs `autoplay-report.py` run on the dir).
-  CPU-path victory-rate gate can close as soon as that report is generated;
-  GPU-path gate must wait on the integration work above.
+- ✓ Victory rate on a 10-seed batch ≥60% — batch
+  `apricot-20260418_080214/gpu-true/`: **8/10 victories (80%)** on the
+  GPU path. `apricot-20260418_080214/gpu-false/` (CPU baseline):
+  also 8/10 (symmetry expected — port determinism preserved across
+  rollout backend).
 - ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
   `mc-ai --features gpu` all compile + test clean).
 - ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`