feat(@projects/@magic-civilization): ✨ update gpu rollout performance metrics
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
308a31b633
commit
59f44747f2
1 changed files with 15 additions and 8 deletions
|
|
@ -139,8 +139,16 @@ successful A5/B5 evidence in the repo.
|
|||
(100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
|
||||
lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
|
||||
- ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
|
||||
≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. Two sequential
|
||||
blockers, first now resolved:
|
||||
≥20% vs `AI_GPU_ROLLOUT=false` — **MEASURED AND FAILS** 2026-04-18.
|
||||
Batch `apricot-20260418_080214/gpu-{true,false}/` (10 seeds T300 each,
|
||||
PARALLEL=10, RAYON=6): GPU avg 219.0s/game, CPU avg 214.7s/game — GPU
|
||||
is **~2% SLOWER**, not 20% faster. Root cause hypothesis: MCTS rollout
|
||||
batches are ~64-256 leaves per dispatch; GPU submit + buffer upload +
|
||||
kernel launch + readback overhead dominates. CPU path with RAYON=6 is
|
||||
already well-saturated. GPU benefit would surface only at much larger
|
||||
batch sizes (1000s of rollouts per leaf) or with multi-GPU sharding
|
||||
(tracked as `g2-04-multi-gpu-batch-simulate-oos`). **Historical blocker
|
||||
already resolved**:
|
||||
- (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by
|
||||
chronically-failing user services (`tor-manager`, `nightcrawler-crawl`,
|
||||
`nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in
|
||||
|
|
@ -162,12 +170,11 @@ successful A5/B5 evidence in the repo.
|
|||
thread `Option<GpuContext>` into `Tree`, dispatch leaf batches through
|
||||
`batch_simulate_gpu` when context present, plumb the flag through
|
||||
`api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`.
|
||||
- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch
|
||||
`.local/iter/sigterm-fix-verify2-1518/` on the current binary produced
|
||||
turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but
|
||||
outcomes not yet tallied (needs `autoplay-report.py` run on the dir).
|
||||
CPU-path victory-rate gate can close as soon as that report is generated;
|
||||
GPU-path gate must wait on the integration work above.
|
||||
- ✓ Victory rate on a 10-seed batch ≥60% — batch
|
||||
`apricot-20260418_080214/gpu-true/`: **8/10 victories (80%)** on the
|
||||
GPU path. `apricot-20260418_080214/gpu-false/` (CPU baseline):
|
||||
also 8/10 (symmetry expected — port determinism preserved across
|
||||
rollout backend).
|
||||
- ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
|
||||
`mc-ai --features gpu` all compile + test clean).
|
||||
- ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue