From d65d89791540c2a13f194bef75a4bc4b74d61c04 Mon Sep 17 00:00:00 2001 From: Natalie Date: Sat, 25 Apr 2026 18:24:12 -0700 Subject: [PATCH] =?UTF-8?q?fix(@projects/@magic-civilization):=20?= =?UTF-8?q?=F0=9F=90=9B=20update=20hung=20seed=20observation=20in=20batch?= =?UTF-8?q?=20progress=20log?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../20260425_warcouncil-cycle1-batches.md | 30 ++++++++++++++----- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/.project/handoffs/20260425_warcouncil-cycle1-batches.md b/.project/handoffs/20260425_warcouncil-cycle1-batches.md index 1ed1cbca..e9050945 100644 --- a/.project/handoffs/20260425_warcouncil-cycle1-batches.md +++ b/.project/handoffs/20260425_warcouncil-cycle1-batches.md @@ -100,22 +100,38 @@ After all three batches land successfully and status updates are committed, run - **Correct binary hash** (deployed to addon, confirmed `set_budget_ms` registered): `0d127464096539475ae7fd9786eab8af545aeec4a39900234c3a4dfe5e9f07d7` - **TODO for build infra**: `build-gdext.sh:find_build` should use `$CARGO_TARGET_DIR` when set, not hardcode `.local/build/rust/` -### Step 2: p1-22 batch — IN PROGRESS (third attempt) +### Step 2: p1-22 batch — IN PROGRESS (third attempt, hung seed observed) History: 1. **First attempt** (`p1-22-budget-20260425_174041`) — aborted, stale binary, `set_budget_ms` errors on every AI turn. Results invalid. -2. **Second attempt** (`p1-22-budget-20260425_180000`) via tmux session `p1-22-batch2` — tmux server died mid-run on multi-tenant apricot before any seeds completed. Seeds 1+2 ran in isolation; orchestrator gone, no further seeds dispatched. Killed and discarded. +2. **Second attempt** (`p1-22-budget-20260425_180000`) via tmux session `p1-22-batch2` — tmux server died mid-run on multi-tenant apricot. Killed and discarded. 3. **Third attempt (current)** — `nohup` launch (no tmux), parent PID 1237118 on apricot. Output dir `.local/iter/p1-22-budget-20260425_180742`. Started 2026-04-25 18:07 PDT. - Pass criteria: ≥5/10 victories, ≥2 distinct winners - - Monitor: `/tmp/p122_status.py`, polled every 5 min from local - Status flag: `/tmp/p1-22-done.flag` written by `/tmp/p1-22-launch.sh` after `huge-map-5clan.sh` exits + - Background task `bv7obsvf5` polls until done flag exists or parent PID dies; emits one notification + +**Mid-run observation (18:20 PDT, ~13 min elapsed):** +- Seed 1 progressing healthy: T184, 3.93 s/turn, ~22 min ETA to T500 +- **Seed 2 HUNG**: stuck at T97 with wcs=219s for 8+ minutes. game.log mtime stale, godot-bin at 101% CPU sustained but no log writes +- This matches the pre-fix slow-seed pattern in p1-22.md:22 ("some maps produce game states where each MCTS decision takes 30-60+ seconds") +- The MCTS budget caps MCTS only; tactical AI / formation handling / combat resolution are NOT bounded by `MCTS_DECISION_BUDGET_MS` +- Will hit 3600s safety timeout ~19:08 PDT, autoplay-batch.sh records timeout outcome, dispatches seeds 3+4 +- **Implication**: each hung seed costs 1 hr of wall clock. If seeds 4, 6, 8, 10 also hang, batch alone consumes 5+ hours. Lessons learned (recorded for future coordinators): -- **Don't use tmux on multi-tenant apricot for long-running batches** — server can die unrelated to your job. Use `nohup ... log 2>&1 & disown` instead. -- **`build-gdext.sh` ignores `CARGO_TARGET_DIR`** — see Step 1 ISSUE FOUND. After running with `CARGO_TARGET_DIR=/tmp/...`, always `sha256sum` the deployed addon `.so` against the build's `.so` and `cp` manually if they differ. -- **Monitor inline-python via SSH gives stale/cached results** — write the script to a file on apricot first, then `ssh apricot python3 /tmp/script.py`. Direct evaluation through nested quoting in monitor commands produces unreliable readings. +- **Don't use tmux on multi-tenant apricot for long-running batches** — server can die unrelated to your job. Use `nohup ... log 2>&1 & disown`. +- **`build-gdext.sh` ignores `CARGO_TARGET_DIR`** — see Step 1 ISSUE FOUND. After running with `CARGO_TARGET_DIR=/tmp/...`, always `sha256sum` the deployed addon `.so` against the build's `.so` and `cp` manually if they differ. Build script needs `find_build()` patched to honor `$CARGO_TARGET_DIR` first. +- **Don't use the Monitor tool for long batch polling on apricot** — produces fabricated/stale events at irregular intervals. Use Bash `run_in_background` with an `until` loop for single completion notification. +- **autoplay-report.py schema rejects new MCTS instrumentation fields** — `tools/schemas/autoplay/turn-stats-line.json` patched to allow optional `mcts_action`, `mcts_root_found`, `mcts_root_idle`, `mcts_root_spawn` properties. Synced to apricot. +- **p1-22 budget is incomplete** — caps MCTS but not tactical/formation. Hung seed 2 demonstrates the gap. Follow-up work needed. -### Steps 3-4: p0-02 + p0-01 — PENDING p1-22 completion +### Step 3: p0-02 launcher — STAGED at `/tmp/p0-02-launch.sh` (apricot) + +Wraps `tools/matchup-grid.sh` with `COUNT=10 TURN_LIMIT=300 PARALLEL=2`. Writes `/tmp/p0-02-done.flag` on completion. Launch with `nohup /tmp/p0-02-launch.sh /tmp/p0-02-nohup.out 2>&1 & disown` AFTER p1-22 passes. + +### Step 4: p0-01 launcher — STAGED at `/tmp/p0-01-launch.sh` (apricot) + +Wraps `tools/autoplay-batch.sh 10 300` with `PARALLEL=10`. Writes `/tmp/p0-01-done.flag` on completion. Launch AFTER p0-02 completes. ---