From 412dbb3ebf738f59fb4e7152bfbfa80f19fa8d6f Mon Sep 17 00:00:00 2001 From: Natalie Date: Fri, 24 Apr 2026 19:18:53 -0700 Subject: [PATCH] =?UTF-8?q?fix(@projects/@magic-civilization):=20?= =?UTF-8?q?=F0=9F=90=9B=20update=20stress-test=20status=20to=20reflect=20i?= =?UTF-8?q?ronhold=20balance=20failure?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../p0-22-ultimate-ai-stress-test.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/.project/objectives/p0-22-ultimate-ai-stress-test.md b/.project/objectives/p0-22-ultimate-ai-stress-test.md index 663d353c..b6d982ef 100644 --- a/.project/objectives/p0-22-ultimate-ai-stress-test.md +++ b/.project/objectives/p0-22-ultimate-ai-stress-test.md @@ -72,11 +72,13 @@ a foregone conclusion; the grid is the precondition. - Standard: `2 × TURN_LIMIT + 300` (1300s) - Override: `SAFETY_TIMEOUT_OVERRIDE=` for manual control. Unblocks huge-map-5clan batch execution (was timing out at 1300s; MCTS lookahead needs ~1800s). -- 🟡 **`tools/matchup-grid.sh` → `matchup_balance: PASS`** — IN PROGRESS 2026-04-24. - Prior batch `matchup-grid-20260419_000018` no longer exists on apricot (cleaned up or on different host). - Fresh full run started 2026-04-24 on apricot (PID 2016984, log: `/tmp/p0-22-matchup-grid-fresh.log`), - with `LAUNCH_COOLDOWN=15 COUNT=5 TURN_LIMIT=300 PARALLEL=4 AI_USE_MCTS=true`. - Verdict pending full 10/10 completion. +- 🔴 **`tools/matchup-grid.sh` → `matchup_balance: PASS`** — FAIL 2026-04-25. + Full run `matchup-grid-20260424_165224` (10/10 pairs, 5 seeds each, 50 total games) completed. + Verdict: `pass: false`. Single reason: `ironhold has 7 appearances but 0 wins in the grid`. + Ironhold won 0/7 games — all other clans won at least once. This is a balance issue in ironhold's + personality parameters, not a tooling problem. Fix: tune ironhold's personality axes in + `public/games/age-of-dwarves/data/ai_personalities.json` so it wins at least 1/7 games. + Full verdict: blackhammer 40%, deepforge 9.1%, goldvein 11.1%, ironhold 0%, runesmith 25%. - 🔴 **`tools/huge-map-5clan.sh` → `ultimate_stress: PASS`** — BLOCKED: three root causes, two fully diagnosed. **Root cause 1 (fixed, confirmed):** Batch `000049` used a stale `.so` lacking `GdAiController` registration — @@ -112,8 +114,11 @@ a foregone conclusion; the grid is the precondition. 1. ~~**Game binary reads `MAP_SIZE` and `NUM_PLAYERS` env.**~~ DONE 2026-04-18. 2. ~~**Wall-clock timeout sufficient for MCTS on huge maps.**~~ DONE 2026-04-23. `autoplay-batch.sh` SAFETY_TIMEOUT now auto-scales to 3× TURN_LIMIT when MCTS enabled. -3. **Complete matchup-grid** — Fresh full run in progress on apricot (PID 2016984). - Run `checklist-report.py matchup_balance` across full grid dir once 10/10 done. +3. ~~**Complete matchup-grid**~~ DONE 2026-04-25 — all 10/10 pairs ran. Result: FAIL (ironhold 0 wins). + Fix ironhold personality balance (see item below), then re-run with `LAUNCH_COOLDOWN=15 COUNT=5 TURN_LIMIT=300 PARALLEL=4 AI_USE_MCTS=true`. +3b. **Fix ironhold balance** — ironhold won 0/7 games in matchup-grid. Likely has too-conservative personality axes + (military/expansion too low). Tune `public/games/age-of-dwarves/data/ai_personalities.json` ironhold entry. + Cross-check against the other 4 clans' axes. Re-run matchup-grid after tuning. 4. **Fix score victory race in `auto_play.gd`** — `_on_victory` signal from `victory_manager.gd` may fire on the same frame that `_state = "done"` / `_finalize_run` writes `max_turns` to turn_stats. Check the `done` branch at line 539-545: it writes `_outcome = "victory" if _victory else "max_turns"` which is correct,