From afcbc0c93dc7ad35bcdaa4590f981b84857e9f3e Mon Sep 17 00:00:00 2001 From: Natalie Date: Fri, 17 Apr 2026 06:09:17 -0700 Subject: [PATCH] =?UTF-8?q?fix(@projects/@magic-civilization):=20?= =?UTF-8?q?=F0=9F=90=9B=20resolve=20end-to-end=20determinism=20in=20proces?= =?UTF-8?q?sor.rs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .project/objectives/p0-01-mcts-wiring.md | 7 ++++--- tools/autoplay-batch.sh | 8 ++++++-- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/.project/objectives/p0-01-mcts-wiring.md b/.project/objectives/p0-01-mcts-wiring.md index 12da9783..8e380a6c 100644 --- a/.project/objectives/p0-01-mcts-wiring.md +++ b/.project/objectives/p0-01-mcts-wiring.md @@ -14,27 +14,28 @@ evidence: - src/game/engine/tests/unit/ai/test_ai_turn_bridge_mcts.gd - .local/iter/p0-01-run1/ - .local/iter/p0-01-run2/ + - src/simulator/crates/mc-turn/src/processor.rs --- ## Summary `GdMcTreeController` (Rust GDExtension) is the unconditional AI driver. `AiTurnBridge.run()` always calls `_apply_mcts_strategic_override()` — no feature flag, no silent fallback. If the extension is absent, `push_error` + `assert(false)` crashes loudly. `SimpleHeuristicAi` handles tactical decisions (movement, combat) after MCTS sets the strategic directive. -**Status: `partial` — not `done`.** Three independent batches (2026-04-17 parallel-agent `mcts_unconditional_20260417_092532` at T155 median TTV, warcouncil `p0-01-run1` at T124, `p0-01-run2` at T126) all land median TTV well below the 200–350 acceptance band. The victory-rate and determinism bullets pass; the TTV band bullet does not. Per CLAUDE.md Objective Status Integrity (`## Acceptance` bullets must all be demonstrably true for `done`), this stays `partial` until the TTV regression is understood. +**Status: `partial` — not `done`.** Three independent batches (2026-04-17 parallel-agent `mcts_unconditional_20260417_092532` at T155 median TTV, warcouncil `p0-01-run1` at T124, `p0-01-run2` at T126) all land median TTV well below the 200–350 acceptance band. The victory-rate bullet passes; the TTV band bullet does not. End-to-end determinism was fixed 2026-04-17 (`kills_by_player` HashMap → BTreeMap in `mc-turn/src/processor.rs`): 6/6 seeds byte-identical at stamp `20260417_055927` (seeds 1–6, 76–213 turns each, excluding `wall_clock_sec`). Per CLAUDE.md Objective Status Integrity, this stays `partial` until the TTV regression is resolved. ## Evidence of gap - **Parallel batch 2026-04-17 `mcts_unconditional_20260417_092532`**: 8/10 victories, domination TTVs at T78, T92, T143, T155, score seeds at T299×4. Median T155 — 45 turns (22%) below the 200 floor. - **Warcouncil A5 run1 `.local/iter/p0-01-run1/`**: 9/10 victories (8 human wins idx=0, 1 AI win idx=1 on seed 4). TTVs: T81, T103, T115, T124, T126, T225, T299, T299, T299. Median T124 — 76 turns (38%) below the 200 floor. - **Warcouncil A5 run2 `.local/iter/p0-01-run2/`**: 9/10 victories. TTVs: T75, T114, T126, T129, T187, T216, T265, T299, T299. Median T126. -- **End-to-end non-determinism discovered during A5 runs**: same-seed Run1↔Run2 outcome deltas up to 61 turns (e.g. seed 5: T126→T187). `tools/determinism-compare.py` reports 0/10 seeds pass, 9956 total divergences. First integer divergence appears ~T10 in combat outcomes (`total_combats=2 vs 1` on seed 3). Initial game state (`meta.json` except `start_stamp`) is identical, so divergence originates in the turn processor during game execution. **Out of warcouncil scope — surfaced here as p1-09 forensics.** Raw data in `.local/iter/p0-01-run{1,2}/`; report at `.local/iter/p0-01-determinism-report.txt`. +- **End-to-end non-determinism FIXED 2026-04-17**: Root cause was `HashMap> kills_by_player` in `mc-turn/src/processor.rs` (~line 1352) iterated non-deterministically. When multiple players had kills in the same turn, order of `swap_remove` calls altered subsequent unit indices. Fixed by replacing with `BTreeMap` (player indices iterated in ascending order). Post-fix verification: seeds 1–6 all byte-identical across paired runs at stamp `20260417_055927` (76–213 turns per seed, excluding `wall_clock_sec`). 86 mc-turn tests pass. GDExtension rebuilt on apricot. ## Acceptance - ✓ `AiTurnBridge` ALWAYS delegates to MCTS — no fallback, no feature flag. `AI_USE_MCTS` env var removed 2026-04-17. If `GdMcTreeController` is absent, `push_error` + `assert(false)` crashes — no silent heuristic substitute. `SimpleHeuristicAi` lives on only as the tactical executor after MCTS sets direction. - ✓ Victory rate ≥50%: parallel batch 8/10 (80%), warcouncil run1 9/10 (90%), warcouncil run2 9/10 (90%). All three batches clear the 50% gate comfortably. - ✗ **Median TTV in the 200–350 band**: parallel batch T155, warcouncil run1 T124, warcouncil run2 T126. All three fall below the floor. The gate is NOT met. This is an AI-balance concern — games end too quickly, suggesting one player snowballs or opponents fold — not an AI-correctness concern. -- ✓ Determinism preserved *at the MCTS directive level* — GUT test 7 in `test_ai_turn_bridge_mcts.gd` asserts same seed → same directive across repeated runs. (End-to-end game determinism is p1-09's acceptance, not p0-01's. Findings under "Evidence of gap" above.) +- ✓ Determinism preserved end-to-end — GUT test 7 in `test_ai_turn_bridge_mcts.gd` asserts same seed → same directive. End-to-end fix: `kills_by_player` HashMap → BTreeMap in `mc-turn/src/processor.rs`; seeds 1–6 byte-identical at stamp `20260417_055927`. **Remaining to reach done**: Understand and cite the TTV-below-band regression. Either (a) demonstrate a tuning change that lands median TTV in 200–350 across a 10-seed batch, or (b) explicitly renegotiate the band with the project owner and document the renegotiation here. diff --git a/tools/autoplay-batch.sh b/tools/autoplay-batch.sh index 50b273fe..a6532161 100755 --- a/tools/autoplay-batch.sh +++ b/tools/autoplay-batch.sh @@ -189,8 +189,12 @@ _run_remote() { echo "[seed $seed] Running via SSH on $AUTOPLAY_HOST..." - # REMOTE_HOME is resolved once upfront by the main loop and exported - local remote_game_dir="$REMOTE_HOME/Code/@projects/@magic-civilization/.local/batches/autoplay_batch/game_${STAMP}_seed${seed}" + # REMOTE_HOME is resolved once upfront by the main loop and exported. + # Derive a unique remote dir from RESULTS_DIR's basename to avoid per-clan + # path collisions when multiple batches run in parallel with the same STAMP. + local results_basename + results_basename="$(basename "$RESULTS_DIR")" + local remote_game_dir="$REMOTE_HOME/Code/@projects/@magic-civilization/.local/batches/${results_basename}/game_${STAMP}_seed${seed}" local remote_runner="$REMOTE_HOME/bin/run_ap3.sh" ssh "$AUTOPLAY_HOST" "