diff --git a/.project/objectives/p1-22a-huge-map-ai-quality.md b/.project/objectives/p1-22a-huge-map-ai-quality.md index a14119b7..67e50e14 100644 --- a/.project/objectives/p1-22a-huge-map-ai-quality.md +++ b/.project/objectives/p1-22a-huge-map-ai-quality.md @@ -172,10 +172,45 @@ improvement on top of that. (gate ≤4 close miss by 1), per-game tech counts P0=45 P1=30 P2=35 P3=30 P4=14 in seed1 (vs 1/1/1/1/1 prior). All 5 personality clans now progress through eras independently. `decisive_rate ≥ 5/10` still 0/10 - (games stop at wall-clock ~960s with `outcome=in_progress` — separate - infra signal cuts games off before natural victory; systemd unit - `Result=success` so it isn't a unit-level timeout). That's the - remaining work for next iterator. + on this batch (games stop at wall-clock ~960s with `outcome=in_progress` + — fetch was reading mid-run snapshots because `flatpak run` detaches + Godot into systemd user scopes and `autoplay-batch.sh`'s `wait` + returns while games are still alive — see next cycle). + + **2026-05-16 cycle 4**: ROOT-CAUSED the apparent zero-victory regression. + `scripts/apricot-run.sh status` reported `complete` based solely on + `completion.marker` presence, but `bash tools/autoplay-batch.sh` + touches that marker after its parallel `wait` returns — and `wait` + returns when `flatpak run` exits (immediately, since flatpak + detaches Godot into a `systemd --user` scope), not when the actual + Godot games finish. FIX: status probe now also counts live + `godot --path ...//...` processes; `state=running` until both + the marker is set AND zero matching procs remain. Commits + `b362039c9` + `f3187282d`. Plus a separate one-character bug in + `tools/checklist-report.py:360` reading `r.get("turn", 0)` where + `_collect` stores `"turns"` (plural) — always returned median 0. + Commit `b55943ba6`. + + Result on batch `20260516_222844` (10 seeds, T=500, fresh apricot): + All quality gates PASS — median `winner_tier_peak=10`, median + `tier_peak_gap=3`, `max_peak_unit≥3 = 10/10`, `wonders≥1 = 10/10`, + median `total_combats = 80`. `decisive_rate = 5/10` so far (1 real + domination at T214, 4 score-fallback at T500). Remaining 5 seeds + still in late-game MCTS at T280-T387 — crawling at ~5-min/turn due + to state explosion (50+ units per side); will eventually + score-fallback at T500 raising the count to 10/10. + + **Remaining gate failure**: `checklist-report.py ultimate_stress` + flags "only 1 distinct clan(s) won across 5 victories (['ironhold'])". + All 5 victories accrue to P0=ironhold because `auto_play.gd` only + impersonates the P0 slot (rush-buy gold, attack-phase commitment, + formation orders), giving that one clan a structural military + advantage the other 4 don't get. Research is now symmetric (fix #3 + above) but strategic action selection still isn't. Next iterator + needs to either (a) move auto_play's strategic helpers into + `turn_processor.gd` so all 5 players get the boost, or (b) rotate + `AI_PIN_PERSONALITY_P0` across seeds so each clan gets equal + autoplay-shaped opportunity. - [x] Path A implemented: `MAX_PLAYERS` raised 4→5, `AbstractPlayerState` expanded to 72 bytes (was 64), `AbstractRolloutState` to 360 bytes (was 256). `force_rel[u16;5]`, `relations[i8;5]`, new padding fields `_pad_fr`/`_pad_rel`.