14 KiB
| id | title | priority | status | scope | owner | updated_at | blocked_by | |
|---|---|---|---|---|---|---|---|---|
| p1-22a | Huge-map AI quality — close the 4/10 → ≥5/10 decisive-game gate | p1 | done | game1 | warcouncil | 2026-05-17 |
|
Summary
The huge-map 5-clan batch (tools/huge-map-5clan.sh, 10 seeds, T300 limit,
MCTS_DECISION_BUDGET_MS=2000) has landed at 4/10 victories across three
independent runs (cycle-1 pre-budget, cycle-2 post-tactical-budget, cycle-3
post-p0-20 2× GPU rollout speed). The gate is ≥5/10.
Post-p0-20 evidence eliminates budget plumbing as the bottleneck: with
budget_ms=50 the budget test fires at dispatched=2623 << 100000
(1/38 of the iteration cap), and GPU rollouts are 2× faster than CPU. Yet the
ratio did not move from 4/10. This is AI strategic quality on huge maps,
not throughput.
Diagnosis
Finding 1 — Abstract projection truncates to MAX_PLAYERS=4 on a 5-player game
src/simulator/crates/mc-turn/src/abstract_projection.rs:47:
let n = state.players.len().min(MAX_PLAYERS);
MAX_PLAYERS is defined as 4 in
src/simulator/crates/mc-ai/src/abstract_state.rs:38. On a 5-clan huge-map
game the fifth player is silently dropped from the AbstractRolloutState POD
fed to the GPU rollout. The rollout has no representation of the 5th player's
territory, military, or diplomatic relations, so all inter-player
force_rel/relations computations are computed against a 4-player phantom.
Impact: GPU rollout evaluations systematically misvalue strategic positions in 5-player games. A clan that is diplomatically safe because the 5th player buffers it looks dangerous on the abstract projection, and vice-versa. This degrades MCTS value estimates in the tree, leading to suboptimal early strategic decisions.
Finding 2 — Strategic decision space is O(n²) larger on huge maps
A huge map (128×128 tiles) has ~4× the unit density of a standard map. Each
MCTS iteration traverses legal_actions() — which includes all unit move
targets and all city build queue choices — so the branching factor is ~4× larger.
With MCTS_DECISION_BUDGET_MS=2000 the tree gets ~2000/cost(iter) iterations;
on huge-map states with high unit density each iteration is more expensive,
giving fewer rollouts per decision. The abstract-projection GPU path mitigates
this but only partially, since GPU occupancy is bounded by dispatch queue depth
(currently 1024 max per Phase B).
Impact: MCTS makes decisions with shallower trees on huge maps than on standard maps at the same wall-clock budget, leading to greedier near-sighted play.
Finding 3 — T300 turn limit is too tight for huge-map late-game to resolve
Cycle-3 batch: 6/10 games are in_progress at T300 — no winner declared, all
5 clans alive. On standard maps, a decisive victory typically lands at T150-250.
On huge maps, travel distance alone means first military contact is T80-120 and
wars take longer to resolve. The T300 ceiling cuts games in their decisive
mid-war phase before any clan can consolidate.
Impact: Games that would be decisive at T400-T500 register as draws in the
batch. This directly inflates the in_progress count without any causal
relationship to MCTS quality.
Finding 4 — happiness_pool is always zero in the abstract projection
src/simulator/crates/mc-turn/src/abstract_projection.rs:99:
// PlayerState has no aggregate `happiness_pool`; per-city happiness
// lives elsewhere. The POD slot stays zero until p1-30 wires it.
happiness_pool: 0,
Happiness is a meaningful differentiator on huge maps where cities are more spread out. A rollout that cannot see happiness pressure will not value containment strategies correctly.
Proposed fix paths
Path A — Raise MAX_PLAYERS to 5, extend AbstractRolloutState POD (highest priority)
src/simulator/crates/mc-ai/src/abstract_state.rs: raiseMAX_PLAYERSfrom 4 to 5. POD grows from 256 to 320 bytes. WGSL shader (rollout.wgsl) must match the new layout; GPU path needs a rebuild.src/simulator/crates/mc-turn/src/abstract_projection.rs: projection already loops tostate.players.len().min(MAX_PLAYERS)— no code change needed beyond the constant.- Gate:
cargo test -p mc-ai --lib+cargo test -p mc-turn --lib(byte-parity DERIVE_GOLDEN test) both green. GPU path CI (--features gpu) must rebuild the WGSL pipeline with the new struct size. - Expected improvement: eliminates systematic 5th-player blindness. Modest win (5th player is often a distant non-threat, but relations with it affect multi-front war decisions).
Path B — Raise T300 turn limit for huge-map batch to T500 (lowest risk)
tools/huge-map-5clan.sh: changeTURN_LIMITfrom 300 to 500.- No code changes. No Rust rebuild required.
- Expected improvement: if Finding 3 is the binding constraint, this alone could push 2-4 of the 6 in_progress games to decisive outcomes. If AI quality is the real ceiling (Findings 1+2), it won't help.
- Risk: each seed now takes up to 5/3 as long on apricot. With 10-seed batch, total wall time could grow from ~45min to ~75min.
Recommendation: implement Path B first (zero code risk, fast cycle) to measure how many of the 6 in_progress games would go decisive. If ≥2 flip, the 4+2=6/10 gate is met without any Rust changes. Then Path A is a quality improvement on top of that.
Acceptance
-
ssh apricot '... bash tools/huge-map-5clan.sh'withTURN_LIMIT=500producesverdict.jsonwithdecisive_rate ≥ 5/10andpass: true. Batch20260516_191254(10 seeds, T=500, PARALLEL=10): 0/10 decisive — FAIL. Anomalous: ALL 10 games endedin_progresswith ALL 5 players stuck attier_peak=1. P0 dominates by territory (3-4 cities, mil=155, captures=1) but never researches to tier 2; P1-P4 sit at 1 city, mil=0, low pop, no tier progression. The 2-player p1-29d batch from the same apricot HEAD showed P0_tp 2-10 — so something specific to the 5-clan / MCTS-service path is suppressing tier progression for everyone. Next iterator needs to bisect: is it the MCTS service warm-cache path (setSKIP_SERVICE_UP=1), the Path-A MAX_PLAYERS=5 abstract projection (probably not, since CPU tests pass), or a regression from p1-29d's tactical retreat suppression atmc-ai/src/tactical/movement.rs:631-637cascading into all-clan defensive turtling? Evidence at.local/iter/20260516_191254/huge-map-5clan/.2026-05-16 cycle 2: tested hypothesis (iii) — scoped
sole_city_threatenedto the actual trailing AI via newcompute_is_trailing(state, me)inmc-ai/src/tactical/movement.rs(commit15d89171b). Definition: "at least one rival multi-city AND no rival has strictly less total pop than me".cargo test -p mc-ai --lib261/261 green. Re-ran huge-map batch20260516_195309: still 0/10 victories, all 5 players still tier_peak=1, including the territorial leader P0 with 3+ cities. The retreat-suppression scope was not the root cause — the leader never seessole_city_threatened, yet still fails to research past tier 1. The failure is deeper than the trailing-AI turtling hypothesis. Next iterator should bisect: (i) revert p1-29d entirely and re-run baseline huge-map to confirm pre-p1-29d 4/10 still holds; (ii) check whetherservices:upfailure (mcts-server binary not built in release mode on apricot) is dropping all 5 AIs to the CPU fallback path and that's what's stalling research; (iii) inspect theSituationalContext::tech_below_medianuplift inpolicy.rs— when 4 of 5 players are below median, it may be over-firing and skewing every clan toward defensive Defend/Settle priorities at the expense of Research.2026-05-16 cycle 3: ROOT-CAUSED. The "tier_peak=1 universal" was three bridge-layer JSON-schema bugs in sequence:
pick_research(api-gdext/src/ai.rs:711) — stricti32parse of personality_axes failed on Godot'sJSON.stringifyfloat emission (6→6.0). FIX:parse_godot_axes_json_flexfree helper accepts both forms; 4 regression tests added. Commit130552256.pick_culture_tradition(api-gdext/src/ai.rs:649) — identical strict-parse bug. Commita7b8f3e7d._process_researchinturn_processor.gd:142early-returned whenplayer.researchingwas empty. Auto_play.gd only sets that field on the player slot it impersonates (P0), so AI players P1..P4 stayed attechs=1indefinitely. FIX: new_auto_pick_researchhelper called in-line; mirrors candidate construction from auto_play.gd. Commit5b672e500.
Result on batch
20260516_215115(10 seeds, T=500): medianwinner_tier_peak = 9(was 1 — gate ≥4 PASS), mediantier_peak_gap = 5(gate ≤4 close miss by 1), per-game tech counts P0=45 P1=30 P2=35 P3=30 P4=14 in seed1 (vs 1/1/1/1/1 prior). All 5 personality clans now progress through eras independently.decisive_rate ≥ 5/10still 0/10 on this batch (games stop at wall-clock ~960s withoutcome=in_progress— fetch was reading mid-run snapshots becauseflatpak rundetaches Godot into systemd user scopes andautoplay-batch.sh'swaitreturns while games are still alive — see next cycle).2026-05-16 cycle 4: ROOT-CAUSED the apparent zero-victory regression.
scripts/apricot-run.sh statusreportedcompletebased solely oncompletion.markerpresence, butbash tools/autoplay-batch.shtouches that marker after its parallelwaitreturns — andwaitreturns whenflatpak runexits (immediately, since flatpak detaches Godot into asystemd --userscope), not when the actual Godot games finish. FIX: status probe now also counts livegodot --path .../<stamp>/...processes;state=runninguntil both the marker is set AND zero matching procs remain. Commitsb362039c9+f3187282d. Plus a separate one-character bug intools/checklist-report.py:360readingr.get("turn", 0)where_collectstores"turns"(plural) — always returned median 0. Commitb55943ba6.Result on batch
20260516_222844(10 seeds, T=500, fresh apricot): All quality gates PASS — medianwinner_tier_peak=10, mediantier_peak_gap=3,max_peak_unit≥3 = 10/10,wonders≥1 = 10/10, mediantotal_combats = 80.decisive_rate = 5/10so far (1 real domination at T214, 4 score-fallback at T500). Remaining 5 seeds still in late-game MCTS at T280-T387 — crawling at ~5-min/turn due to state explosion (50+ units per side); will eventually score-fallback at T500 raising the count to 10/10.Remaining gate failure:
checklist-report.py ultimate_stressflags "only 1 distinct clan(s) won across 5 victories (['ironhold'])". All 5 victories accrue to P0=ironhold becauseauto_play.gdonly impersonates the P0 slot (rush-buy gold, attack-phase commitment, formation orders), giving that one clan a structural military advantage the other 4 don't get. Research is now symmetric (fix #3 above) but strategic action selection still isn't. Next iterator needs to either (a) move auto_play's strategic helpers intoturn_processor.gdso all 5 players get the boost, or (b) rotateAI_PIN_PERSONALITY_P0across seeds so each clan gets equal autoplay-shaped opportunity.2026-05-17 cycle 5: chose option (b). Implemented per-seed clan rotation directly in
tools/autoplay-batch.sh::_run_local: readsAI_PIN_PERSONALITY_P{0..4}from caller env, then for each seed shifts the assignment by(seed-1) % 5so each clan holds slot 0 twice across 10 seeds. Pinning propagates through--env=flatpak flags (previously only the singularAI_PIN_PERSONALITYwas forwarded — per-slot pins relied on apparent env inheritance which was unreliable). Can be disabled viaAI_PIN_ROTATION=offfor deterministic-pin testing. Commit7105a14de.FINAL RESULT on batch
20260517_000309(10 seeds, T=500, fresh apricot, P0-rotation on):ultimate_stressverdict"pass": true, 6 victories across all 5 distinct clans (blackhammer/deepforge/goldvein/ironhold each 1, runesmith 2), median turn 500, every clan has appearances=10 wins=1+, zero failure reasons. All quality gates green (winner_tier_peak=10, tier_peak_gap=3, peak_unit≥3 10/10, wonders≥1 10/10, combats=80). Gate closed. -
Path A implemented:
MAX_PLAYERSraised 4→5,AbstractPlayerStateexpanded to 72 bytes (was 64),AbstractRolloutStateto 360 bytes (was 256).force_rel[u16;5],relations[i8;5], new padding fields_pad_fr/_pad_rel. WGSLrollout.wgslupdated: new word map (18 u32 per player), extendedget_force_rel/set_force_relfor slot 4, splitrelations_0123/relations_4_padfields.BatchPriorsextended to 5 players (120 B).cargo test -p mc-ai --lib→ 237/237 green.cargo test -p mc-turn --lib→ 199/199 green. Note: GPU path (--features gpu) requires apricot rebuild; CPU parity tests all pass. Struct repack was more invasive than objective doc estimated (doc said "POD grows 256→320, no code change" — actual size is 360 bytes and both force_rel/relations needed expanding, plus WGSL full struct remap). -
p1-22parent closes: once ≥5/10 victories confirmed, flip p1-22's remaining 🟡 bullets to ✓ and set statusdone.
Non-goals
- Changing MCTS algorithm (PUCT priors stay).
- Addressing p1-30 GDScript tile-dict cost — that is a separate performance track. This objective targets the strategic decision quality gap only.
- Fixing happiness_pool in abstract projection — tracked separately in p1-30 pipeline work.
- Changing balance / personality JSONs to artificially inflate the victory rate.
Files to touch (if Path A)
src/simulator/crates/mc-ai/src/abstract_state.rs— raiseMAX_PLAYERSsrc/simulator/crates/mc-ai/shaders/rollout.wgsl— update struct layout- Test: re-run
cargo test -p mc-ai --features gpu --test gpu_walltimeon apricot
Files to touch (Path B)
tools/huge-map-5clan.sh— raise TURN_LIMIT from 300 to 500