diff --git a/.project/objectives/p1-29c-sole-city-research-path.md b/.project/objectives/p1-29c-sole-city-research-path.md index 08751f35..8bacc491 100644 --- a/.project/objectives/p1-29c-sole-city-research-path.md +++ b/.project/objectives/p1-29c-sole-city-research-path.md @@ -2,16 +2,18 @@ id: p1-29c title: "Sole-city research path — lift trailing AI from tier_peak=1 to ≥2" priority: p1 -status: partial +status: done scope: game1 owner: warcouncil tags: [ai, balance, tech] -updated_at: 2026-05-14 +updated_at: 2026-05-27 evidence: - "src/simulator/crates/mc-ai/src/policy.rs — SituationalContext + PersonalityPriors::action_prior_with_context (Settle +0.40, Defend +0.20, Research +0.50)" - "src/simulator/crates/mc-ai/src/rollout.rs — TreeState::action_prior wired through SituationalContext::from_abstract (CPU MCTS only; GPU rollout walker keeps base prior to preserve gpu_rollout_parity)" - "src/simulator/crates/mc-ai/tests/sole_city_priors.rs — 9 unit tests covering both bonuses, neutral identity, additive composition, and from_abstract detection (sole-city threatened, tech-below-median, multi-city no-threat, leader no-research-uplift, no-opponents default)" - - "cargo test -p mc-ai --test sole_city_priors: 9 passed; cargo test -p mc-ai --lib: 256 passed; clan_policy_priors / clan_rollout_divergence / personality_weights / mcts_basic / last_stand_predict all pass — no regressions on prior consumers" + - "LOAD-BEARING FIX (commit 8ebebd68f, 2026-05-27): src/simulator/crates/mc-ai/src/tactical/production.rs — sole-city economy break-out. Step 0 in pick_for_city interjects ONE production/infrastructure building once a threatened sole-city AI meets a 2-defender floor (SOLE_CITY_ECON_MIN_DEFENDERS=2) and holds < SOLE_CITY_ECON_TARGET=2 buildings, escaping the step-1 perpetual-military loop that left P1 building ZERO buildings in 10/10 p1-29d seeds. Gated on the live `sole_city_threatened = city_count==1 && threatened` var (production.rs:301), so multi-city/unthreatened players are untouched. 4 new unit tests (builds production when threatened, stops at target, requires min defenders, multi-city unaffected). The RL probe behind it (p1-29e divergence mining) found the trained policy's winning lever is a *production* building (forge), never a science building — so the prior p1-29b science uplift was misdirected AND unreachable; this break-out is the mechanism that finally moves P1 off tier_peak=1." + - "cargo test -p mc-ai --lib: 265 passed (incl. 4 econ break-out tests); sole_city_priors 9 passed — no regressions on prior consumers" + - "GATE PASS — apricot batch 20260527_213814 (smoke, 10 seeds, T300, builds origin/main carrying 8ebebd68f, scored by tools/sole-city-gate.py): 10/10 alive-aware seeds with P0_tp≥2 AND P1_tp≥2 (P1 tier_peak range 2–6), median game length 89 ≤ 384. Per-seed P1_tp: s1=2 s2=2 s3=3 s4=3 s5=5 s6=2 s7=5 s8=2 s9=6 s10=2. Identical turn counts/outcomes to the p1-29d 0/10 baseline (same deterministic scenarios) — the ONLY delta is P1's mid-game tier lifting 1→≥2 via the economy break-out. NOTE: tier_peak is a high-water mark; P1 still loses its capital in 8/10 seeds (cities=0, lost=1) before T300 — surviving to end-game is p1-29d's combat-balance domain, NOT this objective's gate. This objective's scope is the research/economy *path*, and P1 now reaches tier 2–6 before elimination." --- ## Summary @@ -30,7 +32,7 @@ Cycle-48's three fix sites (evaluator `tech_weight_mult=2.5`, rollout `tech_coef ## Acceptance criteria -- [ ] **Sole-city tier-2 reach in 10-seed batch**: ≥ 7/10 alive-aware seeds (both `p0_tp ≥ 2 AND p1_tp ≥ 2`) in `autoplay_batch_p1_29c`. **Measured 2026-05-15 on apricot batch `20260515_215705` (10/10 games produced complete turn_stats — infrastructure clean): 0/10 PASS.** Per-seed: P0 `tier_peak` ranges 2–10 (healthy); P1 `tier_peak` = 1 in ALL 10 seeds; P1 cities at end-game: 0 in 8 seeds, 1 in 2 seeds. P1 is being eliminated or stalled before reaching tier 2. The current `+0.40 Settle / +0.20 Defend / +0.50 Research` uplifts in `SituationalContext` are not strong enough OR P1's structural bottleneck is upstream of action priority (combat survival, not research priority). Recommend filing a separate "P1 survival" objective targeting the actual failure mode (mass elimination before T100) rather than re-tuning sole-city priors. +- [x] **Sole-city tier-2 reach in 10-seed batch**: ≥ 7/10 alive-aware seeds (both `p0_tp ≥ 2 AND p1_tp ≥ 2`). **CLOSED 2026-05-27 on apricot batch `20260527_213814` (smoke, 10 seeds, T300, origin/main @ 8ebebd68f): 10/10 PASS** (P1 `tier_peak` 2–6, median length 89 ≤ 384). Earlier history: 2026-05-15 batch `20260515_215705` was 0/10 (P1_tp=1 in all seeds) — the `+0.40 Settle / +0.20 Defend / +0.50 Research` action-priority uplifts could not move it because the trailing AI never escaped the step-1 perpetual-military production loop to build *anything*. The fix was the **sole-city economy break-out** in `tactical/production.rs` (commit 8ebebd68f, evidence above), driven by the p1-29e RL probe finding that the winning lever is a production building, not science. P1 now reaches tier 2 before elimination in every seed. (P1 still loses its capital in 8/10 seeds — end-game *survival* is owned by p1-29d, not this gate.) - [x] **Tactical action priority — "found second city when sole-city and threatened"**: when `city_count == 1 && threat_level > 0.4 && available_founder_action`, raise founder-action prior. Implemented as `PersonalityPriors::action_prior_with_context` with `+0.40` Settle and `+0.20` Defend uplift under `SituationalContext { sole_city_threatened: true, .. }`. Threat signal derived from any opponent's `force_rel[me] ≥ 1` (u16 unit-count proxy adapted from the spec's `> 0.4` ratio because `force_rel` is not a [0,1] ratio). Wired into MCTS tree-selection via `TreeState::action_prior` impl on `GameRolloutState`. Evidence: `policy.rs::action_prior_with_context`, `rollout.rs::TreeState::action_prior`.