diff --git a/.project/objectives/objectives.json b/.project/objectives/objectives.json index 86a9541f..4953e0f5 100644 --- a/.project/objectives/objectives.json +++ b/.project/objectives/objectives.json @@ -1,5 +1,5 @@ { - "generated_at": "2026-06-04T18:36:52Z", + "generated_at": "2026-06-04T18:50:09Z", "totals": { "done": 242, "in_progress": 1, diff --git a/.project/objectives/p1-29h-stateful-tactical-decisiveness.md b/.project/objectives/p1-29h-stateful-tactical-decisiveness.md index 0918b0fd..f7a70d2f 100644 --- a/.project/objectives/p1-29h-stateful-tactical-decisiveness.md +++ b/.project/objectives/p1-29h-stateful-tactical-decisiveness.md @@ -75,10 +75,12 @@ D1 gate. Either build the `AUTO_PLAY_ALL_AI` de-juiced surface (so slot 0 runs plain mc-ai and tier_peak is real) or an asymmetric harness that pits the new decisive controller against a baseline. -## Status — 🟡 partial (Wave 2, 2026-06-04) +## Status — 🟡 partial (Wave 3, 2026-06-04) -K = 3 of 6 acceptance bullets met. Phase 1 (the capability) is landed and green; -Phase 2 (the elimination measurement + non-juiced harness surface) is not. +K = 5 of 6 acceptance bullets met. Phase 1 (the capability) is landed and green. +Phase 2 built the fair gridded measurement surface and re-scored p1-29d on it +(bullets 5 + 6 ✓); the elimination dial (bullet 3) is **measured-negative** and +stays ☐ — see the Phase-2 findings below. ### Phase 1 — capability (DONE, commit 2ed93956d) @@ -108,44 +110,72 @@ Phase 2 (the elimination measurement + non-juiced harness surface) is not. --tests` clean on apricot; mc-turn 235/235, mc-player-api 126+, mc-mod-host all green. -### Phase 2 — measurement (NOT DONE) +### Phase 2 — measurement (Wave 3, 2026-06-04) -- ☐ ≥1 elimination on a fair two-`scripted:default` surface. **Not measured.** - Blocked on the surface below — no existing full-game driver exercises the - persistent path with real vision (see resume note). -- ☐ Non-juiced measurement surface (`AUTO_PLAY_ALL_AI` or asymmetric harness). - **Partial:** `tests/p1_29h_persistent_memory_seam.rs` (mc-player-api) is a - green regression guard that drives the persistent `apply_end_turn → - drive_ai_slot` seam for 60 turns, but it does NOT establish a *working* - engagement-measurement surface — its `GameState::default()` fixture is - gridless, so `compute_vision` yields an EMPTY visible set - (`sees_own_city=false`, `visible_tiles=0`) and the fog-gated projection - surfaces zero enemy cities → `want_attack` never fires. A real surface needs - a populated grid + vision. -- ☐ `p1-29d` re-scored on the new surface. **Not done** (depends on the surface). +- ☐ ≥1 elimination on a fair two-`scripted:default` surface. + **MEASURED-NEGATIVE — 0 eliminations.** This is now a real measurement, not a + blocked one. On the fair gridded surface (option b, below) over 160 turns: the + army-lock **engages** (`ever_committed=true`, `committed_with_target=true`), + 20 `Event::CityCaptured` fire, yet **0 eliminations** and both empires GROW + (final cities 15 vs 22). The discriminating signal: combined city count NEVER + dips below its starting value (`min_total_cities=2`, `city-dip-below-start= + false`) despite 20 captures, because every capture is immediately offset by a + refound (38 founds). **Verdict: the targeting lock works; the bottleneck is + capture-stickiness / refound-suppression, NOT target acquisition.** This + confirms p1-29d's "indecisive war" root cause and the spec's own + weight-insensitivity risk note empirically. The bullet stays ☐ because the + *outcome* (an elimination) did not occur — but it is no longer unknown. +- ✓ Non-juiced measurement surface built (option b — the gridded Rust harness). + `tests/p1_29h_gridded_elimination.rs` (mc-player-api): attaches a real + `flat_grid(24,24)` to the `GameState` so `compute_vision` populates + (`visible_tiles_max=138`), then loops `apply_action(EndTurn)` from a PASSIVE + ender slot so BOTH combatant slots (0+1) run the SAME scripted-default + controller through the SAME persistent `drive_ai_slot` mem::take seam — a fair + two-`scripted:default` surface (neither juiced, no MCTS, no GDScript harness + lock). `gridded_fair_surface_engages_army_lock` is a green always-on guard + asserting vision>0 AND lock-engages; `fair_scripted_duel_elimination_ + measurement` (`#[ignore]`) records the 160-turn signals. This is the surface + the prior `p1_29h_persistent_memory_seam.rs` guard could not provide (its + gridless `GameState::default()` → empty vision → lock never engaged). +- ✓ `p1-29d` re-scored on the new surface. On the fair gridded duel, D1 + convergence is **0/1 (no elimination)** — the army-lock removed the + target-acquisition gap but the decisiveness gap (capture → kill) persists. + Matches p1-29d's clean-baseline 0/10 finding; the lock is necessary but not + sufficient. (p1-29g trained-vs-scripted re-verification is downstream of a + controller that can actually close out a win, so it stays parked until the + refound-suppression follow-up lands.) -## Resume note (precise hand-off for the next wave) +## Resume note (precise hand-off for the next wave) — UPDATED Wave 3 -The persistent army-lock memory lives ONLY on the `drive_ai_slot` / -`apply_end_turn` path (the `mem::take` + write-back seam in `dispatch.rs`). -Both existing full-game drivers BYPASS it: -- `tools/p1-clean-baseline.py` advances via `client.suggest()` → - `suggest_actions`, which deliberately CLONES the memory (read-only `&GameState` - probe — must not advance the timer), so the lock never persists there. -- `mc-player-api/tests/full_game_transcript.rs` calls `run_ai_turn` directly with - a transient `TacticalMemory::default()`. +**The measurement surface now EXISTS** (`tests/p1_29h_gridded_elimination.rs`, +option b). The Phase-2 question — "does the army-lock move the elimination +dial?" — is **answered: no, on a fair symmetric surface.** The lock engages and +captures land (20 over 160t), but refounds keep pace and no empire is ever +eliminated. -To score Phase 2, build a surface that drives AI slots through `apply_end_turn` -on a state WITH a real grid + vision (so enemy cities are visible and -`want_attack` engages) — either: -- (a) the full Godot autoplay scene with an `AUTO_PLAY_ALL_AI` mode that de-juices - slot 0 and lets every slot run plain mc-ai via `end_turn`, OR -- (b) a Rust harness that builds a gridded `GameState` (mapgen) + computes vision - and loops `apply_action(EndTurn)`, counting eliminations. -Engagement on a full-visibility state is already proven by -`movement::tests::army_lock_concentrates_and_persists_across_turns`, so once a -gridded surface exists the lock will engage; the open question is whether it -moves the elimination dial (the spec's own risk note flags weight-insensitivity). +**The remaining work is NOT in the targeting lock — it demonstrably works.** The +next investigator should target **capture-stickiness / refound-suppression**, +the decisiveness gap p1-29d named: +- A captured city is held too weakly / the loser refounds too freely, so the + combined city count never trends down (measured `min_total_cities` stays at the + starting value through 20 captures). Candidate levers (Rust source-of-truth, + data-thresholds where applicable): + - raise the cost / cooldown / site-restriction on refounding after a recent + loss (mc-turn `try_found_city` / expansion-point economy), so a captured + empire cannot instantly replace the lost city; + - garrison/hold semantics so a captured city is not trivially recaptured and + flip-flopped (the 20 captures with no net city loss suggests churn, not + consolidation); + - a "press the broken empire" escalation once an opponent is reduced below N + cities, to convert a lead into a kill. +- Re-run `fair_scripted_duel_elimination_measurement` (`--ignored`) after each + lever; the gate to clear bullet 3 is `eliminations >= 1` AND + `min_total_cities < start_total_cities` (a real net city loss, not churn). + +The persistent army-lock memory still lives ONLY on the `drive_ai_slot` / +`apply_end_turn` seam; the two non-seam drivers (`tools/p1-clean-baseline.py` +suggest-clone path; `full_game_transcript.rs` transient default) remain bypasses +and are NOT the surface to measure on — use the gridded harness. ## Source-of-truth rails diff --git a/src/simulator/crates/mc-player-api/tests/p1_29h_gridded_elimination.rs b/src/simulator/crates/mc-player-api/tests/p1_29h_gridded_elimination.rs new file mode 100644 index 00000000..ea3e4a57 --- /dev/null +++ b/src/simulator/crates/mc-player-api/tests/p1_29h_gridded_elimination.rs @@ -0,0 +1,366 @@ +//! p1-29h Phase 2 — gridded, fair, two-`scripted:default` elimination surface. +//! +//! This is option (b) from the p1-29h resume note: a Rust harness that builds a +//! GRIDDED `GameState` (so `compute_vision` populates and the fog-gated tactical +//! projection surfaces enemy cities → `want_attack` can fire) and loops +//! `apply_action(EndTurn)` through the PERSISTENT `drive_ai_slot` seam, counting +//! eliminations. +//! +//! # Why a grid is the load-bearing fix +//! +//! The pre-existing seam guard (`p1_29h_persistent_memory_seam.rs`) used +//! `GameState::default()` with no grid, so `compute_vision` returned an EMPTY +//! visible set → zero enemy cities visible → the army-lock never engaged. Here +//! `state.grid = Some(flat_grid(...))`, so vision is real and the lock can +//! commit. Engagement on a full-visibility state is independently proven by +//! `mc_ai::tactical::movement::tests::army_lock_concentrates_and_persists_across_turns`. +//! +//! # Why this is a FAIR two-`scripted:default` surface +//! +//! `apply_end_turn` (dispatch.rs:368) drives EVERY slot through `drive_ai_slot` +//! except (a) the slot that called `EndTurn` and (b) env-listed external slots. +//! So we add a passive non-combatant "ender" slot that owns nothing, and call +//! `apply_action(state, ender_slot, EndTurn)`. Both combatant slots (0 + 1) then +//! run the SAME scripted-default controller through the SAME persistent-memory +//! seam — neither is juiced (no MCTS, no GDScript harness lock). This is the +//! exact symmetry the p1-29d clean baseline requires. +//! +//! # Attribution discipline (p1-29h bullet 3) +//! +//! An elimination only counts toward the bullet if the army-lock actually +//! engaged: the prior seam guard recorded `ever_committed=false` and correctly +//! refused to attribute any capture to the lock. Here we record both +//! `ever_committed` (with `locked_target=Some`) AND eliminations, and the gate +//! asserts BOTH — so a green run proves a lock-attributable elimination, not an +//! incidental one. + +use std::collections::BTreeMap; + +use mc_ai::evaluator::ScoringWeights; +use mc_player_api::action::PlayerAction; +use mc_player_api::apply_action; +use mc_trade::relation::{Relation, RelationState}; +use mc_turn::game_state::{GameState, MapUnit, PlayerState}; + +mod common; +use common::{build_runtime_units_catalog, build_building_catalog, build_unit_catalog}; + +/// Flat grid of one biome — mirrors `ai_fairness::flat_grid`. +fn flat_grid(width: i32, height: i32, biome: &str) -> mc_core::grid::GridState { + let mut g = mc_core::grid::GridState::new(width, height); + for t in &mut g.tiles { + t.biome_label_id = biome.into(); + } + g +} + +/// Aggressive militarist with a city + a warrior stack, co-located so the two +/// combatants are within tactical reach. Personality axes mirror the seam guard +/// (`aggression=9`, `grudge=8`) so commitment fires readily; the clan id stamps +/// the personality-derived commitment length (Rail 2, `ai_personalities.json`). +fn militarist( + state: &mut GameState, + city_col: i32, + city_row: i32, + clan: &str, + n_warriors: i32, +) -> u8 { + let pi = state.players.len() as u8; + let mut axes: BTreeMap = BTreeMap::new(); + axes.insert("expansion".into(), 2); + axes.insert("production".into(), 5); + axes.insert("aggression".into(), 9); + axes.insert("grudge_persistence".into(), 8); + + let units: Vec = (0..n_warriors) + .map(|i| { + let id = state.next_unit_id; + state.next_unit_id = state.next_unit_id.saturating_add(1); + let mut u = MapUnit::new( + "dwarf_warrior", + city_col + i, + city_row, + pi, + &state.units_catalog, + ); + u.id = id; + u.hp = 60; + u.max_hp = 60; + u.attack = 16; + u.defense = 2; + u + }) + .collect(); + + state.players.push(PlayerState { + player_index: pi, + gold: 60, + cities: vec![mc_city::CityState::starter()], + unit_upkeep: Vec::new(), + strategic_axes: axes, + scoring_weights: ScoringWeights::default(), + expansion_points: 0, + city_buildings: vec![Vec::new()], + city_improvements: vec![Vec::new()], + city_ecology: vec![Default::default()], + tech_state: None, + science_pool: 0, + player_tech: None, + science_yield: 0, + units, + city_positions: vec![(city_col, city_row)], + capital_position: Some((city_col, city_row)), + culture_total: 0, + culture_pool: mc_culture::CulturePool::default(), + arcane_lore_pop_deducted: false, + traded_luxuries: Default::default(), + relations: Default::default(), + strategic_ledger: Default::default(), + wonders_built: Default::default(), + explored_deposits: Default::default(), + clan_id: clan.to_string(), + promotion_offense_weight: 1.0, + promotion_defense_weight: 1.0, + promotion_mobility_weight: 1.0, + ..Default::default() + }); + pi +} + +/// A passive non-combatant "ender" slot — owns no units and no cities, so it +/// never decides anything, but ending its turn drives every OTHER slot through +/// the persistent `drive_ai_slot` seam (the fair two-`scripted:default` setup). +fn passive_ender(state: &mut GameState) -> u8 { + let pi = state.players.len() as u8; + state.players.push(PlayerState { + player_index: pi, + gold: 0, + cities: Vec::new(), + city_buildings: Vec::new(), + city_improvements: Vec::new(), + city_ecology: Vec::new(), + units: Vec::new(), + city_positions: Vec::new(), + capital_position: None, + culture_pool: mc_culture::CulturePool::default(), + scoring_weights: ScoringWeights::default(), + clan_id: "observer".into(), + ..Default::default() + }); + pi +} + +/// Build the gridded fair surface: two aggressive combatants in close contact at +/// war, plus a passive ender. Real grid → real vision. +fn build_gridded_duel(attacker_warriors: i32) -> (GameState, u8) { + let mut state = GameState::default(); + state.turn = 1; + state.units_catalog = build_runtime_units_catalog(); + state.ai_unit_catalog = build_unit_catalog(); + state.ai_building_catalog = build_building_catalog(); + state.ai_difficulty_threshold_mult = 1.0; + state.grid = Some(flat_grid(24, 24, "grassland")); + + // Two combatants 5 tiles apart on the same row — within a few turns' + // march so contact is made early. Attacker gets a heavier stack so a + // killing blow is reachable (the spec's whole point is whether the LOCK + // turns a capture into an elimination, not whether a 1:1 fight stalls). + let p0 = militarist(&mut state, 6, 12, "blackhammer", attacker_warriors); + let p1 = militarist(&mut state, 11, 12, "deepforge", 2); + let ender = passive_ender(&mut state); + + // War between the two combatants (authoritative table on players[0]). + let war = RelationState { relation: Relation::War, ..Default::default() }; + state.players[0].relations.insert((p0, p1), war.clone()); + state.players[0].relations.insert((p1, p0), war); + + (state, ender) +} + +/// Per-turn engagement record for the diagnostic recap. +#[derive(Default, Clone)] +struct Probe { + ever_committed: bool, + committed_with_target: bool, + max_visible_tiles: usize, + eliminations: usize, + eliminated_slots: Vec, + final_cities: Vec, + turns_run: u32, + /// Count of `Event::CityCaptured` emitted across the whole run. The direct + /// (a)-vs-(b) discriminator: >0 ⇒ captures DO occur (real p1-29d + /// indecisive-war pathology — captures happen, refounds prevent + /// elimination); =0 ⇒ no decisive contact (geometry/harness artifact). + captures: usize, + /// City founds across the run — sprawl signal. + founds: usize, + /// Lowest combined city count seen at any turn. A dip below the starting + /// total (2) corroborates that a city was lost (captured/razed) even if it + /// was later refounded. + min_total_cities: usize, + /// Starting combined city count (for the dip comparison). + start_total_cities: usize, +} + +/// Drive the gridded fair surface for `max_turns`, recording engagement. +fn drive(max_turns: u32) -> Probe { + use mc_player_api::wire::Event; + + let (mut state, ender) = build_gridded_duel(4); + let mut probe = Probe::default(); + let combatants = [0u8, 1u8]; + probe.start_total_cities = combatants + .iter() + .map(|&c| state.players[c as usize].cities.len()) + .sum(); + probe.min_total_cities = probe.start_total_cities; + + for _ in 0..max_turns { + // Re-assert war each loop (defensive against a peace flip). + for &(a, b) in &[(0u8, 1u8), (1u8, 0u8)] { + state + .players[0] + .relations + .entry((a, b)) + .or_insert(RelationState { relation: Relation::War, ..Default::default() }) + .relation = Relation::War; + } + + let result = apply_action(&mut state, ender, &PlayerAction::EndTurn); + if let Ok(events) = &result { + for ev in events { + match ev { + Event::CityCaptured { .. } => probe.captures += 1, + Event::CityFounded { .. } => probe.founds += 1, + _ => {} + } + } + } + probe.turns_run = state.turn; + + let total_cities: usize = combatants + .iter() + .map(|&c| state.players[c as usize].cities.len()) + .sum(); + probe.min_total_cities = probe.min_total_cities.min(total_cities); + + // Vision sanity — the whole point of the grid. + let vs = mc_vision::compute_vision(&state, &mc_vision::VisionCatalog::default(), None); + for &c in &combatants { + if let Some(pv) = vs.for_player(c) { + probe.max_visible_tiles = probe.max_visible_tiles.max(pv.visible.len()); + } + } + + // Lock engagement on either combatant (persistent path). + for &c in &combatants { + let mem = &state.players[c as usize].tactical_memory; + if mem.is_committed() { + probe.ever_committed = true; + if mem.locked_target.is_some() { + probe.committed_with_target = true; + } + } + } + + // Elimination = a combatant with zero cities. + for &c in &combatants { + if state.players[c as usize].cities.is_empty() + && !probe.eliminated_slots.contains(&c) + { + probe.eliminated_slots.push(c); + } + } + } + + probe.eliminations = probe.eliminated_slots.len(); + probe.final_cities = combatants + .iter() + .map(|&c| state.players[c as usize].cities.len()) + .collect(); + probe +} + +fn report(tag: &str, probe: &Probe) { + eprintln!( + "p1-29h {tag}: visible_tiles_max={} ever_committed={} committed_with_target={} \ + captures={} founds={} eliminations={} eliminated_slots={:?} \ + start_cities={} min_total_cities={} final_cities={:?} turns_run={}", + probe.max_visible_tiles, + probe.ever_committed, + probe.committed_with_target, + probe.captures, + probe.founds, + probe.eliminations, + probe.eliminated_slots, + probe.start_total_cities, + probe.min_total_cities, + probe.final_cities, + probe.turns_run, + ); +} + +/// Phase-1-on-the-production-seam regression guard (always runs). Proves the two +/// things p1-29h bullets 1+2 require and the gridless seam guard COULD NOT show: +/// (1) a real grid yields real vision, and (2) the army-lock engages on the +/// persistent `drive_ai_slot` seam (committed with a locked target). The +/// elimination outcome is RECORDED, never asserted — the elimination bullet is +/// measured-negative (see the ignored gate below), so encoding `>=1` here would +/// hide a known-false assertion behind `cargo test`'s skip-ignored behaviour. +#[test] +fn gridded_fair_surface_engages_army_lock() { + let probe = drive(40); + report("engage-guard (40t)", &probe); + + // (1) Real grid → real vision (the fix the whole objective hinges on). + assert!( + probe.max_visible_tiles > 0, + "gridded surface must produce non-empty vision (was {})", + probe.max_visible_tiles + ); + // (2) The army-lock engages on the production persistent seam — bullets 1+2 + // capability, now exercised by a full-game driver (not just the unit test). + assert!( + probe.ever_committed && probe.committed_with_target, + "army-lock must engage (committed with a locked target); \ + ever_committed={} committed_with_target={}", + probe.ever_committed, + probe.committed_with_target + ); + assert!(probe.turns_run >= 40, "turn loop must advance at least 40 turns"); +} + +/// Phase-2 elimination MEASUREMENT (bullet 3). NOT an assertion gate — the +/// measured result on the fair two-`scripted:default` surface is 0 eliminations +/// (the army-lock targets correctly but the heuristic war stays indecisive, +/// confirming p1-29d's root cause + the spec's own weight-insensitivity risk). +/// Records the discriminating signals (captures / min-city-dip) so the next +/// investigator can see WHETHER captures occur, then asserts only the +/// always-true surface facts so the recorded negative can never silently +/// "pass" by being skipped. +/// +/// `#[ignore]` because it runs a longer game; invoke via `--ignored`. +#[test] +#[ignore = "p1-29h Phase 2 elimination measurement (recorded, not gated); invoke via --ignored"] +fn fair_scripted_duel_elimination_measurement() { + let probe = drive(160); + report("FAIR-DUEL measurement (160t)", &probe); + + // Always-true surface facts — the run is real and the lock engaged. + assert!(probe.max_visible_tiles > 0, "fair surface must produce non-empty vision"); + assert!( + probe.ever_committed && probe.committed_with_target, + "army-lock must engage for the measurement to be meaningful" + ); + // Recorded findings (NOT asserted as pass/fail — they ARE the deliverable): + // * eliminations: the measured-negative bullet-3 result. + // * captures / min_total_cities dip: the (a) indecisive-war vs + // (b) no-contact discriminator. + eprintln!( + "p1-29h finding: lock engages, eliminations={}, captures={}, \ + city-dip-below-start={}", + probe.eliminations, + probe.captures, + probe.min_total_cities < probe.start_total_cities, + ); +}