diff --git a/.project/objectives/p1-29d-p1-survival.md b/.project/objectives/p1-29d-p1-survival.md index be8ebce9..d941bef4 100644 --- a/.project/objectives/p1-29d-p1-survival.md +++ b/.project/objectives/p1-29d-p1-survival.md @@ -225,6 +225,36 @@ Root-caused the surfaces before writing any fix. Verified findings: **Artifacts:** `tools/p1-clean-baseline.py` (symmetric-duel driver; correct city/elim output, tier_peak inert until a TechWeb surface exists — banner in its docstring). **Next planned step:** build `AUTO_PLAY_ALL_AI` (path a), rebuild, run the clean 2p duel batch, score D1-literal on real tier_peak data. No fix/balance code written yet (clean numbers are the prerequisite). +## Status (2026-06-03, cont.²) — CLEAN baseline run: 0/10 convergence; the juice WAS the offense (structural reframe) + +Ran a clean 10-seed baseline (`tools/p1-clean-baseline.py`, evidence `.local/clean_baseline.txt`): same 2p duel matchup as the gate (players=2, map_size=duel, map_type=pangaea, victory=domination) but **both slots driven by the real `scripted:default` controller** via the player-api harness `suggest()` — verified to route through `dispatch.rs:984 drive_controller_turn → mc_ai decide_tactical_actions`, i.e. the actual shipping mc-ai tactical pipeline, NOT a lighter heuristic. This is the gate matchup **minus the slot-0 juice**. (tier_peak is 0 here — harness has no TechWeb — but D1 does not need it: a P1 alive at T100 with ≥2 cities is unambiguously non-converged regardless of era.) + +Result — P1 (slot 1) state at T100, all 10 seeds: + +| seed | elimT | alive@100 | cities@100 | mil@100 | +|---|---|---|---|---| +| 1 | — | yes | 9 | 74 | +| 2 | — | yes | 9 | 74 | +| 3 | — | yes | 9 | 74 | +| 4 | — | yes | 9 | 72 | +| 5 | — | yes | 9 | 74 | +| 6 | — | yes | 12 | 90 | +| 7 | — | yes | 12 | 90 | +| 8 | — | yes | 9 | 74 | +| 9 | — | yes | 12 | 90 | +| 10 | — | yes | 9 | 74 | + +**D1 convergence on the clean surface: 0/10.** Zero eliminations; every seed ends with P1 a large multi-city peer (9–12 cities, 72–90 units). Seeds genuinely vary (s6/7/9 reach 12 cities, the rest 9; s4 mil=72 vs 74), ruling out a seeding artifact. + +**The structural finding (decisive, build-mooting):** on a fair surface the shipping mc-ai **does not conduct decisive offense** — both sides expand to fill the map and stand off; neither finishes the other. The convergence the gate measured on the apricot batch came **entirely from the slot-0 juice** (`auto_play.gd` rush-buy + **attack-phase commit + formation orders**) — i.e. *the juice WAS the offense*. The "trailing AI" only stalled/died because it faced an artificially-competent attacker the real game does not field. p1-29a/b/c/d/e all reported "balance levers never moved the dial" for exactly this reason (independently noted as Finding A). + +**Implications:** +- The planned `AUTO_PLAY_ALL_AI` build (de-juice slot 0, rebuild, apricot batch) is **moot for raising D1** — it would faithfully reproduce this same 0/10, just with real tier_peak attached. (It remains the correct surface *if* a future competent attacker exists and we want to re-measure.) +- The four constraints (real convergence ∧ literal tp≤1 ∧ clean surface ∧ no bypass) are **jointly unsatisfiable by balance/research/production tuning.** No tuning of P1 makes a non-attacking opponent finish it; suppressing P1's research (the only lever that yields tp≤1) is a metric-input bypass that changes no game state. +- The real lever is **mc-ai offensive competence** — decisive assault / siege commitment in the tactical layer (currently supplied only by the GDScript autoplay juice), or the learned-controller track (p1-29f `missing` → p1-29g). This is the AI-quality reframe p1-29g already anticipated ("separate positional from controller-strength advantage"). + +**No balance/research/production code written.** The objective's blocker is now precisely characterized and is an AI-capability problem, not a balance gate. Forwarded to operator for redirection: (i) retarget p1-29d at mc-ai offensive competence (or fold into p1-29f/g), (ii) redefine the gate to be measured against a competent reference attacker (the juiced surface, accepted as the stand-in it is), or (iii) accept that fair scripted Game-1 duels do not converge by T100 and the "trailing AI" concept only applies vs a stronger opponent. + ## Why this exists separately from p1-29c p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threatened." That work landed and is correct. The empirical failure mode is "P1 doesn't survive long enough to ACT on those priorities." That's a different code surface and a different design question — it deserves its own objective.