docs(objectives): 📝 Update survival objective documentation with baseline run status and findings for P1-29D-P1
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
a5cf1073bd
commit
f87d9ef52a
1 changed files with 30 additions and 0 deletions
|
|
@ -225,6 +225,36 @@ Root-caused the surfaces before writing any fix. Verified findings:
|
|||
|
||||
**Artifacts:** `tools/p1-clean-baseline.py` (symmetric-duel driver; correct city/elim output, tier_peak inert until a TechWeb surface exists — banner in its docstring). **Next planned step:** build `AUTO_PLAY_ALL_AI` (path a), rebuild, run the clean 2p duel batch, score D1-literal on real tier_peak data. No fix/balance code written yet (clean numbers are the prerequisite).
|
||||
|
||||
## Status (2026-06-03, cont.²) — CLEAN baseline run: 0/10 convergence; the juice WAS the offense (structural reframe)
|
||||
|
||||
Ran a clean 10-seed baseline (`tools/p1-clean-baseline.py`, evidence `.local/clean_baseline.txt`): same 2p duel matchup as the gate (players=2, map_size=duel, map_type=pangaea, victory=domination) but **both slots driven by the real `scripted:default` controller** via the player-api harness `suggest()` — verified to route through `dispatch.rs:984 drive_controller_turn → mc_ai decide_tactical_actions`, i.e. the actual shipping mc-ai tactical pipeline, NOT a lighter heuristic. This is the gate matchup **minus the slot-0 juice**. (tier_peak is 0 here — harness has no TechWeb — but D1 does not need it: a P1 alive at T100 with ≥2 cities is unambiguously non-converged regardless of era.)
|
||||
|
||||
Result — P1 (slot 1) state at T100, all 10 seeds:
|
||||
|
||||
| seed | elimT | alive@100 | cities@100 | mil@100 |
|
||||
|---|---|---|---|---|
|
||||
| 1 | — | yes | 9 | 74 |
|
||||
| 2 | — | yes | 9 | 74 |
|
||||
| 3 | — | yes | 9 | 74 |
|
||||
| 4 | — | yes | 9 | 72 |
|
||||
| 5 | — | yes | 9 | 74 |
|
||||
| 6 | — | yes | 12 | 90 |
|
||||
| 7 | — | yes | 12 | 90 |
|
||||
| 8 | — | yes | 9 | 74 |
|
||||
| 9 | — | yes | 12 | 90 |
|
||||
| 10 | — | yes | 9 | 74 |
|
||||
|
||||
**D1 convergence on the clean surface: 0/10.** Zero eliminations; every seed ends with P1 a large multi-city peer (9–12 cities, 72–90 units). Seeds genuinely vary (s6/7/9 reach 12 cities, the rest 9; s4 mil=72 vs 74), ruling out a seeding artifact.
|
||||
|
||||
**The structural finding (decisive, build-mooting):** on a fair surface the shipping mc-ai **does not conduct decisive offense** — both sides expand to fill the map and stand off; neither finishes the other. The convergence the gate measured on the apricot batch came **entirely from the slot-0 juice** (`auto_play.gd` rush-buy + **attack-phase commit + formation orders**) — i.e. *the juice WAS the offense*. The "trailing AI" only stalled/died because it faced an artificially-competent attacker the real game does not field. p1-29a/b/c/d/e all reported "balance levers never moved the dial" for exactly this reason (independently noted as Finding A).
|
||||
|
||||
**Implications:**
|
||||
- The planned `AUTO_PLAY_ALL_AI` build (de-juice slot 0, rebuild, apricot batch) is **moot for raising D1** — it would faithfully reproduce this same 0/10, just with real tier_peak attached. (It remains the correct surface *if* a future competent attacker exists and we want to re-measure.)
|
||||
- The four constraints (real convergence ∧ literal tp≤1 ∧ clean surface ∧ no bypass) are **jointly unsatisfiable by balance/research/production tuning.** No tuning of P1 makes a non-attacking opponent finish it; suppressing P1's research (the only lever that yields tp≤1) is a metric-input bypass that changes no game state.
|
||||
- The real lever is **mc-ai offensive competence** — decisive assault / siege commitment in the tactical layer (currently supplied only by the GDScript autoplay juice), or the learned-controller track (p1-29f `missing` → p1-29g). This is the AI-quality reframe p1-29g already anticipated ("separate positional from controller-strength advantage").
|
||||
|
||||
**No balance/research/production code written.** The objective's blocker is now precisely characterized and is an AI-capability problem, not a balance gate. Forwarded to operator for redirection: (i) retarget p1-29d at mc-ai offensive competence (or fold into p1-29f/g), (ii) redefine the gate to be measured against a competent reference attacker (the juiced surface, accepted as the stand-in it is), or (iii) accept that fair scripted Game-1 duels do not converge by T100 and the "trailing AI" concept only applies vs a stronger opponent.
|
||||
|
||||
## Why this exists separately from p1-29c
|
||||
|
||||
p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threatened." That work landed and is correct. The empirical failure mode is "P1 doesn't survive long enough to ACT on those priorities." That's a different code surface and a different design question — it deserves its own objective.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue