docs(objectives): 📝 revise survival and learned controller bridge objectives documentation

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-03 04:06:43 -07:00 · 2026-06-03 04:06:43 -07:00 · f2494b6f6f
commit f2494b6f6f
parent 05795cc3f3
2 changed files with 67 additions and 7 deletions
--- a/.project/objectives/p1-29d-p1-survival.md
+++ b/.project/objectives/p1-29d-p1-survival.md
@ -255,6 +255,20 @@ Result — P1 (slot 1) state at T100, all 10 seeds:

 **No balance/research/production code written.** The objective's blocker is now precisely characterized and is an AI-capability problem, not a balance gate. Forwarded to operator for redirection: (i) retarget p1-29d at mc-ai offensive competence (or fold into p1-29f/g), (ii) redefine the gate to be measured against a competent reference attacker (the juiced surface, accepted as the stand-in it is), or (iii) accept that fair scripted Game-1 duels do not converge by T100 and the "trailing AI" concept only applies vs a stronger opponent.

+### Refinement (2026-06-03) — it is NOT "the AI won't attack"; the war is INDECISIVE (contact+combat probe, omniscient)
+
+Adversarial check (does contact/combat even happen, or is 0/10 a no-contact map artifact?). Probed seeds 1/6/7 omniscient, tracking inter-player distance, unit losses, and city captures (evidence `.local/p1-29d-contact-evidence.txt`):
+
+| seed | capdist | min inter-player dist ever | P0 unit loss | P1 unit loss | P0 lost a city | P1 lost a city |
+|---|---|---|---|---|---|---|
+| 1 | 12.0 | 0.0 | yes | yes | no | no |
+| 6 | 10.6 | 0.0 | yes | yes | no | **yes** |
+| 7 | 9.4 | 0.0 | yes | yes | no | **yes** |
+
+So on the clean surface the two civs **DO make contact** (adjacent/same-tile, dist→0 every seed), **DO fight** (both lose units every seed), and cities **DO get captured** (P1 lost a city in s6/s7) — yet **neither is ever eliminated**: the loser refounds/recovers and both empires keep growing to 8–10+ cities by T90. The earlier "AI doesn't conduct offense" phrasing is therefore **wrong/too strong**; the accurate finding is: **the fair scripted mc-ai wages an *indecisive* war — it skirmishes and even captures, but cannot deliver a killing blow before the loser recovers.** The slot-0 juice (rush-buy + attack-phase commit + formation orders) supplied the *decisiveness / siege-conversion tempo* that turns a capture into an elimination; remove it and captures are traded, not finished. That is why the whole p1-29 family's P1-side balance levers "never moved the dial": no buff to P1 induces convergence when the *opponent* cannot close out a win.
+
+**Scope discipline:** measured against the `scripted:default` tactical controller via `suggest()` (dispatch.rs:984), **corroborated** by the apricot batch's slot-1 (plain-AI) behaviour — it never eliminated the juiced P0 and survived only as an ignored zombie. Not a blanket claim about the full MCTS+tactical pipeline (the batch AI slots also run MCTS; `mcts_*` stats), which this probe did not isolate. **Caveat:** the harness map is not fully gate-faithful — `player_api_main.gd:127 gen.generate(seed_v, map_size)` ignores `map_type` (so it is NOT pangaea) and uses spaced capitals; a map-faithful confirmation still wants the `AUTO_PLAY_ALL_AI` build on pangaea. But pangaea is *more* connected than the spaced-capital harness map, so it would if anything produce *more* contact, not less — the "indecisive war, no elimination" conclusion is unlikely to flip. **The lever is mc-ai siege/assault decisiveness, not balance.**
+
 ## Why this exists separately from p1-29c

 p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threatened." That work landed and is correct. The empirical failure mode is "P1 doesn't survive long enough to ACT on those priorities." That's a different code surface and a different design question — it deserves its own objective.
--- a/.project/objectives/p1-29f-learned-controller-bridge.md
+++ b/.project/objectives/p1-29f-learned-controller-bridge.md
@ -2,11 +2,11 @@
 id: p1-29f
 title: "learned:* controller bridge — make the trained RL policy playable in-engine"
 priority: p1
-status: missing
+status: done
 scope: game1
 owner: simulator-infra
 tags: [ai, rl, controller, infra]
-updated_at: 2026-05-27
+updated_at: 2026-06-03
 relates_to: [p1-29e, p2-67]
 blocked_by: []
 ---
@ -38,7 +38,7 @@ a registered `learned:*` `AiController`.

 ## Acceptance

- [ ] **Runtime-loadable policy artifact.** Export `duel-v4-encfix-s7` from the
+- [x] **Runtime-loadable policy artifact.** Export `duel-v4-encfix-s7` from the
  SB3 `.zip` to a form the engine can evaluate without a Python runtime
  (recommended: ONNX of the policy network, loaded by `mc-ai` inference; or a
  documented sidecar process if in-proc inference is rejected). The encoder
@ -47,19 +47,19 @@ a registered `learned:*` `AiController`.
  F1) must be reimplemented or shared so the in-engine observation matches
  training exactly — a `PlayerView` → obs-tensor mapping with a parity test
  against the Python encoder.
- [ ] **`AiController` impl + registration.** A controller keyed
+- [x] **`AiController` impl + registration.** A controller keyed
  `learned:duel-v4-encfix-s7` implementing the `AiController` trait, registered
  via `register_controller` at engine init (alongside `scripted:default`), so
  it appears in `registered_ids()` / `GdGameState.registered_controller_ids()`.
- [ ] **Per-slot selection from autoplay.** An env hook (e.g.
+- [x] **Per-slot selection from autoplay.** An env hook (e.g.
  `AI_CONTROLLER_P0` / `AI_CONTROLLER_P1`, or a generalised `AI_CONTROLLERS`
  list) that populates `game_settings.ai_controllers` so a batch can assign a
  learned controller to any slot. Default unchanged (`scripted:default`).
- [ ] **Parity / determinism test.** Given a fixed `PlayerView`, the in-engine
+- [x] **Parity / determinism test.** Given a fixed `PlayerView`, the in-engine
  learned controller produces the same action distribution (argmax + top-k) as
  the Python policy on the same observation. Headless GUT-compatible or a Rust
  unit test against a recorded fixture.
- [ ] **Smoke run.** One trained-vs-scripted autoplay game completes headless
+- [x] **Smoke run.** One trained-vs-scripted autoplay game completes headless
  with `learned:duel-v4-encfix-s7` in a slot and emits valid `turn_stats.jsonl`
  (proves the dispatch path works end-to-end — not a quality claim).

@ -81,3 +81,49 @@ a registered `learned:*` `AiController`.
  programmatic, one-action-at-a-time player can drive a real game; that is a
  *different* external-process surface, but the action-application plumbing may
  be reusable.
+
+## Verification (landed 2026-06-03, branch `work/p1-29f`)
+
+All five acceptance bullets verified end-to-end. Evidence:
+
+1. **Runtime-loadable artifact + encoder parity.** ONNX at
+   `public/games/age-of-dwarves/data/ai/duel-v4-encfix-s7.onnx`, loaded by
+   `mc-player-api/src/learned/inference.rs` via pure-Rust `tract-onnx` (no
+   Python). Encoder reimpl in `learned/encoder.rs`. `cargo test -p
+   mc-player-api --test learned_parity` → `learned_encoder_parity` checks 60
+   fixtures, obs float-exact (<1e-4), action mask **bit-exact** vs the Python
+   encoder.
+2. **Registration in `registered_ids()`.** `register_learned_controllers()`
+   fires in `MagicCivPhysicsExtension::on_level_init(InitLevel::Scene)`. A
+   headless-Godot boot probe (`GdGameState.registered_controller_ids()`)
+   returns `learned:duel-v4-encfix-s7, scripted:default`.
+3. **Per-slot env selection.** Implemented via the **generalised-list** form
+   `CP_PLAYER_CONTROLLERS` (comma list, slot-ordered) → `set_player_controller`
+   in `player_api_main.gd`. Smoke boot assigned slot 1 → `learned:…`, slot 2 →
+   `scripted:default`; with no env set, slots default to `scripted:default`
+   (default unchanged). *Caveat:* this stamps `players[slot].controller_id`
+   directly rather than populating `game_settings.ai_controllers` (the
+   game-setup-UI field); the learned controller runs only in the
+   `mc_player_api` dispatch world, which is the path that actually executes the
+   policy — `auto_play.gd`'s `GdAiController` path can't, since the learned
+   controller's one-shot `decide_turn` is identity-only by design.
+4. **Parity / determinism.** `learned_policy_parity` → 60/60 decisive fixtures,
+   argmax + top-5 ordering match the Python policy exactly, logits within 1e-3;
+   `learned_decide_action_end_to_end_determinism` → 60 fixtures stable +
+   argmax-correct. Full crate suite: `cargo test -p mc-player-api -p mc-ai`
+   all green (265 lib + every integration bin, incl. `learned_parity` 3/3).
+5. **Trained-vs-scripted smoke.** 3-player headless game via the player-API
+   harness (slot 0 external pass-driver, slot 1 `learned:duel-v4-encfix-s7`,
+   slot 2 `scripted:default`), 30 turns, no crash. Learned slot **loaded its
+   policy** (no `[learned] policy load failed`) and **applied real actions on 9
+   of 30 turns**; per-turn stats emitted as JSONL. *Caveat:* the artifact is a
+   player-API-native per-turn JSONL, not `auto_play.gd`'s
+   `autoplay-result-schema.json` shape — learned controllers are architecturally
+   a player-API-world feature, so `auto_play.gd` (the canonical
+   `turn_stats.jsonl` emitter) cannot host one. The bullet's stated purpose —
+   "proves the dispatch path works end-to-end" — is met.
+
+*Landing note:* `scripts/player-api-server.sh` gained `MC_DATA_ROOT` /
+`MC_LEARNED_POLICY_PATH` flatpak `--env` passthrough — without it the in-sandbox
+`.so` cannot resolve the ONNX path and the learned slot silently passes every
+turn. Unblocks **p1-29g**.