docs(objectives): 📝 revise survival and learned controller bridge objectives documentation
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
05795cc3f3
commit
f2494b6f6f
2 changed files with 67 additions and 7 deletions
|
|
@ -255,6 +255,20 @@ Result — P1 (slot 1) state at T100, all 10 seeds:
|
|||
|
||||
**No balance/research/production code written.** The objective's blocker is now precisely characterized and is an AI-capability problem, not a balance gate. Forwarded to operator for redirection: (i) retarget p1-29d at mc-ai offensive competence (or fold into p1-29f/g), (ii) redefine the gate to be measured against a competent reference attacker (the juiced surface, accepted as the stand-in it is), or (iii) accept that fair scripted Game-1 duels do not converge by T100 and the "trailing AI" concept only applies vs a stronger opponent.
|
||||
|
||||
### Refinement (2026-06-03) — it is NOT "the AI won't attack"; the war is INDECISIVE (contact+combat probe, omniscient)
|
||||
|
||||
Adversarial check (does contact/combat even happen, or is 0/10 a no-contact map artifact?). Probed seeds 1/6/7 omniscient, tracking inter-player distance, unit losses, and city captures (evidence `.local/p1-29d-contact-evidence.txt`):
|
||||
|
||||
| seed | capdist | min inter-player dist ever | P0 unit loss | P1 unit loss | P0 lost a city | P1 lost a city |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | 12.0 | 0.0 | yes | yes | no | no |
|
||||
| 6 | 10.6 | 0.0 | yes | yes | no | **yes** |
|
||||
| 7 | 9.4 | 0.0 | yes | yes | no | **yes** |
|
||||
|
||||
So on the clean surface the two civs **DO make contact** (adjacent/same-tile, dist→0 every seed), **DO fight** (both lose units every seed), and cities **DO get captured** (P1 lost a city in s6/s7) — yet **neither is ever eliminated**: the loser refounds/recovers and both empires keep growing to 8–10+ cities by T90. The earlier "AI doesn't conduct offense" phrasing is therefore **wrong/too strong**; the accurate finding is: **the fair scripted mc-ai wages an *indecisive* war — it skirmishes and even captures, but cannot deliver a killing blow before the loser recovers.** The slot-0 juice (rush-buy + attack-phase commit + formation orders) supplied the *decisiveness / siege-conversion tempo* that turns a capture into an elimination; remove it and captures are traded, not finished. That is why the whole p1-29 family's P1-side balance levers "never moved the dial": no buff to P1 induces convergence when the *opponent* cannot close out a win.
|
||||
|
||||
**Scope discipline:** measured against the `scripted:default` tactical controller via `suggest()` (dispatch.rs:984), **corroborated** by the apricot batch's slot-1 (plain-AI) behaviour — it never eliminated the juiced P0 and survived only as an ignored zombie. Not a blanket claim about the full MCTS+tactical pipeline (the batch AI slots also run MCTS; `mcts_*` stats), which this probe did not isolate. **Caveat:** the harness map is not fully gate-faithful — `player_api_main.gd:127 gen.generate(seed_v, map_size)` ignores `map_type` (so it is NOT pangaea) and uses spaced capitals; a map-faithful confirmation still wants the `AUTO_PLAY_ALL_AI` build on pangaea. But pangaea is *more* connected than the spaced-capital harness map, so it would if anything produce *more* contact, not less — the "indecisive war, no elimination" conclusion is unlikely to flip. **The lever is mc-ai siege/assault decisiveness, not balance.**
|
||||
|
||||
## Why this exists separately from p1-29c
|
||||
|
||||
p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threatened." That work landed and is correct. The empirical failure mode is "P1 doesn't survive long enough to ACT on those priorities." That's a different code surface and a different design question — it deserves its own objective.
|
||||
|
|
|
|||
|
|
@ -2,11 +2,11 @@
|
|||
id: p1-29f
|
||||
title: "learned:* controller bridge — make the trained RL policy playable in-engine"
|
||||
priority: p1
|
||||
status: missing
|
||||
status: done
|
||||
scope: game1
|
||||
owner: simulator-infra
|
||||
tags: [ai, rl, controller, infra]
|
||||
updated_at: 2026-05-27
|
||||
updated_at: 2026-06-03
|
||||
relates_to: [p1-29e, p2-67]
|
||||
blocked_by: []
|
||||
---
|
||||
|
|
@ -38,7 +38,7 @@ a registered `learned:*` `AiController`.
|
|||
|
||||
## Acceptance
|
||||
|
||||
- [ ] **Runtime-loadable policy artifact.** Export `duel-v4-encfix-s7` from the
|
||||
- [x] **Runtime-loadable policy artifact.** Export `duel-v4-encfix-s7` from the
|
||||
SB3 `.zip` to a form the engine can evaluate without a Python runtime
|
||||
(recommended: ONNX of the policy network, loaded by `mc-ai` inference; or a
|
||||
documented sidecar process if in-proc inference is rejected). The encoder
|
||||
|
|
@ -47,19 +47,19 @@ a registered `learned:*` `AiController`.
|
|||
F1) must be reimplemented or shared so the in-engine observation matches
|
||||
training exactly — a `PlayerView` → obs-tensor mapping with a parity test
|
||||
against the Python encoder.
|
||||
- [ ] **`AiController` impl + registration.** A controller keyed
|
||||
- [x] **`AiController` impl + registration.** A controller keyed
|
||||
`learned:duel-v4-encfix-s7` implementing the `AiController` trait, registered
|
||||
via `register_controller` at engine init (alongside `scripted:default`), so
|
||||
it appears in `registered_ids()` / `GdGameState.registered_controller_ids()`.
|
||||
- [ ] **Per-slot selection from autoplay.** An env hook (e.g.
|
||||
- [x] **Per-slot selection from autoplay.** An env hook (e.g.
|
||||
`AI_CONTROLLER_P0` / `AI_CONTROLLER_P1`, or a generalised `AI_CONTROLLERS`
|
||||
list) that populates `game_settings.ai_controllers` so a batch can assign a
|
||||
learned controller to any slot. Default unchanged (`scripted:default`).
|
||||
- [ ] **Parity / determinism test.** Given a fixed `PlayerView`, the in-engine
|
||||
- [x] **Parity / determinism test.** Given a fixed `PlayerView`, the in-engine
|
||||
learned controller produces the same action distribution (argmax + top-k) as
|
||||
the Python policy on the same observation. Headless GUT-compatible or a Rust
|
||||
unit test against a recorded fixture.
|
||||
- [ ] **Smoke run.** One trained-vs-scripted autoplay game completes headless
|
||||
- [x] **Smoke run.** One trained-vs-scripted autoplay game completes headless
|
||||
with `learned:duel-v4-encfix-s7` in a slot and emits valid `turn_stats.jsonl`
|
||||
(proves the dispatch path works end-to-end — not a quality claim).
|
||||
|
||||
|
|
@ -81,3 +81,49 @@ a registered `learned:*` `AiController`.
|
|||
programmatic, one-action-at-a-time player can drive a real game; that is a
|
||||
*different* external-process surface, but the action-application plumbing may
|
||||
be reusable.
|
||||
|
||||
## Verification (landed 2026-06-03, branch `work/p1-29f`)
|
||||
|
||||
All five acceptance bullets verified end-to-end. Evidence:
|
||||
|
||||
1. **Runtime-loadable artifact + encoder parity.** ONNX at
|
||||
`public/games/age-of-dwarves/data/ai/duel-v4-encfix-s7.onnx`, loaded by
|
||||
`mc-player-api/src/learned/inference.rs` via pure-Rust `tract-onnx` (no
|
||||
Python). Encoder reimpl in `learned/encoder.rs`. `cargo test -p
|
||||
mc-player-api --test learned_parity` → `learned_encoder_parity` checks 60
|
||||
fixtures, obs float-exact (<1e-4), action mask **bit-exact** vs the Python
|
||||
encoder.
|
||||
2. **Registration in `registered_ids()`.** `register_learned_controllers()`
|
||||
fires in `MagicCivPhysicsExtension::on_level_init(InitLevel::Scene)`. A
|
||||
headless-Godot boot probe (`GdGameState.registered_controller_ids()`)
|
||||
returns `learned:duel-v4-encfix-s7, scripted:default`.
|
||||
3. **Per-slot env selection.** Implemented via the **generalised-list** form
|
||||
`CP_PLAYER_CONTROLLERS` (comma list, slot-ordered) → `set_player_controller`
|
||||
in `player_api_main.gd`. Smoke boot assigned slot 1 → `learned:…`, slot 2 →
|
||||
`scripted:default`; with no env set, slots default to `scripted:default`
|
||||
(default unchanged). *Caveat:* this stamps `players[slot].controller_id`
|
||||
directly rather than populating `game_settings.ai_controllers` (the
|
||||
game-setup-UI field); the learned controller runs only in the
|
||||
`mc_player_api` dispatch world, which is the path that actually executes the
|
||||
policy — `auto_play.gd`'s `GdAiController` path can't, since the learned
|
||||
controller's one-shot `decide_turn` is identity-only by design.
|
||||
4. **Parity / determinism.** `learned_policy_parity` → 60/60 decisive fixtures,
|
||||
argmax + top-5 ordering match the Python policy exactly, logits within 1e-3;
|
||||
`learned_decide_action_end_to_end_determinism` → 60 fixtures stable +
|
||||
argmax-correct. Full crate suite: `cargo test -p mc-player-api -p mc-ai`
|
||||
all green (265 lib + every integration bin, incl. `learned_parity` 3/3).
|
||||
5. **Trained-vs-scripted smoke.** 3-player headless game via the player-API
|
||||
harness (slot 0 external pass-driver, slot 1 `learned:duel-v4-encfix-s7`,
|
||||
slot 2 `scripted:default`), 30 turns, no crash. Learned slot **loaded its
|
||||
policy** (no `[learned] policy load failed`) and **applied real actions on 9
|
||||
of 30 turns**; per-turn stats emitted as JSONL. *Caveat:* the artifact is a
|
||||
player-API-native per-turn JSONL, not `auto_play.gd`'s
|
||||
`autoplay-result-schema.json` shape — learned controllers are architecturally
|
||||
a player-API-world feature, so `auto_play.gd` (the canonical
|
||||
`turn_stats.jsonl` emitter) cannot host one. The bullet's stated purpose —
|
||||
"proves the dispatch path works end-to-end" — is met.
|
||||
|
||||
*Landing note:* `scripts/player-api-server.sh` gained `MC_DATA_ROOT` /
|
||||
`MC_LEARNED_POLICY_PATH` flatpak `--env` passthrough — without it the in-sandbox
|
||||
`.so` cannot resolve the ONNX path and the learned slot silently passes every
|
||||
turn. Unblocks **p1-29g**.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue