fix(@projects/@magic-civilization): 🐛 update objective status to partial

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-08 20:42:14 -07:00 · 2026-06-08 20:42:14 -07:00 · 18bd83bff8
commit 18bd83bff8
parent fdcf67801c
1 changed files with 17 additions and 2 deletions
--- a/.project/objectives/p1-29k.md
+++ b/.project/objectives/p1-29k.md
@ -2,7 +2,7 @@
 id: p1-29k
 title: "Drive learned:* controllers on the autoplay (auto_play.gd) gate surface"
 priority: p1
-status: missing
+status: partial
 scope: game1
 tags: [ai, rl, controller, infra, bridge]
 owner: simulator-infra
@ -12,7 +12,22 @@ updated_at: 2026-06-08

 Make the autoplay gate surface able to run a `learned:*` controller in a slot and emit canonical autoplay-schema `turn_stats.jsonl`, so the AI-quality gates (p1-29g) can score trained-vs-scripted.

-## STATUS — 2026-06-08 — BLOCKED (architecture mismatch; STOP-WAIT, owner decision required)
+## STATUS — 2026-06-09 — INFRASTRUCTURE BUILT & LANDED (path-2); residual blocker is policy-generalization (RL track), not engineering
+
+The operator chose **path 2** (full-engine faithful Rust `GameState` on the autoplay surface). That engineering is **complete and landed on main**, regression-safe throughout (main stayed green every increment: `learned_parity` 3/3, GUT 553/29 = baseline zero-new, default no-env path byte-identical):
+
+- **Inc 0** (`2aa0d8664`, `bfb9858d2`) — per-slot controller env hook (`AI_CONTROLLER_P{n}` primary + `CP_PLAYER_CONTROLLERS` alias, default `scripted:default`); `autoplay-batch.sh` passthrough.
+- **Inc 1** (`76239e6a7`, `f200d1d93`) — faithful Rust `GameState` construction in the autoplay world (grid + 3 `#[serde(skip)]` catalogs + units-with-movement) + read-only `learned_view_diagnostics` gate. Root-caused the axial↔offset coordinate bug; gate proved real vision/movement/production.
+- **Inc 2** (`83942f7af`, `aa9af168f`, `561c8322e`, `3683e08e3`) — DRY `drive_learned_slot_recording` split (player-API world unchanged), `GdGameState::run_learned_slot`, **action-replay writeback** through the GDScript dispatch (combat via the SAME `CombatResolver` as scripted slots — comparison-validity preserved). Gate: the trained policy moved a slot's units and **killed an enemy** via the shared engine. Founding writeback deferred (Inc-3 / p1-29j boundary).
+- **Inc 2.5 / 2.5b** (`c2c829c40`, `654d65cc1`, `e53b23441`) — obs-fidelity: real per-city yields (obs[8,9]), `player_tech` (tech-gated buildables), and player economy `gold`/`science_yield`/`culture_total` (obs[0,2,3]). **Every PlayerState-sourced obs dimension is now faithful** (verified live; the hardcoded-0 dims gold_per_turn[1]/happiness[6]/culture_per_turn[7] are bench-v1 zero in BOTH worlds).
+
+**Why the "oscillation vanishes" feature-gate is NOT met — and why obs-fidelity cannot meet it.** A **phase-controlled offline ONNX probe** (world removed; captured obs/mask morphed by magnitude with the mask fixed) proved the policy's failure to choose `EndTurn` is driven by **overall state-magnitude distribution shift, not any divergent field**: `EndTurn` only recovers when the *entire* state is shrunk to duel-early scale (score 240→35, units 13→4, food 37→4); no single dim flips it. The policy `duel:v4-encfix-s7` was trained on small early-game states and never saw mid-game multi-player magnitudes. This **corrects the earlier (A) "off-distribution encoding" verdict** as phase-confounded (it compared native-EARLY vs autoplay-MID); the true cause is magnitude generalization. Additionally, the artifact shows **no competent in-engine play anywhere** — passive natively (EndTurn iter 1, does nothing with movable units), oscillating in autoplay.
+
+**Conclusion: p1-29k's lane (the engine/dispatch/obs infrastructure) is DONE.** The remaining gap — a policy that generalizes beyond duel-early states — is a **training-distribution problem owned by the RL track / p1-29g**, not fixable by p1-29k snapshot plumbing. Inc-4's "trustworthy trained-vs-scripted smoke" is blocked on either (a) a retrain on realistic in-engine distributions, or (b) scoping the smoke to duel-early states only. **Operator fork pending.** Inc-3 (founding writeback) is independent and remains available but unstarted.
+
+Frontmatter `status: partial` — infra acceptance bullets met; the trustworthy-smoke bullet blocked on policy quality (RL).
+
+## STATUS — 2026-06-08 — BLOCKED (architecture mismatch; STOP-WAIT, owner decision required) [SUPERSEDED by the 2026-06-09 status above — the path-2 build resolved this architecture mismatch]

 Investigated end-to-end by simulator-infra (no code landed — Blocker Protocol). **The objective as written cannot be honestly implemented**: the autoplay world and the player-API/bench world are **disjoint state representations**, and the learned policy requires the player-API world's representation.