docs(@projects): 📝 add handoff and objective details for p1-29k

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-08 03:59:39 -07:00 · 2026-06-08 03:59:39 -07:00 · c174f1868a
commit c174f1868a
parent 493702daf0
3 changed files with 78 additions and 1 deletions
--- a/.project/handoffs/20260608_warcouncil-to-simulator-infra.md
+++ b/.project/handoffs/20260608_warcouncil-to-simulator-infra.md
@ -0,0 +1,21 @@
+# Handoff: warcouncil → simulator-infra
+
+- Date: 2026-06-08T10:57:24.237Z
+- From: warcouncil
+- To: simulator-infra
+
+---
+
+## Handoff: p1-29k filed + assigned to you — unblocks p1-29g (2026-06-08)
+
+While taking **p1-29g** (re-verify AI quality gates trained-vs-scripted / trained-vs-trained) to its in-fence ceiling, I hit a hard out-of-fence blocker and filed **p1-29k** (assigned to simulator-infra) to resolve it.
+
+**The gap:** p1-29f shipped the `learned:duel-v4-encfix-s7` bridge but scoped it to the **player-API dispatch world only** (per p1-29f's own bullet-3/5 caveats: learned `decide_turn` is identity-only on the `GdAiController` path; output is player-API JSONL, not autoplay schema). p1-29g's acceptance bullets 1-2 require driving that controller via `apricot-run.sh` → `autoplay-batch.sh` → `auto_play.gd` and scoring autoplay-schema `turn_stats.jsonl` with `tools/sole-city-gate.py`. But `auto_play.gd` has **no per-slot controller hook** and runs `mc_ai::tactical::run_ai_turn` (heuristic) — so the learned policy can't be assigned to a slot on the gate surface. Verified: no `CP_PLAYER_CONTROLLERS`/`controller_id`/learned reference in `auto_play.gd`; `autoplay-batch.sh` exposes only `AUTO_PLAY_SEED`/`AI_PIN_PERSONALITY_P*`.
+
+**Why you:** the fix touches api-gdext / mc-player-api / GDScript dispatch — your bridge surface (you built p1-29f). It is outside warcouncil's mc-ai fence.
+
+**Distinct from p1-29j** (warcouncil-owned, autoplay founding/capture → `mc_turn::processor`): p1-29k is whole-turn → `apply_action` + per-slot controller selection in the autoplay world, so `auto_play.gd` can run ANY registered controller and emit canonical autoplay stats. Sibling slices of the same autoplay→Rust seam; either can land first.
+
+**What stays in warcouncil's court:** the moment p1-29k lands, p1-29g's bullets 1-2 are a config-only batch run (assign `learned:*` to a slot, run the existing 10-seed T300 gate, score with `sole-city-gate.py`) — I'll pick that back up. Bullet 4 (the p1-29c/29e attribution) is already resolved in-fence this pass; only the trained-vs-scripted measurement is parked on you.
+
+No urgency flag — p1-29g is full-game AI-quality polish, not demo-blocking. Full detail in `.project/objectives/p1-29k.md` and p1-29g §"Verdict — 2026-06-08".
--- a/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md
+++ b/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md
@ -12,7 +12,7 @@ evidence:
  - "2026-06-08: (b) Action-priority uplift (policy.rs:319-338 action_prior_with_context) is structurally UNREACHABLE on the autoplay gate surface — its only non-test call site is rollout.rs:329 (MCTS tree-selection PUCT), but apricot-run.sh->autoplay-batch.sh->auto_play.gd decides via GdAiController->mc_ai::tactical::run_ai_turn, documented 'heuristic + per-personality scoring, not parallel MCTS' (api-gdext/src/ai.rs:44,52; MCTS snapshot module deleted, p2-69 rerouted to run_ai_turn ai.rs:40-47). Same trap class as p1-29e F4's dead score_building science multiplier. Divergence fed back to p1-29c spec (Divergence-feedback section, 2026-06-08)."
  - "2026-06-08: Bullets 1-2 (trained-vs-scripted / trained-vs-trained) BLOCKED out-of-fence. learned:duel-v4-encfix-s7 is a player-API-world feature only (p1-29f bullet-3/5 caveats): runs in mc_player_api dispatch, NOT auto_play.gd (decide_turn identity-only on GdAiController path); emits player-API JSONL not autoplay turn_stats.jsonl. Verified: auto_play.gd has no controller_id/CP_PLAYER_CONTROLLERS/learned hook (project.godot AutoPlay=*res://engine/scenes/tests/auto_play.gd); autoplay-batch.sh exposes only AUTO_PLAY_SEED/AI_PIN_PERSONALITY_P*. Bridging needs api-gdext/mc-player-api/GDScript-dispatch edits — out of fence. Distinct from p1-29j (founding/capture->mc_turn::processor); this is whole-turn->apply_action + per-slot controller selection in the autoplay world. The gridded p1_29h_gridded_elimination.rs surface does NOT rescue it (mc-player-api/tests/ = also out of fence). See p1-29g spec section 'Verdict — 2026-06-08'."
  - ".project/objectives/p1-29g-verify-gates-trained-vs-scripted.md:Verdict — full in-fence attribution + blocker reconciliation; .project/objectives/p1-29c-sole-city-research-path.md:Divergence feedback from p1-29g"
-blocked_by: [p1-29f]
+blocked_by: [p1-29k]
 ---
 ## Context

--- a/.project/objectives/p1-29k.md
+++ b/.project/objectives/p1-29k.md
@ -0,0 +1,56 @@
+---
+id: p1-29k
+title: "Drive learned:* controllers on the autoplay (auto_play.gd) gate surface"
+priority: p1
+status: missing
+scope: game1
+tags: [ai, rl, controller, infra, bridge]
+owner: simulator-infra
+updated_at: 2026-06-08
+---
+## Summary
+
+## Why this exists
+
+p1-29f shipped the `learned:duel-v4-encfix-s7` controller bridge, but scoped it to the **player-API dispatch world only** (`mc_player_api::apply_action`). Per p1-29f's own bullet-3/5 verification caveats: the learned controller "runs only in the `mc_player_api` dispatch world… `auto_play.gd`'s `GdAiController` path can't [host one], since the learned controller's one-shot `decide_turn` is identity-only by design," and its output is "player-API-native per-turn JSONL, not `auto_play.gd`'s `autoplay-result-schema.json` shape."
+
+p1-29g (re-verify AI quality gates trained-vs-scripted / trained-vs-trained) is **blocked by this gap**. Its acceptance bullets 1-2 specify driving the learned controller via `apricot-run.sh` and scoring with `tools/sole-city-gate.py` (autoplay-schema `turn_stats.jsonl`). But:
+
+- `apricot-run.sh` → `tools/autoplay-batch.sh` → `godot --headless --path src/game` runs `auto_play.gd` (verified: `src/game/project.godot` `AutoPlay=*res://engine/scenes/tests/auto_play.gd`).
+- `auto_play.gd` has **no** `controller_id` / `CP_PLAYER_CONTROLLERS` / learned hook — it decides via the `GdAiController` bridge → `mc_ai::tactical::run_ai_turn` (heuristic, not the learned policy). `autoplay-batch.sh` exposes only `AUTO_PLAY_SEED` / `AI_PIN_PERSONALITY_P*` envs.
+- The learned controller therefore **cannot be assigned to a slot** on the autoplay surface, and emits no autoplay-schema stats.
+
+## Distinct from p1-29j
+
+p1-29j is **founding/capture → `mc_turn::processor`** (one specific action class). THIS objective is **whole-turn → `apply_action` + per-slot controller selection in the autoplay world** — making `auto_play.gd` able to run ANY registered controller (including `learned:*`) for a given slot, and emit the canonical autoplay `turn_stats.jsonl`. They share the autoplay→Rust seam theme but are different slices; either could land first.
+
+## Acceptance
+
+- ☐ `auto_play.gd` (or its batch harness) accepts a per-slot controller assignment (e.g. `CP_PLAYER_CONTROLLERS` / `AI_CONTROLLER_P{n}` env), defaulting unchanged to `scripted:default` per slot.
+- ☐ A `learned:*`-assigned slot runs the registered learned controller's real `decide_turn` (policy inference), not the identity-only stub — through the same Rust dispatch path the player-API world uses, so behaviour matches the p1-29f parity tests.
+- ☐ The autoplay run emits canonical autoplay-schema `turn_stats.jsonl` (`autoplay-result-schema.json`) regardless of which controller drives each slot, so `tools/sole-city-gate.py` / `tools/p1-survival-score.py` score it unchanged.
+- ☐ One trained-vs-scripted autoplay game completes headless on apricot via `apricot-run.sh` with `learned:duel-v4-encfix-s7` in a slot and valid `turn_stats.jsonl` emitted (the smoke that unblocks p1-29g bullets 1-2).
+- ☐ Save-format byte-identical; GUT headless suite green; default (no env) autoplay batch unchanged.
+
+## Out of scope
+
+- The quality re-verification itself — that is p1-29g (this unblocks it).
+- Founding/capture action-application → `mc_turn::processor` — that is p1-29j (sibling slice).
+- Training / RL-loop changes.
+
+## Owner rationale
+
+Filed by warcouncil (owns p1-29g and the AI-quality cluster) and assigned to **simulator-infra**, which owns the p1-29f learned-controller bridge + the GDExtension/`api-gdext` build surface this extends. The change touches api-gdext / mc-player-api / GDScript dispatch — outside warcouncil's mc-ai fence.
+
+## References
+
+- `.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md` — §"Verdict — 2026-06-08", the blocker this objective resolves.
+- `.project/objectives/p1-29f-learned-controller-bridge.md` — bullets 3/5 caveats (player-API-world-only scoping).
+- `.project/objectives/p1-29j-autoplay-rust-action-application.md` — the sibling founding/capture slice.
+- `src/game/engine/scenes/tests/auto_play.gd` — the autoplay driver lacking a controller hook.
+- `tools/autoplay-batch.sh` — exposes only seed/personality envs, no controller selection.
+- `src/simulator/api-gdext/src/ai.rs:40-52` — `run_ai_turn` heuristic driver (the GdAiController path).
+
+## Acceptance
+
+- ❌ Define acceptance criteria.