From c174f1868ad9ef833bdbfe65932eba34847041ba Mon Sep 17 00:00:00 2001 From: Natalie Date: Mon, 8 Jun 2026 03:59:39 -0700 Subject: [PATCH] =?UTF-8?q?docs(@projects):=20=F0=9F=93=9D=20add=20handoff?= =?UTF-8?q?=20and=20objective=20details=20for=20p1-29k?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../20260608_warcouncil-to-simulator-infra.md | 21 +++++++ ...p1-29g-verify-gates-trained-vs-scripted.md | 2 +- .project/objectives/p1-29k.md | 56 +++++++++++++++++++ 3 files changed, 78 insertions(+), 1 deletion(-) create mode 100644 .project/handoffs/20260608_warcouncil-to-simulator-infra.md create mode 100644 .project/objectives/p1-29k.md diff --git a/.project/handoffs/20260608_warcouncil-to-simulator-infra.md b/.project/handoffs/20260608_warcouncil-to-simulator-infra.md new file mode 100644 index 00000000..4d808917 --- /dev/null +++ b/.project/handoffs/20260608_warcouncil-to-simulator-infra.md @@ -0,0 +1,21 @@ +# Handoff: warcouncil → simulator-infra + +- Date: 2026-06-08T10:57:24.237Z +- From: warcouncil +- To: simulator-infra + +--- + +## Handoff: p1-29k filed + assigned to you — unblocks p1-29g (2026-06-08) + +While taking **p1-29g** (re-verify AI quality gates trained-vs-scripted / trained-vs-trained) to its in-fence ceiling, I hit a hard out-of-fence blocker and filed **p1-29k** (assigned to simulator-infra) to resolve it. + +**The gap:** p1-29f shipped the `learned:duel-v4-encfix-s7` bridge but scoped it to the **player-API dispatch world only** (per p1-29f's own bullet-3/5 caveats: learned `decide_turn` is identity-only on the `GdAiController` path; output is player-API JSONL, not autoplay schema). p1-29g's acceptance bullets 1-2 require driving that controller via `apricot-run.sh` → `autoplay-batch.sh` → `auto_play.gd` and scoring autoplay-schema `turn_stats.jsonl` with `tools/sole-city-gate.py`. But `auto_play.gd` has **no per-slot controller hook** and runs `mc_ai::tactical::run_ai_turn` (heuristic) — so the learned policy can't be assigned to a slot on the gate surface. Verified: no `CP_PLAYER_CONTROLLERS`/`controller_id`/learned reference in `auto_play.gd`; `autoplay-batch.sh` exposes only `AUTO_PLAY_SEED`/`AI_PIN_PERSONALITY_P*`. + +**Why you:** the fix touches api-gdext / mc-player-api / GDScript dispatch — your bridge surface (you built p1-29f). It is outside warcouncil's mc-ai fence. + +**Distinct from p1-29j** (warcouncil-owned, autoplay founding/capture → `mc_turn::processor`): p1-29k is whole-turn → `apply_action` + per-slot controller selection in the autoplay world, so `auto_play.gd` can run ANY registered controller and emit canonical autoplay stats. Sibling slices of the same autoplay→Rust seam; either can land first. + +**What stays in warcouncil's court:** the moment p1-29k lands, p1-29g's bullets 1-2 are a config-only batch run (assign `learned:*` to a slot, run the existing 10-seed T300 gate, score with `sole-city-gate.py`) — I'll pick that back up. Bullet 4 (the p1-29c/29e attribution) is already resolved in-fence this pass; only the trained-vs-scripted measurement is parked on you. + +No urgency flag — p1-29g is full-game AI-quality polish, not demo-blocking. Full detail in `.project/objectives/p1-29k.md` and p1-29g §"Verdict — 2026-06-08". diff --git a/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md b/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md index 4cfa3331..1eb482d9 100644 --- a/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md +++ b/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md @@ -12,7 +12,7 @@ evidence: - "2026-06-08: (b) Action-priority uplift (policy.rs:319-338 action_prior_with_context) is structurally UNREACHABLE on the autoplay gate surface — its only non-test call site is rollout.rs:329 (MCTS tree-selection PUCT), but apricot-run.sh->autoplay-batch.sh->auto_play.gd decides via GdAiController->mc_ai::tactical::run_ai_turn, documented 'heuristic + per-personality scoring, not parallel MCTS' (api-gdext/src/ai.rs:44,52; MCTS snapshot module deleted, p2-69 rerouted to run_ai_turn ai.rs:40-47). Same trap class as p1-29e F4's dead score_building science multiplier. Divergence fed back to p1-29c spec (Divergence-feedback section, 2026-06-08)." - "2026-06-08: Bullets 1-2 (trained-vs-scripted / trained-vs-trained) BLOCKED out-of-fence. learned:duel-v4-encfix-s7 is a player-API-world feature only (p1-29f bullet-3/5 caveats): runs in mc_player_api dispatch, NOT auto_play.gd (decide_turn identity-only on GdAiController path); emits player-API JSONL not autoplay turn_stats.jsonl. Verified: auto_play.gd has no controller_id/CP_PLAYER_CONTROLLERS/learned hook (project.godot AutoPlay=*res://engine/scenes/tests/auto_play.gd); autoplay-batch.sh exposes only AUTO_PLAY_SEED/AI_PIN_PERSONALITY_P*. Bridging needs api-gdext/mc-player-api/GDScript-dispatch edits — out of fence. Distinct from p1-29j (founding/capture->mc_turn::processor); this is whole-turn->apply_action + per-slot controller selection in the autoplay world. The gridded p1_29h_gridded_elimination.rs surface does NOT rescue it (mc-player-api/tests/ = also out of fence). See p1-29g spec section 'Verdict — 2026-06-08'." - ".project/objectives/p1-29g-verify-gates-trained-vs-scripted.md:Verdict — full in-fence attribution + blocker reconciliation; .project/objectives/p1-29c-sole-city-research-path.md:Divergence feedback from p1-29g" -blocked_by: [p1-29f] +blocked_by: [p1-29k] --- ## Context diff --git a/.project/objectives/p1-29k.md b/.project/objectives/p1-29k.md new file mode 100644 index 00000000..67114f54 --- /dev/null +++ b/.project/objectives/p1-29k.md @@ -0,0 +1,56 @@ +--- +id: p1-29k +title: "Drive learned:* controllers on the autoplay (auto_play.gd) gate surface" +priority: p1 +status: missing +scope: game1 +tags: [ai, rl, controller, infra, bridge] +owner: simulator-infra +updated_at: 2026-06-08 +--- +## Summary + +## Why this exists + +p1-29f shipped the `learned:duel-v4-encfix-s7` controller bridge, but scoped it to the **player-API dispatch world only** (`mc_player_api::apply_action`). Per p1-29f's own bullet-3/5 verification caveats: the learned controller "runs only in the `mc_player_api` dispatch world… `auto_play.gd`'s `GdAiController` path can't [host one], since the learned controller's one-shot `decide_turn` is identity-only by design," and its output is "player-API-native per-turn JSONL, not `auto_play.gd`'s `autoplay-result-schema.json` shape." + +p1-29g (re-verify AI quality gates trained-vs-scripted / trained-vs-trained) is **blocked by this gap**. Its acceptance bullets 1-2 specify driving the learned controller via `apricot-run.sh` and scoring with `tools/sole-city-gate.py` (autoplay-schema `turn_stats.jsonl`). But: + +- `apricot-run.sh` → `tools/autoplay-batch.sh` → `godot --headless --path src/game` runs `auto_play.gd` (verified: `src/game/project.godot` `AutoPlay=*res://engine/scenes/tests/auto_play.gd`). +- `auto_play.gd` has **no** `controller_id` / `CP_PLAYER_CONTROLLERS` / learned hook — it decides via the `GdAiController` bridge → `mc_ai::tactical::run_ai_turn` (heuristic, not the learned policy). `autoplay-batch.sh` exposes only `AUTO_PLAY_SEED` / `AI_PIN_PERSONALITY_P*` envs. +- The learned controller therefore **cannot be assigned to a slot** on the autoplay surface, and emits no autoplay-schema stats. + +## Distinct from p1-29j + +p1-29j is **founding/capture → `mc_turn::processor`** (one specific action class). THIS objective is **whole-turn → `apply_action` + per-slot controller selection in the autoplay world** — making `auto_play.gd` able to run ANY registered controller (including `learned:*`) for a given slot, and emit the canonical autoplay `turn_stats.jsonl`. They share the autoplay→Rust seam theme but are different slices; either could land first. + +## Acceptance + +- ☐ `auto_play.gd` (or its batch harness) accepts a per-slot controller assignment (e.g. `CP_PLAYER_CONTROLLERS` / `AI_CONTROLLER_P{n}` env), defaulting unchanged to `scripted:default` per slot. +- ☐ A `learned:*`-assigned slot runs the registered learned controller's real `decide_turn` (policy inference), not the identity-only stub — through the same Rust dispatch path the player-API world uses, so behaviour matches the p1-29f parity tests. +- ☐ The autoplay run emits canonical autoplay-schema `turn_stats.jsonl` (`autoplay-result-schema.json`) regardless of which controller drives each slot, so `tools/sole-city-gate.py` / `tools/p1-survival-score.py` score it unchanged. +- ☐ One trained-vs-scripted autoplay game completes headless on apricot via `apricot-run.sh` with `learned:duel-v4-encfix-s7` in a slot and valid `turn_stats.jsonl` emitted (the smoke that unblocks p1-29g bullets 1-2). +- ☐ Save-format byte-identical; GUT headless suite green; default (no env) autoplay batch unchanged. + +## Out of scope + +- The quality re-verification itself — that is p1-29g (this unblocks it). +- Founding/capture action-application → `mc_turn::processor` — that is p1-29j (sibling slice). +- Training / RL-loop changes. + +## Owner rationale + +Filed by warcouncil (owns p1-29g and the AI-quality cluster) and assigned to **simulator-infra**, which owns the p1-29f learned-controller bridge + the GDExtension/`api-gdext` build surface this extends. The change touches api-gdext / mc-player-api / GDScript dispatch — outside warcouncil's mc-ai fence. + +## References + +- `.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md` — §"Verdict — 2026-06-08", the blocker this objective resolves. +- `.project/objectives/p1-29f-learned-controller-bridge.md` — bullets 3/5 caveats (player-API-world-only scoping). +- `.project/objectives/p1-29j-autoplay-rust-action-application.md` — the sibling founding/capture slice. +- `src/game/engine/scenes/tests/auto_play.gd` — the autoplay driver lacking a controller hook. +- `tools/autoplay-batch.sh` — exposes only seed/personality envs, no controller selection. +- `src/simulator/api-gdext/src/ai.rs:40-52` — `run_ai_turn` heuristic driver (the GdAiController path). + +## Acceptance + +- ❌ Define acceptance criteria.