From 1c47c006407824e11e78203b0e006a570d91f438 Mon Sep 17 00:00:00 2001 From: autocommit Date: Wed, 27 May 2026 23:07:16 -0700 Subject: [PATCH] =?UTF-8?q?docs(objectives):=20=F0=9F=93=9D=20Clarify=20va?= =?UTF-8?q?lidation=20criteria=20differences=20between=20trained=20and=20s?= =?UTF-8?q?cripted=20objectives?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- ...p1-29g-verify-gates-trained-vs-scripted.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 .project/objectives/p1-29g-verify-gates-trained-vs-scripted.md diff --git a/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md b/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md new file mode 100644 index 00000000..666dffdc --- /dev/null +++ b/.project/objectives/p1-29g-verify-gates-trained-vs-scripted.md @@ -0,0 +1,64 @@ +--- +id: p1-29g +title: "Re-verify Game-1 AI quality gates trained-vs-scripted (and trained-vs-trained)" +priority: p1 +status: missing +scope: game1 +owner: warcouncil +tags: [ai, rl, balance, verification] +updated_at: 2026-05-27 +relates_to: [p1-29c, p1-29d, p1-29e, p1-29f] +blocked_by: [p1-29f] +--- + +## Context + +Every Game-1 AI-quality gate to date — p1-29c (sole-city research path), +p1-29d (trailing-AI survival), p1-29a (last-stand), p1-29b (tier_peak_gap), +p0-24 (difficulty distributions) — has been measured **scripted-vs-scripted** +(`scripted:default` in both slots, MCTS rollouts). That is the product-correct +matchup for what Game 1 ships (5 scripted clan personalities), but it tells us +nothing about how those gates behave against the trained RL policy, which is ++5.5 mean reward stronger (p1-29e) and makes structurally different decisions +(production-only economy, no research action — p1-29e F1/F2). + +Two reasons this matters: +1. **Validity of the scripted gates.** p1-29c closed on the *outcome* gate + (10/10 P1_tp≥2 on current main) but p1-29e showed the lift is research-driven + main-branch drift, NOT the shipped intervention. A trained opponent is a + sharper, independent probe of whether the trailing-AI failure modes are + really resolved or just masked by scripted-vs-scripted symmetry. +2. **Asymmetry source.** In the scripted duel, P0 wins the capital rush in + 8/10 seeds purely from spawn/turn-order, not skill (both brains identical). + A trained P0 vs scripted P1 (and vice-versa) separates *positional* advantage + from *controller-strength* advantage. + +## Acceptance + +- [ ] **Trained-vs-scripted batch.** With `learned:duel-v4-encfix-s7` in slot 0 + and `scripted:default` in slot 1 (and the mirror), run the p1-29c/p1-29d + 10-seed T300 gate via `apricot-run.sh` and score with + `tools/sole-city-gate.py`. Report per-seed P0_tp/P1_tp, cities, survival. +- [ ] **Trained-vs-trained batch.** Both slots `learned:duel-v4-encfix-s7`, + same gate. Establishes the trained-vs-trained baseline the p1-29 cluster has + never had. +- [ ] **Reconcile against scripted gates.** State explicitly which p1-29c/d/a/b + conclusions hold, which flip, and which were scripted-symmetry artifacts. + Feed any divergence back to the owning objective. +- [ ] **Clean attribution for p1-29c/29e (carried over).** The HEAD-vs-HEAD+patch + before/after that p1-29e deferred — run it here on a controlled build so the + economy break-out and action-priority uplifts get a yes/no causal verdict + instead of "symptom resolved by drift". + +## Non-goals + +- Building the controller bridge — that is **p1-29f** (this is blocked on it). +- Retuning scripted heuristics based on trained-AI behaviour — that would be a + fresh divergence-mining cycle (p1-29e lineage), filed separately if findings + warrant. + +## Notes + +- The trained policy has no research action (p1-29e F1); interpret a + learned-slot `tier_peak` accordingly (it reflects engine-auto research, not + policy choice).