docs(objectives): 📝 Clarify validation criteria differences between trained and scripted objectives

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
autocommit 2026-05-27 23:07:16 -07:00
parent 91346a7ae9
commit 1c47c00640

View file

@ -0,0 +1,64 @@
---
id: p1-29g
title: "Re-verify Game-1 AI quality gates trained-vs-scripted (and trained-vs-trained)"
priority: p1
status: missing
scope: game1
owner: warcouncil
tags: [ai, rl, balance, verification]
updated_at: 2026-05-27
relates_to: [p1-29c, p1-29d, p1-29e, p1-29f]
blocked_by: [p1-29f]
---
## Context
Every Game-1 AI-quality gate to date — p1-29c (sole-city research path),
p1-29d (trailing-AI survival), p1-29a (last-stand), p1-29b (tier_peak_gap),
p0-24 (difficulty distributions) — has been measured **scripted-vs-scripted**
(`scripted:default` in both slots, MCTS rollouts). That is the product-correct
matchup for what Game 1 ships (5 scripted clan personalities), but it tells us
nothing about how those gates behave against the trained RL policy, which is
+5.5 mean reward stronger (p1-29e) and makes structurally different decisions
(production-only economy, no research action — p1-29e F1/F2).
Two reasons this matters:
1. **Validity of the scripted gates.** p1-29c closed on the *outcome* gate
(10/10 P1_tp≥2 on current main) but p1-29e showed the lift is research-driven
main-branch drift, NOT the shipped intervention. A trained opponent is a
sharper, independent probe of whether the trailing-AI failure modes are
really resolved or just masked by scripted-vs-scripted symmetry.
2. **Asymmetry source.** In the scripted duel, P0 wins the capital rush in
8/10 seeds purely from spawn/turn-order, not skill (both brains identical).
A trained P0 vs scripted P1 (and vice-versa) separates *positional* advantage
from *controller-strength* advantage.
## Acceptance
- [ ] **Trained-vs-scripted batch.** With `learned:duel-v4-encfix-s7` in slot 0
and `scripted:default` in slot 1 (and the mirror), run the p1-29c/p1-29d
10-seed T300 gate via `apricot-run.sh` and score with
`tools/sole-city-gate.py`. Report per-seed P0_tp/P1_tp, cities, survival.
- [ ] **Trained-vs-trained batch.** Both slots `learned:duel-v4-encfix-s7`,
same gate. Establishes the trained-vs-trained baseline the p1-29 cluster has
never had.
- [ ] **Reconcile against scripted gates.** State explicitly which p1-29c/d/a/b
conclusions hold, which flip, and which were scripted-symmetry artifacts.
Feed any divergence back to the owning objective.
- [ ] **Clean attribution for p1-29c/29e (carried over).** The HEAD-vs-HEAD+patch
before/after that p1-29e deferred — run it here on a controlled build so the
economy break-out and action-priority uplifts get a yes/no causal verdict
instead of "symptom resolved by drift".
## Non-goals
- Building the controller bridge — that is **p1-29f** (this is blocked on it).
- Retuning scripted heuristics based on trained-AI behaviour — that would be a
fresh divergence-mining cycle (p1-29e lineage), filed separately if findings
warrant.
## Notes
- The trained policy has no research action (p1-29e F1); interpret a
learned-slot `tier_peak` accordingly (it reflects engine-auto research, not
policy choice).