docs(objectives): 📝 Clarify validation criteria differences between trained and scripted objectives
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
91346a7ae9
commit
1c47c00640
1 changed files with 64 additions and 0 deletions
|
|
@ -0,0 +1,64 @@
|
|||
---
|
||||
id: p1-29g
|
||||
title: "Re-verify Game-1 AI quality gates trained-vs-scripted (and trained-vs-trained)"
|
||||
priority: p1
|
||||
status: missing
|
||||
scope: game1
|
||||
owner: warcouncil
|
||||
tags: [ai, rl, balance, verification]
|
||||
updated_at: 2026-05-27
|
||||
relates_to: [p1-29c, p1-29d, p1-29e, p1-29f]
|
||||
blocked_by: [p1-29f]
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Every Game-1 AI-quality gate to date — p1-29c (sole-city research path),
|
||||
p1-29d (trailing-AI survival), p1-29a (last-stand), p1-29b (tier_peak_gap),
|
||||
p0-24 (difficulty distributions) — has been measured **scripted-vs-scripted**
|
||||
(`scripted:default` in both slots, MCTS rollouts). That is the product-correct
|
||||
matchup for what Game 1 ships (5 scripted clan personalities), but it tells us
|
||||
nothing about how those gates behave against the trained RL policy, which is
|
||||
+5.5 mean reward stronger (p1-29e) and makes structurally different decisions
|
||||
(production-only economy, no research action — p1-29e F1/F2).
|
||||
|
||||
Two reasons this matters:
|
||||
1. **Validity of the scripted gates.** p1-29c closed on the *outcome* gate
|
||||
(10/10 P1_tp≥2 on current main) but p1-29e showed the lift is research-driven
|
||||
main-branch drift, NOT the shipped intervention. A trained opponent is a
|
||||
sharper, independent probe of whether the trailing-AI failure modes are
|
||||
really resolved or just masked by scripted-vs-scripted symmetry.
|
||||
2. **Asymmetry source.** In the scripted duel, P0 wins the capital rush in
|
||||
8/10 seeds purely from spawn/turn-order, not skill (both brains identical).
|
||||
A trained P0 vs scripted P1 (and vice-versa) separates *positional* advantage
|
||||
from *controller-strength* advantage.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- [ ] **Trained-vs-scripted batch.** With `learned:duel-v4-encfix-s7` in slot 0
|
||||
and `scripted:default` in slot 1 (and the mirror), run the p1-29c/p1-29d
|
||||
10-seed T300 gate via `apricot-run.sh` and score with
|
||||
`tools/sole-city-gate.py`. Report per-seed P0_tp/P1_tp, cities, survival.
|
||||
- [ ] **Trained-vs-trained batch.** Both slots `learned:duel-v4-encfix-s7`,
|
||||
same gate. Establishes the trained-vs-trained baseline the p1-29 cluster has
|
||||
never had.
|
||||
- [ ] **Reconcile against scripted gates.** State explicitly which p1-29c/d/a/b
|
||||
conclusions hold, which flip, and which were scripted-symmetry artifacts.
|
||||
Feed any divergence back to the owning objective.
|
||||
- [ ] **Clean attribution for p1-29c/29e (carried over).** The HEAD-vs-HEAD+patch
|
||||
before/after that p1-29e deferred — run it here on a controlled build so the
|
||||
economy break-out and action-priority uplifts get a yes/no causal verdict
|
||||
instead of "symptom resolved by drift".
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Building the controller bridge — that is **p1-29f** (this is blocked on it).
|
||||
- Retuning scripted heuristics based on trained-AI behaviour — that would be a
|
||||
fresh divergence-mining cycle (p1-29e lineage), filed separately if findings
|
||||
warrant.
|
||||
|
||||
## Notes
|
||||
|
||||
- The trained policy has no research action (p1-29e F1); interpret a
|
||||
learned-slot `tier_peak` accordingly (it reflects engine-auto research, not
|
||||
policy choice).
|
||||
Loading…
Add table
Reference in a new issue