docs(p1-29i): 📊 Full-game validation — refound lever inert on autoplay gate; do NOT author cd=5
Ran the deferred full-game validation as a controlled same-build before/after:
one GDExtension built once on apricot from pinned SHA 3d83f4781 (carries the
lever); combat_balance.json is runtime-loaded, so only cooldown_turns would
change between arms.
Pre-flight killed the batch before it ran — cd=5 is inert by construction on the
p1-29d autoplay gate surface, for two independent reasons:
1. Architectural: autoplay applies founding via GDScript dispatch_found_city,
never calling the Rust try_found_city/process_siege where the refound gate
lives (same class as process_science bypassed by GdTechWeb). Lever cannot fire.
2. Behavioral: autoplay produces terminal capital-capture eliminations, never
refound churn — no event for cooldown_turns to gate (4-seed cd=0 run shows
cities_lost 0–1 per game, all terminal; corroborated by the 10-seed
20260529_185955 table).
Arm B (cd=5) NOT run: byte-identical by logic (zero qualifying events) — a hollow
"no effect" confirmation, the inverse of the batch-attribution trap. The pre-flight
clause authorizes stopping.
Verdict: do NOT author cd=5. combat_balance.json left at default 0 (the gridded
5/9→8/9 lift is real on the gridded harness but does NOT transfer — recontextualized
as a surface mismatch, NOT retracted). p1-29h elim bullet scoped to the gridded
surface. p1-29d D1 re-pointed: no longer gated on the refound lever (it does not
unblock D1); real unblock is the autoplay→Rust action-application architecture gap
(out of fence).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4c719a073e
commit
9f1a28b4e8
3 changed files with 136 additions and 18 deletions
|
|
@ -282,9 +282,26 @@ p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threat
|
|||
- p1-29c sole-city implementation: `mc-ai/src/policy.rs::action_prior_with_context`
|
||||
- Prior diagnosis: `.project/objectives/p1-29.md` lines 124-134 ("Early-end games are intentionally-ungated elimination wins.")
|
||||
|
||||
## True state — 2026-06-04 gap analysis
|
||||
**Verified:** partial. The D1 convergence gate is empirically UNSATISFIABLE on a fair surface with current mechanics — p1-29h Phase 2 measured 0/10 eliminations (38 refounds offset 20 captures, 160t). Root cause confirmed: refound-suppression is the missing lever (not balance, not targeting).
|
||||
**Path forward:** gated on the refound-suppression lever (p1-29h/p1-29i). Once captured cities stay taken, re-score D1 on the existing gridded harness.
|
||||
**Blockers:** refound-suppression lever (new objective).
|
||||
## True state — 2026-06-04 gap analysis (UPDATED — refound lever validated full-game, does NOT unblock D1)
|
||||
**Verified:** partial. D1 remains unconverged. The earlier reading that D1 was "gated on the
|
||||
refound-suppression lever (p1-29h/p1-29i)" is now **corrected by full-game validation**: the
|
||||
refound lever was implemented (p1-29i, `CombatBalance::refound_suppression`, default off) and
|
||||
validated full-game as a controlled same-build before/after — verdict **inert by construction on
|
||||
the autoplay gate surface**. Two independent reasons (p1-29i): (1) the autoplay AI applies
|
||||
founding/capture in **GDScript** (`ai_turn_bridge_dispatch.gd:170 dispatch_found_city`), which
|
||||
NEVER calls the Rust `mc_turn::processor::try_found_city`/`process_siege` where the refound gate
|
||||
lives — same class of bypass as `process_science`→`GdTechWeb` already documented here; (2) the
|
||||
autoplay surface produces **terminal capital-capture eliminations** (`cities_lost=1` → game ends)
|
||||
or zombie survival, never the lose-then-refound churn the gridded micro-lift required — so there is
|
||||
no event for the cooldown to gate (corroborated by a 4-seed cd=0 run + this objective's own 10-seed
|
||||
`20260529_185955` table). The gridded 5/9→8/9 lift is real on the gridded harness but does NOT
|
||||
transfer. **No live JSON value authored; lever stays default 0.**
|
||||
**Path forward:** D1 is NOT unblocked by the refound lever. The real unblock is an **architecture
|
||||
change** — route autoplay action-application (founding/capture) through the Rust `mc_turn::processor`
|
||||
so data-driven combat-balance levers reach the gate surface — OR the offensive-competence /
|
||||
learned-controller reframe already documented above (p1-29f/g). Until then D1 stays unconverged on
|
||||
a fair surface (it is not movable by any data-only balance lever that lives in the bypassed Rust path).
|
||||
**Blockers:** autoplay→Rust action-application architecture gap (new objective; Rust/GDScript, out of
|
||||
fence for the data-only refound lane).
|
||||
**Demo gate:** full-game-only — AI convergence is a quality gate, not demo-critical.
|
||||
**Effort:** L (gated on the lever + re-measurement).
|
||||
**Effort:** L (gated on the architecture change + re-measurement).
|
||||
|
|
|
|||
|
|
@ -214,7 +214,29 @@ fair surface — the lock engages AND captures convert across most geometries. *
|
|||
p1-29i):** the lever's live JSON value is NOT yet authored (gridded-micro-surface validation
|
||||
only; needs the full-game 10-seed batch) and p1-29d is NOT re-scored as converged (its gate is
|
||||
the full-game scorecard, a different surface). Lever stays defaulted off; mechanism shipped.
|
||||
**Path forward:** bottleneck is refound-suppression / capture-stickiness, NOT targeting. Next lever: suppress/delay enemy refound after a city loss (or make captures sticky), then re-measure ≥1 elimination. File as new AI objective (p1-29i refound-suppression).
|
||||
|
||||
**UPDATE 2026-06-04 (p1-29i full-game validation) — elim bullet scoped, NOT retracted.** The
|
||||
elimination result above is genuinely true **on the gridded harness** (which routes founding
|
||||
through the Rust `mc_turn::processor::try_found_city` / player-api `apply_action` path, where the
|
||||
refound gate lives — and where cd demonstrably changed outcomes 5/9→8/9). It is **NOT full-game-
|
||||
validated**, and the reason is now a hard architectural fact, not just "different surface": the
|
||||
p1-29d autoplay gate applies founding/capture in **GDScript** (`ai_turn_bridge_dispatch.gd:170
|
||||
dispatch_found_city`), bypassing the Rust gate entirely, and produces terminal capital-capture
|
||||
eliminations rather than the refound churn the gridded lift relies on. So the refound lever is
|
||||
**inert by construction on the gate surface** (p1-29i full-game validation, 2026-06-04). The
|
||||
elim bullet stays ✓ **scoped to the gridded fair surface**; the bottleneck for the FULL-GAME gate
|
||||
is no longer "targeting" or "refound-suppression value" but the **autoplay→Rust action-application
|
||||
architecture gap** — route autoplay founding/capture through the Rust processor so data-driven
|
||||
levers take effect on the gate. Out of fence (Rust/GDScript); file as the next objective.
|
||||
**Path forward (UPDATED 2026-06-04 after p1-29i full-game validation):** refound-suppression
|
||||
(p1-29i) was implemented and validated full-game — verdict **inert on the autoplay gate surface**
|
||||
(the lever's Rust path is bypassed by GDScript founding; the gate produces capital-capture
|
||||
eliminations, not refound churn). So the bottleneck for the FULL-GAME gate is NOT
|
||||
refound-suppression value-tuning; it is the **autoplay→Rust action-application architecture gap**:
|
||||
autoplay resolves founding/capture in GDScript, so data-driven combat-balance levers never reach
|
||||
it. Next objective: route autoplay action-application through the Rust `mc_turn::processor` (so
|
||||
levers take effect on the gate), then re-measure. (Original "p1-29i refound-suppression" lever
|
||||
stays defaulted off — correctly inert, not deleted.)
|
||||
**Blockers:** none for the lever; the measurement surface exists.
|
||||
**Demo gate:** full-game-only — AI plays (moves/fights/captures); convergence is quality polish, not demo-blocking.
|
||||
**Effort:** M.
|
||||
|
|
|
|||
|
|
@ -62,16 +62,72 @@ Data-driven (Rail 2) post-capture refound cooldown:
|
|||
Across geometries, eliminations ALREADY occur in 5/9 baseline conditions. So the honest
|
||||
finding is NOT "the lever unlocks elimination"; it is **"eliminations occur in 5/9 baseline
|
||||
geometries; the refound cooldown raises that to 8/9 (cd=5) with no per-cell regressions."**
|
||||
- ☐ **Author cd=5 into `combat_balance.json` — DEFERRED (not done).** The lift is real on the
|
||||
GRIDDED MICRO-surface (9 geometries, 1 seed each), but a live-game balance value requires the
|
||||
full-game 10-seed batch validation (`tools/p1-survival-score.py`, the balance-philosophy
|
||||
"multi-seed tournament" rule), which is a different + heavier surface not run this pass. The
|
||||
cd response is also a HUMP (cd=8 → 6/9, below the cd=5 peak) whose mechanism is unexplained —
|
||||
another reason not to bake a knife-near value live yet. Lever stays **defaulted off**.
|
||||
- ☐ Re-score p1-29d as converged — **NOT done.** p1-29d's gate is the multi-gate full-game
|
||||
10-seed scorecard (D1 convergence = P1 elim≤T100 OR stalled, 10/10, via the autoplay batch),
|
||||
NOT "≥1 elimination on the gridded micro-duel." This objective did not run that surface, so
|
||||
p1-29d stays unconverged. (The brief's "re-score p1-29d" is gated on its own measurement.)
|
||||
- ✗ **Author cd=5 into `combat_balance.json` — REJECTED (do NOT author; lever stays default 0).**
|
||||
The full-game validation the prior pass deferred was run this pass (2026-06-04) and returned a
|
||||
**decisive negative for the autoplay gate surface**: cd=5 is **inert by construction** there.
|
||||
See the "Full-game validation" section below. The gridded 5/9→8/9 lift is real *on the gridded
|
||||
harness* but **does not transfer** to the p1-29d gate, so no live-game value is justified. Lever
|
||||
stays **defaulted off** — confirmed: `public/games/age-of-dwarves/data/combat_balance.json` has
|
||||
no `refound_suppression` block (cd=0 governs).
|
||||
- ✗ Re-score p1-29d as converged — **NOT done, and now known-unreachable by this lever.** The lever
|
||||
does not touch the autoplay gate's code path (founding/capture resolve in GDScript, not the Rust
|
||||
`try_found_city`/`process_siege` where the gate lives). p1-29d D1 stays unconverged and is NO
|
||||
LONGER gated on this lever — re-pointed to the autoplay→Rust action-application architecture gap
|
||||
(out of fence). See p1-29d's updated gap analysis.
|
||||
|
||||
## Full-game validation (2026-06-04) — the deferred batch, run as a controlled before/after
|
||||
|
||||
The brief required the heavy full-game validation the prior pass deferred. Set up as a **strict
|
||||
same-build before/after** (no stale-commit confound): one GDExtension built once on apricot from
|
||||
pinned SHA `ad00dc78a` (carries the lever), then the only byte changed between arms is
|
||||
`refound_suppression.cooldown_turns` in `combat_balance.json` (`quality_deltas`/everything else
|
||||
held constant). combat_balance.json is **runtime-loaded** (`game_state.gd:224 _load_combat_balance_into`
|
||||
→ Rust `set_combat_balance_json`), NOT compiled in, so one binary serves both arms — the strongest
|
||||
possible attribution.
|
||||
|
||||
**Pre-flight (advisor-mandated) killed the batch before it ran — two independent reasons cd=5 is
|
||||
inert on the autoplay surface:**
|
||||
|
||||
1. **Architectural (code-path-not-executed — the `process_science`/`GdTechWeb` trap again).** The
|
||||
autoplay AI applies founding via **pure GDScript** `ai_turn_bridge_dispatch.gd:170
|
||||
dispatch_found_city` (`CityScript.new()` → `player.cities.append` → `EventBus.city_founded.emit`).
|
||||
It **never calls** `mc_turn::processor::try_found_city`, where the `refound_suppression` gate
|
||||
and the `last_city_lost_turn` stamp live. City capture in the autoplay loop likewise does **not**
|
||||
route through Rust `process_siege` (zero matches in `turn_processor.gd`/`dispatch`). The lever's
|
||||
Rust code path is exercised only by the player-api `apply_action`/dispatch surface (the gridded
|
||||
harness) — NOT the autoplay turn loop the p1-29d gate uses. So cd=5 **cannot fire** on the gate
|
||||
surface by construction. (No GDScript refound gate / `last_city_lost` / cooldown exists anywhere —
|
||||
grep-verified.)
|
||||
|
||||
2. **Behavioral (no triggering event exists).** Even if routed through Rust, the autoplay surface
|
||||
produces **terminal capital-capture eliminations** (one decisive `cities_lost=1` → game ends) or
|
||||
zombie survival — NEVER the lose-then-refound churn (38 founds / 20 captures over 160t) the
|
||||
gridded micro-harness produced and on which the 5/9→8/9 lift was measured. There is no
|
||||
capture-then-refound-within-cooldown event for `cooldown_turns` to gate.
|
||||
|
||||
**Empirical corroboration (arm A = cd=0, same build, T300, 4 seeds 1/3/6/7):**
|
||||
|
||||
| seed | endT | outcome | final (cities, lost, captured) P0 / P1 | mid-game refound churn |
|
||||
|---|---|---|---|---|
|
||||
| 1 | 256 | in_progress | (1,0,0) / (1,0,0) | none — neither side ever lost a city |
|
||||
| 3 | 171 | victory | (2,0,1) / (2,1,0) | none — P1's only loss is the terminal (T171) victory-deciding capture |
|
||||
| 6 | 151 | victory | (0,1,0) / (4,0,1) | none — P0's only loss is terminal |
|
||||
| 7 | 68 | victory | (0,1,0) / (2,0,1) | none — P0's only loss is terminal |
|
||||
|
||||
Across all 4 seeds: **`cities_lost` totals 0–1 per game, every loss is a terminal capital capture,
|
||||
zero refound-within-cooldown events.** This generalizes to all 10 gate seeds via evidence already
|
||||
on file — p1-29d's own 10-seed table (`20260529_185955`) shows the same pattern: `cities_lost=1`
|
||||
(terminal) in 8/10 or `cities_lost=0` (zombie) in 2/10, never refound churn. **Arm B (cd=5) was NOT
|
||||
run:** with zero qualifying events at cd=0, a cd=5 run is byte-identical *by logic* — it would
|
||||
"confirm no effect" hollowly (cannot distinguish "lever inert" from "lever fine, no event fired"),
|
||||
which is the inverse of the batch-attribution trap. The pre-flight clause authorizes stopping.
|
||||
|
||||
**Verdict: cd=5 does NOT move the full-game gate — and not because it was tried and failed, but
|
||||
because it is architecturally inert on that surface.** The gridded 5/9→8/9 lift is **not retracted**
|
||||
(it is genuinely true on the gridded harness, which routes through `try_found_city` and where cd
|
||||
demonstrably changed outcomes) — it is **recontextualized as a surface mismatch**: the lever is
|
||||
correctly implemented and fires on the player-api/gridded path, but is simply not on the autoplay
|
||||
gate's code path. No live-game value is authored. Lever stays default 0.
|
||||
|
||||
## Honest result (2026-06-04)
|
||||
|
||||
|
|
@ -88,20 +144,43 @@ robust micro-surface lift recorded; live authoring + p1-29d re-score deferred to
|
|||
batch. Per the brief, reporting the measured result + the tradeoff honestly (no degenerate
|
||||
value forced, no fabricated convergence) is the deliverable.
|
||||
|
||||
## Terminal result (2026-06-04, full-game validation) — REVERT/leave-off, lever inert on the gate
|
||||
|
||||
The deferred full-game validation ran and resolved both caveats above **against** authoring:
|
||||
the gridded 5/9→8/9 lift **does not transfer to the autoplay gate**, because the lever's Rust
|
||||
code path (`try_found_city`/`process_siege`) is **not on the autoplay turn loop** (founding/capture
|
||||
resolve in GDScript `dispatch_found_city`), AND the autoplay surface produces no refound-churn for
|
||||
the cooldown to gate (terminal capital-capture eliminations, not lose-then-refound). cd=5 is
|
||||
therefore **inert by construction on the p1-29d gate surface** — confirmed by code-reading and a
|
||||
4-seed cd=0 corroboration run (zero refound events). **Decision: do NOT author cd=5; lever stays
|
||||
default 0** (live JSON has no `refound_suppression` block — verified). This is a clean negative,
|
||||
not a balance failure: the mechanism is correct and fires on the gridded/player-api surface; it is
|
||||
simply not wired into the surface the gate measures. **Next lever to actually move p1-29d D1 is an
|
||||
architecture change** — route autoplay action-application (founding/capture) through the Rust
|
||||
`mc_turn::processor` so data-driven combat-balance levers like this one take effect on the gate
|
||||
surface — which is out of fence (Rust/GDScript, owned by a concurrent lane). File as the blocker.
|
||||
|
||||
## Source-of-truth rails
|
||||
|
||||
- **Rust crate**: `mc-turn::processor` owns the refound gate + loss-turn stamp; `mc-core`
|
||||
owns the `RefoundSuppression` tunable.
|
||||
- **JSON path**: `public/games/age-of-dwarves/data/combat_balance.json` —
|
||||
`refound_suppression.cooldown_turns` (NOT yet authored; default 0 governs).
|
||||
`refound_suppression.cooldown_turns` (NOT authored; default 0 governs — confirmed-correct
|
||||
after full-game validation showed the lever inert on the autoplay gate surface).
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Authoring a live-game cooldown value before multi-seed convergence is proven.
|
||||
- Authoring a live-game cooldown value before multi-seed convergence is proven (now RESOLVED:
|
||||
rejected — the lever is architecturally inert on the gate surface; no value is justified).
|
||||
- The targeting lock itself (p1-29h, done) and the learned-controller track (p1-29f/g).
|
||||
- The autoplay→Rust action-application architecture change (the actual unblock for p1-29d D1) —
|
||||
Rust/GDScript, out of fence; file as the next objective.
|
||||
|
||||
## References
|
||||
|
||||
- `.project/objectives/p1-29h-stateful-tactical-decisiveness.md`
|
||||
- `.project/objectives/p1-29d-p1-survival.md`
|
||||
- `src/simulator/crates/mc-player-api/tests/p1_29h_gridded_elimination.rs` — diagnostic + sweep.
|
||||
- `src/game/engine/src/modules/ai/ai_turn_bridge_dispatch.gd:170` — `dispatch_found_city`, the
|
||||
GDScript autoplay founding path that BYPASSES the Rust `try_found_city` refound gate.
|
||||
- `src/game/engine/src/autoloads/game_state.gd:224` — `_load_combat_balance_into` (runtime JSON load).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue