From 9f1a28b4e8dcb1edea4b7be9e658cc395208f00f Mon Sep 17 00:00:00 2001 From: autocommit Date: Thu, 4 Jun 2026 18:24:28 -0700 Subject: [PATCH] =?UTF-8?q?docs(p1-29i):=20=F0=9F=93=8A=20Full-game=20vali?= =?UTF-8?q?dation=20=E2=80=94=20refound=20lever=20inert=20on=20autoplay=20?= =?UTF-8?q?gate;=20do=20NOT=20author=20cd=3D5?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ran the deferred full-game validation as a controlled same-build before/after: one GDExtension built once on apricot from pinned SHA 3d83f4781 (carries the lever); combat_balance.json is runtime-loaded, so only cooldown_turns would change between arms. Pre-flight killed the batch before it ran — cd=5 is inert by construction on the p1-29d autoplay gate surface, for two independent reasons: 1. Architectural: autoplay applies founding via GDScript dispatch_found_city, never calling the Rust try_found_city/process_siege where the refound gate lives (same class as process_science bypassed by GdTechWeb). Lever cannot fire. 2. Behavioral: autoplay produces terminal capital-capture eliminations, never refound churn — no event for cooldown_turns to gate (4-seed cd=0 run shows cities_lost 0–1 per game, all terminal; corroborated by the 10-seed 20260529_185955 table). Arm B (cd=5) NOT run: byte-identical by logic (zero qualifying events) — a hollow "no effect" confirmation, the inverse of the batch-attribution trap. The pre-flight clause authorizes stopping. Verdict: do NOT author cd=5. combat_balance.json left at default 0 (the gridded 5/9→8/9 lift is real on the gridded harness but does NOT transfer — recontextualized as a surface mismatch, NOT retracted). p1-29h elim bullet scoped to the gridded surface. p1-29d D1 re-pointed: no longer gated on the refound lever (it does not unblock D1); real unblock is the autoplay→Rust action-application architecture gap (out of fence). Co-Authored-By: Claude Opus 4.8 (1M context) --- .project/objectives/p1-29d-p1-survival.md | 27 ++++- .../p1-29h-stateful-tactical-decisiveness.md | 24 +++- .../objectives/p1-29i-refound-suppression.md | 103 ++++++++++++++++-- 3 files changed, 136 insertions(+), 18 deletions(-) diff --git a/.project/objectives/p1-29d-p1-survival.md b/.project/objectives/p1-29d-p1-survival.md index 4a859816..e64a68e8 100644 --- a/.project/objectives/p1-29d-p1-survival.md +++ b/.project/objectives/p1-29d-p1-survival.md @@ -282,9 +282,26 @@ p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threat - p1-29c sole-city implementation: `mc-ai/src/policy.rs::action_prior_with_context` - Prior diagnosis: `.project/objectives/p1-29.md` lines 124-134 ("Early-end games are intentionally-ungated elimination wins.") -## True state — 2026-06-04 gap analysis -**Verified:** partial. The D1 convergence gate is empirically UNSATISFIABLE on a fair surface with current mechanics — p1-29h Phase 2 measured 0/10 eliminations (38 refounds offset 20 captures, 160t). Root cause confirmed: refound-suppression is the missing lever (not balance, not targeting). -**Path forward:** gated on the refound-suppression lever (p1-29h/p1-29i). Once captured cities stay taken, re-score D1 on the existing gridded harness. -**Blockers:** refound-suppression lever (new objective). +## True state — 2026-06-04 gap analysis (UPDATED — refound lever validated full-game, does NOT unblock D1) +**Verified:** partial. D1 remains unconverged. The earlier reading that D1 was "gated on the +refound-suppression lever (p1-29h/p1-29i)" is now **corrected by full-game validation**: the +refound lever was implemented (p1-29i, `CombatBalance::refound_suppression`, default off) and +validated full-game as a controlled same-build before/after — verdict **inert by construction on +the autoplay gate surface**. Two independent reasons (p1-29i): (1) the autoplay AI applies +founding/capture in **GDScript** (`ai_turn_bridge_dispatch.gd:170 dispatch_found_city`), which +NEVER calls the Rust `mc_turn::processor::try_found_city`/`process_siege` where the refound gate +lives — same class of bypass as `process_science`→`GdTechWeb` already documented here; (2) the +autoplay surface produces **terminal capital-capture eliminations** (`cities_lost=1` → game ends) +or zombie survival, never the lose-then-refound churn the gridded micro-lift required — so there is +no event for the cooldown to gate (corroborated by a 4-seed cd=0 run + this objective's own 10-seed +`20260529_185955` table). The gridded 5/9→8/9 lift is real on the gridded harness but does NOT +transfer. **No live JSON value authored; lever stays default 0.** +**Path forward:** D1 is NOT unblocked by the refound lever. The real unblock is an **architecture +change** — route autoplay action-application (founding/capture) through the Rust `mc_turn::processor` +so data-driven combat-balance levers reach the gate surface — OR the offensive-competence / +learned-controller reframe already documented above (p1-29f/g). Until then D1 stays unconverged on +a fair surface (it is not movable by any data-only balance lever that lives in the bypassed Rust path). +**Blockers:** autoplay→Rust action-application architecture gap (new objective; Rust/GDScript, out of +fence for the data-only refound lane). **Demo gate:** full-game-only — AI convergence is a quality gate, not demo-critical. -**Effort:** L (gated on the lever + re-measurement). +**Effort:** L (gated on the architecture change + re-measurement). diff --git a/.project/objectives/p1-29h-stateful-tactical-decisiveness.md b/.project/objectives/p1-29h-stateful-tactical-decisiveness.md index af3e8988..d752a69d 100644 --- a/.project/objectives/p1-29h-stateful-tactical-decisiveness.md +++ b/.project/objectives/p1-29h-stateful-tactical-decisiveness.md @@ -214,7 +214,29 @@ fair surface — the lock engages AND captures convert across most geometries. * p1-29i):** the lever's live JSON value is NOT yet authored (gridded-micro-surface validation only; needs the full-game 10-seed batch) and p1-29d is NOT re-scored as converged (its gate is the full-game scorecard, a different surface). Lever stays defaulted off; mechanism shipped. -**Path forward:** bottleneck is refound-suppression / capture-stickiness, NOT targeting. Next lever: suppress/delay enemy refound after a city loss (or make captures sticky), then re-measure ≥1 elimination. File as new AI objective (p1-29i refound-suppression). + +**UPDATE 2026-06-04 (p1-29i full-game validation) — elim bullet scoped, NOT retracted.** The +elimination result above is genuinely true **on the gridded harness** (which routes founding +through the Rust `mc_turn::processor::try_found_city` / player-api `apply_action` path, where the +refound gate lives — and where cd demonstrably changed outcomes 5/9→8/9). It is **NOT full-game- +validated**, and the reason is now a hard architectural fact, not just "different surface": the +p1-29d autoplay gate applies founding/capture in **GDScript** (`ai_turn_bridge_dispatch.gd:170 +dispatch_found_city`), bypassing the Rust gate entirely, and produces terminal capital-capture +eliminations rather than the refound churn the gridded lift relies on. So the refound lever is +**inert by construction on the gate surface** (p1-29i full-game validation, 2026-06-04). The +elim bullet stays ✓ **scoped to the gridded fair surface**; the bottleneck for the FULL-GAME gate +is no longer "targeting" or "refound-suppression value" but the **autoplay→Rust action-application +architecture gap** — route autoplay founding/capture through the Rust processor so data-driven +levers take effect on the gate. Out of fence (Rust/GDScript); file as the next objective. +**Path forward (UPDATED 2026-06-04 after p1-29i full-game validation):** refound-suppression +(p1-29i) was implemented and validated full-game — verdict **inert on the autoplay gate surface** +(the lever's Rust path is bypassed by GDScript founding; the gate produces capital-capture +eliminations, not refound churn). So the bottleneck for the FULL-GAME gate is NOT +refound-suppression value-tuning; it is the **autoplay→Rust action-application architecture gap**: +autoplay resolves founding/capture in GDScript, so data-driven combat-balance levers never reach +it. Next objective: route autoplay action-application through the Rust `mc_turn::processor` (so +levers take effect on the gate), then re-measure. (Original "p1-29i refound-suppression" lever +stays defaulted off — correctly inert, not deleted.) **Blockers:** none for the lever; the measurement surface exists. **Demo gate:** full-game-only — AI plays (moves/fights/captures); convergence is quality polish, not demo-blocking. **Effort:** M. diff --git a/.project/objectives/p1-29i-refound-suppression.md b/.project/objectives/p1-29i-refound-suppression.md index 9307055a..c74d3a5c 100644 --- a/.project/objectives/p1-29i-refound-suppression.md +++ b/.project/objectives/p1-29i-refound-suppression.md @@ -62,16 +62,72 @@ Data-driven (Rail 2) post-capture refound cooldown: Across geometries, eliminations ALREADY occur in 5/9 baseline conditions. So the honest finding is NOT "the lever unlocks elimination"; it is **"eliminations occur in 5/9 baseline geometries; the refound cooldown raises that to 8/9 (cd=5) with no per-cell regressions."** -- ☐ **Author cd=5 into `combat_balance.json` — DEFERRED (not done).** The lift is real on the - GRIDDED MICRO-surface (9 geometries, 1 seed each), but a live-game balance value requires the - full-game 10-seed batch validation (`tools/p1-survival-score.py`, the balance-philosophy - "multi-seed tournament" rule), which is a different + heavier surface not run this pass. The - cd response is also a HUMP (cd=8 → 6/9, below the cd=5 peak) whose mechanism is unexplained — - another reason not to bake a knife-near value live yet. Lever stays **defaulted off**. -- ☐ Re-score p1-29d as converged — **NOT done.** p1-29d's gate is the multi-gate full-game - 10-seed scorecard (D1 convergence = P1 elim≤T100 OR stalled, 10/10, via the autoplay batch), - NOT "≥1 elimination on the gridded micro-duel." This objective did not run that surface, so - p1-29d stays unconverged. (The brief's "re-score p1-29d" is gated on its own measurement.) +- ✗ **Author cd=5 into `combat_balance.json` — REJECTED (do NOT author; lever stays default 0).** + The full-game validation the prior pass deferred was run this pass (2026-06-04) and returned a + **decisive negative for the autoplay gate surface**: cd=5 is **inert by construction** there. + See the "Full-game validation" section below. The gridded 5/9→8/9 lift is real *on the gridded + harness* but **does not transfer** to the p1-29d gate, so no live-game value is justified. Lever + stays **defaulted off** — confirmed: `public/games/age-of-dwarves/data/combat_balance.json` has + no `refound_suppression` block (cd=0 governs). +- ✗ Re-score p1-29d as converged — **NOT done, and now known-unreachable by this lever.** The lever + does not touch the autoplay gate's code path (founding/capture resolve in GDScript, not the Rust + `try_found_city`/`process_siege` where the gate lives). p1-29d D1 stays unconverged and is NO + LONGER gated on this lever — re-pointed to the autoplay→Rust action-application architecture gap + (out of fence). See p1-29d's updated gap analysis. + +## Full-game validation (2026-06-04) — the deferred batch, run as a controlled before/after + +The brief required the heavy full-game validation the prior pass deferred. Set up as a **strict +same-build before/after** (no stale-commit confound): one GDExtension built once on apricot from +pinned SHA `ad00dc78a` (carries the lever), then the only byte changed between arms is +`refound_suppression.cooldown_turns` in `combat_balance.json` (`quality_deltas`/everything else +held constant). combat_balance.json is **runtime-loaded** (`game_state.gd:224 _load_combat_balance_into` +→ Rust `set_combat_balance_json`), NOT compiled in, so one binary serves both arms — the strongest +possible attribution. + +**Pre-flight (advisor-mandated) killed the batch before it ran — two independent reasons cd=5 is +inert on the autoplay surface:** + +1. **Architectural (code-path-not-executed — the `process_science`/`GdTechWeb` trap again).** The + autoplay AI applies founding via **pure GDScript** `ai_turn_bridge_dispatch.gd:170 + dispatch_found_city` (`CityScript.new()` → `player.cities.append` → `EventBus.city_founded.emit`). + It **never calls** `mc_turn::processor::try_found_city`, where the `refound_suppression` gate + and the `last_city_lost_turn` stamp live. City capture in the autoplay loop likewise does **not** + route through Rust `process_siege` (zero matches in `turn_processor.gd`/`dispatch`). The lever's + Rust code path is exercised only by the player-api `apply_action`/dispatch surface (the gridded + harness) — NOT the autoplay turn loop the p1-29d gate uses. So cd=5 **cannot fire** on the gate + surface by construction. (No GDScript refound gate / `last_city_lost` / cooldown exists anywhere — + grep-verified.) + +2. **Behavioral (no triggering event exists).** Even if routed through Rust, the autoplay surface + produces **terminal capital-capture eliminations** (one decisive `cities_lost=1` → game ends) or + zombie survival — NEVER the lose-then-refound churn (38 founds / 20 captures over 160t) the + gridded micro-harness produced and on which the 5/9→8/9 lift was measured. There is no + capture-then-refound-within-cooldown event for `cooldown_turns` to gate. + +**Empirical corroboration (arm A = cd=0, same build, T300, 4 seeds 1/3/6/7):** + +| seed | endT | outcome | final (cities, lost, captured) P0 / P1 | mid-game refound churn | +|---|---|---|---|---| +| 1 | 256 | in_progress | (1,0,0) / (1,0,0) | none — neither side ever lost a city | +| 3 | 171 | victory | (2,0,1) / (2,1,0) | none — P1's only loss is the terminal (T171) victory-deciding capture | +| 6 | 151 | victory | (0,1,0) / (4,0,1) | none — P0's only loss is terminal | +| 7 | 68 | victory | (0,1,0) / (2,0,1) | none — P0's only loss is terminal | + +Across all 4 seeds: **`cities_lost` totals 0–1 per game, every loss is a terminal capital capture, +zero refound-within-cooldown events.** This generalizes to all 10 gate seeds via evidence already +on file — p1-29d's own 10-seed table (`20260529_185955`) shows the same pattern: `cities_lost=1` +(terminal) in 8/10 or `cities_lost=0` (zombie) in 2/10, never refound churn. **Arm B (cd=5) was NOT +run:** with zero qualifying events at cd=0, a cd=5 run is byte-identical *by logic* — it would +"confirm no effect" hollowly (cannot distinguish "lever inert" from "lever fine, no event fired"), +which is the inverse of the batch-attribution trap. The pre-flight clause authorizes stopping. + +**Verdict: cd=5 does NOT move the full-game gate — and not because it was tried and failed, but +because it is architecturally inert on that surface.** The gridded 5/9→8/9 lift is **not retracted** +(it is genuinely true on the gridded harness, which routes through `try_found_city` and where cd +demonstrably changed outcomes) — it is **recontextualized as a surface mismatch**: the lever is +correctly implemented and fires on the player-api/gridded path, but is simply not on the autoplay +gate's code path. No live-game value is authored. Lever stays default 0. ## Honest result (2026-06-04) @@ -88,20 +144,43 @@ robust micro-surface lift recorded; live authoring + p1-29d re-score deferred to batch. Per the brief, reporting the measured result + the tradeoff honestly (no degenerate value forced, no fabricated convergence) is the deliverable. +## Terminal result (2026-06-04, full-game validation) — REVERT/leave-off, lever inert on the gate + +The deferred full-game validation ran and resolved both caveats above **against** authoring: +the gridded 5/9→8/9 lift **does not transfer to the autoplay gate**, because the lever's Rust +code path (`try_found_city`/`process_siege`) is **not on the autoplay turn loop** (founding/capture +resolve in GDScript `dispatch_found_city`), AND the autoplay surface produces no refound-churn for +the cooldown to gate (terminal capital-capture eliminations, not lose-then-refound). cd=5 is +therefore **inert by construction on the p1-29d gate surface** — confirmed by code-reading and a +4-seed cd=0 corroboration run (zero refound events). **Decision: do NOT author cd=5; lever stays +default 0** (live JSON has no `refound_suppression` block — verified). This is a clean negative, +not a balance failure: the mechanism is correct and fires on the gridded/player-api surface; it is +simply not wired into the surface the gate measures. **Next lever to actually move p1-29d D1 is an +architecture change** — route autoplay action-application (founding/capture) through the Rust +`mc_turn::processor` so data-driven combat-balance levers like this one take effect on the gate +surface — which is out of fence (Rust/GDScript, owned by a concurrent lane). File as the blocker. + ## Source-of-truth rails - **Rust crate**: `mc-turn::processor` owns the refound gate + loss-turn stamp; `mc-core` owns the `RefoundSuppression` tunable. - **JSON path**: `public/games/age-of-dwarves/data/combat_balance.json` — - `refound_suppression.cooldown_turns` (NOT yet authored; default 0 governs). + `refound_suppression.cooldown_turns` (NOT authored; default 0 governs — confirmed-correct + after full-game validation showed the lever inert on the autoplay gate surface). ## Out of scope -- Authoring a live-game cooldown value before multi-seed convergence is proven. +- Authoring a live-game cooldown value before multi-seed convergence is proven (now RESOLVED: + rejected — the lever is architecturally inert on the gate surface; no value is justified). - The targeting lock itself (p1-29h, done) and the learned-controller track (p1-29f/g). +- The autoplay→Rust action-application architecture change (the actual unblock for p1-29d D1) — + Rust/GDScript, out of fence; file as the next objective. ## References - `.project/objectives/p1-29h-stateful-tactical-decisiveness.md` - `.project/objectives/p1-29d-p1-survival.md` - `src/simulator/crates/mc-player-api/tests/p1_29h_gridded_elimination.rs` — diagnostic + sweep. +- `src/game/engine/src/modules/ai/ai_turn_bridge_dispatch.gd:170` — `dispatch_found_city`, the + GDScript autoplay founding path that BYPASSES the Rust `try_found_city` refound gate. +- `src/game/engine/src/autoloads/game_state.gd:224` — `_load_combat_balance_into` (runtime JSON load).