docs(objectives): 📝 Update validation section in RL divergence mining objectives to reflect current state and add batch analysis details

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 21:15:13 -07:00 · 2026-05-27 21:15:13 -07:00 · 32a694eab7
commit 32a694eab7
parent b7353d43e2
1 changed files with 63 additions and 13 deletions
--- a/.project/objectives/p1-29e-rl-divergence-mining.md
+++ b/.project/objectives/p1-29e-rl-divergence-mining.md
@ -111,20 +111,70 @@ action priors (`action_prior_with_context`), p1-29d tuned combat damage
  min defenders → warrior; multi-city unaffected → warrior. `cargo test -p
  mc-ai --lib`: 265 pass.

-## Validation (before/after autoplay batch)
+## Validation (before/after autoplay batch) — GATE NOT MET

-Baseline: apricot batch `20260516_183534` — P1 buildings = 0 in 10/10,
-`tier_peak = 1` in 10/10, 0/10 gate.
+Local 10-seed T300 batch on the patched build:
+`.local/batches/p1_29e_after` (fresh GDExtension rebuild from working tree).

-After (this patch): _<pending — filled by the validation batch below>_
+**Headline (do not be fooled):** vs the stale apricot baseline
+`20260516_183534`, P1 `tier_peak` rose 1 → 2-5 in **10/10** seeds. This is
+**NOT attributable to this patch** — it is main-branch drift. Three facts
+establish that:

-```
-PARALLEL=4 bash tools/autoplay-batch.sh 10 300 .local/batches/p1_29e_after
-# analyze: P1 buildings_max > 0 ?  P1 tier_peak >= 2 ?  P1 median survival turn ?
-```
+1. `tier_peak` is defined as *highest tech-era researched*
+   (`turn_processor.gd::_player_tier_peak`; mirrored in
+   `auto_play.gd`). It is research-driven, NOT building- or unit-driven.
+2. This patch only adds **buildings**. P1 completed **zero buildings** in
+   all 10 seeds (`player_stats.buildings` max = 0; only `owner=0` appears in
+   `city_building_completed`). The patch produced no material change a
+   research metric could reflect.
+3. The baseline `20260516_183534` is a *different commit*; comparing across
+   it conflates this patch with all intervening main-branch changes.

-Per p1-29d iteration discipline: any directional movement (P1 builds >0
-buildings; survival-turn or tier_peak rises) confirms the lever direction
-even if the full ≥7/10 gate does not land in one pass. If the gate is still
-0–2/10, this objective stays `partial` and the next iteration targets the
-production-scale handicap (candidate 3), not another priors tweak.
+So the apparent improvement is a measurement artifact of comparing against a
+stale-commit baseline. **The completion gate (a candidate validated by a
+before/after batch showing the metric move *because of the candidate*) is NOT
+met.**
+
+### What the fresh batch actually revealed (more valuable than the patch)
+
+Current-main P1 behaviour differs from the p1-29d baseline narrative:
+- P1 reaches `tier_peak` 2-5 by **pure research** (techs 9-35) — the old
+  "P1 stuck at tier_peak=1" symptom is **already gone on current main**.
+- P1 still loses its capital in 8/10 (eliminated T44-100) with
+  `kills=1-10`, `units_lost=1-4` — it fights but loses.
+- Survivor seeds 5/9 (T300/272, 1 city): `mil=0`, `buildings=0`,
+  `pop=17/33`, `techs=31-35` — P1 researches to era 5 but builds **nothing
+  material** for 250+ turns. Possible production stall worth its own
+  investigation (snapshot-timing artifact vs genuine stall — unconfirmed).
+
+### Why the patch did not demonstrably help
+
+The break-out gate `own_mil >= SOLE_CITY_ECON_MIN_DEFENDERS (2)` requires the
+sole-city AI to hold ≥2 standing non-founder units at decision time. P1's
+`mil` snapshot is 0 at every recorded turn in 10/10 seeds (it fights via
+very-transient units between snapshots). Whether the gate ever fired is
+unconfirmed (the engine emits no production-queue event to detect it from
+batch artifacts). Either way the patch completed 0 buildings, so it had no
+observable effect — and the `own_mil>=2` floor may be exactly wrong for the
+weakest player.
+
+### Honest status & next steps
+
+- **Gate: NOT MET.** No metric movement attributable to this patch.
+- The patch is gated to `sole_city_threatened` and fully unit-tested
+  (265 mc-ai tests green), so it is safe in-tree, but **unvalidated** — the
+  consuming p1-29c/29d worker should validate or revert it.
+- **Remaining attribution step (deferred on host load):** controlled
+  before/after on the *same* fresh build — HEAD vs HEAD+patch, same 10 seeds —
+  is the only clean way to attribute (or refute) any effect. Held while the
+  host runs ≥20 concurrent `godot-bin` (host guard); to run when load drops:
+  ```
+  # baseline = revert the two production.rs edits, rebuild, run; then re-apply
+  PARALLEL=3 bash tools/autoplay-batch.sh 10 300 .local/batches/p1_29e_before
+  ```
+- **Reframe for the next iteration:** the failure regime on current main is
+  *survival with no military* and a possible *production stall*, not the old
+  "military-spam, no economy". Re-baseline p1-29c/29d against current main
+  before further patch work; the `tier_peak=1` symptom they target may already
+  be resolved.