docs(p0-26b): record post-port tier_peak batch; bullet 5 stays open (no baseline)

Ran the 10-seed T300 autoplay smoke batch on apricot against the committed build (origin/main @ f5c4ee9c3, the live pick_research_via_bridge dispatch). Recorded the per-seed tier_peak table as cited evidence: all 5 clans present, all games resolved to victory with both sides active (seed5's 37-turn win is a legitimate tinkersmith domination rush), tier_peak spans T1-T6 with no collapse — a healthy, clan-diverse post-port distribution. Bullet 5 remains [ ] / status partial: the acceptance is comparative ("unchanged-or-better vs the pre-port baseline") and no recorded pre-port baseline exists. The nearest pre-port SHA is confounded by ~3 weeks of unrelated changes; the only confound-free before/after (port commits reverted) isn't reachable by the apricot launcher (forge-fetch only, revert branch is ACS-unpushed). Decision-correctness is already pinned by the bullet-4 parity test (8/8). tier_peak extracted directly from player_stats[].tier_peak because autoplay-report.py aborted on a pre-existing autoplay-validate.py bug (unhashable type: list) — left unfixed (out of fence). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:23:20 -07:00 · 2026-06-04 21:23:20 -07:00 · d1b7ff8efc
commit d1b7ff8efc
parent 4b0b661907
1 changed files with 58 additions and 13 deletions
--- a/.project/objectives/p0-26b-pick-research-rust-port.md
+++ b/.project/objectives/p0-26b-pick-research-rust-port.md
@ -72,14 +72,59 @@ updated off `missing`. Audited and re-scored this session:
    (`cargo test -p mc-ai --test pick_research_parity`, 2026-06-04): all 5 clans
    open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip,
    availability filter.
- **Bullet 5 (regression batch) NOT verified.** Needs a 10-seed T300 autoplay
-  batch comparing tier_peak vs the pre-port baseline. Deferred: requires the
-  main-tree path + `tools/autoplay-batch.sh`, and apricot's Godot was occupied
-  by the concurrent balance lane this session. **Resume:** run the batch once
-  apricot is free; if tier_peak distributions are unchanged-or-better, mark
-  bullet 5 ✓ and flip status → done. Workspace was green (`cargo check
-  --workspace` exit 0) before and after this audit; this session made NO source
-  changes to p0-26b — only the objective-file re-score.
+- **Bullet 5 (regression batch) — post-port batch RUN, but no rigorous
+  before/after, so bullet stays open.** A 10-seed T300 autoplay smoke batch was
+  run on apricot against the committed build (`origin/main` @ `9ce1269c8`, the
+  SHA carrying the live `pick_research_via_bridge` dispatch). Per-seed results
+  (tier_peak read directly from `player_stats[].tier_peak` in
+  `turn_stats.jsonl` — `autoplay-report.py` aborted on a pre-existing
+  `autoplay-validate.py` bug, `TypeError: unhashable type: 'list'`, out of
+  fence so not fixed):
+
+  | seed | turns | outcome | p0 tier_peak | p1 tier_peak | p1 clan | winner |
+  |---|---|---|---|---|---|---|
+  | 1 | 300 | victory | 6 | 6 | ironhold | ironhold |
+  | 2 | 207 | victory | 6 | 6 | runesmith | (p0) |
+  | 3 | 171 | victory | 6 | 6 | blackhammer | (p0) |
+  | 4 | 300 | victory | 6 | 6 | ironhold | ironhold |
+  | 5 | 37 | victory (domination) | 1 | 1 | tinkersmith | tinkersmith |
+  | 6 | 151 | victory | 2 | 3 | runesmith | runesmith |
+  | 7 | 68 | victory | 2 | 2 | tinkersmith | tinkersmith |
+  | 8 | 185 | victory | 4 | 3 | goldvein | (p0) |
+  | 9 | 167 | victory | 3 | 3 | blackhammer | (p0) |
+  | 10 | 300 | victory | 6 | 6 | ironhold | ironhold |
+
+  **Health read (no-regression signal):** all 5 clan personalities appear
+  (ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game
+  resolved to a `victory` with both sides active — no stalls or one-sided
+  no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith
+  domination rush: both founded capitals, 18 techs researched, 106 combats,
+  p1 captured a city); tier_peak spans T1–T6 with no collapse to a single
+  value. tier_peak across all 20 player-instances: median 5.0, mean 4.2,
+  dist `{1:2, 2:3, 3:4, 4:1, 6:10}`.
+
+  **Why this does NOT close bullet 5 / flip to `done`.** The acceptance text is
+  *comparative* — "tier_peak distributions unchanged-or-better **vs the
+  pre-port baseline**." No recorded pre-port tier_peak baseline exists (checked
+  `.project/reports/simulation/{baseline,balance}/`, `experiment-log.md`, and
+  apricot `~/.cache/mc-batches/`; none predate the port). The nearest pre-port
+  SHA (`a8760bb50`, 2026-05-14, last commit before the delegation bridge
+  `f9197ba86`) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI
+  changes, so a delta against it would not isolate the port. The only
+  confound-free "before" is current `main` with the two port commits
+  (`f9197ba86`, `431e58092`) reverted — but that revert would live on a branch
+  ACS does not push to the forge, and apricot's launcher only builds
+  forge-fetchable refs (`origin/main` / `BUILD_REF`), so it is not reachable in
+  this environment without extra plumbing — disproportionate for a bullet the
+  objective itself tags verification-only. Note also that tier_peak here is
+  dominated by game *length* (every 300-turn game hits the T6 cap; short
+  games cap lower), so even a clean before/after would be a coarse signal;
+  the decision-correctness of the port is already pinned by bullet-4's parity
+  test (8/8, all 5 clans → identical tech ordering → identical researched
+  techs → identical tier_peak by construction).
+
+  Workspace was green before and after; this session made NO source changes to
+  p0-26b — only this objective-file re-score plus the post-port batch.

 ## Non-goals

@ -93,9 +138,9 @@ updated off `missing`. Audited and re-scored this session:
  in the Rust controller, the action-priority uplift and the actual research
  decision share one source and can be measured together.

-## True state — 2026-06-04 gap analysis
-**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ✗ bullet 5: 10-seed T300 regression batch not run.
-**Path forward:** run the 10-seed T300 autoplay batch on apricot vs current main; if tier_peak unchanged-or-better, flip to done.
-**Blockers:** none — ready (needs only an apricot batch + free host).
+## True state — 2026-06-04 gap analysis (updated 2026-06-05)
+**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (`9ce1269c8`, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open.
+**Path forward:** a confound-free flip to `done` needs a port-isolating before/after — current `main` with `f9197ba86` + `431e58092` reverted, vs `main` — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds `origin/main`/`BUILD_REF` only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only.
+**Blockers:** no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch).
 **Demo gate:** post-demo polish — research dispatch already works in-game; the batch is verification only.
-**Effort:** S (one batch).
+**Effort:** S (one isolating before/after build, blocked on launcher forge-fetch plumbing).