docs(p0-26b): record post-port tier_peak batch; bullet 5 stays open (no baseline)

Ran the 10-seed T300 autoplay smoke batch on apricot against the committed
build (origin/main @ f5c4ee9c3, the live pick_research_via_bridge dispatch).
Recorded the per-seed tier_peak table as cited evidence: all 5 clans present,
all games resolved to victory with both sides active (seed5's 37-turn win is a
legitimate tinkersmith domination rush), tier_peak spans T1-T6 with no
collapse — a healthy, clan-diverse post-port distribution.

Bullet 5 remains [ ] / status partial: the acceptance is comparative
("unchanged-or-better vs the pre-port baseline") and no recorded pre-port
baseline exists. The nearest pre-port SHA is confounded by ~3 weeks of
unrelated changes; the only confound-free before/after (port commits reverted)
isn't reachable by the apricot launcher (forge-fetch only, revert branch is
ACS-unpushed). Decision-correctness is already pinned by the bullet-4 parity
test (8/8). tier_peak extracted directly from player_stats[].tier_peak because
autoplay-report.py aborted on a pre-existing autoplay-validate.py bug
(unhashable type: list) — left unfixed (out of fence).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
autocommit 2026-06-04 21:23:20 -07:00
parent 4b0b661907
commit d1b7ff8efc

View file

@ -72,14 +72,59 @@ updated off `missing`. Audited and re-scored this session:
(`cargo test -p mc-ai --test pick_research_parity`, 2026-06-04): all 5 clans
open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip,
availability filter.
- **Bullet 5 (regression batch) NOT verified.** Needs a 10-seed T300 autoplay
batch comparing tier_peak vs the pre-port baseline. Deferred: requires the
main-tree path + `tools/autoplay-batch.sh`, and apricot's Godot was occupied
by the concurrent balance lane this session. **Resume:** run the batch once
apricot is free; if tier_peak distributions are unchanged-or-better, mark
bullet 5 ✓ and flip status → done. Workspace was green (`cargo check
--workspace` exit 0) before and after this audit; this session made NO source
changes to p0-26b — only the objective-file re-score.
- **Bullet 5 (regression batch) — post-port batch RUN, but no rigorous
before/after, so bullet stays open.** A 10-seed T300 autoplay smoke batch was
run on apricot against the committed build (`origin/main` @ `9ce1269c8`, the
SHA carrying the live `pick_research_via_bridge` dispatch). Per-seed results
(tier_peak read directly from `player_stats[].tier_peak` in
`turn_stats.jsonl``autoplay-report.py` aborted on a pre-existing
`autoplay-validate.py` bug, `TypeError: unhashable type: 'list'`, out of
fence so not fixed):
| seed | turns | outcome | p0 tier_peak | p1 tier_peak | p1 clan | winner |
|---|---|---|---|---|---|---|
| 1 | 300 | victory | 6 | 6 | ironhold | ironhold |
| 2 | 207 | victory | 6 | 6 | runesmith | (p0) |
| 3 | 171 | victory | 6 | 6 | blackhammer | (p0) |
| 4 | 300 | victory | 6 | 6 | ironhold | ironhold |
| 5 | 37 | victory (domination) | 1 | 1 | tinkersmith | tinkersmith |
| 6 | 151 | victory | 2 | 3 | runesmith | runesmith |
| 7 | 68 | victory | 2 | 2 | tinkersmith | tinkersmith |
| 8 | 185 | victory | 4 | 3 | goldvein | (p0) |
| 9 | 167 | victory | 3 | 3 | blackhammer | (p0) |
| 10 | 300 | victory | 6 | 6 | ironhold | ironhold |
**Health read (no-regression signal):** all 5 clan personalities appear
(ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game
resolved to a `victory` with both sides active — no stalls or one-sided
no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith
domination rush: both founded capitals, 18 techs researched, 106 combats,
p1 captured a city); tier_peak spans T1T6 with no collapse to a single
value. tier_peak across all 20 player-instances: median 5.0, mean 4.2,
dist `{1:2, 2:3, 3:4, 4:1, 6:10}`.
**Why this does NOT close bullet 5 / flip to `done`.** The acceptance text is
*comparative* — "tier_peak distributions unchanged-or-better **vs the
pre-port baseline**." No recorded pre-port tier_peak baseline exists (checked
`.project/reports/simulation/{baseline,balance}/`, `experiment-log.md`, and
apricot `~/.cache/mc-batches/`; none predate the port). The nearest pre-port
SHA (`a8760bb50`, 2026-05-14, last commit before the delegation bridge
`f9197ba86`) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI
changes, so a delta against it would not isolate the port. The only
confound-free "before" is current `main` with the two port commits
(`f9197ba86`, `431e58092`) reverted — but that revert would live on a branch
ACS does not push to the forge, and apricot's launcher only builds
forge-fetchable refs (`origin/main` / `BUILD_REF`), so it is not reachable in
this environment without extra plumbing — disproportionate for a bullet the
objective itself tags verification-only. Note also that tier_peak here is
dominated by game *length* (every 300-turn game hits the T6 cap; short
games cap lower), so even a clean before/after would be a coarse signal;
the decision-correctness of the port is already pinned by bullet-4's parity
test (8/8, all 5 clans → identical tech ordering → identical researched
techs → identical tier_peak by construction).
Workspace was green before and after; this session made NO source changes to
p0-26b — only this objective-file re-score plus the post-port batch.
## Non-goals
@ -93,9 +138,9 @@ updated off `missing`. Audited and re-scored this session:
in the Rust controller, the action-priority uplift and the actual research
decision share one source and can be measured together.
## True state — 2026-06-04 gap analysis
**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ✗ bullet 5: 10-seed T300 regression batch not run.
**Path forward:** run the 10-seed T300 autoplay batch on apricot vs current main; if tier_peak unchanged-or-better, flip to done.
**Blockers:** none — ready (needs only an apricot batch + free host).
## True state — 2026-06-04 gap analysis (updated 2026-06-05)
**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (`9ce1269c8`, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open.
**Path forward:** a confound-free flip to `done` needs a port-isolating before/after — current `main` with `f9197ba86` + `431e58092` reverted, vs `main` — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds `origin/main`/`BUILD_REF` only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only.
**Blockers:** no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch).
**Demo gate:** post-demo polish — research dispatch already works in-game; the batch is verification only.
**Effort:** S (one batch).
**Effort:** S (one isolating before/after build, blocked on launcher forge-fetch plumbing).