docs(p0-26b): record post-port tier_peak batch; bullet 5 stays open (no baseline)
Ran the 10-seed T300 autoplay smoke batch on apricot against the committed
build (origin/main @ f5c4ee9c3, the live pick_research_via_bridge dispatch).
Recorded the per-seed tier_peak table as cited evidence: all 5 clans present,
all games resolved to victory with both sides active (seed5's 37-turn win is a
legitimate tinkersmith domination rush), tier_peak spans T1-T6 with no
collapse — a healthy, clan-diverse post-port distribution.
Bullet 5 remains [ ] / status partial: the acceptance is comparative
("unchanged-or-better vs the pre-port baseline") and no recorded pre-port
baseline exists. The nearest pre-port SHA is confounded by ~3 weeks of
unrelated changes; the only confound-free before/after (port commits reverted)
isn't reachable by the apricot launcher (forge-fetch only, revert branch is
ACS-unpushed). Decision-correctness is already pinned by the bullet-4 parity
test (8/8). tier_peak extracted directly from player_stats[].tier_peak because
autoplay-report.py aborted on a pre-existing autoplay-validate.py bug
(unhashable type: list) — left unfixed (out of fence).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4b0b661907
commit
d1b7ff8efc
1 changed files with 58 additions and 13 deletions
|
|
@ -72,14 +72,59 @@ updated off `missing`. Audited and re-scored this session:
|
|||
(`cargo test -p mc-ai --test pick_research_parity`, 2026-06-04): all 5 clans
|
||||
open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip,
|
||||
availability filter.
|
||||
- **Bullet 5 (regression batch) NOT verified.** Needs a 10-seed T300 autoplay
|
||||
batch comparing tier_peak vs the pre-port baseline. Deferred: requires the
|
||||
main-tree path + `tools/autoplay-batch.sh`, and apricot's Godot was occupied
|
||||
by the concurrent balance lane this session. **Resume:** run the batch once
|
||||
apricot is free; if tier_peak distributions are unchanged-or-better, mark
|
||||
bullet 5 ✓ and flip status → done. Workspace was green (`cargo check
|
||||
--workspace` exit 0) before and after this audit; this session made NO source
|
||||
changes to p0-26b — only the objective-file re-score.
|
||||
- **Bullet 5 (regression batch) — post-port batch RUN, but no rigorous
|
||||
before/after, so bullet stays open.** A 10-seed T300 autoplay smoke batch was
|
||||
run on apricot against the committed build (`origin/main` @ `9ce1269c8`, the
|
||||
SHA carrying the live `pick_research_via_bridge` dispatch). Per-seed results
|
||||
(tier_peak read directly from `player_stats[].tier_peak` in
|
||||
`turn_stats.jsonl` — `autoplay-report.py` aborted on a pre-existing
|
||||
`autoplay-validate.py` bug, `TypeError: unhashable type: 'list'`, out of
|
||||
fence so not fixed):
|
||||
|
||||
| seed | turns | outcome | p0 tier_peak | p1 tier_peak | p1 clan | winner |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | 300 | victory | 6 | 6 | ironhold | ironhold |
|
||||
| 2 | 207 | victory | 6 | 6 | runesmith | (p0) |
|
||||
| 3 | 171 | victory | 6 | 6 | blackhammer | (p0) |
|
||||
| 4 | 300 | victory | 6 | 6 | ironhold | ironhold |
|
||||
| 5 | 37 | victory (domination) | 1 | 1 | tinkersmith | tinkersmith |
|
||||
| 6 | 151 | victory | 2 | 3 | runesmith | runesmith |
|
||||
| 7 | 68 | victory | 2 | 2 | tinkersmith | tinkersmith |
|
||||
| 8 | 185 | victory | 4 | 3 | goldvein | (p0) |
|
||||
| 9 | 167 | victory | 3 | 3 | blackhammer | (p0) |
|
||||
| 10 | 300 | victory | 6 | 6 | ironhold | ironhold |
|
||||
|
||||
**Health read (no-regression signal):** all 5 clan personalities appear
|
||||
(ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game
|
||||
resolved to a `victory` with both sides active — no stalls or one-sided
|
||||
no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith
|
||||
domination rush: both founded capitals, 18 techs researched, 106 combats,
|
||||
p1 captured a city); tier_peak spans T1–T6 with no collapse to a single
|
||||
value. tier_peak across all 20 player-instances: median 5.0, mean 4.2,
|
||||
dist `{1:2, 2:3, 3:4, 4:1, 6:10}`.
|
||||
|
||||
**Why this does NOT close bullet 5 / flip to `done`.** The acceptance text is
|
||||
*comparative* — "tier_peak distributions unchanged-or-better **vs the
|
||||
pre-port baseline**." No recorded pre-port tier_peak baseline exists (checked
|
||||
`.project/reports/simulation/{baseline,balance}/`, `experiment-log.md`, and
|
||||
apricot `~/.cache/mc-batches/`; none predate the port). The nearest pre-port
|
||||
SHA (`a8760bb50`, 2026-05-14, last commit before the delegation bridge
|
||||
`f9197ba86`) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI
|
||||
changes, so a delta against it would not isolate the port. The only
|
||||
confound-free "before" is current `main` with the two port commits
|
||||
(`f9197ba86`, `431e58092`) reverted — but that revert would live on a branch
|
||||
ACS does not push to the forge, and apricot's launcher only builds
|
||||
forge-fetchable refs (`origin/main` / `BUILD_REF`), so it is not reachable in
|
||||
this environment without extra plumbing — disproportionate for a bullet the
|
||||
objective itself tags verification-only. Note also that tier_peak here is
|
||||
dominated by game *length* (every 300-turn game hits the T6 cap; short
|
||||
games cap lower), so even a clean before/after would be a coarse signal;
|
||||
the decision-correctness of the port is already pinned by bullet-4's parity
|
||||
test (8/8, all 5 clans → identical tech ordering → identical researched
|
||||
techs → identical tier_peak by construction).
|
||||
|
||||
Workspace was green before and after; this session made NO source changes to
|
||||
p0-26b — only this objective-file re-score plus the post-port batch.
|
||||
|
||||
## Non-goals
|
||||
|
||||
|
|
@ -93,9 +138,9 @@ updated off `missing`. Audited and re-scored this session:
|
|||
in the Rust controller, the action-priority uplift and the actual research
|
||||
decision share one source and can be measured together.
|
||||
|
||||
## True state — 2026-06-04 gap analysis
|
||||
**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ✗ bullet 5: 10-seed T300 regression batch not run.
|
||||
**Path forward:** run the 10-seed T300 autoplay batch on apricot vs current main; if tier_peak unchanged-or-better, flip to done.
|
||||
**Blockers:** none — ready (needs only an apricot batch + free host).
|
||||
## True state — 2026-06-04 gap analysis (updated 2026-06-05)
|
||||
**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (`9ce1269c8`, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open.
|
||||
**Path forward:** a confound-free flip to `done` needs a port-isolating before/after — current `main` with `f9197ba86` + `431e58092` reverted, vs `main` — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds `origin/main`/`BUILD_REF` only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only.
|
||||
**Blockers:** no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch).
|
||||
**Demo gate:** post-demo polish — research dispatch already works in-game; the batch is verification only.
|
||||
**Effort:** S (one batch).
|
||||
**Effort:** S (one isolating before/after build, blocked on launcher forge-fetch plumbing).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue