diff --git a/.project/objectives/p0-26b-pick-research-rust-port.md b/.project/objectives/p0-26b-pick-research-rust-port.md index 835447d1..704c625a 100644 --- a/.project/objectives/p0-26b-pick-research-rust-port.md +++ b/.project/objectives/p0-26b-pick-research-rust-port.md @@ -72,14 +72,59 @@ updated off `missing`. Audited and re-scored this session: (`cargo test -p mc-ai --test pick_research_parity`, 2026-06-04): all 5 clans open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip, availability filter. -- **Bullet 5 (regression batch) NOT verified.** Needs a 10-seed T300 autoplay - batch comparing tier_peak vs the pre-port baseline. Deferred: requires the - main-tree path + `tools/autoplay-batch.sh`, and apricot's Godot was occupied - by the concurrent balance lane this session. **Resume:** run the batch once - apricot is free; if tier_peak distributions are unchanged-or-better, mark - bullet 5 ✓ and flip status → done. Workspace was green (`cargo check - --workspace` exit 0) before and after this audit; this session made NO source - changes to p0-26b — only the objective-file re-score. +- **Bullet 5 (regression batch) — post-port batch RUN, but no rigorous + before/after, so bullet stays open.** A 10-seed T300 autoplay smoke batch was + run on apricot against the committed build (`origin/main` @ `9ce1269c8`, the + SHA carrying the live `pick_research_via_bridge` dispatch). Per-seed results + (tier_peak read directly from `player_stats[].tier_peak` in + `turn_stats.jsonl` — `autoplay-report.py` aborted on a pre-existing + `autoplay-validate.py` bug, `TypeError: unhashable type: 'list'`, out of + fence so not fixed): + + | seed | turns | outcome | p0 tier_peak | p1 tier_peak | p1 clan | winner | + |---|---|---|---|---|---|---| + | 1 | 300 | victory | 6 | 6 | ironhold | ironhold | + | 2 | 207 | victory | 6 | 6 | runesmith | (p0) | + | 3 | 171 | victory | 6 | 6 | blackhammer | (p0) | + | 4 | 300 | victory | 6 | 6 | ironhold | ironhold | + | 5 | 37 | victory (domination) | 1 | 1 | tinkersmith | tinkersmith | + | 6 | 151 | victory | 2 | 3 | runesmith | runesmith | + | 7 | 68 | victory | 2 | 2 | tinkersmith | tinkersmith | + | 8 | 185 | victory | 4 | 3 | goldvein | (p0) | + | 9 | 167 | victory | 3 | 3 | blackhammer | (p0) | + | 10 | 300 | victory | 6 | 6 | ironhold | ironhold | + + **Health read (no-regression signal):** all 5 clan personalities appear + (ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game + resolved to a `victory` with both sides active — no stalls or one-sided + no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith + domination rush: both founded capitals, 18 techs researched, 106 combats, + p1 captured a city); tier_peak spans T1–T6 with no collapse to a single + value. tier_peak across all 20 player-instances: median 5.0, mean 4.2, + dist `{1:2, 2:3, 3:4, 4:1, 6:10}`. + + **Why this does NOT close bullet 5 / flip to `done`.** The acceptance text is + *comparative* — "tier_peak distributions unchanged-or-better **vs the + pre-port baseline**." No recorded pre-port tier_peak baseline exists (checked + `.project/reports/simulation/{baseline,balance}/`, `experiment-log.md`, and + apricot `~/.cache/mc-batches/`; none predate the port). The nearest pre-port + SHA (`a8760bb50`, 2026-05-14, last commit before the delegation bridge + `f9197ba86`) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI + changes, so a delta against it would not isolate the port. The only + confound-free "before" is current `main` with the two port commits + (`f9197ba86`, `431e58092`) reverted — but that revert would live on a branch + ACS does not push to the forge, and apricot's launcher only builds + forge-fetchable refs (`origin/main` / `BUILD_REF`), so it is not reachable in + this environment without extra plumbing — disproportionate for a bullet the + objective itself tags verification-only. Note also that tier_peak here is + dominated by game *length* (every 300-turn game hits the T6 cap; short + games cap lower), so even a clean before/after would be a coarse signal; + the decision-correctness of the port is already pinned by bullet-4's parity + test (8/8, all 5 clans → identical tech ordering → identical researched + techs → identical tier_peak by construction). + + Workspace was green before and after; this session made NO source changes to + p0-26b — only this objective-file re-score plus the post-port batch. ## Non-goals @@ -93,9 +138,9 @@ updated off `missing`. Audited and re-scored this session: in the Rust controller, the action-priority uplift and the actual research decision share one source and can be measured together. -## True state — 2026-06-04 gap analysis -**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ✗ bullet 5: 10-seed T300 regression batch not run. -**Path forward:** run the 10-seed T300 autoplay batch on apricot vs current main; if tier_peak unchanged-or-better, flip to done. -**Blockers:** none — ready (needs only an apricot batch + free host). +## True state — 2026-06-04 gap analysis (updated 2026-06-05) +**Verified:** 4/5. ✓ pick_tech (`mc-ai/src/evaluator.rs`), ✓ `GdAiController::pick_research` (`api-gdext/src/ai.rs`), ✓ `auto_play.gd::_pick_research` dispatch-only (zero scoring), ✓ parity test `pick_research_parity.rs` 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (`9ce1269c8`, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open. +**Path forward:** a confound-free flip to `done` needs a port-isolating before/after — current `main` with `f9197ba86` + `431e58092` reverted, vs `main` — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds `origin/main`/`BUILD_REF` only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only. +**Blockers:** no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch). **Demo gate:** post-demo polish — research dispatch already works in-game; the batch is verification only. -**Effort:** S (one batch). +**Effort:** S (one isolating before/after build, blocked on launcher forge-fetch plumbing).