11 KiB
| id | title | priority | status | scope | owner | updated_at | evidence | assigned_by | |
|---|---|---|---|---|---|---|---|---|---|
| p1-05-followup-shipwright-batch | Shipwright autoplay-batch sign-off — luxury variance + personality win balance | p2 | done | game1 | shipwright | 2026-06-23 |
|
shipwright |
Summary
Two p1-05 acceptance bullets cannot close inside p1-05's JSON-tuning-only scope — both depend on upstream warcouncil work and require a fresh 10-seed (or 50-game) autoplay batch on apricot once that upstream lands:
-
Luxury variance ≥ 3 distinct luxuries per seed. Un-gating experiment (
apricot-20260418_062941) falsified the JSON-tuning hypothesis; the true blocker is game length (median domination ~T85), not the tech gate. p2-54d landed themc-ai::evaluator::score_techluxury-unlock scoring (211/211 mc-ai tests pass), but full revalidation requires a batch afterp0-08domination tempo lengthens median game past T250. -
personality_win_balancePASS per p0-02 acceptance. Warcouncil owns the 50-game sample underp0-02; Shipwright signs off on the batch run + report once warcouncil delivers the personality tunes.
Both are autoplay-batch sign-off work, not new design or tuning. Tracked separately so p1-05 itself can close on the in-scope JSON tuning that already shipped (pop_peak median 69, worker_improvements min 15, techs median 39, combats median 808, strategic_gate_rejections 1670).
Acceptance
- Post-
p0-0810-seed T300 batch on apricot shows per-seedluxury_variance ≥ 3(baseline 3,1,3,1,2,1,8,3,3,0 fromapricot-20260418_062941). - Post-
p0-0250-game batch showspersonality_win_balancePASS per p0-02 acceptance. - Both batch reports filed under
.project/reports/batches/.
Dependencies
p0-08— domination tempo (warcouncil).p0-02— personality win balance 50-game sample (warcouncil).p2-54d—luxury_unlock_scoresAI bridge (landed, awaiting batch).
2026-06-04 batch sign-off attempt (bridge-cse lane, committed-build) → stays stub
Lane constraints: committed-build only (BUILD_REF=origin/main), zero source
edits, fence = this .md only. apricot.lan reachable; canonical checkout +
docker + systemd --user verified live.
origin/main SHA at check time: e9b14f1a9 (2026-06-03 23:28).
Ground-truth evidence — a committed-build smoke batch already on apricot,
built from origin/main (launcher.log: build_ref=origin/main), 10 seeds T300,
stamp 20260529_185955/smoke (game stamp 20260530_010036). Per-seed final
turn_stats.jsonl (P0 = winning human slot, P1 = pinned AI):
| seed | turn | outcome | victory | P0 luxuries | P1 luxuries |
|---|---|---|---|---|---|
| 1 | 63 | victory | domination | 1 | 0 |
| 2 | 44 | victory | domination | 5 | 0 |
| 3 | 153 | victory | domination | 7 | 0 |
| 4 | 100 | victory | domination | 5 | 0 |
| 5 | 300 | victory | score | 8 | 2 |
| 6 | 78 | victory | domination | 4 | 0 |
| 7 | 203 | victory | domination | 10 | 0 |
| 8 | 65 | victory | domination | 4 | 0 |
| 9 | 286 | in_progress | — | 16 | 6 |
| 10 | 56 | victory | domination | 3 | 0 |
Bullet 1 — luxury variance ≥ 3 distinct luxuries per seed: FAILS (precondition unmet + metric not extractable).
- Game-length precondition unmet. The bullet is explicitly gated on "after
p0-08domination tempo lengthens median game past T250". Observed median game length on currentorigin/main= ~T89 (sorted: 44,56,63,65,78,100, 153,203,286,300). Domination victories at T44–T78 still dominate. The root-caused blocker (short games depress luxury accumulation), not the tech gate, persists in the committed build. - Metric not extractable. The
luxuriesfield inturn_stats.jsonlis a scalar count, not the set of distinct luxury types per seed the acceptance demands. The baseline3,1,3,1,2,1,8,3,3,0came from a one-off un-gating analysis (apricot-20260418_062941);grep -rn luxury_variance tools/returns nothing — no current report tool reproduces a distinct-type variance metric. - Even read generously as winner-scalar-luxuries ≥3, seed 1 (=1) fails, and the loser is wiped to 0 luxuries in 8/10 seeds by early domination — the exact short-game failure mode the bullet's own dependency note predicted.
Bullet 2 — personality_win_balance PASS per p0-02: already PASSED upstream (no new batch needed).
- p0-02 is
status: done. Its acceptance carries the signed-off 50-game sample (p0-02 line 69, 2026-04-18, post-p0-26 binary, stampsapricot-20260418_224038–224050): ironhold 8/10, goldvein 9/10, blackhammer 9/10, deepforge 8/10, runesmith 9/10 — every clan wins ≥1/10 when pinned, spread 80–90%, no clan shut out, none dominant. That IS thepersonality_win_balancesample this bullet's dependency points at. - A new committed-build run cannot improve on this: the default
smokemode is a 2-player game (meta.json player_clans: {"0":"", "1":"<clan>"}) where the human slot wins via domination andwinner_personalityis empty — smoke does not surface a multi-clan win distribution. The pinned-clan / matchup-grid batches are the correct harness and were already run + signed off under p0-02. - This bullet's evidence therefore exists and passes; it simply lives in p0-02, not in a batch I needed to re-run from this lane.
Bullet 3 — file both reports under .project/reports/batches/: OUT OF FENCE.
- This lane's strict fence is the two follow-up
.mdfiles only; writing under.project/reports/batches/is outside it. Even if bullets 1+2 both passed, this bullet cannot be satisfied from this lane.
Conclusion: status remains stub. Bullet 2's underlying gate is met
(via p0-02), but bullet 1 genuinely misses (median game ~T89 « required T250;
distinct-luxury-variance metric has no extraction path), and bullet 3 is
out-of-fence. Per feedback_balance_philosophy, the luxury miss is an outcome of
game-length tempo (short domination games), not a tunable this lane may touch —
closing it requires p0-08-class tempo lengthening to actually move the median
past T250 AND a luxury_variance extraction tool to be added, neither of which
exists on origin/main @ e9b14f1a9.
2026-06-04 collect-and-analyze sweep (bridge-cse lane, fresh committed-build smoke batch) → stays stub
Re-verification, not re-run. A fresh committed-build smoke batch completed on
apricot this session (~/.cache/mc-batches/20260604_011524/smoke,
completion.marker present). Launcher: build_ref=origin/main, detached HEAD
e9b14f1a9, built_sha=e9b14f1a9, 10 seeds T300, games stamped 20260604_082815,
finished 2026-06-04 08:28 UTC. This is the same SHA and same smoke harness the
2026-06-04 attempt above analyzed; it re-confirms that verdict on fresh seeds rather
than reconstructing a dropped analysis.
Per-seed final turn_stats.jsonl (P0 = human slot, P1 = pinned AI clan):
| seed | turn | outcome | victory | winner | winner pop_peak | P0 lux | P1 lux | P1 clan |
|---|---|---|---|---|---|---|---|---|
| 1 | 300 | victory | score | P1 | 69 | 1 | 7 | ironhold |
| 2 | 201 | victory | domination | P0 | 139 | 13 | 6 | ironhold |
| 3 | 179 | victory | domination | P0 | 62 | 9 | 3 | ironhold |
| 4 | 300 | victory | score | P1 | 77 | 10 | 9 | ironhold |
| 5 | 45 | victory | domination | P1 | 16 | 0 | 1 | tinkersmith |
| 6 | 300 | victory | score | P1 | 36 | 9 | 4 | runesmith |
| 7 | 300 | victory | score | P1 | 70 | 12 | 8 | tinkersmith |
| 8 | 136 | victory | domination | P0 | 28 | 4 | 0 | ironhold |
| 9 | 214 | in_progress | — | — | — | 13 | 4 | blackhammer |
| 10 | 300 | victory | score | P1 | 75 | 3 | 10 | ironhold |
Bullet 1 — luxury variance ≥ 3 distinct luxuries per seed: STILL FAILS.
- Precondition still unmet. The bullet is gated on "after
p0-08lengthens median game past T250." This batch's end-turns sort to 45,136,179,201,214,300,300,300,300,300 (median ~T257). That median is not evidence p0-08 landed: it is the samee9b14f1a9build as the prior attempt (which saw median ~T89 on its seeds), so the swing is stochastic MCTS variance on different seeds, not a tempo property of the build (per feedback_batch_attribution_discipline — one batch's median is not a build property; same-SHA divergence is itself the proof). p0-08 has not landed inorigin/main. - Metric still not extractable.
grep -rn luxury_variance tools/returns nothing (rc=1, re-verified 2026-06-04). No tool reproduces the distinct-luxury-type variance the acceptance demands;turn_stats.jsonlluxuriesis a scalar count. - Even read generously as winner-scalar-luxuries ≥3: seed1 winner=1 and seed5 winner=1 both fail, and seed9 never resolves (in_progress) — the bullet misses on the scalar read too.
Bullet 2 — personality_win_balance PASS per p0-02: still satisfied upstream.
p0-02re-verifiedstatus: done(2026-06-04). Its signed-off 50-game pinned-clan sample is thepersonality_win_balanceevidence this bullet points at; smoke mode cannot reproduce it (when the P0 human slot wins,winner_personalityis empty — e.g. seeds 2,3,8 here). The correct harness ran + signed off under p0-02.
Bullet 3 — file reports under .project/reports/batches/: STILL OUT OF FENCE.
- This lane edits only the two follow-up
.mdfiles.
Conclusion: status remains stub. Bullet 2's gate is met (via p0-02); bullet 1
genuinely misses (no p0-08 tempo in main; no distinct-luxury-variance extraction path;
even the scalar read fails on seeds 1, 5, 9); bullet 3 is out of fence. The miss is an
outcome of game-length tempo and missing tooling — not a tunable this lane may touch
(feedback_balance_philosophy).
Out of scope
- New JSON tuning passes — p1-05 closed those.
- New AI scoring logic — p2-54d closed the luxury_unlock_scores path.
True state — 2026-06-04 gap analysis
Verified: stub. Re-verified vs today's apricot batch (20260604_011524, smoke, e9b14f1a9). Bullet 1 (luxury variance ≥3 distinct/seed) FAILS — precondition p0-08 luxury-tempo not in main; winner luxuries=1; no extraction tool. Bullet 2 (personality_win_balance) satisfied upstream (p0-02 done). Bullet 3 (report file) out of fence. Path forward: blocked until p0-08 lands luxury tempo in main; then run a non-smoke batch and score distinct luxuries/seed. Blockers: p0-08 (luxury tempo) must land in main. Demo gate: full-game-only — a balance sign-off, not demo-critical. Effort: S (once p0-08 lands).