magicciv/.project/objectives/p1-29d-p1-survival.md at 0349a4e8fd9ec0b036f0862e324b76bb20a26e5a

autocommit 9f1a28b4e8 docs(p1-29i): 📊 Full-game validation — refound lever inert on autoplay gate; do NOT author cd=5

Ran the deferred full-game validation as a controlled same-build before/after:
one GDExtension built once on apricot from pinned SHA 3d83f4781 (carries the
lever); combat_balance.json is runtime-loaded, so only cooldown_turns would
change between arms.

Pre-flight killed the batch before it ran — cd=5 is inert by construction on the
p1-29d autoplay gate surface, for two independent reasons:

1. Architectural: autoplay applies founding via GDScript dispatch_found_city,
   never calling the Rust try_found_city/process_siege where the refound gate
   lives (same class as process_science bypassed by GdTechWeb). Lever cannot fire.
2. Behavioral: autoplay produces terminal capital-capture eliminations, never
   refound churn — no event for cooldown_turns to gate (4-seed cd=0 run shows
   cities_lost 0–1 per game, all terminal; corroborated by the 10-seed
   20260529_185955 table).

Arm B (cd=5) NOT run: byte-identical by logic (zero qualifying events) — a hollow
"no effect" confirmation, the inverse of the batch-attribution trap. The pre-flight
clause authorizes stopping.

Verdict: do NOT author cd=5. combat_balance.json left at default 0 (the gridded
5/9→8/9 lift is real on the gridded harness but does NOT transfer — recontextualized
as a surface mismatch, NOT retracted). p1-29h elim bullet scoped to the gridded
surface. p1-29d D1 re-pointed: no longer gated on the refound lever (it does not
unblock D1); real unblock is the autoplay→Rust action-application architecture gap
(out of fence).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-04 18:24:28 -07:00

40 KiB

Raw Blame History

title

priority

status

scope

owner

updated_at

evidence

blocked_by

p1-29d-p1-survival

P1 (trailing AI) eliminated or stalled before T100 in 10/10 seeds — upstream of action priority

partial

game1

warcouncil

2026-06-03

2026-06-03 (decisiveness investigation, operator-forced attempt): ROOT-CAUSED the 'indecisive war' to a NAMED, inherently-STATEFUL capability that mc-ai's tactical layer cannot host. The decisiveness the juiced slot-0 supplies is army-level TARGET-LOCK + multi-turn ATTACK-COMMITMENT HYSTERESIS + PRESS-ON-CAPTURE, implemented in src/game/engine/src/entities/auto_play.gd:1109-1133 (`_attack_commitment_turns = 5` holds the whole army on `_locked_target` through scoring wobble; refreshes the 5-turn commit when the target falls so a capture is immediately pressed into the next city → elimination; `_target_stuck_turns >= 20` re-targets). mc-ai's decide_military_action (movement.rs:524-689) is PER-UNIT GREEDY: each unit independently picks nearest-city/nearest-stray every turn with no army-wide locked target and no cross-turn commitment, so captures are traded not finished — exactly the spec's 'indecisive war'.

2026-06-03 ARCHITECTURE BLOCKER (verified, not assumed): the missing capability needs cross-turn persistent state (`_locked_target`, `_attack_commitment_turns`, `_target_stuck_turns` live as instance vars on the auto_play node). mc-ai::tactical::state::TacticalState is a per-turn snapshot rebuilt fresh from the bridge each turn — it carries NO memory/scratch field (struct read in full). A persistent tactical-commitment channel would have to be threaded through api-gdext (the GdAiController bridge), which is OUT OF FENCE for this session and is genuinely an architecture/design decision, not a movement.rs edit. A stateless approximation (all-units → single nearest enemy city, recomputed each turn) reproduces force CONCENTRATION but NOT the commitment hysteresis — it keeps the easy half and drops the load-bearing anti-flip half, i.e. a degraded alternative (rejected per Complete-Code/Blocker discipline).

2026-06-03 MEASUREMENT INCOHERENCE (why no in-fence lever is even validatable): the only clean surface, tools/p1-clean-baseline.py, drives BOTH slots through the same scripted:default controller (symmetric). Any decisiveness lever added to mc-ai applies to both sides, so it credits slot-0 and slot-1 eliminations ~50/50 while D1 counts only slot-1 elimination — structurally caps near ~5/10, never the 10/10 D1 wants. An asymmetric / per-slot-controller measurement surface does not exist and is itself out-of-fence harness work. Conclusion: D1 cannot be closed by a mc-ai/tactical edit this pass; needs design input (a stateful tactical channel through api-gdext AND a per-slot competence-asymmetry measurement surface). NO mc-ai code changed for p1-29d this pass — investigation + verified root-cause only. Stays partial.

RE-BASELINE 2026-05-29 apricot batch 20260529_185955 (current main e22d78fa5, builds p1-29e production patch; 10 seeds T300): P1 tier_peak now 2-6 in ALL 10 seeds — the old 'P1 stuck at tier_peak=1' symptom is GONE (confirms p1-29e F-reframe). Scored via tools/p1-survival-score.py.

RE-BASELINE verdict: BOTH gates fail on current main. GATE A (survival, file: >=7/10 alive-aware P1 tp>=2) = 2/10 FAIL (P1 dies in 8/10; peak-ever tp>=2 doesn't help a dead P1). GATE B (convergence, title/dispatch: P1 elim<=T100 OR stalled, 10/10) = 6/10 FAIL (misses s3/s7 elim AFTER T100; s5/s9 alive AND developing). Gates are anti-correlated on s5/s9.

tools/p1-survival-score.py — reusable both-ways scorer for this objective family (raw per-seed table + GATE A survival + GATE B convergence). Scans full turn_stats for P1 elimination turn.

FINDING: p1-29e sole-city production break-out is INERT for its target population — gated own_mil>=2, but P1 mil snapshot = 0 in all 10 seeds (P1 fights via transient units between snapshots). Refiles to p1-29e.

RATIFIED 2026-05-29 (operator): acceptance is now a multi-gate scorecard, not a single gate. p1-29d owns D1 (convergence, 6/10), D2 (competitive survival = old single gate, 2/10), D3 (no-zombie, 0/2). Family gates F1 development-reach 10/10 ✓, F2 game-length ✓, F3 decisiveness ✓ owned by p1-29c/a/b. Stays partial: D1/D2/D3 all fail. No balance code changed this pass (D2 at 2/10 is in the 'do not iterate' band).

src/simulator/crates/mc-core/src/combat_balance.rs:57-94 — SoloCityGrace block added with JSON-driven defaults (1.0/0 inert when omitted)

public/games/age-of-dwarves/data/combat_balance.json:7-10 — canonical magnitude defense_mult=1.75, turns=80

src/simulator/crates/mc-combat/src/resolver.rs:264-275, 597-606 — defender_solo_city_grace_mult field on CombatParams, composes multiplicatively with last_stand, clamped to >=1.0

src/simulator/crates/mc-combat/src/resolver.rs — 3 new tests: solo_city_grace_default_is_inert, solo_city_grace_reduces_defender_damage, solo_city_grace_clamped_below_one_inert (cargo test -p mc-combat: 142/142 pass)

src/simulator/crates/mc-turn/src/processor.rs:2230-2244, 3014-3026 — both PvP call sites compute grace_active = at_last_city && cities_lost_total==0 && state.turn<cb.turns

src/simulator/crates/mc-ai/src/tactical/movement.rs:553-571, 631-637 — sole-city-threatened retreat-threshold uplift (+0.30 cap 0.90) plus step-8 march-on-enemy-city suppression

cargo test -p mc-ai -p mc-core -p mc-turn: all green on mac (mc-ai 261 pass, mc-core 249 pass, mc-turn 222 pass + 1 ignored)

BLOCKED: apricot 10-seed batch verification — autocommit (commits-tray LLM gen) returning Errno 61 Connection refused for @projects/@magic-civilization across all cycles 04:46–05:08; apricot-run.sh builds from origin/main, so apricot cannot pick up the new code until LLM service recovers and ACS resumes pushing

Context

p1-29c shipped sole-city research-priority uplift (SituationalContext::sole_city_threatened adds +0.40 Settle / +0.20 Defend / +0.50 Research) and was apricot-verified on batch 20260515_215705 (10/10 games produced complete turn_stats; infrastructure clean after commits e200634df + 8820ce04a). The gate result:

Per-seed: P0 tier_peak 2–10 (healthy)
          P1 tier_peak = 1 in ALL 10 seeds
          P1 cities at end-game: 0 in 8 seeds, 1 in 2 seeds
Gate ≥7/10 alive-aware seeds with P1 tier_peak ≥ 2: 0/10 PASS

P1 is eliminated or stalled before reaching tier 2 in 80% of seeds. The current uplift sites in mc-ai/src/policy.rs::SituationalContext cannot move this — they tune action priority, but P1 doesn't survive long enough to act on it.

The actual failure mode

The 8 seeds where P1 ends with 0 cities indicate cities_lost == cities_founded == 1 (or P1 lost its capital). This is upstream of research-priority — P1 is losing combat encounters before getting to tier 2.

Likely contributors (any/all):

Combat balance — P1's tier-1 units cannot defend against a P0 that already has tier-2 unit access by mid-game.
Map placement — P1's capital might consistently be in a hostile region (no chokepoint, no fortifiable terrain).
AI tactical decisions — P1 may be over-attacking (losing units offensively) instead of fortifying.
Personality skew — most clans bias toward "Conquest" axis early, even from a weaker position.

Acceptance (historical single-gate — SUPERSEDED by the ratified scorecard above)

Retained as the work record of the 2026-05-15/16 single-gate iteration. The "≥7/10 alive-aware P1 tier_peak ≥ 2" gate below is now gate D2 in the ratified scorecard. Canonical acceptance is the multi-gate scorecard.

✓ Diagnose which contributor(s) dominate — performed on apricot batch 20260515_215705. Already-shipped player_stats fields (cities_lost, units_lost, kills, mil, pop) tell the story without new instrumentation:

Pattern	Count	Detail
P1 capital captured	8/10	`cities_lost=1`, units_lost 1–3, kills 1–10 (P1 fights but loses). Game-end turn 44–203 (median ~70).
P1 isolated, P0 doesn't attack	2/10	T300/277, P1=1 city, mil=0, kills=0, pop=13/33. P1 builds civilians, P0 ignored it.

Conclusion: P1 is being attacked and losing its capital in 80% of games. This is a combat balance / capital-rush problem, NOT a research-priority problem. P1 already prioritizes defense (kills 1-10 attackers) but loses the mechanical contest.

✓ Identify the load-bearing lever. The diagnosis confirmed two independent failure modes:
1. Combat balance hole: last_stand_defense_multiplier(true, 0) = 1.0 — defender gets ZERO last-stand help on its starting capital because the formula multiplies by cities_lost, which is 0 for the initial city. The "last stand" never fires until P1 has already lost a city, by which point P1 is dead. This is the load-bearing flaw.
2. AI suicide-attack vector: tactical/movement.rs step 8 makes every wandering unit march on the nearest enemy city even when own field is outnumbered, so P1's army gets fed into P0 piecemeal outside P1's capital.
✓ Implement levers 1+2 combined:
- Lever 1 (mc-combat): new defender_solo_city_grace_mult field on CombatParams, computed in mc-turn::processor as cb.solo_city_grace.defense_mult when at_last_city && cities_lost_total == 0 && state.turn < cb.solo_city_grace.turns. Composes multiplicatively with last-stand + terrain + walls so defender-on-hills-with-walls stacks every layer.
- Lever 2 (mc-ai): sole_city_threatened = me.cities.len()==1 && enemy_mil_count >= own_mil_count — raises retreat HP threshold by +0.30 (cap 0.90) AND suppresses the "march on nearest enemy city" fallback. Units garrison home instead of feeding P0.
- JSON magnitudes: defense_mult: 1.75, turns: 80 in combat_balance.json. Tunable from data without recompile per Rail 2.
✓ Apricot batch verification: run 2026-05-16 18:41 PDT at apricot:/var/home/lilith/.cache/mc-batches/20260516_183534/smoke/ (10 seeds, T=300, 5-clan AI, builds from cache stamp mc-src-20260516_183534 which carries this objective's code).
✗ Gate ≥7/10 seeds with P1 tier_peak ≥ 2: 0/10 PASS — unchanged from pre-fix baseline 20260515_215705. Per-seed result (seed → P0 tp / P1 tp / P1 cities / outcome):
- s1 t=63 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory(domination)
- s2 t=44 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory
- s3 t=152 P0_tp=6 P1_tp=1 P1_c=0 lost=1 victory
- s4 t=100 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory
- s5 t=300 P0_tp=3 P1_tp=1 P1_c=1 lost=0 victory (P1 alive but stalled at tp=1)
- s6 t=76 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory
- s7 t=203 P0_tp=7 P1_tp=1 P1_c=0 lost=1 victory
- s8 t=66 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory
- s9 t=278 P0_tp=10 P1_tp=1 P1_c=1 lost=0 in_progress (P1 alive but stalled at tp=1)
- s10 t=57 P0_tp=2 P1_tp=1 P1_c=0 lost=1 victory
- 8/10 still end in P1 capital captured (cities_lost=1); 2/10 P1 survives but never reaches tier 2. SoloCityGrace (defense_mult=1.75, turns=80) + retreat-threshold uplift (+0.30 cap 0.90) + step-8 suppression were insufficient to alter the dominant failure mode.
✗ Closes p1-29c's bullet 1 and p1-29a's blocker: NOT closed — same root regime persists.

Status (2026-05-16) — apricot batch FAIL, intervention insufficient

Code changes shipped; apricot 10-seed batch run; gate 0/10 PASS (unchanged from pre-fix baseline). Per the objective's iteration discipline ("Gate regresses or unchanged (0-2/10): stop, do NOT iterate"), this objective stays partial. The two grace-mechanics levers (combat damage reduction + AI retreat threshold) do not move the dial on the dominant failure mode — P0 captures P1's capital before grace expires, often before turn 100 (8/10 seeds).

Diagnosis for next iteration (filed for whoever picks this up): the 1.75× defender multiplier is being applied but P0's tier-1 spam still wins on volume against P1's single-unit garrison. The bottleneck is P1's production of defenders, not the per-defender damage math — a sole-city AI with 1 settler + 1 warrior cannot out-build a P0 with 3 cities already producing units, regardless of damage multipliers. Future work should target either (a) auto-spawning a free defender for the trailing AI when threatened (mechanical handicap, not balance), (b) tightening map placement so P1's capital is not adjacent to P0's start region, or (c) accepting that 4X games legitimately end in early elimination of the weakest player and revising the gate to be "P1 reaches tier 2 OR survives ≥150 turns".

Files modified:

public/games/age-of-dwarves/data/combat_balance.json — added solo_city_grace block.
src/simulator/crates/mc-core/src/combat_balance.rs — added SoloCityGrace struct.
src/simulator/crates/mc-core/src/lib.rs — re-export SoloCityGrace.
src/simulator/crates/mc-combat/src/resolver.rs — added defender_solo_city_grace_mult field + wiring in damage calc + 3 unit tests.
src/simulator/crates/mc-combat/src/lib.rs — re-export default_solo_city_grace_mult.
src/simulator/crates/mc-turn/src/processor.rs — compute grace at both PvP combat call sites.
src/simulator/crates/mc-ai/src/tactical/movement.rs — sole-city-threatened retreat + step-8 suppression.

Local test results:

cargo check --workspace: clean.
cargo test -p mc-combat: 142/142 lib + 10/10 predict + others — all green, including 3 new solo-city tests.
cargo test -p mc-ai -p mc-core -p mc-turn: all green.

Top hypothesis if the apricot gate still misses 7/10 after batch runs:

Tuning bump: raise defense_mult from 1.75 → 2.25 and/or extend turns from 80 → 120. The last_stand cap of 3.0× is precedent for "hard but not impossible" — 2.25 keeps us well below domination-blocking territory while substantially raising P1's survival odds against the dominant clan's tier-2 army wave.
Secondary lever: in decide_military_action, when me.cities.len()==1 && enemy_mil_count > own_mil_count, also bias unit movement TOWARD me.cities[0].hex instead of just suppressing step 8 — actively reel scattered units home.

Status (2026-05-29) — re-baseline on current main; BOTH gates fail → resolved into the ratified multi-gate scorecard (see Acceptance below)

Per p1-29e's explicit recommendation ("re-baseline p1-29c/29d against current main before further patch work"), ran a clean apricot batch 20260529_185955 on current main (e22d78fa5, which carries the p1-29e sole-city production break-out). Scored with tools/p1-survival-score.py.

Per-seed (P1 = trailing slot 1):

seed	endT	outcome	P0 tp	P1 tp	P1 cities	P1 elimT	P1 alive
1	63	victory	2	2	0	63	no
2	44	victory	2	2	0	44	no
3	153	victory	6	3	0	153	no
4	100	victory	2	3	0	100	no
5	300	victory	7	5	1	—	yes
6	78	victory	2	2	0	78	no
7	203	victory	7	5	0	203	no
8	65	victory	2	2	0	65	no
9	286	in_progress	10	6	1	—	yes
10	56	victory	2	2	0	56	no

Key change vs the 2026-05-16 baseline: P1 tier_peak rose from 1 in all 10 seeds to 2-6 in all 10 — the old "stuck at tier 1" symptom is resolved on current main (pure-research drift, not the production patch — P1 still completes 0 buildings; see p1-29e). Game-length/elimination-timing dynamics are otherwise unchanged from baseline.

Both candidate gates FAIL, in opposite ways:

GATE A — survival (this file's Acceptance bullet): ≥7/10 alive-aware seeds with P1 tier_peak ≥ 2. Result 2/10 FAIL (only s5, s9 end alive). The tier_peak symptom is gone, but P1 still loses its capital in 8/10 — a dead P1 with peak-ever tp≥2 does not satisfy an alive-aware gate. This matches p1-29a's "structural, territory problem" conclusion.
GATE B — convergence (this objective's TITLE + the dispatch): P1 eliminated OR stalled before T100, in 10/10. Result 6/10 FAIL. Misses: s3 (elim T153) and s7 (elim T203) — P1 eliminated but after T100 (lingering wildcard); s5 and s9 — P1 alive and developing to tp 5/6 (a genuine surviving contender, not "converged").

The two gates are anti-correlated: s5/s9 are the only GATE A passes and the GATE B failures. They demand opposite engineering (make P1 stronger vs. make the game converge), so the objective cannot be closed — or even correctly iterated — until the canonical gate is chosen. Question forwarded to the orchestrator (Clare); per-seed data scored both ways above so the decision can be made in one shot.

Discipline note: under GATE A the result is 2/10, inside the objective's own "stop, do NOT iterate (0-2/10)" band — so no balance tuning was attempted. Under GATE B the result is 6/10 with a concrete 4-seed miss list, which would invite targeted work, but GATE B is not yet a written acceptance criterion. No code changed in this pass — re-baseline + measurement only.

Secondary finding (refiled to p1-29e): the p1-29e sole-city production break-out is inert for its target population. Its gate own_mil >= SOLE_CITY_ECON_MIN_DEFENDERS (2) never fires because P1's mil snapshot is 0 in all 10 seeds (P1 fights via transient units that exist between snapshots). The lever cannot help the very player it targets until that floor is reconsidered.

Acceptance — multi-gate scorecard (RATIFIED 2026-05-29 by operator)

This scorecard is the canonical acceptance structure for p1-29d, ratified by the operator on 2026-05-29 ("it sounds like we should have many gates"). It supersedes the single-gate ## Acceptance (historical …) section below, which is retained as the work record. p1-29d closes done only when its three owned gates (D1, D2, D3) all pass on a clean apricot batch.

Per operator steer, p1-29d is not a single pass/fail — trailing-AI health is multi-dimensional, and the binary survival-vs-convergence framing hid a third mode (zombie survivors: s5/s9 end alive but inert, mil=0, never a threat). The family already owns most dimensions; p1-29d uniquely owns the END-GAME FATE of the trailing AI. Scored on batch 20260529_185955:

Family-level gates (owned elsewhere; reported for context):

#	Dimension	Threshold	Provenance	Current
F1	Development reach	P1 `tier_peak ≥ 2` ever, ≥7/10	p1-29c (done)	10/10	✓
F2	Game length	median ≤ T500	p1-29 (user 2026-04-26)	median 89	✓
F3	Decisiveness	victory by cap	p1-29 cycle-4	9/10 + 1 in_prog	✓

p1-29d-owned gates (end-game fate of trailing AI):

#	Dimension	Threshold	Source	Current
D1	Convergence	P1 eliminated≤T100 OR stalled(alive,tp≤1), 10/10	title + dispatch	6/10	✗
D2	Competitive survival	P1 ends ALIVE with tp≥2, ≥7/10	body Acceptance (liveness-strict)	2/10	✗
D3	No-zombie	among ALIVE survivors, P1 non-inert (`mil>0` or >1 city)	ratified 2026-05-29 (this re-baseline)	0/2 survivors healthy	✗

The 4 non-converged seeds split into two distinct pathologies:

Late elimination (s3 @T153, s7 @T203): P1 lingers as a wildcard well past T100 before dying — fails D1.
Zombie tail (s5 @T300, s9 @T286): P1 alive, researched to tp 5/6, but mil=0, 1 city, kills=0 — an ignored bystander, not a competitor. Passes D2's letter (alive+developed) but fails D3 (inert). These are the only D2 "successes," and they are not healthy survivals.

Net trailing-AI fate on current main: 6/10 clean fast convergence, 2/10 late elimination, 2/10 inert zombie tail, 0/10 genuine competitive survival. The headline win from this re-baseline is F1 (development reach now 10/10 — the old "stuck at tier 1" symptom is fixed). The remaining unhealth is concentrated in 4 seeds with two named, distinct causes.

Thresholds for D1/D2 are lifted verbatim from existing sources; D3 and its threshold were newly minted in this re-baseline and ratified alongside D1/D2 on 2026-05-29.

Next worker — direction (no gate targeted yet; operator ratified the panel but did not pick a column to drive): the two pathologies are independent and have distinct levers. Late elimination (D1, s3/s7) is a combat/pacing problem — P1 dies after lingering, so the lever is either faster resolution or a turn-floored convergence. Zombie tail (D3, s5/s9) is an AI-behaviour problem — P1 survives but builds no military (mil=0) and never contests; the lever is making an isolated, un-threatened sole-city AI either expand/militarise or get found-and-finished by P0. Note the p1-29e production break-out is inert here (own_mil>=2 floor never fires; P1 mil=0 in 10/10) — reconsider that floor before relying on it. Per the objective's discipline, D2 at 2/10 is in the "do not iterate" band; do not tune balance code until a specific gate is chosen as the target.

Two architecture-level findings (2026-05-29, during D3 root-cause dig) — STOP/REPORT

While diagnosing D3 (zombie survivors s5/s9), tracing which code actually drives the batch surfaced two findings that recontextualize this whole objective. Both were verified, not assumed.

Finding A — the autoplay matchup is asymmetric (P0 is a juiced harness player)

auto_play.gd controls only slot 0 (it runs _play_turn for GameState.get_current_player(), then _end_turn presses the "End Turn" button and waits; the other players resolve through TurnManager's real AI pipeline → ai_turn_bridge → mc-ai). tools/autoplay-batch.sh (lines ~200-210) states this verbatim:

"auto_play.gd impersonates slot 0 with extra strategic helpers (rush-buy gold, attack-phase commit, formation orders) that one clan wins every game. Rotating which clan holds slot 0 across seeds spreads the autoplay-shaped opportunity."

So P0 (slot 0) is a deliberately-boosted scripted player; P1 (the "trailing AI") is the plain game AI (mc-ai). The gate "P1 survives/converges vs P0" has been measuring the real AI against an opponent engineered to win every game. This is the most likely reason the entire p1-29 family of mc-ai P1-buffs "never moved the dial" (p1-29a/b/c/d/e all report this): no amount of buffing P1 makes it reliably beat a slot-0 player with bespoke combat helpers the AI pipeline doesn't get. The family documents auto_play's research helpers (p1-29 cycles 3-4) but not this combat-side asymmetry as the load-bearing cause.

Finding B — P1 has a genuine production stall (independent of A)

In the two seeds where P0 ignores P1 (s5/s9: P1 alive at T286/T300, never attacked), P1 still produces nothing for 250+ turns: unit_created owner=1 fires exactly once (the turn-0 starter), city_building_completed owner=1 = 0, units_lost=0, kills=0, while pop grows 8→33 and research climbs to tp 5/6. units_lost=0 + only-1-unit-created rules out bankruptcy/disband. P1 founds its city and then its production never converts for the rest of the game — a real defect in the mc-ai production path (or how its EnqueueBuild actions are applied / a near-zero production-yield tile assignment). Not yet root-caused to a single line; the mc-ai production.rs break-out (p1-29e) is downstream of this and irrelevant until it's fixed.

Implication for the gate

These mean the D1/D2/D3 columns may be partly measuring harness asymmetry (A), not trailing-AI quality. Before tuning P1 further, the design question is whether the gate should: (i) fix the production stall (B) and re-measure; (ii) symmetrize the harness (P0 = plain AI, or P1 also gets the helpers) so the matchup is fair; or (iii) redefine the trailing-AI gate to be measured in a same-controller context (cf. p1-29g's trained-vs-scripted framing). Held for operator decision — no code changed.

Status (2026-06-03) — D1 re-scored at turn≤100; the stale `tp≤1` proxy is the binding constraint, not balance

Dispatch reaffirmed D1 (convergence: P1 eliminated OR stalled before T100, 10/10) as the canonical gate for the Pre-Wave-1 pacing requirement. Re-scored the intact 20260529_185955 batch (still on apricot; HEAD 7f2eef48c has only moved on RL-export/comms paths since the e22d78fa5 baseline, so the convergence regime is unchanged). New tool tools/p1-convergence-lens.py samples P1's state at the last turn ≤100 instead of end-of-game, and reports two readings of "stalled":

seed	endT	P1 elimT	@T100: cities / mil / kills / tp	D1-literal (tp≤1)	D1-nonfactor (mil=0,c≤1,kills=0)
1	63	63	dead	OK	OK
2	44	44	dead	OK	OK
3	153	153	1 / 0 / 2 / 3	NO	NO
4	100	100	dead	OK	OK
5	300	—	1 / 0 / 0 / 5	NO	OK
6	78	78	dead	OK	OK
7	203	203	1 / 0 / 3 / 5	NO	NO
8	65	65	dead	OK	OK
9	286	—	1 / 0 / 0 / 3	NO	OK
10	56	56	dead	OK	OK

D1-literal: 6/10 (unchanged — confirms baseline holds on current main).
D1-nonfactor: 8/10. The zombie tail s5/s9 is alive at T100 with mil=0, 1 city, kills=0 — genuinely inert by the objective's own D3 definition. They fail D1 only because research-drift inflated tier_peak to 5/3 (> 1); tier_peak is engine-auto research, not a policy/threat signal (p1-29e F1, p1-29g note). Recognizing functional stall fixes both with zero code.
The 2 residual misses are s3/s7 only. At T100 both are alive, mil=0, 1 city, but kills=2/3 — P1 is mid-siege, actively killing attackers, and falls at T153/T203. These are genuine non-convergences under either reading. They are not zombies; they are a trailing AI legitimately resisting capture for 50–100 turns past T100.

Implication. The binding constraint on D1 is a gate-semantics question, not a balance lever: (a) "stalled" as literal tp≤1 vs. "stalled" as functional non-factor (mil=0 & cities≤1 & kills=0). Under (a) the gate is 6/10; under the non-factor reading it is 8/10. The only seeds needing actual engineering are s3/s7, and per Finding A (confirmed verbatim: auto_play.gd slot-0 is the juiced harness player) "converging" them faster means tuning the non-shipping P0 siege — the exact dead end the whole p1-29 family keeps hitting. Alternatively s3/s7 satisfy a 4X-legitimacy reading ("trailing AI may legitimately resist ≥150 turns") already floated in the 2026-05-16 status.

No code changed; no balance tuned (D2 still 2/10, inside the "do not iterate" band). Decision forwarded to orchestrator: (Q1) which reading of "stalled" is canonical for D1 — literal tp≤1, or functional non-factor at T100? (Q2) the 6/8/10 numbers are all on the juiced-P0 autoplay surface (Finding A); accept that surface, or re-measure clean via p1-29g's trained/symmetric mechanism before closing? Scorer: tools/p1-convergence-lens.py <results_dir>.

Status (2026-06-03, cont.) — operator unblocked (literal gate + clean surface); root-cause of the clean-measurement path

Operator ruling: Q1 = literal gate stands (stalled := tier_peak<=1; the research-drift that inflates tp on inert civs is a real defect to fix, not to redefine around). Q2 = re-measure on a CLEAN surface (trained/symmetric), the juiced-P0 numbers do not count. Directive: "no bypasses, no metric-redefinition, no juiced-harness numbers — real convergence under the literal gate, verified on a clean surface."

Root-caused the surfaces before writing any fix. Verified findings:

The gate matchup is a 2-PLAYER DUEL, not 5-player. meta.json of batch 20260529_185955: num_players:2, map_size:duel, map_type:pangaea, wrap:sphere, landmass:continents, slot0 controller:"" (autoplay), slot1:scripted:default. The "5-clan" framing is clan rotation across the 10 seeds; each game is 2p. The faithful clean surface is therefore the same 2p duel minus the slot-0 juice.
Research is auto-drift, confirmed (operator's framing validated). mc-turn processor.rs::process_science: science_per_turn = culture^1.5 × min(cities,12) × 15 — depends only on city count and the clan culture axis, NOT on pop/buildings/production/military. Research-target selection is fully automatic (topo.find(can_research)). A 1-city inert civ accrues science every turn forever and auto-climbs eras with zero policy involvement. (Note: the full autoplay game drives research via the GDScript TurnManager.get_tech_web() → GdTechWeb path; process_science is the mc-turn/dispatch equivalent. Both make tp a pure function of passive existence, not development.)
STOP/REPORT — no existing surface can measure a CLEAN tier_peak. The symmetric player-api harness (mc-player-api, used by p1-29e's mine_divergence) does not load a TechWeb (projection.rs TRACKED comments confirm). Driven locally (both slots scripted:default via suggest()), research.researched is empty and tier_peak==0 at all turns even with 4+ cities — so it cannot measure the D1 metric. The only surface that produces tier_peak is the full Godot autoplay scene, whose slot 0 is the juiced player. Therefore a clean tier_peak baseline requires NEW infra: either (a) an AUTO_PLAY_ALL_AI mode that runs slot 0 through the plain mc-ai pipeline (de-juice; that scene already loads the tech web) + rebuild + apricot batch — the faithful path; or (b) wire a TechWeb into the harness dispatch + surface researched techs in the projection (lower fidelity: harness worldgen/capital-placement differs from the autoplay scene).
Adversarial check (held for operator): the research-drift "fix" alone is a metric-input bypass. Suppressing auto-research so inert civs read tp<=1 changes only the metric's input, not the game state — s5/s9 still sit alive to T300, s3/s7 still die T153/T203; the game converges zero turns sooner. That satisfies the literal tp<=1 letter while delivering none of the dispatch's stated purpose ("stable late-game pacing requires this convergence"). Real convergence for the zombie/late-elim seeds needs game-STATE change: P0 finding-and-finishing the isolated P1 (excluded — juiced path), a map/placement change, or fixing the production stall (Finding B) — which is the more plausible root of the pacing pathology but pulls toward a stronger P1 (anti-convergence) unless it routes to "expand → get found → eliminated." These four constraints (real convergence ∧ literal tp≤1 ∧ clean surface ∧ no bypass) are in tension and may not be jointly satisfiable by a small lever; flagged for the operator alongside the clean baseline.

Artifacts: tools/p1-clean-baseline.py (symmetric-duel driver; correct city/elim output, tier_peak inert until a TechWeb surface exists — banner in its docstring). Next planned step: build AUTO_PLAY_ALL_AI (path a), rebuild, run the clean 2p duel batch, score D1-literal on real tier_peak data. No fix/balance code written yet (clean numbers are the prerequisite).

Status (2026-06-03, cont.²) — CLEAN baseline run: 0/10 convergence; the juice WAS the offense (structural reframe)

Ran a clean 10-seed baseline (tools/p1-clean-baseline.py, evidence .local/clean_baseline.txt): same 2p duel matchup as the gate (players=2, map_size=duel, map_type=pangaea, victory=domination) but both slots driven by the real scripted:default controller via the player-api harness suggest() — verified to route through dispatch.rs:984 drive_controller_turn → mc_ai decide_tactical_actions, i.e. the actual shipping mc-ai tactical pipeline, NOT a lighter heuristic. This is the gate matchup minus the slot-0 juice. (tier_peak is 0 here — harness has no TechWeb — but D1 does not need it: a P1 alive at T100 with ≥2 cities is unambiguously non-converged regardless of era.)

Result — P1 (slot 1) state at T100, all 10 seeds:

seed	elimT	alive@100	cities@100	mil@100
1	—	yes	9	74
2	—	yes	9	74
3	—	yes	9	74
4	—	yes	9	72
5	—	yes	9	74
6	—	yes	12	90
7	—	yes	12	90
8	—	yes	9	74
9	—	yes	12	90
10	—	yes	9	74

D1 convergence on the clean surface: 0/10. Zero eliminations; every seed ends with P1 a large multi-city peer (9–12 cities, 72–90 units). Seeds genuinely vary (s6/7/9 reach 12 cities, the rest 9; s4 mil=72 vs 74), ruling out a seeding artifact.

The structural finding (decisive, build-mooting): on a fair surface the shipping mc-ai does not conduct decisive offense — both sides expand to fill the map and stand off; neither finishes the other. The convergence the gate measured on the apricot batch came entirely from the slot-0 juice (auto_play.gd rush-buy + attack-phase commit + formation orders) — i.e. the juice WAS the offense. The "trailing AI" only stalled/died because it faced an artificially-competent attacker the real game does not field. p1-29a/b/c/d/e all reported "balance levers never moved the dial" for exactly this reason (independently noted as Finding A).

Implications:

The planned AUTO_PLAY_ALL_AI build (de-juice slot 0, rebuild, apricot batch) is moot for raising D1 — it would faithfully reproduce this same 0/10, just with real tier_peak attached. (It remains the correct surface if a future competent attacker exists and we want to re-measure.)
The four constraints (real convergence ∧ literal tp≤1 ∧ clean surface ∧ no bypass) are jointly unsatisfiable by balance/research/production tuning. No tuning of P1 makes a non-attacking opponent finish it; suppressing P1's research (the only lever that yields tp≤1) is a metric-input bypass that changes no game state.
The real lever is mc-ai offensive competence — decisive assault / siege commitment in the tactical layer (currently supplied only by the GDScript autoplay juice), or the learned-controller track (p1-29f missing → p1-29g). This is the AI-quality reframe p1-29g already anticipated ("separate positional from controller-strength advantage").

No balance/research/production code written. The objective's blocker is now precisely characterized and is an AI-capability problem, not a balance gate. Forwarded to operator for redirection: (i) retarget p1-29d at mc-ai offensive competence (or fold into p1-29f/g), (ii) redefine the gate to be measured against a competent reference attacker (the juiced surface, accepted as the stand-in it is), or (iii) accept that fair scripted Game-1 duels do not converge by T100 and the "trailing AI" concept only applies vs a stronger opponent.

Refinement (2026-06-03) — it is NOT "the AI won't attack"; the war is INDECISIVE (contact+combat probe, omniscient)

Adversarial check (does contact/combat even happen, or is 0/10 a no-contact map artifact?). Probed seeds 1/6/7 omniscient, tracking inter-player distance, unit losses, and city captures (evidence .local/p1-29d-contact-evidence.txt):

seed	capdist	P0 unit loss	P1 unit loss	P0 lost a city	P1 lost a city
1	12.0	yes	yes	no	no
6	10.6	yes	yes	no	yes
7	9.4	yes	yes	no	yes

So on the clean surface the two civs DO make contact (adjacent/same-tile, dist→0 every seed), DO fight (both lose units every seed), and cities DO get captured (P1 lost a city in s6/s7) — yet neither is ever eliminated: the loser refounds/recovers and both empires keep growing to 8–10+ cities by T90. The earlier "AI doesn't conduct offense" phrasing is therefore wrong/too strong; the accurate finding is: the fair scripted mc-ai wages an indecisive war — it skirmishes and even captures, but cannot deliver a killing blow before the loser recovers. The slot-0 juice (rush-buy + attack-phase commit + formation orders) supplied the decisiveness / siege-conversion tempo that turns a capture into an elimination; remove it and captures are traded, not finished. That is why the whole p1-29 family's P1-side balance levers "never moved the dial": no buff to P1 induces convergence when the opponent cannot close out a win.

Scope discipline: measured against the scripted:default tactical controller via suggest() (dispatch.rs:984), corroborated by the apricot batch's slot-1 (plain-AI) behaviour — it never eliminated the juiced P0 and survived only as an ignored zombie. Not a blanket claim about the full MCTS+tactical pipeline (the batch AI slots also run MCTS; mcts_* stats), which this probe did not isolate. Caveat: the harness map is not fully gate-faithful — player_api_main.gd:127 gen.generate(seed_v, map_size) ignores map_type (so it is NOT pangaea) and uses spaced capitals; a map-faithful confirmation still wants the AUTO_PLAY_ALL_AI build on pangaea. But pangaea is more connected than the spaced-capital harness map, so it would if anything produce more contact, not less — the "indecisive war, no elimination" conclusion is unlikely to flip. The lever is mc-ai siege/assault decisiveness, not balance.

Why this exists separately from p1-29c

p1-29c's spec is "raise priority of Settle/Defend/Research when sole-city threatened." That work landed and is correct. The empirical failure mode is "P1 doesn't survive long enough to ACT on those priorities." That's a different code surface and a different design question — it deserves its own objective.

References

Apricot batch evidence: apricot:~/.cache/mc-batches/20260515_215705/smoke/
p1-29c sole-city implementation: mc-ai/src/policy.rs::action_prior_with_context
Prior diagnosis: .project/objectives/p1-29.md lines 124-134 ("Early-end games are intentionally-ungated elimination wins.")

True state — 2026-06-04 gap analysis (UPDATED — refound lever validated full-game, does NOT unblock D1)

Verified: partial. D1 remains unconverged. The earlier reading that D1 was "gated on the refound-suppression lever (p1-29h/p1-29i)" is now corrected by full-game validation: the refound lever was implemented (p1-29i, CombatBalance::refound_suppression, default off) and validated full-game as a controlled same-build before/after — verdict inert by construction on the autoplay gate surface. Two independent reasons (p1-29i): (1) the autoplay AI applies founding/capture in GDScript (ai_turn_bridge_dispatch.gd:170 dispatch_found_city), which NEVER calls the Rust mc_turn::processor::try_found_city/process_siege where the refound gate lives — same class of bypass as process_science→GdTechWeb already documented here; (2) the autoplay surface produces terminal capital-capture eliminations (cities_lost=1 → game ends) or zombie survival, never the lose-then-refound churn the gridded micro-lift required — so there is no event for the cooldown to gate (corroborated by a 4-seed cd=0 run + this objective's own 10-seed 20260529_185955 table). The gridded 5/9→8/9 lift is real on the gridded harness but does NOT transfer. No live JSON value authored; lever stays default 0. Path forward: D1 is NOT unblocked by the refound lever. The real unblock is an architecture change — route autoplay action-application (founding/capture) through the Rust mc_turn::processor so data-driven combat-balance levers reach the gate surface — OR the offensive-competence / learned-controller reframe already documented above (p1-29f/g). Until then D1 stays unconverged on a fair surface (it is not movable by any data-only balance lever that lives in the bypassed Rust path). Blockers: autoplay→Rust action-application architecture gap (new objective; Rust/GDScript, out of fence for the data-only refound lane). Demo gate: full-game-only — AI convergence is a quality gate, not demo-critical. Effort: L (gated on the architecture change + re-measurement).

seed	elimT	alive@100	cities@100	mil@100
1	—	yes	9	74
2	—	yes	9	74
3	—	yes	9	74
4	—	yes	9	72
5	—	yes	9	74
6	—	yes	12	90
7	—	yes	12	90
8	—	yes	9	74
9	—	yes	12	90
10	—	yes	9	74

seed	elimT	alive@100	cities@100	mil@100
1	—	yes	9	74
2	—	yes	9	74
3	—	yes	9	74
4	—	yes	9	72
5	—	yes	9	74
6	—	yes	12	90
7	—	yes	12	90
8	—	yes	9	74
9	—	yes	12	90
10	—	yes	9	74

40 KiB Raw Blame History Unescape Escape