feat(@projects/@magic-civilization): ✨ add post-smoke clan evidence analysis

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-18 10:02:32 -07:00 · 2026-04-18 10:02:32 -07:00 · 8c5612914d
commit 8c5612914d
parent e71cd3ca92
2 changed files with 132 additions and 0 deletions
--- a/.project/experiments/p0-26-p1-inert.md
+++ b/.project/experiments/p0-26-p1-inert.md
@ -144,6 +144,43 @@ Same CLASS of bug as Round 2 (WeatherEvent i32 fields) but for u8 fields in mc-t

 **Predicted outcome**: E2E gate passes. Games run full length with both players active. Other u8 fields (PlayerSnap.wealth/expansion_axis/production_axis, McSnapshot.victory_city_count, combat_event structs) may still need annotation — but `choose_action_with_stats` only parses `GameState` directly, so the PlayerState fix should be sufficient for that specific call path.

+## Round 8 — post-smoke clan evidence (p0-02 / p0-01 quality gates)
+
+**Setup**: 5× clan batches (10 seeds T300 each) on the post-fix binary.
+
+**Results** (2026-04-18):
+| Clan | Wins | Avg | Winner TP | TP Gap | PUT | Wonders |
+|---|---|---|---|---|---|---|
+| ironhold | 8/10 | 202s | 3.0 | 4.0 | 1.0 | 7/10 |
+| goldvein | 9/10 | 161s | 3.0 | 3.5 | 1.0 | 8/10 |
+| blackhammer | 9/10 | 158s | 3.0 | 3.5 | 1.0 | 8/10 |
+| deepforge | 8/10 | 204s | 2.5 | 3.0 | 1.0 | 7/10 |
+| runesmith | 9/10 | 161s | 3.0 | 3.5 | 1.0 | 8/10 |
+
+**p0-01 state-at-end gates FAIL**:
+- `winner_tier_peak ≥ 6` needs median=6, got 2.5-3.0 → FAIL
+- `tier_peak_gap ≤ 2` needs median≤2, got 3.0-4.0 → FAIL
+- `peak_unit_tier ≥ 6 in ≥7/10 games` — ALL clans show PUT=1.0 (warriors only) → FAIL
+- `≥1 wonder per player in ≥5/10 games` — 7-8/10 had any wonder → likely PASS (interpret loose)
+
+**p0-02 reframed gate FAILS** (median tier_peak differs by ≥1 era between divergent pairs): only deepforge (2.5) differs from the cluster at 3.0 — max delta 0.5.
+
+**Observational diagnosis**: The tactical port functions (both players play, found cities, build militaries, fight). But games end at T39-T100 via early domination before tech tree progresses past tier 1. p1 founds but gets overrun by p0's warrior rush before archers/pikemen/etc. are researched.
+
+**Clan pair signature** (matches pre-reframe TTV data):
+- {goldvein, runesmith, blackhammer} fast-aggressive cluster: ~160s/game, 9/10 wins
+- {ironhold, deepforge} slower-defensive cluster: ~202s/game, 8/10 wins
+
+**Not a port bug — a GAMEPLAY BALANCE issue**. The port faithfully executes the heuristic, but the heuristic (1:1 from simple_heuristic_ai.gd) produces rush-dominated play at current map sizes. Fixing needs either:
+1. Longer-horizon MCTS (budget > 20 turns — currently limits tree depth)
+2. Tune DOMINANCE_FACTOR higher so aggression only triggers at bigger advantage
+3. Larger maps (tile count / player distance) — forces more buildup before contact
+4. Era-pacing tuning in tech tree to slow tier-1 → tier-2 transition
+
+Those are p0-01 / p0-02 / p0-24 balance-tuning scope, not p0-26 port scope. p0-26 itself closes on the smoke-gate evidence (port works); the quality sub-gates re-point to the balance objectives.
+
+---
+
 **RESULT (batch bdncm5x7y, in-flight mid-batch snapshot)**:

 | Seed | Outcome | Turn | p1 cities |
--- a/tools/batch-quality-metrics.sh
+++ b/tools/batch-quality-metrics.sh
@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# batch-quality-metrics.sh — aggregate post-p0-25 quality metrics per batch.
+#
+# Usage:
+#   tools/batch-quality-metrics.sh <batch_dir>
+#   tools/batch-quality-metrics.sh apricot:<remote_batch_dir>
+#
+# Prints:
+#   median_winner_tier_peak=N
+#   median_peak_unit_tier=N
+#   median_wonder_count=N
+#   median_tier_peak_gap=N
+#   games_with_any_wonder=M/N
+#   ...
+
+set -euo pipefail
+
+TARGET="${1:?usage: tools/batch-quality-metrics.sh <batch_dir | apricot:/path>}"
+
+read -r -d '' QUERY <<'EOF' || true
+set -e
+: "${DIR:?DIR must be set}"
+python3 - "$DIR" <<'PY'
+import json, pathlib, sys, statistics
+root = pathlib.Path(sys.argv[1])
+games = sorted(root.glob("game_*"))
+
+winner_tier_peaks = []
+peak_unit_tiers = []
+wonder_counts = []
+tier_peak_gaps = []
+any_wonder_games = 0
+
+for g in games:
+    stats_path = g / "turn_stats.jsonl"
+    if not stats_path.is_file() or stats_path.stat().st_size == 0:
+        continue
+    last = None
+    with open(stats_path) as f:
+        for line in f:
+            if line.strip():
+                last = line
+    if not last:
+        continue
+    try:
+        d = json.loads(last)
+    except Exception:
+        continue
+    ps = d.get("player_stats", {})
+    winner_idx = d.get("winner_index")
+    all_tp = []
+    for k, v in ps.items():
+        tp = v.get("tier_peak")
+        put = v.get("peak_unit_tier")
+        wc = v.get("wonder_count", 0)
+        if isinstance(tp, int) and tp >= 0:
+            all_tp.append(tp)
+        if isinstance(put, int) and put >= 0:
+            peak_unit_tiers.append(put)
+        if isinstance(wc, int):
+            wonder_counts.append(wc)
+            if wc >= 1:
+                any_wonder_games += 0  # game-level check below
+    # Winner tier_peak: winner_idx's tier_peak, only for decided games
+    if winner_idx is not None and winner_idx >= 0:
+        winner_stats = ps.get(str(winner_idx), {})
+        wtp = winner_stats.get("tier_peak")
+        if isinstance(wtp, int) and wtp >= 0:
+            winner_tier_peaks.append(wtp)
+    # tier_peak_gap = max - min across players
+    if len(all_tp) >= 2:
+        tier_peak_gaps.append(max(all_tp) - min(all_tp))
+    # game-level "any wonder"
+    if any(isinstance(v.get("wonder_count"), int) and v["wonder_count"] >= 1 for v in ps.values()):
+        any_wonder_games += 1
+
+def fmt_median(xs):
+    return f"{statistics.median(xs):.1f}" if xs else "(no data)"
+
+print(f"total_games_with_stats={sum(1 for g in games if (g/'turn_stats.jsonl').is_file())}")
+print(f"decided_games={len(winner_tier_peaks)}")
+print(f"median_winner_tier_peak={fmt_median(winner_tier_peaks)}  (n={len(winner_tier_peaks)})")
+print(f"median_peak_unit_tier={fmt_median(peak_unit_tiers)}  (n={len(peak_unit_tiers)})")
+print(f"median_tier_peak_gap={fmt_median(tier_peak_gaps)}  (n={len(tier_peak_gaps)})")
+print(f"median_wonder_count_per_player={fmt_median(wonder_counts)}  (n={len(wonder_counts)})")
+print(f"games_with_any_wonder={any_wonder_games}/{len(games)}")
+PY
+EOF
+
+if [[ "$TARGET" == apricot:* ]]; then
+    REMOTE_PATH="${TARGET#apricot:}"
+    ssh apricot "DIR='${REMOTE_PATH}' bash -s" <<< "$QUERY"
+else
+    DIR="$TARGET" bash -c "$QUERY"
+fi