feat(@projects/@magic-civilization): ✨ add post-smoke clan evidence analysis
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
e71cd3ca92
commit
8c5612914d
2 changed files with 132 additions and 0 deletions
|
|
@ -144,6 +144,43 @@ Same CLASS of bug as Round 2 (WeatherEvent i32 fields) but for u8 fields in mc-t
|
|||
|
||||
**Predicted outcome**: E2E gate passes. Games run full length with both players active. Other u8 fields (PlayerSnap.wealth/expansion_axis/production_axis, McSnapshot.victory_city_count, combat_event structs) may still need annotation — but `choose_action_with_stats` only parses `GameState` directly, so the PlayerState fix should be sufficient for that specific call path.
|
||||
|
||||
## Round 8 — post-smoke clan evidence (p0-02 / p0-01 quality gates)
|
||||
|
||||
**Setup**: 5× clan batches (10 seeds T300 each) on the post-fix binary.
|
||||
|
||||
**Results** (2026-04-18):
|
||||
| Clan | Wins | Avg | Winner TP | TP Gap | PUT | Wonders |
|
||||
|---|---|---|---|---|---|---|
|
||||
| ironhold | 8/10 | 202s | 3.0 | 4.0 | 1.0 | 7/10 |
|
||||
| goldvein | 9/10 | 161s | 3.0 | 3.5 | 1.0 | 8/10 |
|
||||
| blackhammer | 9/10 | 158s | 3.0 | 3.5 | 1.0 | 8/10 |
|
||||
| deepforge | 8/10 | 204s | 2.5 | 3.0 | 1.0 | 7/10 |
|
||||
| runesmith | 9/10 | 161s | 3.0 | 3.5 | 1.0 | 8/10 |
|
||||
|
||||
**p0-01 state-at-end gates FAIL**:
|
||||
- `winner_tier_peak ≥ 6` needs median=6, got 2.5-3.0 → FAIL
|
||||
- `tier_peak_gap ≤ 2` needs median≤2, got 3.0-4.0 → FAIL
|
||||
- `peak_unit_tier ≥ 6 in ≥7/10 games` — ALL clans show PUT=1.0 (warriors only) → FAIL
|
||||
- `≥1 wonder per player in ≥5/10 games` — 7-8/10 had any wonder → likely PASS (interpret loose)
|
||||
|
||||
**p0-02 reframed gate FAILS** (median tier_peak differs by ≥1 era between divergent pairs): only deepforge (2.5) differs from the cluster at 3.0 — max delta 0.5.
|
||||
|
||||
**Observational diagnosis**: The tactical port functions (both players play, found cities, build militaries, fight). But games end at T39-T100 via early domination before tech tree progresses past tier 1. p1 founds but gets overrun by p0's warrior rush before archers/pikemen/etc. are researched.
|
||||
|
||||
**Clan pair signature** (matches pre-reframe TTV data):
|
||||
- {goldvein, runesmith, blackhammer} fast-aggressive cluster: ~160s/game, 9/10 wins
|
||||
- {ironhold, deepforge} slower-defensive cluster: ~202s/game, 8/10 wins
|
||||
|
||||
**Not a port bug — a GAMEPLAY BALANCE issue**. The port faithfully executes the heuristic, but the heuristic (1:1 from simple_heuristic_ai.gd) produces rush-dominated play at current map sizes. Fixing needs either:
|
||||
1. Longer-horizon MCTS (budget > 20 turns — currently limits tree depth)
|
||||
2. Tune DOMINANCE_FACTOR higher so aggression only triggers at bigger advantage
|
||||
3. Larger maps (tile count / player distance) — forces more buildup before contact
|
||||
4. Era-pacing tuning in tech tree to slow tier-1 → tier-2 transition
|
||||
|
||||
Those are p0-01 / p0-02 / p0-24 balance-tuning scope, not p0-26 port scope. p0-26 itself closes on the smoke-gate evidence (port works); the quality sub-gates re-point to the balance objectives.
|
||||
|
||||
---
|
||||
|
||||
**RESULT (batch bdncm5x7y, in-flight mid-batch snapshot)**:
|
||||
|
||||
| Seed | Outcome | Turn | p1 cities |
|
||||
|
|
|
|||
95
tools/batch-quality-metrics.sh
Executable file
95
tools/batch-quality-metrics.sh
Executable file
|
|
@ -0,0 +1,95 @@
|
|||
#!/usr/bin/env bash
|
||||
# batch-quality-metrics.sh — aggregate post-p0-25 quality metrics per batch.
|
||||
#
|
||||
# Usage:
|
||||
# tools/batch-quality-metrics.sh <batch_dir>
|
||||
# tools/batch-quality-metrics.sh apricot:<remote_batch_dir>
|
||||
#
|
||||
# Prints:
|
||||
# median_winner_tier_peak=N
|
||||
# median_peak_unit_tier=N
|
||||
# median_wonder_count=N
|
||||
# median_tier_peak_gap=N
|
||||
# games_with_any_wonder=M/N
|
||||
# ...
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
TARGET="${1:?usage: tools/batch-quality-metrics.sh <batch_dir | apricot:/path>}"
|
||||
|
||||
read -r -d '' QUERY <<'EOF' || true
|
||||
set -e
|
||||
: "${DIR:?DIR must be set}"
|
||||
python3 - "$DIR" <<'PY'
|
||||
import json, pathlib, sys, statistics
|
||||
root = pathlib.Path(sys.argv[1])
|
||||
games = sorted(root.glob("game_*"))
|
||||
|
||||
winner_tier_peaks = []
|
||||
peak_unit_tiers = []
|
||||
wonder_counts = []
|
||||
tier_peak_gaps = []
|
||||
any_wonder_games = 0
|
||||
|
||||
for g in games:
|
||||
stats_path = g / "turn_stats.jsonl"
|
||||
if not stats_path.is_file() or stats_path.stat().st_size == 0:
|
||||
continue
|
||||
last = None
|
||||
with open(stats_path) as f:
|
||||
for line in f:
|
||||
if line.strip():
|
||||
last = line
|
||||
if not last:
|
||||
continue
|
||||
try:
|
||||
d = json.loads(last)
|
||||
except Exception:
|
||||
continue
|
||||
ps = d.get("player_stats", {})
|
||||
winner_idx = d.get("winner_index")
|
||||
all_tp = []
|
||||
for k, v in ps.items():
|
||||
tp = v.get("tier_peak")
|
||||
put = v.get("peak_unit_tier")
|
||||
wc = v.get("wonder_count", 0)
|
||||
if isinstance(tp, int) and tp >= 0:
|
||||
all_tp.append(tp)
|
||||
if isinstance(put, int) and put >= 0:
|
||||
peak_unit_tiers.append(put)
|
||||
if isinstance(wc, int):
|
||||
wonder_counts.append(wc)
|
||||
if wc >= 1:
|
||||
any_wonder_games += 0 # game-level check below
|
||||
# Winner tier_peak: winner_idx's tier_peak, only for decided games
|
||||
if winner_idx is not None and winner_idx >= 0:
|
||||
winner_stats = ps.get(str(winner_idx), {})
|
||||
wtp = winner_stats.get("tier_peak")
|
||||
if isinstance(wtp, int) and wtp >= 0:
|
||||
winner_tier_peaks.append(wtp)
|
||||
# tier_peak_gap = max - min across players
|
||||
if len(all_tp) >= 2:
|
||||
tier_peak_gaps.append(max(all_tp) - min(all_tp))
|
||||
# game-level "any wonder"
|
||||
if any(isinstance(v.get("wonder_count"), int) and v["wonder_count"] >= 1 for v in ps.values()):
|
||||
any_wonder_games += 1
|
||||
|
||||
def fmt_median(xs):
|
||||
return f"{statistics.median(xs):.1f}" if xs else "(no data)"
|
||||
|
||||
print(f"total_games_with_stats={sum(1 for g in games if (g/'turn_stats.jsonl').is_file())}")
|
||||
print(f"decided_games={len(winner_tier_peaks)}")
|
||||
print(f"median_winner_tier_peak={fmt_median(winner_tier_peaks)} (n={len(winner_tier_peaks)})")
|
||||
print(f"median_peak_unit_tier={fmt_median(peak_unit_tiers)} (n={len(peak_unit_tiers)})")
|
||||
print(f"median_tier_peak_gap={fmt_median(tier_peak_gaps)} (n={len(tier_peak_gaps)})")
|
||||
print(f"median_wonder_count_per_player={fmt_median(wonder_counts)} (n={len(wonder_counts)})")
|
||||
print(f"games_with_any_wonder={any_wonder_games}/{len(games)}")
|
||||
PY
|
||||
EOF
|
||||
|
||||
if [[ "$TARGET" == apricot:* ]]; then
|
||||
REMOTE_PATH="${TARGET#apricot:}"
|
||||
ssh apricot "DIR='${REMOTE_PATH}' bash -s" <<< "$QUERY"
|
||||
else
|
||||
DIR="$TARGET" bash -c "$QUERY"
|
||||
fi
|
||||
Loading…
Add table
Reference in a new issue