diff --git a/.project/objectives/p0-02-clan-personalities.md b/.project/objectives/p0-02-clan-personalities.md index b86c1629..c508e67c 100644 --- a/.project/objectives/p0-02-clan-personalities.md +++ b/.project/objectives/p0-02-clan-personalities.md @@ -14,6 +14,7 @@ evidence: - src/game/engine/src/modules/ai/personality_assigner.gd - src/game/engine/tests/unit/ai/test_ai_turn_bridge_mcts.gd - .local/iter/p0-02-clans/ + - .local/iter/b5-manual-20260417_061957/ --- ## Summary @@ -39,9 +40,15 @@ Balance: 49 total games, each clan 3 AI-wins, max 33% — passes. Gold axis: gol - ✓ `mc-ai::ScoringWeights::from_personality(id: &str)` loads weights from JSON — implemented in `evaluator.rs`, GUT test 8 verifies `blackhammer.military_base > goldvein.military_base`. - ✓ AI assignment at game start picks one of the 5 personalities per AI player — `personality_assigner.gd` assigns randomly; `meta.json::player_clans` confirms. `AI_PIN_PERSONALITY` env var verified working. - ✓ Batch of 5×10 seeds with `AI_PIN_PERSONALITY=` produces measurably different stats per clan — gold axis: goldvein 2× ironhold (543 vs 266); TTV: goldvein/runesmith finish 30 turns faster than ironhold/deepforge. Combat frequency at T9 for all clans (map-forced start proximity, not personality-driven). -- ✓ **Personality win-rate balance**: 49 games, each clan wins at least 3 times, no clan >50% win rate (max 33%). Sample ≥50 games: 49 = borderline; blackhammer had 9 games due to 1 missing seed. Treating as met. +- ✗ **Personality win-rate balance**: FAILED on 2026-04-17 B5 re-run (`.local/iter/b5-manual-20260417_061957/`, 50 games, post-determinism-fix binary, ai-verify). Verdict: `blackhammer has 10 appearances but 0 wins (threshold: >= 5)`. Per-clan win rates: deepforge 40% (4/10), ironhold 30% (3/10), goldvein 10% (1/10), runesmith 10% (1/10), blackhammer 0% (0/10). 50-clause "no clan >50%" bullet passes (max 40%); "≥5 apps must have ≥1 win" bullet fails on blackhammer. Prior `.local/iter/p0-02-clans/` batch (pre-determinism-fix, parallel agent) reported 49-game 3-wins-per-clan result which does not reproduce under the fixed binary. AI wins only 9/50 games overall (18%) — aggressive clans underperform builder/isolationist profiles. - ✗ **Six axes each materially affect gameplay** — not empirically verified. Deepforge (production=8) and ironhold (production=9) produce identical batch metrics across same 10 seeds, suggesting adjacent production values don't produce measurable divergence at this sample size. Axis removal counterfactual test not run. +## Remaining to done + +- Tune blackhammer's evaluator weights so it wins ≥1 game in a 10-seed sample. Current aggression axis appears to underperform defensive/builder profiles — investigate whether `military_base` is being dominated by `food_base` / `production_base` in the value function, or whether tactical executor (`simple_heuristic_ai.gd`) fails to capitalize on early military production. +- Broader game-balance review of why AI wins only 18% of 50 games vs 82% for the heuristic human. See parallel agent's `.local/batches/ablate_aggression_20260417_072921/` for in-progress aggression ablation data. +- Empirically verify each axis materially affects gameplay (5th ✗ bullet). Recommended approach: per-axis ablation (e.g. `AI_DISABLE_AXIS=production`) → re-run 10-seed batch → show win-rate shift. Stretch goal noted at design time. + ## Depends on - `p0-01` (MCTS wiring) — personalities ideally vary MCTS weights as well as heuristic weights.