feat(@projects/@magic-civilization): ✨ derive tactical thresholds from personality axes

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-18 10:58:03 -07:00 · 2026-04-18 10:58:03 -07:00 · 331a07a773
commit 331a07a773
parent 2c0071e721
2 changed files with 190 additions and 0 deletions
--- a/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md
+++ b/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md
@ -0,0 +1,101 @@
+---
+id: p0-37
+title: Personality-emergent tactical thresholds (lift 7 hardcoded constants into axis-derived functions)
+priority: p0
+status: stub
+scope: game1
+owner: warcouncil
+updated_at: 2026-04-18
+evidence:
+  - src/simulator/crates/mc-ai/src/tactical/movement.rs
+  - src/simulator/crates/mc-ai/src/tactical/production.rs
+  - src/simulator/crates/mc-ai/src/evaluator.rs
+---
+
+## Summary
+
+The p0-26 tactical port faithfully copied 7 tuning constants from
+`simple_heuristic_ai.gd` into Rust. They're currently flat globals that ignore
+personality axes and difficulty tier, which means:
+
+- **Rail-2 violation**: gameplay tuning hardcoded in Rust instead of derived
+  from JSON-owned data (`ai_personalities.json::strategic_axes`).
+- **Personality suppression**: every clan uses the same posture-flip threshold,
+  so aggression / grudge_persistence / wealth axes only affect production
+  scoring, not commit-to-assault decisions. Clan flavor flattens on the
+  tactical layer.
+- **Downstream gate failures**: p0-01 tier_peak, p0-02 era-divergence, p0-22
+  median-turn all share the same root — games resolve T39-T100 via
+  rush-domination because one global factor governs every clan's
+  rush-commit decision.
+
+The existing `ScoringWeights::apply_axes` (evaluator.rs:180-204) already
+proves the pattern works: `aggression` scales `military_base`, `expansion`
+scales `site_food`. That pattern stops at scoring; it should continue into
+posture / retreat / chase / siege thresholds.
+
+Research basis (2024-2025):
+- **Sims 3 / Richard Evans (Game AI Pro)**: axis-shaped utility → NPCs diverge
+  in identical states. 16 years of production evidence.
+- **Tactical Troops: Anthracite Shift**: utility-AI-scored orders feed MCTS
+  priors. Our axis-derived thresholds are the utility layer.
+- **Vox Deorum (Civ-V, arxiv 2512.18564, Dec 2025)**: validates
+  macro/tactical decoupling across 2,327 games — our MCTS-strategic +
+  axis-driven-tactical layering sits in the sweet spot.
+
+## Scope
+
+Replace 7 `const` values with pure functions of `&StrategicAxes`:
+
+| Current constant | Axis driver | Proposed range |
+|---|---|---|
+| `DOMINANCE_FACTOR = 1.25` | `aggression` (0-10) | 1.6 - 1.1 (cautious → rush) |
+| `CAPITAL_APPROACH_HEX = 16` | `aggression` + `grudge_persistence` | 10 - 22 |
+| `RETREAT_HP_FRACTION = 0.4` | `aggression` (inverse) | 0.55 - 0.25 |
+| `DEFENSIVE_CHASE_RANGE = 12` | `aggression` | 6 - 18 |
+| `FINAL_PUSH_ENEMY_CITY_COUNT = 1` | `grudge_persistence` | 1 - 3 |
+| `CAPITAL_WALLS_MIN_AGE_TURNS = 20` | derived `defense` | 10 - 30 |
+| `DOMINANCE_GOLD_FLOOR = 50` | `wealth` | 25 - 100 |
+
+Function signatures live in a new `mc_ai::tactical::thresholds` module with
+one pure fn per threshold. Callsites in `movement.rs` + `production.rs` pass
+through the `ScoringWeights::axes` already on the decision path.
+
+## Acceptance
+
+- ✗ `mc_ai::tactical::thresholds::{dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_walls_min_age_turns, dominance_gold_floor}` implemented as pure functions of `&StrategicAxes`. Each has a unit test pinning extremes (axis=0 lower-bound, axis=10 upper-bound) and mid-point (axis=5 ≈ current hardcoded value for continuity).
+- ✗ Tactical regression suite migrated: behavioral assertions replace constant-pin assertions (e.g. "blackhammer aggression=9 yields factor < goldvein aggression=4" not "factor == 1.25"). Current regression count 79 stays green.
+- ✗ Callsites updated: `movement.rs:443, 449, 780-ish, 868-ish, 906-ish, 1049-1054` + `production.rs:318, 446`. No remaining `const` references to the 7 lifted names. `cargo test -p mc-ai tactical` stays green.
+- ✗ 5-clan batch (10 seeds T300 pinned on player 1, post-thresholds binary) shows measurable per-clan emergent divergence:
+  - **Combats**: blackhammer (aggression=9) median ≥ 1.5× goldvein (aggression=4)
+  - **Median turn**: goldvein games ≥ 1.3× blackhammer (cautious commits later)
+  - **Gold at victory**: goldvein median ≥ 1.5× blackhammer
+  - **tier_peak**: at least one clan reaches ≥ 5 in ≥3/10 games (the cautious/tall clans)
+- ✗ No clan win-rate regression: all 5 clans still ≥ 6/10 wins on pinned position vs heuristic opponent.
+- ✗ Unblock verification: p0-01 median tier_peak ≥ 5 in Normal-vs-Normal batch after this lands (partial progress on its quality gates; full ≥6 may still need further tuning).
+
+## Non-goals
+
+- MCTS prior injection — tracked separately as `p0-38`.
+- Difficulty-tier compose layer — tracked as p0-24 architecture update.
+- Per-clan threshold overrides in `ai_personalities.json` — if derived-from-axes
+  is insufficient for specific clans, per-clan overrides are a later lever
+  (adds JSON field). Start with derived functions.
+
+## Depends on
+
+- `p0-26` (tactical AI Rust port) — ✅ done; this refactors the ported layer.
+
+## Blocks
+
+- `p0-01` tier_peak gate (partially — this is necessary but may not be sufficient)
+- `p0-02` era-divergence gate (this is the core lever for clan-tier_peak spread)
+- `p0-08` domination tempo (raising DOMINANCE_FACTOR median pushes median game-length up)
+- `p0-22` ultimate stress median-turn gate
+- `p0-24` difficulty composition architecture
+
+## Research references
+
+- Richard Evans, Sims 3 utility axes. *Game AI Pro* ch. 10 (Merrill, "Building Utility Decisions into Your Existing Behavior Tree"): http://www.gameaipro.com/GameAIPro/GameAIPro_Chapter10_Building_Utility_Decisions_into_Your_Existing_Behavior_Tree.pdf
+- Tactical Troops: Anthracite Shift — utility-AI + MCTS hybrid: https://www.researchgate.net/publication/358095717
+- Vox Deorum (Civ-V hybrid AI, 2,327-game validation): https://arxiv.org/abs/2512.18564
--- a/.project/objectives/p0-38-mcts-personality-priors.md
+++ b/.project/objectives/p0-38-mcts-personality-priors.md
@ -0,0 +1,89 @@
+---
+id: p0-38
+title: Inject personality-utility scores as MCTS UCB1 priors
+priority: p0
+status: stub
+scope: game1
+owner: warcouncil
+updated_at: 2026-04-18
+evidence:
+  - src/simulator/crates/mc-ai/src/mcts_tree.rs
+  - src/simulator/crates/mc-ai/src/evaluator.rs
+  - src/simulator/crates/mc-ai/src/abstract_state.rs
+---
+
+## Summary
+
+Current MCTS selection uses classical UCB1 at tree nodes — all actions start
+with equal prior, exploration is driven only by visit count. `ScoringWeights`
+and `strategic_axes` feed the *tactical executor* and *leaf evaluator* but
+NOT the tree-selection step. This means MCTS explores the same branches for
+every clan; divergence only appears at the leaf.
+
+AlphaGo's core contribution was **learned priors** seeded into the tree. We
+don't need learning — we have personality utility. Inject it as the `P(s,a)`
+term in the PUCT / UCB1-with-prior formula:
+
+```
+score(a) = Q(s,a) + c_puct × P(s,a) × sqrt(N(s)) / (1 + N(s,a))
+```
+
+Where `P(s,a) = softmax(personality_utility(state, action) / temperature)`
+and `personality_utility` is the same `ScoringWeights`-driven evaluator used
+at the leaf.
+
+Effect: blackhammer's MCTS tree spends more branches on early assault
+variants; goldvein's tree spends more branches on tech-up + defend variants.
+Without the prior, both clans' trees are identical shape — only the leaf
+evaluator differs, and leaf evaluation is after 20+ turns of rollout where
+the differentiating choice has already been washed out.
+
+## Scope
+
+1. Extend `mcts_tree::Node` with `prior: f32` (action-selection weight).
+2. On node expansion, compute `prior[action]` via softmax over personality
+   utility scores for each child action. Reuse `ScoringWeights` +
+   `evaluator::evaluate_state` (or a lighter personality_policy_score fn).
+3. Replace UCB1 `score = Q + c × sqrt(ln N / n)` with PUCT
+   `score = Q + c_puct × prior × sqrt(N) / (1 + n)`.
+4. Tune `c_puct` and softmax temperature empirically — start with
+   `c_puct=1.0, T=1.0` per AlphaGo convention.
+5. Gate behind `AI_MCTS_PRIORS=true` initially so the baseline is one commit
+   behind; flip to default-on after validation.
+
+## Acceptance
+
+- ✗ `Node::prior: f32` field + `expand_with_priors(&ScoringWeights)` method on tree; UCB1 formula swapped for PUCT. Parity tests for `prior=uniform` case recover classical UCB1 (regression-safety).
+- ✗ 4/4 existing mcts_tree unit tests green. +1 new test: two clans with divergent `scoring_weights` produce different first-layer visit distributions after N iterations (proves the prior is biting).
+- ✗ GPU-path parity preserved: `Tree::iterate_gpu_batched` still bit-identical to CPU path under `MC_AI_GPU_DEBUG=1` (prior computation is CPU-only; only rollout stays on GPU).
+- ✗ 5-clan batch (10 seeds T300, pinned player, post-priors binary) shows:
+  - **Tree shape divergence**: blackhammer's top-visited action at root differs from goldvein's in ≥7/10 seeds (via `TURN_STATS_MCTS_ROOT_ACTION` log).
+  - **Build-order divergence**: blackhammer's first-5-builds include ≥3 military units; goldvein's first-5-builds include ≥2 markets/buildings. Median across seeds.
+- ✗ No win-rate regression: victory rate stays ≥ 8/10 per pinned clan.
+- ✗ Determinism preserved: same seed + same scoring_weights → byte-identical action trace.
+
+## Non-goals
+
+- Learned priors (AlphaZero policy net). Pure utility-function priors only.
+- GPU-side prior computation. Priors are CPU; GPU handles rollouts.
+- Per-action prior normalization per player. Use softmax-with-temperature, not
+  hand-tuned per-action bonuses.
+
+## Depends on
+
+- `p0-37` (axis-derived thresholds) — priors should consume the same
+  personality signal that shapes thresholds, for consistency across layers.
+- `p0-20` (GPU rollouts integration) — ✅ structural work done; priors ride on
+  the existing Tree::iterate_gpu_batched pipeline.
+
+## Blocks
+
+- `p0-01` residual tier_peak gap (if thresholds alone don't push median to ≥6)
+- `p0-22` ultimate_stress divergent-strategy gate (priors are what make
+  5-clan games evolve down different branches)
+
+## Research references
+
+- AlphaGo / PUCT prior formulation: Silver et al., *Nature* 2016.
+- Tactical Troops: utility-scored MCTS priors for tactical games. https://www.researchgate.net/publication/358095717
+- Neural MCTS survey: Świechowski et al., *Applied Intelligence* 2023. https://link.springer.com/article/10.1007/s10489-023-05240-w