diff --git a/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md b/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md new file mode 100644 index 00000000..7e5354ec --- /dev/null +++ b/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md @@ -0,0 +1,101 @@ +--- +id: p0-37 +title: Personality-emergent tactical thresholds (lift 7 hardcoded constants into axis-derived functions) +priority: p0 +status: stub +scope: game1 +owner: warcouncil +updated_at: 2026-04-18 +evidence: + - src/simulator/crates/mc-ai/src/tactical/movement.rs + - src/simulator/crates/mc-ai/src/tactical/production.rs + - src/simulator/crates/mc-ai/src/evaluator.rs +--- + +## Summary + +The p0-26 tactical port faithfully copied 7 tuning constants from +`simple_heuristic_ai.gd` into Rust. They're currently flat globals that ignore +personality axes and difficulty tier, which means: + +- **Rail-2 violation**: gameplay tuning hardcoded in Rust instead of derived + from JSON-owned data (`ai_personalities.json::strategic_axes`). +- **Personality suppression**: every clan uses the same posture-flip threshold, + so aggression / grudge_persistence / wealth axes only affect production + scoring, not commit-to-assault decisions. Clan flavor flattens on the + tactical layer. +- **Downstream gate failures**: p0-01 tier_peak, p0-02 era-divergence, p0-22 + median-turn all share the same root — games resolve T39-T100 via + rush-domination because one global factor governs every clan's + rush-commit decision. + +The existing `ScoringWeights::apply_axes` (evaluator.rs:180-204) already +proves the pattern works: `aggression` scales `military_base`, `expansion` +scales `site_food`. That pattern stops at scoring; it should continue into +posture / retreat / chase / siege thresholds. + +Research basis (2024-2025): +- **Sims 3 / Richard Evans (Game AI Pro)**: axis-shaped utility → NPCs diverge + in identical states. 16 years of production evidence. +- **Tactical Troops: Anthracite Shift**: utility-AI-scored orders feed MCTS + priors. Our axis-derived thresholds are the utility layer. +- **Vox Deorum (Civ-V, arxiv 2512.18564, Dec 2025)**: validates + macro/tactical decoupling across 2,327 games — our MCTS-strategic + + axis-driven-tactical layering sits in the sweet spot. + +## Scope + +Replace 7 `const` values with pure functions of `&StrategicAxes`: + +| Current constant | Axis driver | Proposed range | +|---|---|---| +| `DOMINANCE_FACTOR = 1.25` | `aggression` (0-10) | 1.6 - 1.1 (cautious → rush) | +| `CAPITAL_APPROACH_HEX = 16` | `aggression` + `grudge_persistence` | 10 - 22 | +| `RETREAT_HP_FRACTION = 0.4` | `aggression` (inverse) | 0.55 - 0.25 | +| `DEFENSIVE_CHASE_RANGE = 12` | `aggression` | 6 - 18 | +| `FINAL_PUSH_ENEMY_CITY_COUNT = 1` | `grudge_persistence` | 1 - 3 | +| `CAPITAL_WALLS_MIN_AGE_TURNS = 20` | derived `defense` | 10 - 30 | +| `DOMINANCE_GOLD_FLOOR = 50` | `wealth` | 25 - 100 | + +Function signatures live in a new `mc_ai::tactical::thresholds` module with +one pure fn per threshold. Callsites in `movement.rs` + `production.rs` pass +through the `ScoringWeights::axes` already on the decision path. + +## Acceptance + +- ✗ `mc_ai::tactical::thresholds::{dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_walls_min_age_turns, dominance_gold_floor}` implemented as pure functions of `&StrategicAxes`. Each has a unit test pinning extremes (axis=0 lower-bound, axis=10 upper-bound) and mid-point (axis=5 ≈ current hardcoded value for continuity). +- ✗ Tactical regression suite migrated: behavioral assertions replace constant-pin assertions (e.g. "blackhammer aggression=9 yields factor < goldvein aggression=4" not "factor == 1.25"). Current regression count 79 stays green. +- ✗ Callsites updated: `movement.rs:443, 449, 780-ish, 868-ish, 906-ish, 1049-1054` + `production.rs:318, 446`. No remaining `const` references to the 7 lifted names. `cargo test -p mc-ai tactical` stays green. +- ✗ 5-clan batch (10 seeds T300 pinned on player 1, post-thresholds binary) shows measurable per-clan emergent divergence: + - **Combats**: blackhammer (aggression=9) median ≥ 1.5× goldvein (aggression=4) + - **Median turn**: goldvein games ≥ 1.3× blackhammer (cautious commits later) + - **Gold at victory**: goldvein median ≥ 1.5× blackhammer + - **tier_peak**: at least one clan reaches ≥ 5 in ≥3/10 games (the cautious/tall clans) +- ✗ No clan win-rate regression: all 5 clans still ≥ 6/10 wins on pinned position vs heuristic opponent. +- ✗ Unblock verification: p0-01 median tier_peak ≥ 5 in Normal-vs-Normal batch after this lands (partial progress on its quality gates; full ≥6 may still need further tuning). + +## Non-goals + +- MCTS prior injection — tracked separately as `p0-38`. +- Difficulty-tier compose layer — tracked as p0-24 architecture update. +- Per-clan threshold overrides in `ai_personalities.json` — if derived-from-axes + is insufficient for specific clans, per-clan overrides are a later lever + (adds JSON field). Start with derived functions. + +## Depends on + +- `p0-26` (tactical AI Rust port) — ✅ done; this refactors the ported layer. + +## Blocks + +- `p0-01` tier_peak gate (partially — this is necessary but may not be sufficient) +- `p0-02` era-divergence gate (this is the core lever for clan-tier_peak spread) +- `p0-08` domination tempo (raising DOMINANCE_FACTOR median pushes median game-length up) +- `p0-22` ultimate stress median-turn gate +- `p0-24` difficulty composition architecture + +## Research references + +- Richard Evans, Sims 3 utility axes. *Game AI Pro* ch. 10 (Merrill, "Building Utility Decisions into Your Existing Behavior Tree"): http://www.gameaipro.com/GameAIPro/GameAIPro_Chapter10_Building_Utility_Decisions_into_Your_Existing_Behavior_Tree.pdf +- Tactical Troops: Anthracite Shift — utility-AI + MCTS hybrid: https://www.researchgate.net/publication/358095717 +- Vox Deorum (Civ-V hybrid AI, 2,327-game validation): https://arxiv.org/abs/2512.18564 diff --git a/.project/objectives/p0-38-mcts-personality-priors.md b/.project/objectives/p0-38-mcts-personality-priors.md new file mode 100644 index 00000000..221583b2 --- /dev/null +++ b/.project/objectives/p0-38-mcts-personality-priors.md @@ -0,0 +1,89 @@ +--- +id: p0-38 +title: Inject personality-utility scores as MCTS UCB1 priors +priority: p0 +status: stub +scope: game1 +owner: warcouncil +updated_at: 2026-04-18 +evidence: + - src/simulator/crates/mc-ai/src/mcts_tree.rs + - src/simulator/crates/mc-ai/src/evaluator.rs + - src/simulator/crates/mc-ai/src/abstract_state.rs +--- + +## Summary + +Current MCTS selection uses classical UCB1 at tree nodes — all actions start +with equal prior, exploration is driven only by visit count. `ScoringWeights` +and `strategic_axes` feed the *tactical executor* and *leaf evaluator* but +NOT the tree-selection step. This means MCTS explores the same branches for +every clan; divergence only appears at the leaf. + +AlphaGo's core contribution was **learned priors** seeded into the tree. We +don't need learning — we have personality utility. Inject it as the `P(s,a)` +term in the PUCT / UCB1-with-prior formula: + +``` +score(a) = Q(s,a) + c_puct × P(s,a) × sqrt(N(s)) / (1 + N(s,a)) +``` + +Where `P(s,a) = softmax(personality_utility(state, action) / temperature)` +and `personality_utility` is the same `ScoringWeights`-driven evaluator used +at the leaf. + +Effect: blackhammer's MCTS tree spends more branches on early assault +variants; goldvein's tree spends more branches on tech-up + defend variants. +Without the prior, both clans' trees are identical shape — only the leaf +evaluator differs, and leaf evaluation is after 20+ turns of rollout where +the differentiating choice has already been washed out. + +## Scope + +1. Extend `mcts_tree::Node` with `prior: f32` (action-selection weight). +2. On node expansion, compute `prior[action]` via softmax over personality + utility scores for each child action. Reuse `ScoringWeights` + + `evaluator::evaluate_state` (or a lighter personality_policy_score fn). +3. Replace UCB1 `score = Q + c × sqrt(ln N / n)` with PUCT + `score = Q + c_puct × prior × sqrt(N) / (1 + n)`. +4. Tune `c_puct` and softmax temperature empirically — start with + `c_puct=1.0, T=1.0` per AlphaGo convention. +5. Gate behind `AI_MCTS_PRIORS=true` initially so the baseline is one commit + behind; flip to default-on after validation. + +## Acceptance + +- ✗ `Node::prior: f32` field + `expand_with_priors(&ScoringWeights)` method on tree; UCB1 formula swapped for PUCT. Parity tests for `prior=uniform` case recover classical UCB1 (regression-safety). +- ✗ 4/4 existing mcts_tree unit tests green. +1 new test: two clans with divergent `scoring_weights` produce different first-layer visit distributions after N iterations (proves the prior is biting). +- ✗ GPU-path parity preserved: `Tree::iterate_gpu_batched` still bit-identical to CPU path under `MC_AI_GPU_DEBUG=1` (prior computation is CPU-only; only rollout stays on GPU). +- ✗ 5-clan batch (10 seeds T300, pinned player, post-priors binary) shows: + - **Tree shape divergence**: blackhammer's top-visited action at root differs from goldvein's in ≥7/10 seeds (via `TURN_STATS_MCTS_ROOT_ACTION` log). + - **Build-order divergence**: blackhammer's first-5-builds include ≥3 military units; goldvein's first-5-builds include ≥2 markets/buildings. Median across seeds. +- ✗ No win-rate regression: victory rate stays ≥ 8/10 per pinned clan. +- ✗ Determinism preserved: same seed + same scoring_weights → byte-identical action trace. + +## Non-goals + +- Learned priors (AlphaZero policy net). Pure utility-function priors only. +- GPU-side prior computation. Priors are CPU; GPU handles rollouts. +- Per-action prior normalization per player. Use softmax-with-temperature, not + hand-tuned per-action bonuses. + +## Depends on + +- `p0-37` (axis-derived thresholds) — priors should consume the same + personality signal that shapes thresholds, for consistency across layers. +- `p0-20` (GPU rollouts integration) — ✅ structural work done; priors ride on + the existing Tree::iterate_gpu_batched pipeline. + +## Blocks + +- `p0-01` residual tier_peak gap (if thresholds alone don't push median to ≥6) +- `p0-22` ultimate_stress divergent-strategy gate (priors are what make + 5-clan games evolve down different branches) + +## Research references + +- AlphaGo / PUCT prior formulation: Silver et al., *Nature* 2016. +- Tactical Troops: utility-scored MCTS priors for tactical games. https://www.researchgate.net/publication/358095717 +- Neural MCTS survey: Świechowski et al., *Applied Intelligence* 2023. https://link.springer.com/article/10.1007/s10489-023-05240-w