feat(@projects/@magic-civilization): ✨ derive tactical thresholds from personality axes
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
2c0071e721
commit
331a07a773
2 changed files with 190 additions and 0 deletions
|
|
@ -0,0 +1,101 @@
|
|||
---
|
||||
id: p0-37
|
||||
title: Personality-emergent tactical thresholds (lift 7 hardcoded constants into axis-derived functions)
|
||||
priority: p0
|
||||
status: stub
|
||||
scope: game1
|
||||
owner: warcouncil
|
||||
updated_at: 2026-04-18
|
||||
evidence:
|
||||
- src/simulator/crates/mc-ai/src/tactical/movement.rs
|
||||
- src/simulator/crates/mc-ai/src/tactical/production.rs
|
||||
- src/simulator/crates/mc-ai/src/evaluator.rs
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The p0-26 tactical port faithfully copied 7 tuning constants from
|
||||
`simple_heuristic_ai.gd` into Rust. They're currently flat globals that ignore
|
||||
personality axes and difficulty tier, which means:
|
||||
|
||||
- **Rail-2 violation**: gameplay tuning hardcoded in Rust instead of derived
|
||||
from JSON-owned data (`ai_personalities.json::strategic_axes`).
|
||||
- **Personality suppression**: every clan uses the same posture-flip threshold,
|
||||
so aggression / grudge_persistence / wealth axes only affect production
|
||||
scoring, not commit-to-assault decisions. Clan flavor flattens on the
|
||||
tactical layer.
|
||||
- **Downstream gate failures**: p0-01 tier_peak, p0-02 era-divergence, p0-22
|
||||
median-turn all share the same root — games resolve T39-T100 via
|
||||
rush-domination because one global factor governs every clan's
|
||||
rush-commit decision.
|
||||
|
||||
The existing `ScoringWeights::apply_axes` (evaluator.rs:180-204) already
|
||||
proves the pattern works: `aggression` scales `military_base`, `expansion`
|
||||
scales `site_food`. That pattern stops at scoring; it should continue into
|
||||
posture / retreat / chase / siege thresholds.
|
||||
|
||||
Research basis (2024-2025):
|
||||
- **Sims 3 / Richard Evans (Game AI Pro)**: axis-shaped utility → NPCs diverge
|
||||
in identical states. 16 years of production evidence.
|
||||
- **Tactical Troops: Anthracite Shift**: utility-AI-scored orders feed MCTS
|
||||
priors. Our axis-derived thresholds are the utility layer.
|
||||
- **Vox Deorum (Civ-V, arxiv 2512.18564, Dec 2025)**: validates
|
||||
macro/tactical decoupling across 2,327 games — our MCTS-strategic +
|
||||
axis-driven-tactical layering sits in the sweet spot.
|
||||
|
||||
## Scope
|
||||
|
||||
Replace 7 `const` values with pure functions of `&StrategicAxes`:
|
||||
|
||||
| Current constant | Axis driver | Proposed range |
|
||||
|---|---|---|
|
||||
| `DOMINANCE_FACTOR = 1.25` | `aggression` (0-10) | 1.6 - 1.1 (cautious → rush) |
|
||||
| `CAPITAL_APPROACH_HEX = 16` | `aggression` + `grudge_persistence` | 10 - 22 |
|
||||
| `RETREAT_HP_FRACTION = 0.4` | `aggression` (inverse) | 0.55 - 0.25 |
|
||||
| `DEFENSIVE_CHASE_RANGE = 12` | `aggression` | 6 - 18 |
|
||||
| `FINAL_PUSH_ENEMY_CITY_COUNT = 1` | `grudge_persistence` | 1 - 3 |
|
||||
| `CAPITAL_WALLS_MIN_AGE_TURNS = 20` | derived `defense` | 10 - 30 |
|
||||
| `DOMINANCE_GOLD_FLOOR = 50` | `wealth` | 25 - 100 |
|
||||
|
||||
Function signatures live in a new `mc_ai::tactical::thresholds` module with
|
||||
one pure fn per threshold. Callsites in `movement.rs` + `production.rs` pass
|
||||
through the `ScoringWeights::axes` already on the decision path.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- ✗ `mc_ai::tactical::thresholds::{dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_walls_min_age_turns, dominance_gold_floor}` implemented as pure functions of `&StrategicAxes`. Each has a unit test pinning extremes (axis=0 lower-bound, axis=10 upper-bound) and mid-point (axis=5 ≈ current hardcoded value for continuity).
|
||||
- ✗ Tactical regression suite migrated: behavioral assertions replace constant-pin assertions (e.g. "blackhammer aggression=9 yields factor < goldvein aggression=4" not "factor == 1.25"). Current regression count 79 stays green.
|
||||
- ✗ Callsites updated: `movement.rs:443, 449, 780-ish, 868-ish, 906-ish, 1049-1054` + `production.rs:318, 446`. No remaining `const` references to the 7 lifted names. `cargo test -p mc-ai tactical` stays green.
|
||||
- ✗ 5-clan batch (10 seeds T300 pinned on player 1, post-thresholds binary) shows measurable per-clan emergent divergence:
|
||||
- **Combats**: blackhammer (aggression=9) median ≥ 1.5× goldvein (aggression=4)
|
||||
- **Median turn**: goldvein games ≥ 1.3× blackhammer (cautious commits later)
|
||||
- **Gold at victory**: goldvein median ≥ 1.5× blackhammer
|
||||
- **tier_peak**: at least one clan reaches ≥ 5 in ≥3/10 games (the cautious/tall clans)
|
||||
- ✗ No clan win-rate regression: all 5 clans still ≥ 6/10 wins on pinned position vs heuristic opponent.
|
||||
- ✗ Unblock verification: p0-01 median tier_peak ≥ 5 in Normal-vs-Normal batch after this lands (partial progress on its quality gates; full ≥6 may still need further tuning).
|
||||
|
||||
## Non-goals
|
||||
|
||||
- MCTS prior injection — tracked separately as `p0-38`.
|
||||
- Difficulty-tier compose layer — tracked as p0-24 architecture update.
|
||||
- Per-clan threshold overrides in `ai_personalities.json` — if derived-from-axes
|
||||
is insufficient for specific clans, per-clan overrides are a later lever
|
||||
(adds JSON field). Start with derived functions.
|
||||
|
||||
## Depends on
|
||||
|
||||
- `p0-26` (tactical AI Rust port) — ✅ done; this refactors the ported layer.
|
||||
|
||||
## Blocks
|
||||
|
||||
- `p0-01` tier_peak gate (partially — this is necessary but may not be sufficient)
|
||||
- `p0-02` era-divergence gate (this is the core lever for clan-tier_peak spread)
|
||||
- `p0-08` domination tempo (raising DOMINANCE_FACTOR median pushes median game-length up)
|
||||
- `p0-22` ultimate stress median-turn gate
|
||||
- `p0-24` difficulty composition architecture
|
||||
|
||||
## Research references
|
||||
|
||||
- Richard Evans, Sims 3 utility axes. *Game AI Pro* ch. 10 (Merrill, "Building Utility Decisions into Your Existing Behavior Tree"): http://www.gameaipro.com/GameAIPro/GameAIPro_Chapter10_Building_Utility_Decisions_into_Your_Existing_Behavior_Tree.pdf
|
||||
- Tactical Troops: Anthracite Shift — utility-AI + MCTS hybrid: https://www.researchgate.net/publication/358095717
|
||||
- Vox Deorum (Civ-V hybrid AI, 2,327-game validation): https://arxiv.org/abs/2512.18564
|
||||
89
.project/objectives/p0-38-mcts-personality-priors.md
Normal file
89
.project/objectives/p0-38-mcts-personality-priors.md
Normal file
|
|
@ -0,0 +1,89 @@
|
|||
---
|
||||
id: p0-38
|
||||
title: Inject personality-utility scores as MCTS UCB1 priors
|
||||
priority: p0
|
||||
status: stub
|
||||
scope: game1
|
||||
owner: warcouncil
|
||||
updated_at: 2026-04-18
|
||||
evidence:
|
||||
- src/simulator/crates/mc-ai/src/mcts_tree.rs
|
||||
- src/simulator/crates/mc-ai/src/evaluator.rs
|
||||
- src/simulator/crates/mc-ai/src/abstract_state.rs
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Current MCTS selection uses classical UCB1 at tree nodes — all actions start
|
||||
with equal prior, exploration is driven only by visit count. `ScoringWeights`
|
||||
and `strategic_axes` feed the *tactical executor* and *leaf evaluator* but
|
||||
NOT the tree-selection step. This means MCTS explores the same branches for
|
||||
every clan; divergence only appears at the leaf.
|
||||
|
||||
AlphaGo's core contribution was **learned priors** seeded into the tree. We
|
||||
don't need learning — we have personality utility. Inject it as the `P(s,a)`
|
||||
term in the PUCT / UCB1-with-prior formula:
|
||||
|
||||
```
|
||||
score(a) = Q(s,a) + c_puct × P(s,a) × sqrt(N(s)) / (1 + N(s,a))
|
||||
```
|
||||
|
||||
Where `P(s,a) = softmax(personality_utility(state, action) / temperature)`
|
||||
and `personality_utility` is the same `ScoringWeights`-driven evaluator used
|
||||
at the leaf.
|
||||
|
||||
Effect: blackhammer's MCTS tree spends more branches on early assault
|
||||
variants; goldvein's tree spends more branches on tech-up + defend variants.
|
||||
Without the prior, both clans' trees are identical shape — only the leaf
|
||||
evaluator differs, and leaf evaluation is after 20+ turns of rollout where
|
||||
the differentiating choice has already been washed out.
|
||||
|
||||
## Scope
|
||||
|
||||
1. Extend `mcts_tree::Node` with `prior: f32` (action-selection weight).
|
||||
2. On node expansion, compute `prior[action]` via softmax over personality
|
||||
utility scores for each child action. Reuse `ScoringWeights` +
|
||||
`evaluator::evaluate_state` (or a lighter personality_policy_score fn).
|
||||
3. Replace UCB1 `score = Q + c × sqrt(ln N / n)` with PUCT
|
||||
`score = Q + c_puct × prior × sqrt(N) / (1 + n)`.
|
||||
4. Tune `c_puct` and softmax temperature empirically — start with
|
||||
`c_puct=1.0, T=1.0` per AlphaGo convention.
|
||||
5. Gate behind `AI_MCTS_PRIORS=true` initially so the baseline is one commit
|
||||
behind; flip to default-on after validation.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- ✗ `Node::prior: f32` field + `expand_with_priors(&ScoringWeights)` method on tree; UCB1 formula swapped for PUCT. Parity tests for `prior=uniform` case recover classical UCB1 (regression-safety).
|
||||
- ✗ 4/4 existing mcts_tree unit tests green. +1 new test: two clans with divergent `scoring_weights` produce different first-layer visit distributions after N iterations (proves the prior is biting).
|
||||
- ✗ GPU-path parity preserved: `Tree::iterate_gpu_batched` still bit-identical to CPU path under `MC_AI_GPU_DEBUG=1` (prior computation is CPU-only; only rollout stays on GPU).
|
||||
- ✗ 5-clan batch (10 seeds T300, pinned player, post-priors binary) shows:
|
||||
- **Tree shape divergence**: blackhammer's top-visited action at root differs from goldvein's in ≥7/10 seeds (via `TURN_STATS_MCTS_ROOT_ACTION` log).
|
||||
- **Build-order divergence**: blackhammer's first-5-builds include ≥3 military units; goldvein's first-5-builds include ≥2 markets/buildings. Median across seeds.
|
||||
- ✗ No win-rate regression: victory rate stays ≥ 8/10 per pinned clan.
|
||||
- ✗ Determinism preserved: same seed + same scoring_weights → byte-identical action trace.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Learned priors (AlphaZero policy net). Pure utility-function priors only.
|
||||
- GPU-side prior computation. Priors are CPU; GPU handles rollouts.
|
||||
- Per-action prior normalization per player. Use softmax-with-temperature, not
|
||||
hand-tuned per-action bonuses.
|
||||
|
||||
## Depends on
|
||||
|
||||
- `p0-37` (axis-derived thresholds) — priors should consume the same
|
||||
personality signal that shapes thresholds, for consistency across layers.
|
||||
- `p0-20` (GPU rollouts integration) — ✅ structural work done; priors ride on
|
||||
the existing Tree::iterate_gpu_batched pipeline.
|
||||
|
||||
## Blocks
|
||||
|
||||
- `p0-01` residual tier_peak gap (if thresholds alone don't push median to ≥6)
|
||||
- `p0-22` ultimate_stress divergent-strategy gate (priors are what make
|
||||
5-clan games evolve down different branches)
|
||||
|
||||
## Research references
|
||||
|
||||
- AlphaGo / PUCT prior formulation: Silver et al., *Nature* 2016.
|
||||
- Tactical Troops: utility-scored MCTS priors for tactical games. https://www.researchgate.net/publication/358095717
|
||||
- Neural MCTS survey: Świechowski et al., *Applied Intelligence* 2023. https://link.springer.com/article/10.1007/s10489-023-05240-w
|
||||
Loading…
Add table
Reference in a new issue