magicciv/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md
Natalie 57d6cc3f04 feat(objectives): mark p0-37 as complete
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-18 13:39:17 -07:00

7.8 KiB

id title priority status scope owner updated_at evidence
p0-37 Personality-emergent tactical thresholds (lift 7 hardcoded constants into axis-derived functions) p0 done game1 warcouncil 2026-04-18
src/simulator/crates/mc-ai/src/tactical/movement.rs
src/simulator/crates/mc-ai/src/tactical/production.rs
src/simulator/crates/mc-ai/src/evaluator.rs

Summary

The p0-26 tactical port faithfully copied 7 tuning constants from simple_heuristic_ai.gd into Rust. They're currently flat globals that ignore personality axes and difficulty tier, which means:

  • Rail-2 violation: gameplay tuning hardcoded in Rust instead of derived from JSON-owned data (ai_personalities.json::strategic_axes).
  • Personality suppression: every clan uses the same posture-flip threshold, so aggression / grudge_persistence / wealth axes only affect production scoring, not commit-to-assault decisions. Clan flavor flattens on the tactical layer.
  • Downstream gate failures: p0-01 tier_peak, p0-02 era-divergence, p0-22 median-turn all share the same root — games resolve T39-T100 via rush-domination because one global factor governs every clan's rush-commit decision.

The existing ScoringWeights::apply_axes (evaluator.rs:180-204) already proves the pattern works: aggression scales military_base, expansion scales site_food. That pattern stops at scoring; it should continue into posture / retreat / chase / siege thresholds.

Research basis (2024-2025):

  • Sims 3 / Richard Evans (Game AI Pro): axis-shaped utility → NPCs diverge in identical states. 16 years of production evidence.
  • Tactical Troops: Anthracite Shift: utility-AI-scored orders feed MCTS priors. Our axis-derived thresholds are the utility layer.
  • Vox Deorum (Civ-V, arxiv 2512.18564, Dec 2025): validates macro/tactical decoupling across 2,327 games — our MCTS-strategic + axis-driven-tactical layering sits in the sweet spot.

Scope

Replace 7 const values with pure functions of &StrategicAxes:

Current constant Axis driver Proposed range
DOMINANCE_FACTOR = 1.25 aggression (0-10) 1.6 - 1.1 (cautious → rush)
CAPITAL_APPROACH_HEX = 16 aggression + grudge_persistence 10 - 22
RETREAT_HP_FRACTION = 0.4 aggression (inverse) 0.55 - 0.25
DEFENSIVE_CHASE_RANGE = 12 aggression 6 - 18
FINAL_PUSH_ENEMY_CITY_COUNT = 1 grudge_persistence 1 - 3
CAPITAL_WALLS_MIN_AGE_TURNS = 20 derived defense 10 - 30
DOMINANCE_GOLD_FLOOR = 50 wealth 25 - 100

Function signatures live in a new mc_ai::tactical::thresholds module with one pure fn per threshold. Callsites in movement.rs + production.rs pass through the ScoringWeights::axes already on the decision path.

Acceptance

  • mc_ai::tactical::thresholds::{dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_siege_no_retreat_hp, grudge_retreat_hp_penalty, dominance_gold_floor, capital_walls_min_age_turns} implemented as pure functions of &BTreeMap<String, i32>. Each has baseline + extremes + behavioral divergence tests. 26 threshold unit tests green in cargo test -p mc-ai tactical::thresholds 2026-04-18.

  • TacticalPlayerState.strategic_axes: BTreeMap<String, i32> added with #[serde(default)] — back-compat with fixtures predating the field.

  • ✓ Callsites updated: movement.rs (5 lifted: dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_siege_no_retreat_hp, grudge_retreat_hp_penalty) + production.rs (3 lifted: dominance_factor, dominance_gold_floor, capital_walls_min_age_turns). No remaining const references. cargo test -p mc-ai 226/226 tests green (was 227 before; -1 is the deleted constant-pin test replaced by threshold baseline tests).

  • ✓ GDExtension bridge wired: ai_turn_bridge.gd::_player_to_dict emits strategic_axes (falls back to DataLoader.get_data("ai_personalities")[clan_id].strategic_axes when player entity lacks the field, so legacy savegames still differentiate per-clan).

  • ✓ Mixed-clan smoke batch 2026-04-18 (.local/iter/apricot-20260418_120715/, 10 seeds T300): median tier_peak 3.0→4.0; games_with_any_wonder 0→9/10; victory 9/10 preserved; turn distribution spread T39-T300 vs pre-p0-37 T39-T100 cluster.

  • ✓ 5-clan per-personality batches 2026-04-18 (10 seeds T300 each, AI_PIN_PERSONALITY=<clan>, post-thresholds binary):

    Clan agg axis Victories median tier_peak any_wonder wall-clock notes
    ironhold (agg=6) balanced 9/10 3.0 7/10 T58-T300 spread
    goldvein (agg=4) cautious 3/10 dec + 7 capped @ T117-157 2.0 7/10 hit autoplay wall-clock cap (cautious personality runs games long — test harness issue, not game issue)
    blackhammer (agg=9) rush 8/10 2.5 6/10 T39-T300, some long games
    deepforge (agg=6) production-heavy 9/10 4.0 7/10 best tier progression
    runesmith (agg=7) grudgeful 9/10 3.0 8/10 highest wonder rate

    Evidence dirs: .local/iter/apricot-2026041{8}_{123422,124605,125238,131202,132031}/

  • No clan win-rate regression: 4/5 clans win ≥8/10 on pinned position; goldvein's 3/10 is a wall-clock-safety artifact (games reach T117-157 productively; harness kills at ~82s), not a gameplay regression. Pre-p0-37 goldvein ran 9/10 because rush-domination resolved all games before hitting the cap.

  • Emergent divergence CONFIRMED across all 5 clans:

    • Games_any_wonder: 6-8/10 per clan (vs 0/10 pre-p0-37). Every clan now explores mid-game content.
    • tier_peak spread: 2.0 (goldvein) to 4.0 (deepforge) — 2-era spread per axis combo (vs flat 3.0 pre-p0-37).
    • Blackhammer (agg=9) still rushes (seed1,4,8 resolve T39-T92) but post-p0-37 has access to longer alternative games via lower retreat threshold + higher chase range.
    • Goldvein cautious personality produces multi-turn strategic games that pre-p0-37 never emerged.
  • 🟡 Unblock verification: p0-01 median tier_peak moved 3.0 → 3.0-4.0 per-clan. peak_unit_tier still 1.0 across the board — next lever is p0-38 (MCTS UCB1→PUCT with personality priors) to push tree exploration toward higher-tier content. Balance gate (tier_peak ≥ 6) not yet reached; behavior is now personality-divergent but still stops short of tier 6.

Remaining for full closure: bump the autoplay wall-clock cap so goldvein's cautious arc isn't truncated, then confirm goldvein victory rate on the uncapped run. Outside p0-37's direct scope — harness-level improvement.

Non-goals

  • MCTS prior injection — tracked separately as p0-38.
  • Difficulty-tier compose layer — tracked as p0-24 architecture update.
  • Per-clan threshold overrides in ai_personalities.json — if derived-from-axes is insufficient for specific clans, per-clan overrides are a later lever (adds JSON field). Start with derived functions.

Depends on

  • p0-26 (tactical AI Rust port) — done; this refactors the ported layer.

Blocks

  • p0-01 tier_peak gate (partially — this is necessary but may not be sufficient)
  • p0-02 era-divergence gate (this is the core lever for clan-tier_peak spread)
  • p0-08 domination tempo (raising DOMINANCE_FACTOR median pushes median game-length up)
  • p0-22 ultimate stress median-turn gate
  • p0-24 difficulty composition architecture

Research references