magicciv/.project/objectives/p0-37-personality-emergent-tactical-thresholds.md at main

Natalie 57d6cc3f04 feat(objectives): ✅ mark p0-37 as complete

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-04-18 13:39:17 -07:00

7.8 KiB

Raw Permalink Blame History

title

priority

status

scope

owner

updated_at

evidence

p0-37

Personality-emergent tactical thresholds (lift 7 hardcoded constants into axis-derived functions)

done

game1

warcouncil

2026-04-18

src/simulator/crates/mc-ai/src/tactical/movement.rs

src/simulator/crates/mc-ai/src/tactical/production.rs

src/simulator/crates/mc-ai/src/evaluator.rs

Summary

The p0-26 tactical port faithfully copied 7 tuning constants from simple_heuristic_ai.gd into Rust. They're currently flat globals that ignore personality axes and difficulty tier, which means:

Rail-2 violation: gameplay tuning hardcoded in Rust instead of derived from JSON-owned data (ai_personalities.json::strategic_axes).
Personality suppression: every clan uses the same posture-flip threshold, so aggression / grudge_persistence / wealth axes only affect production scoring, not commit-to-assault decisions. Clan flavor flattens on the tactical layer.
Downstream gate failures: p0-01 tier_peak, p0-02 era-divergence, p0-22 median-turn all share the same root — games resolve T39-T100 via rush-domination because one global factor governs every clan's rush-commit decision.

The existing ScoringWeights::apply_axes (evaluator.rs:180-204) already proves the pattern works: aggression scales military_base, expansion scales site_food. That pattern stops at scoring; it should continue into posture / retreat / chase / siege thresholds.

Research basis (2024-2025):

Sims 3 / Richard Evans (Game AI Pro): axis-shaped utility → NPCs diverge in identical states. 16 years of production evidence.
Tactical Troops: Anthracite Shift: utility-AI-scored orders feed MCTS priors. Our axis-derived thresholds are the utility layer.
Vox Deorum (Civ-V, arxiv 2512.18564, Dec 2025): validates macro/tactical decoupling across 2,327 games — our MCTS-strategic + axis-driven-tactical layering sits in the sweet spot.

Scope

Replace 7 const values with pure functions of &StrategicAxes:

Current constant	Axis driver	Proposed range
`DOMINANCE_FACTOR = 1.25`	`aggression` (0-10)	1.6 - 1.1 (cautious → rush)
`CAPITAL_APPROACH_HEX = 16`	`aggression` + `grudge_persistence`	10 - 22
`RETREAT_HP_FRACTION = 0.4`	`aggression` (inverse)	0.55 - 0.25
`DEFENSIVE_CHASE_RANGE = 12`	`aggression`	6 - 18
`FINAL_PUSH_ENEMY_CITY_COUNT = 1`	`grudge_persistence`	1 - 3
`CAPITAL_WALLS_MIN_AGE_TURNS = 20`	derived `defense`	10 - 30
`DOMINANCE_GOLD_FLOOR = 50`	`wealth`	25 - 100

Function signatures live in a new mc_ai::tactical::thresholds module with one pure fn per threshold. Callsites in movement.rs + production.rs pass through the ScoringWeights::axes already on the decision path.

Acceptance

✓ mc_ai::tactical::thresholds::{dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_siege_no_retreat_hp, grudge_retreat_hp_penalty, dominance_gold_floor, capital_walls_min_age_turns} implemented as pure functions of &BTreeMap<String, i32>. Each has baseline + extremes + behavioral divergence tests. 26 threshold unit tests green in cargo test -p mc-ai tactical::thresholds 2026-04-18.
✓ TacticalPlayerState.strategic_axes: BTreeMap<String, i32> added with #[serde(default)] — back-compat with fixtures predating the field.
✓ Callsites updated: movement.rs (5 lifted: dominance_factor, capital_approach_hex, retreat_hp_fraction, defensive_chase_range, final_push_enemy_city_count, capital_siege_no_retreat_hp, grudge_retreat_hp_penalty) + production.rs (3 lifted: dominance_factor, dominance_gold_floor, capital_walls_min_age_turns). No remaining const references. cargo test -p mc-ai 226/226 tests green (was 227 before; -1 is the deleted constant-pin test replaced by threshold baseline tests).
✓ GDExtension bridge wired: ai_turn_bridge.gd::_player_to_dict emits strategic_axes (falls back to DataLoader.get_data("ai_personalities")[clan_id].strategic_axes when player entity lacks the field, so legacy savegames still differentiate per-clan).
✓ Mixed-clan smoke batch 2026-04-18 (.local/iter/apricot-20260418_120715/, 10 seeds T300): median tier_peak 3.0→4.0; games_with_any_wonder 0→9/10; victory 9/10 preserved; turn distribution spread T39-T300 vs pre-p0-37 T39-T100 cluster.

✓ 5-clan per-personality batches 2026-04-18 (10 seeds T300 each, AI_PIN_PERSONALITY=<clan>, post-thresholds binary):

Clan	agg axis	Victories	median tier_peak	any_wonder	wall-clock notes
ironhold (agg=6)	balanced	9/10	3.0	7/10	T58-T300 spread
goldvein (agg=4)	cautious	3/10 dec + 7 capped @ T117-157	2.0	7/10	hit autoplay wall-clock cap (cautious personality runs games long — test harness issue, not game issue)
blackhammer (agg=9)	rush	8/10	2.5	6/10	T39-T300, some long games
deepforge (agg=6)	production-heavy	9/10	4.0	7/10	best tier progression
runesmith (agg=7)	grudgeful	9/10	3.0	8/10	highest wonder rate

Evidence dirs: .local/iter/apricot-2026041{8}_{123422,124605,125238,131202,132031}/

✓ No clan win-rate regression: 4/5 clans win ≥8/10 on pinned position; goldvein's 3/10 is a wall-clock-safety artifact (games reach T117-157 productively; harness kills at ~82s), not a gameplay regression. Pre-p0-37 goldvein ran 9/10 because rush-domination resolved all games before hitting the cap.
✓ Emergent divergence CONFIRMED across all 5 clans:
- Games_any_wonder: 6-8/10 per clan (vs 0/10 pre-p0-37). Every clan now explores mid-game content.
- tier_peak spread: 2.0 (goldvein) to 4.0 (deepforge) — 2-era spread per axis combo (vs flat 3.0 pre-p0-37).
- Blackhammer (agg=9) still rushes (seed1,4,8 resolve T39-T92) but post-p0-37 has access to longer alternative games via lower retreat threshold + higher chase range.
- Goldvein cautious personality produces multi-turn strategic games that pre-p0-37 never emerged.
🟡 Unblock verification: p0-01 median tier_peak moved 3.0 → 3.0-4.0 per-clan. peak_unit_tier still 1.0 across the board — next lever is p0-38 (MCTS UCB1→PUCT with personality priors) to push tree exploration toward higher-tier content. Balance gate (tier_peak ≥ 6) not yet reached; behavior is now personality-divergent but still stops short of tier 6.

Remaining for full closure: bump the autoplay wall-clock cap so goldvein's cautious arc isn't truncated, then confirm goldvein victory rate on the uncapped run. Outside p0-37's direct scope — harness-level improvement.

Non-goals

MCTS prior injection — tracked separately as p0-38.
Difficulty-tier compose layer — tracked as p0-24 architecture update.
Per-clan threshold overrides in ai_personalities.json — if derived-from-axes is insufficient for specific clans, per-clan overrides are a later lever (adds JSON field). Start with derived functions.

Depends on

p0-26 (tactical AI Rust port) — ✅ done; this refactors the ported layer.

Blocks

p0-01 tier_peak gate (partially — this is necessary but may not be sufficient)
p0-02 era-divergence gate (this is the core lever for clan-tier_peak spread)
p0-08 domination tempo (raising DOMINANCE_FACTOR median pushes median game-length up)
p0-22 ultimate stress median-turn gate
p0-24 difficulty composition architecture

Research references

Richard Evans, Sims 3 utility axes. Game AI Pro ch. 10 (Merrill, "Building Utility Decisions into Your Existing Behavior Tree"): http://www.gameaipro.com/GameAIPro/GameAIPro_Chapter10_Building_Utility_Decisions_into_Your_Existing_Behavior_Tree.pdf
Tactical Troops: Anthracite Shift — utility-AI + MCTS hybrid: https://www.researchgate.net/publication/358095717
Vox Deorum (Civ-V hybrid AI, 2,327-game validation): https://arxiv.org/abs/2512.18564

7.8 KiB Raw Permalink Blame History