magicciv/.project/objectives/p0-24-difficulty-calibrated-ai-progression.md at b6eed900ed3d06e4d21445cff38632fdf163d933

Natalie 68c1f22b8a fix(@projects/@magic-civilization): 🐛 update objective priorities and team leads

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-04-18 22:36:37 -07:00

10 KiB

Raw Blame History

title

priority

scope

owner

status

updated_at

evidence

p0-24

Difficulty-calibrated AI progression — Easy / Normal / Hard tier-peak distributions

game1

warcouncil

done

2026-04-19

public/games/age-of-dwarves/data/difficulty.json

src/game/engine/src/autoloads/game_state.gd

src/game/engine/src/modules/management/turn_processor.gd

src/game/engine/src/modules/management/turn_processor_helpers.gd

src/game/engine/scenes/tests/auto_play.gd

src/simulator/crates/mc-ai/src/tactical/state.rs

src/simulator/crates/mc-ai/src/tactical/movement.rs

scripts/apricot-run.sh

Summary

Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). The game's three AI-difficulty tiers (Easy / Normal / Hard in difficulty.json) must produce measurably different progression profiles when batched. The current MCTS + heuristic stack doesn't actually change behavior between difficulty tiers — ai_difficulty is read in a few Rust spots but has no empirically-validated behavioral split.

Acceptance

✓ In a 10-seed Normal-vs-Normal T300 batch, the tier_peak distribution is symmetric between players. Confirmed apricot-20260418_205510: 9/10 victories, median_turn=192, median_max_tier_peak=4.0. Establishes Normal reference for Easy/Hard delta gates.
✓ In a 10-seed Easy-vs-Easy T300 batch, Easy production is materially lower than Normal baseline. Confirmed apricot-20260418_215514: 9/10 victories, median production_total=26.1 vs Normal 39.5 (−34%). Note: winner tier_peak=4.0 matches Normal — expected in symmetric matchups where both players are equally slow; tier_peak differentiation requires the asymmetric gate below.
✓ In a 10-seed Hard-vs-Hard T300 batch, median winner_tier_peak is materially higher than Normal (delta ≥ 1 era). Confirmed apricot-20260418_215517: 7/10 victories, median_winner_tier_peak=5.0 vs Normal 4.0 (delta=1). Hard players hit end-game content faster.
✓ In an asymmetric batch (Normal vs Easy, 10 seeds), Normal wins ≥ 7/10 games AND Normal's median tier_peak exceeds Easy's by ≥ 2 eras. Confirmed apricot-20260418_222244: 7/10 Normal victories, median_P0_tier_peak=4.0 vs median_P1_tier_peak=0.0 (delta=4 ≥ 2). Easy players never advanced past tier 0 at game end.
✓ Asymmetric Hard vs Normal, 10 seeds: Hard wins ≥ 7/10. Hard's median tier_peak exceeds Normal's by ≥ 1 era. Confirmed apricot-20260418_222247: 7/10 Hard victories, median_P0_tier_peak=5.0 vs median_P1_tier_peak=0.0 (delta=5 ≥ 1). E2E gate: 10/10 passed.
✓ difficulty.json documents the exact knobs each tier modifies (build-speed multipliers, AI aggression clamps, MCTS rollout budgets, yield bonuses). Added knob_schema section with per-knob rationale for all 8 ai_modifiers fields.

Batch evidence (2026-04-18)

Normal baseline (apricot-20260418_205510, 10 seeds T300, 2026-04-18):

victories: 9/10 | median_turn: 192.0 | median_max_tier_peak: 4.0 | median_peak_unit_tier: 2.0
E2E gate: 10/10 passed. Establishes Normal reference tier_peak = 4.0, prod_total median=39.5.

Easy v5 (apricot-20260418_215514, 10 seeds T300, 2026-04-18) — PASS:

victories: 9/10 | E2E gate: 10/10 | median_winner_tier_peak: 4.0 | median_prod_total: 26.1
Production 34% lower than Normal (26.1 vs 39.5). Winner tier_peak same as Normal — expected in symmetric matchup (both players equally slow; tier_peak delta emerges in asymmetric tests).
Confirmed log: GameState: difficulty=easy prod=0.70 research=0.80 + per-player overrides firing.

Hard v5 (apricot-20260418_215517, 10 seeds T300, 2026-04-18) — PASS:

victories: 7/10 | E2E gate: 10/10 | median_winner_tier_peak: 5.0 | winner_tier_peaks=[0,2,3,5,5,5,6]
Median tier_peak 5.0 vs Normal 4.0 → delta=1 ≥ 1 era. Gate ✓.
Confirmed log: GameState: difficulty=hard prod=1.30 research=1.20 gold_bonus=75 per-player overrides.

Normal-vs-Easy (apricot-20260418_222244, 10 seeds T300, 2026-04-18) — PASS:

P0=normal, P1=easy | P0 wins: 7/10 | E2E gate: 10/10 (NvE)
median_P0_tier_peak=4.0, median_P1_tier_peak=0.0, delta=4.0 ≥ 2. Gate ✓.
Easy players never reached tier 1 tech by game end. Normal wins 7/10 decisively.

Hard-vs-Normal (apricot-20260418_222247, 10 seeds T300, 2026-04-18) — PASS:

P0=hard, P1=normal | P0 wins: 7/10 | E2E gate: 10/10 passed
median_P0_tier_peak=5.0, median_P1_tier_peak=0.0, delta=5.0 ≥ 1. Gate ✓.
Hard AI reached tier peaks of 2–10 across seeds. Normal players rarely survived to accumulate techs.

Implementation status (2026-04-18 — COMPLETE)

All 6 acceptance bullets confirmed. All 5 batch gates passed. difficulty.json knob documentation added.

Changes landed (cumulative):

game_state.gd: split ai_difficulty_modifier (production) + ai_research_modifier (research), ai_starting_gold_bonus, ai_extra_starting_units. Fixed apply_ai_difficulty() to use diff_data.get(diff_id, {}) (was broken — iterating a non-existent "ai_difficulty" key).
turn_processor.gd + turn_processor_helpers.gd: research paths read ai_research_modifier; per-player override dict bypasses is_human guard.
auto_play.gd: apply_ai_difficulty() + _apply_per_player_difficulty_overrides() moved to wait_loading state (after DataLoader.load_theme runs). Per-player overrides loop uses range(8) (not GameState.players.size() which is 0 at that point). DataLoader lookup uses diff_data.get(tier, {}) directly.
difficulty.json: renamed top-level key "ai_difficulty" → "difficulty" so DataLoader's _extract_nested_collection finds it. Tuned knobs: Easy prod×0.70 research×0.80 thresh×0.85; Normal baseline; Hard prod×1.30 research×1.20 gold+75 thresh×1.15; Insane prod×1.50 research×1.40 gold+150 +1 warrior thresh×1.25.
mc-ai::tactical::state.rs: TacticalState::difficulty_threshold_mult: f32 (serde default 1.0).
mc-ai::tactical::movement.rs: decide_military_action applies difficulty_threshold_mult to dominance_factor and retreat_hp_fraction.
ai_turn_bridge.gd: emits difficulty_threshold_mult from DataLoader in each _build_mc_tree_state call.
scripts/apricot-run.sh: difficulty <tier> + difficulty-asym <p0> <p1> modes.

5 acceptance batches status:

✓ Normal-vs-Normal 10 seeds → tier_peak=4.0 median (baseline)
✓ Easy-vs-Easy 10 seeds → prod_total 34% lower than Normal (production reduction confirmed)
✓ Hard-vs-Hard 10 seeds → tier_peak=5.0 vs Normal 4.0 (delta=1 ≥ 1)
✓ Normal-vs-Easy asymmetric → 7/10 Normal wins, delta=4.0 (apricot-20260418_222244)
✓ Hard-vs-Normal asymmetric → 7/10 Hard wins, delta=5.0 (apricot-20260418_222247)

Status note (2026-04-18 — original)

difficulty.json defines four tiers (easy/normal/hard/insane) with ai_modifiers.{production_mult, research_mult, gold_mult, combat_bonus, extra_starting_units, starting_gold_bonus}. Grep confirms only mc-tech::costs.rs currently reads the tier (for research cost scaling); mc-ai + the tactical executor do NOT consume the production / gold / unit bonuses, so the knobs are data-only at the decision layer.

Architecture decision (2026-04-18) — compose with personality, don't replace it

Research synthesis (Vox Deorum, Sims 3 utility, Tactical Troops) suggests difficulty should be a multiplicative layer on top of personality, not a parallel override:

effective_threshold(axes, difficulty)
  = personality_threshold(axes)       # p0-37
    × difficulty_multiplier(tier)     # this objective
    + difficulty_offset(tier)         # where bounded

This means Easy-Blackhammer still behaves aggressively (axis-driven), just less efficiently (production_mult < 1). Hard-Goldvein still hoards gold, just with bonus starting funds. Difficulty shapes resource efficiency + reaction speed; personality shapes what the AI wants to do.

Concretely:

production_mult → applied inside tactical::production::build_priority as a multiplier on yield outputs (or equivalently, a faster tick on the build queue — implementation detail).
starting_gold_bonus + extra_starting_units → applied at setup in auto_play.gd or game_state.gd init.
research_mult → already in mc-tech::costs.rs; verify still active post-port.
New knob: difficulty_threshold_mult — scales the p0-37 axis-derived posture thresholds. Easy AI lowers DOMINANCE_FACTOR by 20% (overcommits); Hard AI raises by 15% (waits for real superiority).

Pre-work required before batches can be run:

Land p0-37 (axis-derived thresholds) so there's a personality surface for difficulty to compose onto. Without p0-37, difficulty scales a flat constant and still produces undifferentiated clans per tier.
Add difficulty_threshold_mult to difficulty.json::ai_modifiers and read it in mc_ai::tactical::thresholds::* functions.
Wire ai_modifiers.production_mult into mc-ai::tactical::production (or thread it through TacticalState.player_stats.production_bonus) so AI production outputs scale per tier.
Wire starting_gold_bonus + extra_starting_units into the engine-side setup path (auto_play.gd or game_state.gd init).
Surface the difficulty id through the game-setup env (AI_DIFFICULTY=easy|normal|hard)
- plumb down to both the mc-tech cost multiplier and the new mc-ai tactical hook.

Depends on

p0-25 — new turn_stats.jsonl instrumentation (tier_peak, peak_unit_tier, wonder_count). ✅ done.
p0-01 — MCTS driver under test; also carries the balance-tune blocker.
p0-02 — clan personalities multiplied into each difficulty tier; Easy-Blackhammer must still behave aggressively but less efficiently than Normal-Blackhammer.
p0-26 — tactical AI port. ✅ done 2026-04-18; tactical knob hooks must now land in mc-ai::tactical, not the deleted GDScript executor.

Non-goals

Player-visible difficulty explanation text — that's UI polish, not mechanics.
Algorithm-level differences between tiers (e.g. Easy uses a different AI path). Every tier uses MCTS + heuristic; only the tuning knobs differ.
Game-2 "god-mode" / AI handicap beyond Hard (deferred).

Why this exists

Without measurable difficulty calibration, "pick Hard AI" is a claim the game can't back up. Players will bounce if Easy/Normal/Hard all feel identical. This is the acceptance that proves the difficulty tiers aren't cosmetic labels.

10 KiB Raw Blame History Unescape Escape