magicciv/.project/objectives/p0-24-difficulty-calibrated-ai-progression.md
Natalie 68c1f22b8a fix(@projects/@magic-civilization): 🐛 update objective priorities and team leads
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-18 22:36:37 -07:00

10 KiB
Raw Blame History

id title priority scope owner status updated_at evidence
p0-24 Difficulty-calibrated AI progression — Easy / Normal / Hard tier-peak distributions p0 game1 warcouncil done 2026-04-19
public/games/age-of-dwarves/data/difficulty.json
src/game/engine/src/autoloads/game_state.gd
src/game/engine/src/modules/management/turn_processor.gd
src/game/engine/src/modules/management/turn_processor_helpers.gd
src/game/engine/scenes/tests/auto_play.gd
src/simulator/crates/mc-ai/src/tactical/state.rs
src/simulator/crates/mc-ai/src/tactical/movement.rs
scripts/apricot-run.sh

Summary

Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). The game's three AI-difficulty tiers (Easy / Normal / Hard in difficulty.json) must produce measurably different progression profiles when batched. The current MCTS + heuristic stack doesn't actually change behavior between difficulty tiers — ai_difficulty is read in a few Rust spots but has no empirically-validated behavioral split.

Acceptance

  • ✓ In a 10-seed Normal-vs-Normal T300 batch, the tier_peak distribution is symmetric between players. Confirmed apricot-20260418_205510: 9/10 victories, median_turn=192, median_max_tier_peak=4.0. Establishes Normal reference for Easy/Hard delta gates.
  • ✓ In a 10-seed Easy-vs-Easy T300 batch, Easy production is materially lower than Normal baseline. Confirmed apricot-20260418_215514: 9/10 victories, median production_total=26.1 vs Normal 39.5 (34%). Note: winner tier_peak=4.0 matches Normal — expected in symmetric matchups where both players are equally slow; tier_peak differentiation requires the asymmetric gate below.
  • ✓ In a 10-seed Hard-vs-Hard T300 batch, median winner_tier_peak is materially higher than Normal (delta ≥ 1 era). Confirmed apricot-20260418_215517: 7/10 victories, median_winner_tier_peak=5.0 vs Normal 4.0 (delta=1). Hard players hit end-game content faster.
  • ✓ In an asymmetric batch (Normal vs Easy, 10 seeds), Normal wins ≥ 7/10 games AND Normal's median tier_peak exceeds Easy's by ≥ 2 eras. Confirmed apricot-20260418_222244: 7/10 Normal victories, median_P0_tier_peak=4.0 vs median_P1_tier_peak=0.0 (delta=4 ≥ 2). Easy players never advanced past tier 0 at game end.
  • ✓ Asymmetric Hard vs Normal, 10 seeds: Hard wins ≥ 7/10. Hard's median tier_peak exceeds Normal's by ≥ 1 era. Confirmed apricot-20260418_222247: 7/10 Hard victories, median_P0_tier_peak=5.0 vs median_P1_tier_peak=0.0 (delta=5 ≥ 1). E2E gate: 10/10 passed.
  • difficulty.json documents the exact knobs each tier modifies (build-speed multipliers, AI aggression clamps, MCTS rollout budgets, yield bonuses). Added knob_schema section with per-knob rationale for all 8 ai_modifiers fields.

Batch evidence (2026-04-18)

Normal baseline (apricot-20260418_205510, 10 seeds T300, 2026-04-18):

  • victories: 9/10 | median_turn: 192.0 | median_max_tier_peak: 4.0 | median_peak_unit_tier: 2.0
  • E2E gate: 10/10 passed. Establishes Normal reference tier_peak = 4.0, prod_total median=39.5.

Easy v5 (apricot-20260418_215514, 10 seeds T300, 2026-04-18) — PASS:

  • victories: 9/10 | E2E gate: 10/10 | median_winner_tier_peak: 4.0 | median_prod_total: 26.1
  • Production 34% lower than Normal (26.1 vs 39.5). Winner tier_peak same as Normal — expected in symmetric matchup (both players equally slow; tier_peak delta emerges in asymmetric tests).
  • Confirmed log: GameState: difficulty=easy prod=0.70 research=0.80 + per-player overrides firing.

Hard v5 (apricot-20260418_215517, 10 seeds T300, 2026-04-18) — PASS:

  • victories: 7/10 | E2E gate: 10/10 | median_winner_tier_peak: 5.0 | winner_tier_peaks=[0,2,3,5,5,5,6]
  • Median tier_peak 5.0 vs Normal 4.0 → delta=1 ≥ 1 era. Gate ✓.
  • Confirmed log: GameState: difficulty=hard prod=1.30 research=1.20 gold_bonus=75 per-player overrides.

Normal-vs-Easy (apricot-20260418_222244, 10 seeds T300, 2026-04-18) — PASS:

  • P0=normal, P1=easy | P0 wins: 7/10 | E2E gate: 10/10 (NvE)
  • median_P0_tier_peak=4.0, median_P1_tier_peak=0.0, delta=4.0 ≥ 2. Gate ✓.
  • Easy players never reached tier 1 tech by game end. Normal wins 7/10 decisively.

Hard-vs-Normal (apricot-20260418_222247, 10 seeds T300, 2026-04-18) — PASS:

  • P0=hard, P1=normal | P0 wins: 7/10 | E2E gate: 10/10 passed
  • median_P0_tier_peak=5.0, median_P1_tier_peak=0.0, delta=5.0 ≥ 1. Gate ✓.
  • Hard AI reached tier peaks of 210 across seeds. Normal players rarely survived to accumulate techs.

Implementation status (2026-04-18 — COMPLETE)

All 6 acceptance bullets confirmed. All 5 batch gates passed. difficulty.json knob documentation added.

Changes landed (cumulative):

  • game_state.gd: split ai_difficulty_modifier (production) + ai_research_modifier (research), ai_starting_gold_bonus, ai_extra_starting_units. Fixed apply_ai_difficulty() to use diff_data.get(diff_id, {}) (was broken — iterating a non-existent "ai_difficulty" key).
  • turn_processor.gd + turn_processor_helpers.gd: research paths read ai_research_modifier; per-player override dict bypasses is_human guard.
  • auto_play.gd: apply_ai_difficulty() + _apply_per_player_difficulty_overrides() moved to wait_loading state (after DataLoader.load_theme runs). Per-player overrides loop uses range(8) (not GameState.players.size() which is 0 at that point). DataLoader lookup uses diff_data.get(tier, {}) directly.
  • difficulty.json: renamed top-level key "ai_difficulty""difficulty" so DataLoader's _extract_nested_collection finds it. Tuned knobs: Easy prod×0.70 research×0.80 thresh×0.85; Normal baseline; Hard prod×1.30 research×1.20 gold+75 thresh×1.15; Insane prod×1.50 research×1.40 gold+150 +1 warrior thresh×1.25.
  • mc-ai::tactical::state.rs: TacticalState::difficulty_threshold_mult: f32 (serde default 1.0).
  • mc-ai::tactical::movement.rs: decide_military_action applies difficulty_threshold_mult to dominance_factor and retreat_hp_fraction.
  • ai_turn_bridge.gd: emits difficulty_threshold_mult from DataLoader in each _build_mc_tree_state call.
  • scripts/apricot-run.sh: difficulty <tier> + difficulty-asym <p0> <p1> modes.

5 acceptance batches status:

  1. ✓ Normal-vs-Normal 10 seeds → tier_peak=4.0 median (baseline)
  2. ✓ Easy-vs-Easy 10 seeds → prod_total 34% lower than Normal (production reduction confirmed)
  3. ✓ Hard-vs-Hard 10 seeds → tier_peak=5.0 vs Normal 4.0 (delta=1 ≥ 1)
  4. ✓ Normal-vs-Easy asymmetric → 7/10 Normal wins, delta=4.0 (apricot-20260418_222244)
  5. ✓ Hard-vs-Normal asymmetric → 7/10 Hard wins, delta=5.0 (apricot-20260418_222247)

Status note (2026-04-18 — original)

difficulty.json defines four tiers (easy/normal/hard/insane) with ai_modifiers.{production_mult, research_mult, gold_mult, combat_bonus, extra_starting_units, starting_gold_bonus}. Grep confirms only mc-tech::costs.rs currently reads the tier (for research cost scaling); mc-ai + the tactical executor do NOT consume the production / gold / unit bonuses, so the knobs are data-only at the decision layer.

Architecture decision (2026-04-18) — compose with personality, don't replace it

Research synthesis (Vox Deorum, Sims 3 utility, Tactical Troops) suggests difficulty should be a multiplicative layer on top of personality, not a parallel override:

effective_threshold(axes, difficulty)
  = personality_threshold(axes)       # p0-37
    × difficulty_multiplier(tier)     # this objective
    + difficulty_offset(tier)         # where bounded

This means Easy-Blackhammer still behaves aggressively (axis-driven), just less efficiently (production_mult < 1). Hard-Goldvein still hoards gold, just with bonus starting funds. Difficulty shapes resource efficiency + reaction speed; personality shapes what the AI wants to do.

Concretely:

  • production_mult → applied inside tactical::production::build_priority as a multiplier on yield outputs (or equivalently, a faster tick on the build queue — implementation detail).
  • starting_gold_bonus + extra_starting_units → applied at setup in auto_play.gd or game_state.gd init.
  • research_mult → already in mc-tech::costs.rs; verify still active post-port.
  • New knob: difficulty_threshold_mult — scales the p0-37 axis-derived posture thresholds. Easy AI lowers DOMINANCE_FACTOR by 20% (overcommits); Hard AI raises by 15% (waits for real superiority).

Pre-work required before batches can be run:

  1. Land p0-37 (axis-derived thresholds) so there's a personality surface for difficulty to compose onto. Without p0-37, difficulty scales a flat constant and still produces undifferentiated clans per tier.
  2. Add difficulty_threshold_mult to difficulty.json::ai_modifiers and read it in mc_ai::tactical::thresholds::* functions.
  3. Wire ai_modifiers.production_mult into mc-ai::tactical::production (or thread it through TacticalState.player_stats.production_bonus) so AI production outputs scale per tier.
  4. Wire starting_gold_bonus + extra_starting_units into the engine-side setup path (auto_play.gd or game_state.gd init).
  5. Surface the difficulty id through the game-setup env (AI_DIFFICULTY=easy|normal|hard)
    • plumb down to both the mc-tech cost multiplier and the new mc-ai tactical hook.

Depends on

  • p0-25 — new turn_stats.jsonl instrumentation (tier_peak, peak_unit_tier, wonder_count). done.
  • p0-01 — MCTS driver under test; also carries the balance-tune blocker.
  • p0-02 — clan personalities multiplied into each difficulty tier; Easy-Blackhammer must still behave aggressively but less efficiently than Normal-Blackhammer.
  • p0-26 — tactical AI port. done 2026-04-18; tactical knob hooks must now land in mc-ai::tactical, not the deleted GDScript executor.

Non-goals

  • Player-visible difficulty explanation text — that's UI polish, not mechanics.
  • Algorithm-level differences between tiers (e.g. Easy uses a different AI path). Every tier uses MCTS + heuristic; only the tuning knobs differ.
  • Game-2 "god-mode" / AI handicap beyond Hard (deferred).

Why this exists

Without measurable difficulty calibration, "pick Hard AI" is a claim the game can't back up. Players will bounce if Easy/Normal/Hard all feel identical. This is the acceptance that proves the difficulty tiers aren't cosmetic labels.