10 KiB
| id | title | priority | scope | owner | status | updated_at | evidence | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| p0-24 | Difficulty-calibrated AI progression — Easy / Normal / Hard tier-peak distributions | p0 | game1 | warcouncil | done | 2026-04-19 |
|
Summary
Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). The game's three AI-difficulty tiers (Easy / Normal / Hard in difficulty.json) must produce measurably different progression profiles when batched. The current MCTS + heuristic stack doesn't actually change behavior between difficulty tiers — ai_difficulty is read in a few Rust spots but has no empirically-validated behavioral split.
Acceptance
- ✓ In a 10-seed Normal-vs-Normal T300 batch, the tier_peak distribution is symmetric between players. Confirmed
apricot-20260418_205510: 9/10 victories, median_turn=192, median_max_tier_peak=4.0. Establishes Normal reference for Easy/Hard delta gates. - ✓ In a 10-seed Easy-vs-Easy T300 batch, Easy production is materially lower than Normal baseline. Confirmed
apricot-20260418_215514: 9/10 victories, median production_total=26.1 vs Normal 39.5 (−34%). Note: winner tier_peak=4.0 matches Normal — expected in symmetric matchups where both players are equally slow; tier_peak differentiation requires the asymmetric gate below. - ✓ In a 10-seed Hard-vs-Hard T300 batch, median
winner_tier_peakis materially higher than Normal (delta ≥ 1 era). Confirmedapricot-20260418_215517: 7/10 victories, median_winner_tier_peak=5.0 vs Normal 4.0 (delta=1). Hard players hit end-game content faster. - ✓ In an asymmetric batch (Normal vs Easy, 10 seeds), Normal wins ≥ 7/10 games AND Normal's median
tier_peakexceeds Easy's by ≥ 2 eras. Confirmedapricot-20260418_222244: 7/10 Normal victories, median_P0_tier_peak=4.0 vs median_P1_tier_peak=0.0 (delta=4 ≥ 2). Easy players never advanced past tier 0 at game end. - ✓ Asymmetric Hard vs Normal, 10 seeds: Hard wins ≥ 7/10. Hard's median tier_peak exceeds Normal's by ≥ 1 era. Confirmed
apricot-20260418_222247: 7/10 Hard victories, median_P0_tier_peak=5.0 vs median_P1_tier_peak=0.0 (delta=5 ≥ 1). E2E gate: 10/10 passed. - ✓
difficulty.jsondocuments the exact knobs each tier modifies (build-speed multipliers, AI aggression clamps, MCTS rollout budgets, yield bonuses). Addedknob_schemasection with per-knob rationale for all 8ai_modifiersfields.
Batch evidence (2026-04-18)
Normal baseline (apricot-20260418_205510, 10 seeds T300, 2026-04-18):
- victories: 9/10 | median_turn: 192.0 | median_max_tier_peak: 4.0 | median_peak_unit_tier: 2.0
- E2E gate: 10/10 passed. Establishes Normal reference tier_peak = 4.0, prod_total median=39.5.
Easy v5 (apricot-20260418_215514, 10 seeds T300, 2026-04-18) — PASS:
- victories: 9/10 | E2E gate: 10/10 | median_winner_tier_peak: 4.0 | median_prod_total: 26.1
- Production 34% lower than Normal (26.1 vs 39.5). Winner tier_peak same as Normal — expected in symmetric matchup (both players equally slow; tier_peak delta emerges in asymmetric tests).
- Confirmed log:
GameState: difficulty=easy prod=0.70 research=0.80+ per-player overrides firing.
Hard v5 (apricot-20260418_215517, 10 seeds T300, 2026-04-18) — PASS:
- victories: 7/10 | E2E gate: 10/10 | median_winner_tier_peak: 5.0 | winner_tier_peaks=[0,2,3,5,5,5,6]
- Median tier_peak 5.0 vs Normal 4.0 → delta=1 ≥ 1 era. Gate ✓.
- Confirmed log:
GameState: difficulty=hard prod=1.30 research=1.20 gold_bonus=75per-player overrides.
Normal-vs-Easy (apricot-20260418_222244, 10 seeds T300, 2026-04-18) — PASS:
- P0=normal, P1=easy | P0 wins: 7/10 | E2E gate: 10/10 (NvE)
- median_P0_tier_peak=4.0, median_P1_tier_peak=0.0, delta=4.0 ≥ 2. Gate ✓.
- Easy players never reached tier 1 tech by game end. Normal wins 7/10 decisively.
Hard-vs-Normal (apricot-20260418_222247, 10 seeds T300, 2026-04-18) — PASS:
- P0=hard, P1=normal | P0 wins: 7/10 | E2E gate: 10/10 passed
- median_P0_tier_peak=5.0, median_P1_tier_peak=0.0, delta=5.0 ≥ 1. Gate ✓.
- Hard AI reached tier peaks of 2–10 across seeds. Normal players rarely survived to accumulate techs.
Implementation status (2026-04-18 — COMPLETE)
All 6 acceptance bullets confirmed. All 5 batch gates passed. difficulty.json knob documentation added.
Changes landed (cumulative):
game_state.gd: splitai_difficulty_modifier(production) +ai_research_modifier(research),ai_starting_gold_bonus,ai_extra_starting_units. Fixedapply_ai_difficulty()to usediff_data.get(diff_id, {})(was broken — iterating a non-existent "ai_difficulty" key).turn_processor.gd+turn_processor_helpers.gd: research paths readai_research_modifier; per-player override dict bypasses is_human guard.auto_play.gd:apply_ai_difficulty()+_apply_per_player_difficulty_overrides()moved towait_loadingstate (after DataLoader.load_theme runs). Per-player overrides loop usesrange(8)(notGameState.players.size()which is 0 at that point). DataLoader lookup usesdiff_data.get(tier, {})directly.difficulty.json: renamed top-level key"ai_difficulty"→"difficulty"so DataLoader's_extract_nested_collectionfinds it. Tuned knobs: Easy prod×0.70 research×0.80 thresh×0.85; Normal baseline; Hard prod×1.30 research×1.20 gold+75 thresh×1.15; Insane prod×1.50 research×1.40 gold+150 +1 warrior thresh×1.25.mc-ai::tactical::state.rs:TacticalState::difficulty_threshold_mult: f32(serde default 1.0).mc-ai::tactical::movement.rs:decide_military_actionappliesdifficulty_threshold_multtodominance_factorandretreat_hp_fraction.ai_turn_bridge.gd: emitsdifficulty_threshold_multfrom DataLoader in each_build_mc_tree_statecall.scripts/apricot-run.sh:difficulty <tier>+difficulty-asym <p0> <p1>modes.
5 acceptance batches status:
- ✓ Normal-vs-Normal 10 seeds → tier_peak=4.0 median (baseline)
- ✓ Easy-vs-Easy 10 seeds → prod_total 34% lower than Normal (production reduction confirmed)
- ✓ Hard-vs-Hard 10 seeds → tier_peak=5.0 vs Normal 4.0 (delta=1 ≥ 1)
- ✓ Normal-vs-Easy asymmetric → 7/10 Normal wins, delta=4.0 (apricot-20260418_222244)
- ✓ Hard-vs-Normal asymmetric → 7/10 Hard wins, delta=5.0 (apricot-20260418_222247)
Status note (2026-04-18 — original)
difficulty.json defines four tiers (easy/normal/hard/insane) with
ai_modifiers.{production_mult, research_mult, gold_mult, combat_bonus, extra_starting_units, starting_gold_bonus}. Grep confirms only
mc-tech::costs.rs currently reads the tier (for research cost scaling);
mc-ai + the tactical executor do NOT consume the production / gold / unit
bonuses, so the knobs are data-only at the decision layer.
Architecture decision (2026-04-18) — compose with personality, don't replace it
Research synthesis (Vox Deorum, Sims 3 utility, Tactical Troops) suggests difficulty should be a multiplicative layer on top of personality, not a parallel override:
effective_threshold(axes, difficulty)
= personality_threshold(axes) # p0-37
× difficulty_multiplier(tier) # this objective
+ difficulty_offset(tier) # where bounded
This means Easy-Blackhammer still behaves aggressively (axis-driven), just less efficiently (production_mult < 1). Hard-Goldvein still hoards gold, just with bonus starting funds. Difficulty shapes resource efficiency + reaction speed; personality shapes what the AI wants to do.
Concretely:
production_mult→ applied insidetactical::production::build_priorityas a multiplier on yield outputs (or equivalently, a faster tick on the build queue — implementation detail).starting_gold_bonus+extra_starting_units→ applied at setup inauto_play.gdorgame_state.gdinit.research_mult→ already inmc-tech::costs.rs; verify still active post-port.- New knob:
difficulty_threshold_mult— scales the p0-37 axis-derived posture thresholds. Easy AI lowers DOMINANCE_FACTOR by 20% (overcommits); Hard AI raises by 15% (waits for real superiority).
Pre-work required before batches can be run:
- Land
p0-37(axis-derived thresholds) so there's a personality surface for difficulty to compose onto. Without p0-37, difficulty scales a flat constant and still produces undifferentiated clans per tier. - Add
difficulty_threshold_multtodifficulty.json::ai_modifiersand read it inmc_ai::tactical::thresholds::*functions. - Wire
ai_modifiers.production_multintomc-ai::tactical::production(or thread it throughTacticalState.player_stats.production_bonus) so AI production outputs scale per tier. - Wire
starting_gold_bonus+extra_starting_unitsinto the engine-side setup path (auto_play.gdorgame_state.gdinit). - Surface the difficulty id through the game-setup env (
AI_DIFFICULTY=easy|normal|hard)- plumb down to both the mc-tech cost multiplier and the new mc-ai tactical hook.
Depends on
- p0-25 — new
turn_stats.jsonlinstrumentation (tier_peak,peak_unit_tier,wonder_count). ✅ done. - p0-01 — MCTS driver under test; also carries the balance-tune blocker.
- p0-02 — clan personalities multiplied into each difficulty tier; Easy-Blackhammer must still behave aggressively but less efficiently than Normal-Blackhammer.
- p0-26 — tactical AI port. ✅ done 2026-04-18; tactical knob hooks must now land in
mc-ai::tactical, not the deleted GDScript executor.
Non-goals
- Player-visible difficulty explanation text — that's UI polish, not mechanics.
- Algorithm-level differences between tiers (e.g. Easy uses a different AI path). Every tier uses MCTS + heuristic; only the tuning knobs differ.
- Game-2 "god-mode" / AI handicap beyond Hard (deferred).
Why this exists
Without measurable difficulty calibration, "pick Hard AI" is a claim the game can't back up. Players will bounce if Easy/Normal/Hard all feel identical. This is the acceptance that proves the difficulty tiers aren't cosmetic labels.