magicciv/docs/ai-roadmap.md
autocommit b8855ecdd2 docs(docs): 📝 Add patch-by-patch narrative for AI post-launch content series documentation
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:13 -07:00

7 KiB
Raw Blame History

AI Roadmap — Magic Civilization

Designer-facing narrative of what the AI is, what it can and can't do today, and how each post-launch patch improves it. Engineering reference is ai-production.md; modder contract is modding/ai-controller.md.


Where we are at launch (v1.0)

Two AI families ship side-by-side:

  • Scripted AI — six named clan personalities (Default, Warmonger, Builder, Tinkersmith, Peaceful, Opportunist). Driven by an MCTS-plus- heuristic engine. Transparent, tunable from JSON, fast.
  • Learned AI — one neural-net opponent named learned:duel-v1b, trained via reinforcement learning against the scripted clans. Wins ~90% of 1v1 duels against the baseline. Anchors the Champion difficulty tier.

Difficulty is built by stacking handicaps + policy temperature on top of either family — not by training "weaker" networks.


What the AI knows today (and what it doesn't)

Honest diagnosis of the launch learned AI. The simulator state — every tile, building, tech, opponent personality, fog-of-war reveal — is fed through a heavily-compressed 32-float summary before the policy sees it. The result:

The AI does understand The AI does not understand
Its own gold, science, culture Per-tile terrain (which biomes are around it)
Total city count, total unit count Individual cities' tile yields or worked tiles
How many opponents are at war / peace Which opponent is which — they're aggregated counts
science_per_turn (one number) The tech tree, prerequisites, or research choices
16 hardcoded build options Any building or unit outside that list
One-hex moves and attacks Pathfinding more than one hex ahead
Whether it has units of type "warrior" or "founder" Resource stockpiles, strategic resources, luxuries

This is fine for duel-map play against weak opponents (current Champion difficulty). It is not enough for 12-FFA play, complex maps, late- game decisions, or tournament-grade strategy. The five post-launch patches below close each of those gaps.

The simulator already exposes everything in the right column — the limitation is purely how the policy reads it. No engine rewrites are needed.


v1.1 — "Sight" (Stage 6.5)

What changes: the AI gains map awareness.

  • Reads the actual hex map (biomes, rivers, improvements, fog) instead of a summary statistic.
  • Sees the full building + unit catalogs from data files; new content becomes trainable automatically.
  • Sees its top-3 most-threatening opponents distinctly (instead of "I am at war with N players").
  • Is bootstrapped from recordings of the six scripted personalities, so it cold-starts at roughly-scripted-AI strength on day one of training.

What players notice: the Champion-tier AI no longer makes "why-would-anyone-do-that" map decisions. It scouts. It defends chokepoints. It expands toward food, not into deserts.


v1.2 — "Memory" (Stage 6.6)

What changes: the AI gains short-term memory.

  • A recurrent network layer carries information across turns. The AI remembers what it just did and what each opponent just did.
  • Per-opponent memory slots → it forms a working model of each opponent individually ("player 5 has been building catapults", "player 7 has been turtling").

What players notice: the AI starts to adapt. If you turtle, it shifts to siege. If you rush, it shifts to defense. It also stops repeating obvious mistakes within a single game.


v1.3 — "Foresight" (Stage 6.7 + 6.8)

The two largest single-patch upgrades, shipped together.

AlphaZero search at inference (6.7)

What changes: the AI thinks ahead before each decision.

  • At every turn, the AI runs 64256 quick simulated futures, guided by its trained intuition, and picks the line that looks best.
  • This is the same recipe AlphaGo / AlphaZero used to surpass humans at Go and chess.
  • The engine already has the search machinery built (we just plug the neural net into it as the policy + value head).

What players notice: a step-change in tactical strength. +200400 Elo is the canonical result of adding search to a trained policy. The AI stops blundering. It sets traps. It calculates multi-step combats.

Multi-step movement (6.8)

What changes: the AI commands its units like a player does.

  • Pick a destination tile; the simulator paths there over multiple turns.
  • Set rally points so freshly-built units head somewhere useful.
  • Issue patrol routes for scouts.
  • Order escorts (a defender follows a settler).

What players notice: the AI stops moving one hex at a time. Armies march in formation. Builders get protected. Scouts cover the map deliberately.


v1.4 — "Mastery" (Stage 6.9)

What changes: the AI learns against itself instead of against the scripted opponents.

  • Self-play league: generation 0 plays generation 1 plays generation 2, and so on. Each generation must beat the entire prior population to graduate.
  • 12-slot huge-map free-for-all is the training arena — no handicaps, no easier opponents.
  • Four specialist variants ship alongside the generalist: Rush (early pressure), Turtle (defensive consolidation), Tech (research race), Economy (long-game empire). Each is a separate selectable controller; modders can train their own.

What players notice: the strongest AI difficulty tier becomes tournament-grade. The specialists give the campaign distinct opposition personalities that you can prepare strategies against. New mod authors get four worked examples (one per specialist) to learn from.


How a player picks AI in v1.0

In the New Game screen, each opponent slot has an AI dropdown. Choose:

  • a scripted clan personality (named, themed), or
  • the learned AI (currently one: learned:duel-v1b).

The five patches above add more entries to that dropdown — they do not change the UI flow. A v1.0 save loads in v1.4 because every save records which AI was driving which slot.


How a player picks difficulty

Difficulty is never "the AI's brain is smaller." Difficulty is:

  • a resource handicap (the AI gets more/fewer starting yields), and
  • (for learned AI tiers only) a "temperature" that makes the AI play more or less consistently.

So the ladder is:

Difficulty AI Why it's harder
Settler Peaceful scripted clan AI starts behind; rarely attacks
Chieftain Default scripted clan Balanced
Warlord Rotating scripted clans Multiple personalities; less predictable
King (v1.4+) Best league-gen AI Genuinely strong AI
Champion (v1.0) → (v1.4) Learned AI, low temperature Near-optimal play; small handicap

Cross-references