magicciv/docs/ai-roadmap.md

# AI Roadmap — Magic Civilization

Designer-facing narrative of what the AI is, what it can and can't do today,
and how each post-launch patch improves it. Engineering reference is
[`ai-production.md`](./ai-production.md); modder contract is
[`modding/ai-controller.md`](./modding/ai-controller.md).

---

## Where we are at launch (v1.0)

Two AI families ship side-by-side:

- **Scripted AI** — six named clan personalities (Default, Warmonger,
  Builder, Tinkersmith, Peaceful, Opportunist). Driven by an MCTS-plus-
  heuristic engine. Transparent, tunable from JSON, fast.
- **Learned AI** — one neural-net opponent named `learned:duel-v1b`,
  trained via reinforcement learning against the scripted clans. Wins
  ~90% of 1v1 duels against the baseline. Anchors the **Champion**
  difficulty tier.

Difficulty is built by stacking handicaps + policy temperature on top of
either family — not by training "weaker" networks.

---

## What the AI knows today (and what it doesn't)

Honest diagnosis of the launch learned AI. The simulator state — every
tile, building, tech, opponent personality, fog-of-war reveal — is fed
through a heavily-compressed 32-float summary before the policy sees it.
The result:

| The AI **does** understand | The AI **does not** understand |
|---|---|
| Its own gold, science, culture | Per-tile terrain (which biomes are around it) |
| Total city count, total unit count | Individual cities' tile yields or worked tiles |
| How many opponents are at war / peace | *Which* opponent is which — they're aggregated counts |
| `science_per_turn` (one number) | The tech tree, prerequisites, or research choices |
| 16 hardcoded build options | Any building or unit outside that list |
| One-hex moves and attacks | Pathfinding more than one hex ahead |
| Whether it has units of type "warrior" or "founder" | Resource stockpiles, strategic resources, luxuries |

This is fine for duel-map play against weak opponents (current Champion
difficulty). It is **not enough** for 12-FFA play, complex maps, late-
game decisions, or tournament-grade strategy. The five post-launch
patches below close each of those gaps.

The simulator already exposes everything in the right column — the
limitation is purely how the policy reads it. No engine rewrites are
needed.

---

## v1.1 — "Sight" (Stage 6.5)

**What changes:** the AI gains *map awareness*.

- Reads the actual hex map (biomes, rivers, improvements, fog) instead
  of a summary statistic.
- Sees the full building + unit catalogs from data files; new content
  becomes trainable automatically.
- Sees its top-3 most-threatening opponents distinctly (instead of "I am
  at war with N players").
- Is bootstrapped from recordings of the six scripted personalities, so
  it cold-starts at roughly-scripted-AI strength on day one of training.

**What players notice:** the Champion-tier AI no longer makes
"why-would-anyone-do-that" map decisions. It scouts. It defends
chokepoints. It expands toward food, not into deserts.

---

## v1.2 — "Memory" (Stage 6.6)

**What changes:** the AI gains *short-term memory*.

- A recurrent network layer carries information across turns. The AI
  remembers what it just did and what each opponent just did.
- Per-opponent memory slots → it forms a working model of each opponent
  individually ("player 5 has been building catapults", "player 7 has
  been turtling").

**What players notice:** the AI starts to *adapt*. If you turtle, it
shifts to siege. If you rush, it shifts to defense. It also stops
repeating obvious mistakes within a single game.

---

## v1.3 — "Foresight" (Stage 6.7 + 6.8)

The two largest single-patch upgrades, shipped together.

### AlphaZero search at inference (6.7)

**What changes:** the AI thinks ahead before each decision.

- At every turn, the AI runs 64–256 quick simulated futures, guided by
  its trained intuition, and picks the line that looks best.
- This is the same recipe AlphaGo / AlphaZero used to surpass humans at
  Go and chess.
- The engine already has the search machinery built (we just plug the
  neural net into it as the policy + value head).

**What players notice:** a step-change in tactical strength. **+200–400
Elo** is the canonical result of adding search to a trained policy. The
AI stops blundering. It sets traps. It calculates multi-step combats.

### Multi-step movement (6.8)

**What changes:** the AI commands its units like a player does.

- Pick a destination tile; the simulator paths there over multiple turns.
- Set rally points so freshly-built units head somewhere useful.
- Issue patrol routes for scouts.
- Order escorts (a defender follows a settler).

**What players notice:** the AI stops moving one hex at a time. Armies
march in formation. Builders get protected. Scouts cover the map
deliberately.

---

## v1.4 — "Mastery" (Stage 6.9)

**What changes:** the AI learns *against itself* instead of against the
scripted opponents.

- Self-play league: generation 0 plays generation 1 plays generation 2,
  and so on. Each generation must beat the entire prior population to
  graduate.
- 12-slot huge-map free-for-all is the training arena — no handicaps,
  no easier opponents.
- Four specialist variants ship alongside the generalist: **Rush** (early
  pressure), **Turtle** (defensive consolidation), **Tech** (research
  race), **Economy** (long-game empire). Each is a separate selectable
  controller; modders can train their own.

**What players notice:** the strongest AI difficulty tier becomes
**tournament-grade**. The specialists give the campaign distinct
opposition personalities that you can prepare strategies against. New
mod authors get four worked examples (one per specialist) to learn from.

---

## How a player picks AI in v1.0

In the New Game screen, each opponent slot has an AI dropdown. Choose:
- a scripted clan personality (named, themed), or
- the learned AI (currently one: `learned:duel-v1b`).

The five patches above add more entries to that dropdown — they do not
change the UI flow. A v1.0 save loads in v1.4 because every save records
which AI was driving which slot.

---

## How a player picks difficulty

Difficulty is **never** "the AI's brain is smaller." Difficulty is:

- a resource handicap (the AI gets more/fewer starting yields), and
- (for learned AI tiers only) a "temperature" that makes the AI play
  more or less consistently.

So the ladder is:

| Difficulty | AI | Why it's harder |
|---|---|---|
| Settler | Peaceful scripted clan | AI starts behind; rarely attacks |
| Chieftain | Default scripted clan | Balanced |
| Warlord | Rotating scripted clans | Multiple personalities; less predictable |
| King (v1.4+) | Best league-gen AI | Genuinely strong AI |
| Champion (v1.0) → (v1.4) | Learned AI, low temperature | Near-optimal play; small handicap |

---

## Cross-references

- Engineering reference: [`ai-production.md`](./ai-production.md)
- Modder contract: [`modding/ai-controller.md`](./modding/ai-controller.md)
- ABI decisions memo: [`modding/abi-decisions.md`](./modding/abi-decisions.md)
- Plan file (internal): `~/.claude/plans/in-the-game-civilization-elegant-popcorn.md`