magicciv/docs/ai-roadmap.md
autocommit b8855ecdd2 docs(docs): 📝 Add patch-by-patch narrative for AI post-launch content series documentation
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:13 -07:00

183 lines
7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AI Roadmap — Magic Civilization
Designer-facing narrative of what the AI is, what it can and can't do today,
and how each post-launch patch improves it. Engineering reference is
[`ai-production.md`](./ai-production.md); modder contract is
[`modding/ai-controller.md`](./modding/ai-controller.md).
---
## Where we are at launch (v1.0)
Two AI families ship side-by-side:
- **Scripted AI** — six named clan personalities (Default, Warmonger,
Builder, Tinkersmith, Peaceful, Opportunist). Driven by an MCTS-plus-
heuristic engine. Transparent, tunable from JSON, fast.
- **Learned AI** — one neural-net opponent named `learned:duel-v1b`,
trained via reinforcement learning against the scripted clans. Wins
~90% of 1v1 duels against the baseline. Anchors the **Champion**
difficulty tier.
Difficulty is built by stacking handicaps + policy temperature on top of
either family — not by training "weaker" networks.
---
## What the AI knows today (and what it doesn't)
Honest diagnosis of the launch learned AI. The simulator state — every
tile, building, tech, opponent personality, fog-of-war reveal — is fed
through a heavily-compressed 32-float summary before the policy sees it.
The result:
| The AI **does** understand | The AI **does not** understand |
|---|---|
| Its own gold, science, culture | Per-tile terrain (which biomes are around it) |
| Total city count, total unit count | Individual cities' tile yields or worked tiles |
| How many opponents are at war / peace | *Which* opponent is which — they're aggregated counts |
| `science_per_turn` (one number) | The tech tree, prerequisites, or research choices |
| 16 hardcoded build options | Any building or unit outside that list |
| One-hex moves and attacks | Pathfinding more than one hex ahead |
| Whether it has units of type "warrior" or "founder" | Resource stockpiles, strategic resources, luxuries |
This is fine for duel-map play against weak opponents (current Champion
difficulty). It is **not enough** for 12-FFA play, complex maps, late-
game decisions, or tournament-grade strategy. The five post-launch
patches below close each of those gaps.
The simulator already exposes everything in the right column — the
limitation is purely how the policy reads it. No engine rewrites are
needed.
---
## v1.1 — "Sight" (Stage 6.5)
**What changes:** the AI gains *map awareness*.
- Reads the actual hex map (biomes, rivers, improvements, fog) instead
of a summary statistic.
- Sees the full building + unit catalogs from data files; new content
becomes trainable automatically.
- Sees its top-3 most-threatening opponents distinctly (instead of "I am
at war with N players").
- Is bootstrapped from recordings of the six scripted personalities, so
it cold-starts at roughly-scripted-AI strength on day one of training.
**What players notice:** the Champion-tier AI no longer makes
"why-would-anyone-do-that" map decisions. It scouts. It defends
chokepoints. It expands toward food, not into deserts.
---
## v1.2 — "Memory" (Stage 6.6)
**What changes:** the AI gains *short-term memory*.
- A recurrent network layer carries information across turns. The AI
remembers what it just did and what each opponent just did.
- Per-opponent memory slots → it forms a working model of each opponent
individually ("player 5 has been building catapults", "player 7 has
been turtling").
**What players notice:** the AI starts to *adapt*. If you turtle, it
shifts to siege. If you rush, it shifts to defense. It also stops
repeating obvious mistakes within a single game.
---
## v1.3 — "Foresight" (Stage 6.7 + 6.8)
The two largest single-patch upgrades, shipped together.
### AlphaZero search at inference (6.7)
**What changes:** the AI thinks ahead before each decision.
- At every turn, the AI runs 64256 quick simulated futures, guided by
its trained intuition, and picks the line that looks best.
- This is the same recipe AlphaGo / AlphaZero used to surpass humans at
Go and chess.
- The engine already has the search machinery built (we just plug the
neural net into it as the policy + value head).
**What players notice:** a step-change in tactical strength. **+200400
Elo** is the canonical result of adding search to a trained policy. The
AI stops blundering. It sets traps. It calculates multi-step combats.
### Multi-step movement (6.8)
**What changes:** the AI commands its units like a player does.
- Pick a destination tile; the simulator paths there over multiple turns.
- Set rally points so freshly-built units head somewhere useful.
- Issue patrol routes for scouts.
- Order escorts (a defender follows a settler).
**What players notice:** the AI stops moving one hex at a time. Armies
march in formation. Builders get protected. Scouts cover the map
deliberately.
---
## v1.4 — "Mastery" (Stage 6.9)
**What changes:** the AI learns *against itself* instead of against the
scripted opponents.
- Self-play league: generation 0 plays generation 1 plays generation 2,
and so on. Each generation must beat the entire prior population to
graduate.
- 12-slot huge-map free-for-all is the training arena — no handicaps,
no easier opponents.
- Four specialist variants ship alongside the generalist: **Rush** (early
pressure), **Turtle** (defensive consolidation), **Tech** (research
race), **Economy** (long-game empire). Each is a separate selectable
controller; modders can train their own.
**What players notice:** the strongest AI difficulty tier becomes
**tournament-grade**. The specialists give the campaign distinct
opposition personalities that you can prepare strategies against. New
mod authors get four worked examples (one per specialist) to learn from.
---
## How a player picks AI in v1.0
In the New Game screen, each opponent slot has an AI dropdown. Choose:
- a scripted clan personality (named, themed), or
- the learned AI (currently one: `learned:duel-v1b`).
The five patches above add more entries to that dropdown — they do not
change the UI flow. A v1.0 save loads in v1.4 because every save records
which AI was driving which slot.
---
## How a player picks difficulty
Difficulty is **never** "the AI's brain is smaller." Difficulty is:
- a resource handicap (the AI gets more/fewer starting yields), and
- (for learned AI tiers only) a "temperature" that makes the AI play
more or less consistently.
So the ladder is:
| Difficulty | AI | Why it's harder |
|---|---|---|
| Settler | Peaceful scripted clan | AI starts behind; rarely attacks |
| Chieftain | Default scripted clan | Balanced |
| Warlord | Rotating scripted clans | Multiple personalities; less predictable |
| King (v1.4+) | Best league-gen AI | Genuinely strong AI |
| Champion (v1.0) → (v1.4) | Learned AI, low temperature | Near-optimal play; small handicap |
---
## Cross-references
- Engineering reference: [`ai-production.md`](./ai-production.md)
- Modder contract: [`modding/ai-controller.md`](./modding/ai-controller.md)
- ABI decisions memo: [`modding/abi-decisions.md`](./modding/abi-decisions.md)
- Plan file (internal): `~/.claude/plans/in-the-game-civilization-elegant-popcorn.md`