183 lines
7 KiB
Markdown
183 lines
7 KiB
Markdown
# AI Roadmap — Magic Civilization
|
||
|
||
Designer-facing narrative of what the AI is, what it can and can't do today,
|
||
and how each post-launch patch improves it. Engineering reference is
|
||
[`ai-production.md`](./ai-production.md); modder contract is
|
||
[`modding/ai-controller.md`](./modding/ai-controller.md).
|
||
|
||
---
|
||
|
||
## Where we are at launch (v1.0)
|
||
|
||
Two AI families ship side-by-side:
|
||
|
||
- **Scripted AI** — six named clan personalities (Default, Warmonger,
|
||
Builder, Tinkersmith, Peaceful, Opportunist). Driven by an MCTS-plus-
|
||
heuristic engine. Transparent, tunable from JSON, fast.
|
||
- **Learned AI** — one neural-net opponent named `learned:duel-v1b`,
|
||
trained via reinforcement learning against the scripted clans. Wins
|
||
~90% of 1v1 duels against the baseline. Anchors the **Champion**
|
||
difficulty tier.
|
||
|
||
Difficulty is built by stacking handicaps + policy temperature on top of
|
||
either family — not by training "weaker" networks.
|
||
|
||
---
|
||
|
||
## What the AI knows today (and what it doesn't)
|
||
|
||
Honest diagnosis of the launch learned AI. The simulator state — every
|
||
tile, building, tech, opponent personality, fog-of-war reveal — is fed
|
||
through a heavily-compressed 32-float summary before the policy sees it.
|
||
The result:
|
||
|
||
| The AI **does** understand | The AI **does not** understand |
|
||
|---|---|
|
||
| Its own gold, science, culture | Per-tile terrain (which biomes are around it) |
|
||
| Total city count, total unit count | Individual cities' tile yields or worked tiles |
|
||
| How many opponents are at war / peace | *Which* opponent is which — they're aggregated counts |
|
||
| `science_per_turn` (one number) | The tech tree, prerequisites, or research choices |
|
||
| 16 hardcoded build options | Any building or unit outside that list |
|
||
| One-hex moves and attacks | Pathfinding more than one hex ahead |
|
||
| Whether it has units of type "warrior" or "founder" | Resource stockpiles, strategic resources, luxuries |
|
||
|
||
This is fine for duel-map play against weak opponents (current Champion
|
||
difficulty). It is **not enough** for 12-FFA play, complex maps, late-
|
||
game decisions, or tournament-grade strategy. The five post-launch
|
||
patches below close each of those gaps.
|
||
|
||
The simulator already exposes everything in the right column — the
|
||
limitation is purely how the policy reads it. No engine rewrites are
|
||
needed.
|
||
|
||
---
|
||
|
||
## v1.1 — "Sight" (Stage 6.5)
|
||
|
||
**What changes:** the AI gains *map awareness*.
|
||
|
||
- Reads the actual hex map (biomes, rivers, improvements, fog) instead
|
||
of a summary statistic.
|
||
- Sees the full building + unit catalogs from data files; new content
|
||
becomes trainable automatically.
|
||
- Sees its top-3 most-threatening opponents distinctly (instead of "I am
|
||
at war with N players").
|
||
- Is bootstrapped from recordings of the six scripted personalities, so
|
||
it cold-starts at roughly-scripted-AI strength on day one of training.
|
||
|
||
**What players notice:** the Champion-tier AI no longer makes
|
||
"why-would-anyone-do-that" map decisions. It scouts. It defends
|
||
chokepoints. It expands toward food, not into deserts.
|
||
|
||
---
|
||
|
||
## v1.2 — "Memory" (Stage 6.6)
|
||
|
||
**What changes:** the AI gains *short-term memory*.
|
||
|
||
- A recurrent network layer carries information across turns. The AI
|
||
remembers what it just did and what each opponent just did.
|
||
- Per-opponent memory slots → it forms a working model of each opponent
|
||
individually ("player 5 has been building catapults", "player 7 has
|
||
been turtling").
|
||
|
||
**What players notice:** the AI starts to *adapt*. If you turtle, it
|
||
shifts to siege. If you rush, it shifts to defense. It also stops
|
||
repeating obvious mistakes within a single game.
|
||
|
||
---
|
||
|
||
## v1.3 — "Foresight" (Stage 6.7 + 6.8)
|
||
|
||
The two largest single-patch upgrades, shipped together.
|
||
|
||
### AlphaZero search at inference (6.7)
|
||
|
||
**What changes:** the AI thinks ahead before each decision.
|
||
|
||
- At every turn, the AI runs 64–256 quick simulated futures, guided by
|
||
its trained intuition, and picks the line that looks best.
|
||
- This is the same recipe AlphaGo / AlphaZero used to surpass humans at
|
||
Go and chess.
|
||
- The engine already has the search machinery built (we just plug the
|
||
neural net into it as the policy + value head).
|
||
|
||
**What players notice:** a step-change in tactical strength. **+200–400
|
||
Elo** is the canonical result of adding search to a trained policy. The
|
||
AI stops blundering. It sets traps. It calculates multi-step combats.
|
||
|
||
### Multi-step movement (6.8)
|
||
|
||
**What changes:** the AI commands its units like a player does.
|
||
|
||
- Pick a destination tile; the simulator paths there over multiple turns.
|
||
- Set rally points so freshly-built units head somewhere useful.
|
||
- Issue patrol routes for scouts.
|
||
- Order escorts (a defender follows a settler).
|
||
|
||
**What players notice:** the AI stops moving one hex at a time. Armies
|
||
march in formation. Builders get protected. Scouts cover the map
|
||
deliberately.
|
||
|
||
---
|
||
|
||
## v1.4 — "Mastery" (Stage 6.9)
|
||
|
||
**What changes:** the AI learns *against itself* instead of against the
|
||
scripted opponents.
|
||
|
||
- Self-play league: generation 0 plays generation 1 plays generation 2,
|
||
and so on. Each generation must beat the entire prior population to
|
||
graduate.
|
||
- 12-slot huge-map free-for-all is the training arena — no handicaps,
|
||
no easier opponents.
|
||
- Four specialist variants ship alongside the generalist: **Rush** (early
|
||
pressure), **Turtle** (defensive consolidation), **Tech** (research
|
||
race), **Economy** (long-game empire). Each is a separate selectable
|
||
controller; modders can train their own.
|
||
|
||
**What players notice:** the strongest AI difficulty tier becomes
|
||
**tournament-grade**. The specialists give the campaign distinct
|
||
opposition personalities that you can prepare strategies against. New
|
||
mod authors get four worked examples (one per specialist) to learn from.
|
||
|
||
---
|
||
|
||
## How a player picks AI in v1.0
|
||
|
||
In the New Game screen, each opponent slot has an AI dropdown. Choose:
|
||
- a scripted clan personality (named, themed), or
|
||
- the learned AI (currently one: `learned:duel-v1b`).
|
||
|
||
The five patches above add more entries to that dropdown — they do not
|
||
change the UI flow. A v1.0 save loads in v1.4 because every save records
|
||
which AI was driving which slot.
|
||
|
||
---
|
||
|
||
## How a player picks difficulty
|
||
|
||
Difficulty is **never** "the AI's brain is smaller." Difficulty is:
|
||
|
||
- a resource handicap (the AI gets more/fewer starting yields), and
|
||
- (for learned AI tiers only) a "temperature" that makes the AI play
|
||
more or less consistently.
|
||
|
||
So the ladder is:
|
||
|
||
| Difficulty | AI | Why it's harder |
|
||
|---|---|---|
|
||
| Settler | Peaceful scripted clan | AI starts behind; rarely attacks |
|
||
| Chieftain | Default scripted clan | Balanced |
|
||
| Warlord | Rotating scripted clans | Multiple personalities; less predictable |
|
||
| King (v1.4+) | Best league-gen AI | Genuinely strong AI |
|
||
| Champion (v1.0) → (v1.4) | Learned AI, low temperature | Near-optimal play; small handicap |
|
||
|
||
---
|
||
|
||
## Cross-references
|
||
|
||
- Engineering reference: [`ai-production.md`](./ai-production.md)
|
||
- Modder contract: [`modding/ai-controller.md`](./modding/ai-controller.md)
|
||
- ABI decisions memo: [`modding/abi-decisions.md`](./modding/abi-decisions.md)
|
||
- Plan file (internal): `~/.claude/plans/in-the-game-civilization-elegant-popcorn.md`
|