feat(@projects/@magic-civilization): add claude-mcts vs easy-ai experiment analysis

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Natalie 2026-05-12 17:23:44 -07:00
parent 98a98155d1
commit cb0f18361e

View file

@ -1536,3 +1536,186 @@ Artifact: `.local/demo-runs/2026-05-12-claude-vs-easy-ai-250-turn/recap.md`.
- `mc-ai/tests/gpu_walltime.rs` and related GPU-pipeline tests have
pre-existing compile errors from a `TacticalState`/`AbstractPlayerState`
field rename. Pre-existing.
## 2026-05-12 — Claude-as-MCTS run analysis
### Brief vs. reality
The task brief asked for "wire the production `mc_ai::run_ai_turn` MCTS
into Claude's policy slot ... give Claude a higher rollout budget than
the AI slots". Reading the code refuted both premises:
1. **`mc_ai::tactical::run_ai_turn` is NOT MCTS.** It's the
deterministic heuristic pipeline (`decide_tactical_actions`
movement → combat_predict → settle → production → citizen). The
actual MCTS code lives in `mc-ai/src/mcts.rs` and is not on the
AI-slot turn path used by `dispatch::drive_ai_slot`.
2. **There is no rollout-budget knob.** Signature is
`run_ai_turn(state, player, &weights, seed) -> Vec<Action>`. The only
differentiators between slots are `ScoringWeights` and `seed`.
3. **Bench harness `stamp_personality` is cosmetic.** It stamps
`clan_id` + 3 promotion weights only; both AI slots in the prior
250-turn run actually used `ScoringWeights::default()`, not
blackhammer/deepforge weights. The 250-turn baseline's lopsided
win for slot 1 was seed/turn-order variance, not weights.
The honest experiment we could run within those constraints: give
Claude (slot 0) real per-clan `ScoringWeights` loaded from
`public/games/age-of-dwarves/data/ai_personalities.json`
`blackhammer` is the natural pick (aggression 9, expansion 6,
production 7). Leave slots 1 + 2 on `ScoringWeights::default()`. Same
`run_ai_turn` pipeline for all three slots.
### Implementation
New `#[ignore]`d test at `src/simulator/crates/mc-player-api/tests/full_game_transcript.rs`:
```
cargo test -p mc-player-api --test full_game_transcript --release -- \
--ignored claude_mcts_vs_two_easy_ais_transcript --nocapture
```
Drive function `drive_strong_claude_game` projects tactical for slot 0,
runs `mc_ai::tactical::run_ai_turn` with blackhammer weights + a seed
derived from `(turn) * 0x9E3779B97F4A7C15`, dispatches each emitted
`mc_ai::Action` via `apply_ai_action`, then issues a standard
`PlayerAction::EndTurn` so slots 1 + 2 run through the production
`drive_ai_slot` path unchanged. Recap mirrors `write_long_recap`
layout so the two artifacts are diff-able.
### Result — 500 turns, release build, 15 s wall time
**Slot 2 (default weights) won.** Claude (blackhammer) finished THIRD.
```
| slot | gold | cities | units |
|----------------------------|------|--------|-------|
| 0 (Claude / blackhammer) | 1259 | 17 | 171 |
| 1 (AI / default weights) | 142 | 21 | 148 |
| 2 (AI / default weights) | 271 | 51 | 392 |
```
- 86 cities founded across the run; 9 618 units killed.
- `Event::GameOver` did NOT fire in 500 turns despite the carnage.
The 250-turn baseline's `NaturalGameOver(232, last_survivor=1)`
was contingent on seed/turn-order, not a guaranteed terminator at
this horizon.
- Claude's action signature frequency:
`move` 49 336, `queue_unit` 5 370, `queue_building` 829, `fortify` 22.
Zero attacks emitted by name (combat happens via AI movement-into-
adjacent then engine resolves combat at the EndTurn boundary).
- Zero techs and zero `CityBuildingCompleted` events — the bench
harness does not run the building-completion pipeline because
`state.ai_building_catalog` populates queues but the city
production step never advances them (no `current_tech` either —
same projector gap documented in the long-game recap).
### Reading the result
The "stronger ScoringWeights win" hypothesis didn't hold. Default
weights beat blackhammer's aggression-heavy axes in this state space.
Three plausible explanations, ranked by load-bearingness:
1. **Tactical heuristic insensitive to weight magnitude.**
`decide_tactical_actions` is mostly threshold-based —
`military_base`, `expansion_base`, etc. set policy gates rather
than rank-orderings, so once they exceed a floor the behaviour
plateaus. Blackhammer's aggression-9 weights don't actually push
the AI past a default-weights AI on the metrics that matter
(`city_expansion`, `yield_prod`).
2. **Starting positions matter more than weights.** Slot 2 starts at
(15, 25) with the most empty hexes around it; slot 0 at (5, 5)
has corner constraints. The expansion-driven landgrab favoured
slot 2 mechanically.
3. **Per-turn seed `(turn) * 0x9E3779B97F4A7C15` for Claude vs.
`(turn) * 0x9E3779B97F4A7C15 + slot` for AI slots** put slot 0 on
the same RNG stream as slot 0's `drive_ai_slot` would have used —
no novel randomness was injected. Bad luck propagated. Could be
mitigated with `(turn, slot=0) + claude_offset`; in this run we
matched the deterministic AI path exactly.
### What this answers about the original question
> Can Claude beat the hardest AI?
In the current simulation harness, Claude — running the production
tactical heuristic with the strongest available per-clan weights —
LOSES to two AI slots running default weights. Honest answer:
**no, not with this code, not at this horizon, not with these weights.**
The simulation does NOT scale with personality intensity in the way
the brief assumed. The 250-turn baseline's slot-1 dominance was an
artifact of seed luck, not blackhammer being intrinsically stronger.
### Artifacts
- Transcript: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/transcript.jsonl`
- Recap: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/recap.md`
- State snapshots: `state-turn-00.json` through `state-turn-25.json`.
---
### 2026-05-12 (follow-up) — REAL-MCTS wiring attempt: hard-stop, no run
Task: wire the **real** tree MCTS at `mc-ai/src/mcts.rs` into Claude's
slot at 1000 rollouts and run a 500-turn game.
**Outcome: hard-stop per task brief. No run performed. No production
code added.** The brief named a function that does not exist and a
translator that does not exist; the gap is structural, not a missing
import. Verified by direct file inspection (no compile attempted).
#### What the brief asked for vs. what the source contains
| Brief claim | Source reality |
| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mc_ai::mcts::mcts_decide(state, player, budget, weights, seed) -> Vec<Action>` | Not present. `mc-ai/src/mcts.rs` exposes only `MctsEngine::select(&[Candidate], seed) -> Option<usize>` — a flat UCB1 bandit over already-scored `Candidate`s, not a lookahead engine. Its own module docstring says "flat MCTS — UCB1 bandit over the candidate action space with noisy evaluation". |
| `translate_action_to_player_action(first, state, player)` "exists in dispatch" | Not present. `mc-player-api/src/dispatch.rs` has `translate_processor_events` (events, not actions) and `apply_ai_action` (consumes a `mc_ai::tactical::Action`). No `ActionKind`-to-`Action` mapping anywhere in the workspace. |
| Real MCTS operates on `TacticalState` like `run_ai_turn` | The real tree MCTS (`mc-ai/src/mcts_tree.rs``Tree<GameRolloutState>`) operates on `AbstractRolloutState` projected from `GameState`, not `TacticalState`. End-to-end integration test at `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs` confirms the pipeline. |
| Real MCTS returns actions consumable by `apply_ai_action` | `Tree<GameRolloutState>::most_visited_action_at_root() -> Option<mc_ai::policy::ActionKind>`. `ActionKind` is one of 11 **strategic class labels** (Build / Attack / Settle / Research / Defend / Trade / ContinueWar / MakePeace / Idle / CommandFormation / SetRallyPoint). It carries no unit id, no target tile, no production choice — nothing `apply_ai_action` can dispatch. |
#### The three `ActionKind` types — easy to conflate, all different
1. `mc_ai::policy::ActionKind` — 11 strategic labels emitted by tree MCTS.
2. `mc_core::action::ActionKind` — unit verbs (`Fortify`, `Skip`, `FoundCity`, `BuildImprovement`, ...) consumed by `dispatch::invoke_unit_action`.
3. `mc_ai::tactical::Action` — the concrete tactical-action enum carrying unit ids and target coords, consumed by `apply_ai_action`.
No mapping (1) → (3) exists. Writing one is not a "translate" — it would be a tactical planner conditioned on a strategic gate, which is design work, not wiring.
#### Why this triggers the brief's hard-stop
The brief's hard-stops include:
> - `mc_ai::mcts::mcts_decide` (or equivalent) doesn't have a public entry → ... STOP, document, exit.
> - Translate function doesn't exist (mc_ai::Action → PlayerAction direction) → check `dispatch::apply_ai_action`; reuse its mapping table or replicate.
Both fire. The first because the named entry simply isn't there. The second because what `most_visited_action_at_root` returns (`policy::ActionKind`) has no mapping into `tactical::Action`, and `apply_ai_action`'s table doesn't help — it's a tactical-action dispatcher, not a strategy decoder.
#### What does exist (for future work)
- Real tree MCTS pipeline, end-to-end exercised:
- `mc_turn::abstract_projection::to_abstract_rollout_state(&GameState) -> AbstractRolloutState`
- `mc_ai::rollout::GameRolloutState::new(pod, priors)`
- `mc_ai::mcts_tree::Tree::new(root_state)``iterate_gpu_batched(BATCH, seed, None, &AiBackend::Cpu)` loop
- `tree.most_visited_action_at_root() -> Option<ActionKind>`
- Working integration test: `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs::choose_action_via_abstract_default_priors_returns_action` — runs the full pipeline in pure Rust (no Godot runtime), `rollout_budget = 16`.
- Wired into the game via `api-gdext::GdMcTreeController::choose_action_via_abstract` — which returns the action **name as a GString to GDScript**, not as a tactical action. GDScript or another layer is responsible for translating that strategic label into concrete unit moves.
#### Recommended next-step shape (not in this session's scope)
A real "MCTS-driven Claude" requires a strategic-gate pattern:
1. Per turn, run `Tree<GameRolloutState>` to pick an `ActionKind`.
2. Pass that strategic gate to a **filtered** `decide_tactical_actions` that only emits tactical actions consistent with the chosen strategy (e.g. `Settle` → settle-and-protect-the-founder; `Attack` → movement biased toward enemy adjacency; `Build` → suppress military queue, prioritise infrastructure).
3. Dispatch the filtered tactical chain through `apply_ai_action` as today.
This is a new module (`mc_ai::strategic_gate` or similar), not a one-function wire. Roughly the same effort as the original `decide_tactical_actions`. Tracked as: **out of scope for p2-67 — open as new objective if pursued.**
#### Verification performed (no compile, all by inspection)
- Read `mc-ai/src/mcts.rs` end-to-end — flat UCB1, no `mcts_decide`.
- Read `mc-ai/src/lib.rs` — public re-exports do not include any `mcts_decide`.
- Read `mc-ai/src/mcts_tree.rs` — confirmed `Tree<GameRolloutState>` returns `Option<ActionKind>`.
- Read `mc-turn/tests/abstract_choose_action.rs` — confirmed full pipeline runs in test scope.
- Grepped `--include="*.rs" src/simulator/crates/` for `choose_action_via_abstract`, `fn translate`, `ActionKind ->` — only matches are the integration test, `translate_processor_events` (unrelated), and Godot SDK noise.
- Confirmed `mc-player-api/Cargo.toml` already depends on `mc-ai` and `mc-turn` — wiring is not blocked by missing deps; it's blocked by the missing translation function.