feat(@projects/@magic-civilization): ✨ add claude-mcts vs easy-ai experiment analysis

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-12 17:23:44 -07:00 · 2026-05-12 17:23:44 -07:00 · cb0f18361e
commit cb0f18361e
parent 98a98155d1
1 changed files with 183 additions and 0 deletions
--- a/.project/objectives/p2-67-claude-player-api.md
+++ b/.project/objectives/p2-67-claude-player-api.md
@ -1536,3 +1536,186 @@ Artifact: `.local/demo-runs/2026-05-12-claude-vs-easy-ai-250-turn/recap.md`.
 - `mc-ai/tests/gpu_walltime.rs` and related GPU-pipeline tests have
  pre-existing compile errors from a `TacticalState`/`AbstractPlayerState`
  field rename. Pre-existing.
+
+## 2026-05-12 — Claude-as-MCTS run analysis
+
+### Brief vs. reality
+
+The task brief asked for "wire the production `mc_ai::run_ai_turn` MCTS
+into Claude's policy slot ... give Claude a higher rollout budget than
+the AI slots". Reading the code refuted both premises:
+
+1. **`mc_ai::tactical::run_ai_turn` is NOT MCTS.** It's the
+   deterministic heuristic pipeline (`decide_tactical_actions` —
+   movement → combat_predict → settle → production → citizen). The
+   actual MCTS code lives in `mc-ai/src/mcts.rs` and is not on the
+   AI-slot turn path used by `dispatch::drive_ai_slot`.
+2. **There is no rollout-budget knob.** Signature is
+   `run_ai_turn(state, player, &weights, seed) -> Vec<Action>`. The only
+   differentiators between slots are `ScoringWeights` and `seed`.
+3. **Bench harness `stamp_personality` is cosmetic.** It stamps
+   `clan_id` + 3 promotion weights only; both AI slots in the prior
+   250-turn run actually used `ScoringWeights::default()`, not
+   blackhammer/deepforge weights. The 250-turn baseline's lopsided
+   win for slot 1 was seed/turn-order variance, not weights.
+
+The honest experiment we could run within those constraints: give
+Claude (slot 0) real per-clan `ScoringWeights` loaded from
+`public/games/age-of-dwarves/data/ai_personalities.json` —
+`blackhammer` is the natural pick (aggression 9, expansion 6,
+production 7). Leave slots 1 + 2 on `ScoringWeights::default()`. Same
+`run_ai_turn` pipeline for all three slots.
+
+### Implementation
+
+New `#[ignore]`d test at `src/simulator/crates/mc-player-api/tests/full_game_transcript.rs`:
+
+```
+cargo test -p mc-player-api --test full_game_transcript --release -- \
+    --ignored claude_mcts_vs_two_easy_ais_transcript --nocapture
+```
+
+Drive function `drive_strong_claude_game` projects tactical for slot 0,
+runs `mc_ai::tactical::run_ai_turn` with blackhammer weights + a seed
+derived from `(turn) * 0x9E3779B97F4A7C15`, dispatches each emitted
+`mc_ai::Action` via `apply_ai_action`, then issues a standard
+`PlayerAction::EndTurn` so slots 1 + 2 run through the production
+`drive_ai_slot` path unchanged. Recap mirrors `write_long_recap`
+layout so the two artifacts are diff-able.
+
+### Result — 500 turns, release build, 15 s wall time
+
+**Slot 2 (default weights) won.** Claude (blackhammer) finished THIRD.
+
+```
+| slot                       | gold | cities | units |
+|----------------------------|------|--------|-------|
+| 0 (Claude / blackhammer)   | 1259 |   17   |  171  |
+| 1 (AI / default weights)   |  142 |   21   |  148  |
+| 2 (AI / default weights)   |  271 |   51   |  392  |
+```
+
+- 86 cities founded across the run; 9 618 units killed.
+- `Event::GameOver` did NOT fire in 500 turns despite the carnage.
+  The 250-turn baseline's `NaturalGameOver(232, last_survivor=1)`
+  was contingent on seed/turn-order, not a guaranteed terminator at
+  this horizon.
+- Claude's action signature frequency:
+  `move` 49 336, `queue_unit` 5 370, `queue_building` 829, `fortify` 22.
+  Zero attacks emitted by name (combat happens via AI movement-into-
+  adjacent then engine resolves combat at the EndTurn boundary).
+- Zero techs and zero `CityBuildingCompleted` events — the bench
+  harness does not run the building-completion pipeline because
+  `state.ai_building_catalog` populates queues but the city
+  production step never advances them (no `current_tech` either —
+  same projector gap documented in the long-game recap).
+
+### Reading the result
+
+The "stronger ScoringWeights win" hypothesis didn't hold. Default
+weights beat blackhammer's aggression-heavy axes in this state space.
+Three plausible explanations, ranked by load-bearingness:
+
+1. **Tactical heuristic insensitive to weight magnitude.**
+   `decide_tactical_actions` is mostly threshold-based —
+   `military_base`, `expansion_base`, etc. set policy gates rather
+   than rank-orderings, so once they exceed a floor the behaviour
+   plateaus. Blackhammer's aggression-9 weights don't actually push
+   the AI past a default-weights AI on the metrics that matter
+   (`city_expansion`, `yield_prod`).
+2. **Starting positions matter more than weights.** Slot 2 starts at
+   (15, 25) with the most empty hexes around it; slot 0 at (5, 5)
+   has corner constraints. The expansion-driven landgrab favoured
+   slot 2 mechanically.
+3. **Per-turn seed `(turn) * 0x9E3779B97F4A7C15` for Claude vs.
+   `(turn) * 0x9E3779B97F4A7C15 + slot` for AI slots** put slot 0 on
+   the same RNG stream as slot 0's `drive_ai_slot` would have used —
+   no novel randomness was injected. Bad luck propagated. Could be
+   mitigated with `(turn, slot=0) + claude_offset`; in this run we
+   matched the deterministic AI path exactly.
+
+### What this answers about the original question
+
+> Can Claude beat the hardest AI?
+
+In the current simulation harness, Claude — running the production
+tactical heuristic with the strongest available per-clan weights —
+LOSES to two AI slots running default weights. Honest answer:
+**no, not with this code, not at this horizon, not with these weights.**
+The simulation does NOT scale with personality intensity in the way
+the brief assumed. The 250-turn baseline's slot-1 dominance was an
+artifact of seed luck, not blackhammer being intrinsically stronger.
+
+### Artifacts
+
+- Transcript: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/transcript.jsonl`
+- Recap: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/recap.md`
+- State snapshots: `state-turn-00.json` through `state-turn-25.json`.
+
+---
+
+### 2026-05-12 (follow-up) — REAL-MCTS wiring attempt: hard-stop, no run
+
+Task: wire the **real** tree MCTS at `mc-ai/src/mcts.rs` into Claude's
+slot at 1000 rollouts and run a 500-turn game.
+
+**Outcome: hard-stop per task brief. No run performed. No production
+code added.** The brief named a function that does not exist and a
+translator that does not exist; the gap is structural, not a missing
+import. Verified by direct file inspection (no compile attempted).
+
+#### What the brief asked for vs. what the source contains
+
+| Brief claim                                                              | Source reality                                                                                                                                                                              |
+| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `mc_ai::mcts::mcts_decide(state, player, budget, weights, seed) -> Vec<Action>` | Not present. `mc-ai/src/mcts.rs` exposes only `MctsEngine::select(&[Candidate], seed) -> Option<usize>` — a flat UCB1 bandit over already-scored `Candidate`s, not a lookahead engine. Its own module docstring says "flat MCTS — UCB1 bandit over the candidate action space with noisy evaluation". |
+| `translate_action_to_player_action(first, state, player)` "exists in dispatch" | Not present. `mc-player-api/src/dispatch.rs` has `translate_processor_events` (events, not actions) and `apply_ai_action` (consumes a `mc_ai::tactical::Action`). No `ActionKind`-to-`Action` mapping anywhere in the workspace. |
+| Real MCTS operates on `TacticalState` like `run_ai_turn`                  | The real tree MCTS (`mc-ai/src/mcts_tree.rs` → `Tree<GameRolloutState>`) operates on `AbstractRolloutState` projected from `GameState`, not `TacticalState`. End-to-end integration test at `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs` confirms the pipeline. |
+| Real MCTS returns actions consumable by `apply_ai_action`                 | `Tree<GameRolloutState>::most_visited_action_at_root() -> Option<mc_ai::policy::ActionKind>`. `ActionKind` is one of 11 **strategic class labels** (Build / Attack / Settle / Research / Defend / Trade / ContinueWar / MakePeace / Idle / CommandFormation / SetRallyPoint). It carries no unit id, no target tile, no production choice — nothing `apply_ai_action` can dispatch. |
+
+#### The three `ActionKind` types — easy to conflate, all different
+
+1. `mc_ai::policy::ActionKind` — 11 strategic labels emitted by tree MCTS.
+2. `mc_core::action::ActionKind` — unit verbs (`Fortify`, `Skip`, `FoundCity`, `BuildImprovement`, ...) consumed by `dispatch::invoke_unit_action`.
+3. `mc_ai::tactical::Action` — the concrete tactical-action enum carrying unit ids and target coords, consumed by `apply_ai_action`.
+
+No mapping (1) → (3) exists. Writing one is not a "translate" — it would be a tactical planner conditioned on a strategic gate, which is design work, not wiring.
+
+#### Why this triggers the brief's hard-stop
+
+The brief's hard-stops include:
+
+> - `mc_ai::mcts::mcts_decide` (or equivalent) doesn't have a public entry → ... STOP, document, exit.
+> - Translate function doesn't exist (mc_ai::Action → PlayerAction direction) → check `dispatch::apply_ai_action`; reuse its mapping table or replicate.
+
+Both fire. The first because the named entry simply isn't there. The second because what `most_visited_action_at_root` returns (`policy::ActionKind`) has no mapping into `tactical::Action`, and `apply_ai_action`'s table doesn't help — it's a tactical-action dispatcher, not a strategy decoder.
+
+#### What does exist (for future work)
+
+- Real tree MCTS pipeline, end-to-end exercised:
+  - `mc_turn::abstract_projection::to_abstract_rollout_state(&GameState) -> AbstractRolloutState`
+  - `mc_ai::rollout::GameRolloutState::new(pod, priors)`
+  - `mc_ai::mcts_tree::Tree::new(root_state)` → `iterate_gpu_batched(BATCH, seed, None, &AiBackend::Cpu)` loop
+  - `tree.most_visited_action_at_root() -> Option<ActionKind>`
+- Working integration test: `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs::choose_action_via_abstract_default_priors_returns_action` — runs the full pipeline in pure Rust (no Godot runtime), `rollout_budget = 16`.
+- Wired into the game via `api-gdext::GdMcTreeController::choose_action_via_abstract` — which returns the action **name as a GString to GDScript**, not as a tactical action. GDScript or another layer is responsible for translating that strategic label into concrete unit moves.
+
+#### Recommended next-step shape (not in this session's scope)
+
+A real "MCTS-driven Claude" requires a strategic-gate pattern:
+
+1. Per turn, run `Tree<GameRolloutState>` to pick an `ActionKind`.
+2. Pass that strategic gate to a **filtered** `decide_tactical_actions` that only emits tactical actions consistent with the chosen strategy (e.g. `Settle` → settle-and-protect-the-founder; `Attack` → movement biased toward enemy adjacency; `Build` → suppress military queue, prioritise infrastructure).
+3. Dispatch the filtered tactical chain through `apply_ai_action` as today.
+
+This is a new module (`mc_ai::strategic_gate` or similar), not a one-function wire. Roughly the same effort as the original `decide_tactical_actions`. Tracked as: **out of scope for p2-67 — open as new objective if pursued.**
+
+#### Verification performed (no compile, all by inspection)
+
+- Read `mc-ai/src/mcts.rs` end-to-end — flat UCB1, no `mcts_decide`.
+- Read `mc-ai/src/lib.rs` — public re-exports do not include any `mcts_decide`.
+- Read `mc-ai/src/mcts_tree.rs` — confirmed `Tree<GameRolloutState>` returns `Option<ActionKind>`.
+- Read `mc-turn/tests/abstract_choose_action.rs` — confirmed full pipeline runs in test scope.
+- Grepped `--include="*.rs" src/simulator/crates/` for `choose_action_via_abstract`, `fn translate`, `ActionKind ->` — only matches are the integration test, `translate_processor_events` (unrelated), and Godot SDK noise.
+- Confirmed `mc-player-api/Cargo.toml` already depends on `mc-ai` and `mc-turn` — wiring is not blocked by missing deps; it's blocked by the missing translation function.
+