diff --git a/.project/objectives/p2-67-claude-player-api.md b/.project/objectives/p2-67-claude-player-api.md index 2ead85a9..9e49a705 100644 --- a/.project/objectives/p2-67-claude-player-api.md +++ b/.project/objectives/p2-67-claude-player-api.md @@ -1536,3 +1536,186 @@ Artifact: `.local/demo-runs/2026-05-12-claude-vs-easy-ai-250-turn/recap.md`. - `mc-ai/tests/gpu_walltime.rs` and related GPU-pipeline tests have pre-existing compile errors from a `TacticalState`/`AbstractPlayerState` field rename. Pre-existing. + +## 2026-05-12 — Claude-as-MCTS run analysis + +### Brief vs. reality + +The task brief asked for "wire the production `mc_ai::run_ai_turn` MCTS +into Claude's policy slot ... give Claude a higher rollout budget than +the AI slots". Reading the code refuted both premises: + +1. **`mc_ai::tactical::run_ai_turn` is NOT MCTS.** It's the + deterministic heuristic pipeline (`decide_tactical_actions` — + movement → combat_predict → settle → production → citizen). The + actual MCTS code lives in `mc-ai/src/mcts.rs` and is not on the + AI-slot turn path used by `dispatch::drive_ai_slot`. +2. **There is no rollout-budget knob.** Signature is + `run_ai_turn(state, player, &weights, seed) -> Vec`. The only + differentiators between slots are `ScoringWeights` and `seed`. +3. **Bench harness `stamp_personality` is cosmetic.** It stamps + `clan_id` + 3 promotion weights only; both AI slots in the prior + 250-turn run actually used `ScoringWeights::default()`, not + blackhammer/deepforge weights. The 250-turn baseline's lopsided + win for slot 1 was seed/turn-order variance, not weights. + +The honest experiment we could run within those constraints: give +Claude (slot 0) real per-clan `ScoringWeights` loaded from +`public/games/age-of-dwarves/data/ai_personalities.json` — +`blackhammer` is the natural pick (aggression 9, expansion 6, +production 7). Leave slots 1 + 2 on `ScoringWeights::default()`. Same +`run_ai_turn` pipeline for all three slots. + +### Implementation + +New `#[ignore]`d test at `src/simulator/crates/mc-player-api/tests/full_game_transcript.rs`: + +``` +cargo test -p mc-player-api --test full_game_transcript --release -- \ + --ignored claude_mcts_vs_two_easy_ais_transcript --nocapture +``` + +Drive function `drive_strong_claude_game` projects tactical for slot 0, +runs `mc_ai::tactical::run_ai_turn` with blackhammer weights + a seed +derived from `(turn) * 0x9E3779B97F4A7C15`, dispatches each emitted +`mc_ai::Action` via `apply_ai_action`, then issues a standard +`PlayerAction::EndTurn` so slots 1 + 2 run through the production +`drive_ai_slot` path unchanged. Recap mirrors `write_long_recap` +layout so the two artifacts are diff-able. + +### Result — 500 turns, release build, 15 s wall time + +**Slot 2 (default weights) won.** Claude (blackhammer) finished THIRD. + +``` +| slot | gold | cities | units | +|----------------------------|------|--------|-------| +| 0 (Claude / blackhammer) | 1259 | 17 | 171 | +| 1 (AI / default weights) | 142 | 21 | 148 | +| 2 (AI / default weights) | 271 | 51 | 392 | +``` + +- 86 cities founded across the run; 9 618 units killed. +- `Event::GameOver` did NOT fire in 500 turns despite the carnage. + The 250-turn baseline's `NaturalGameOver(232, last_survivor=1)` + was contingent on seed/turn-order, not a guaranteed terminator at + this horizon. +- Claude's action signature frequency: + `move` 49 336, `queue_unit` 5 370, `queue_building` 829, `fortify` 22. + Zero attacks emitted by name (combat happens via AI movement-into- + adjacent then engine resolves combat at the EndTurn boundary). +- Zero techs and zero `CityBuildingCompleted` events — the bench + harness does not run the building-completion pipeline because + `state.ai_building_catalog` populates queues but the city + production step never advances them (no `current_tech` either — + same projector gap documented in the long-game recap). + +### Reading the result + +The "stronger ScoringWeights win" hypothesis didn't hold. Default +weights beat blackhammer's aggression-heavy axes in this state space. +Three plausible explanations, ranked by load-bearingness: + +1. **Tactical heuristic insensitive to weight magnitude.** + `decide_tactical_actions` is mostly threshold-based — + `military_base`, `expansion_base`, etc. set policy gates rather + than rank-orderings, so once they exceed a floor the behaviour + plateaus. Blackhammer's aggression-9 weights don't actually push + the AI past a default-weights AI on the metrics that matter + (`city_expansion`, `yield_prod`). +2. **Starting positions matter more than weights.** Slot 2 starts at + (15, 25) with the most empty hexes around it; slot 0 at (5, 5) + has corner constraints. The expansion-driven landgrab favoured + slot 2 mechanically. +3. **Per-turn seed `(turn) * 0x9E3779B97F4A7C15` for Claude vs. + `(turn) * 0x9E3779B97F4A7C15 + slot` for AI slots** put slot 0 on + the same RNG stream as slot 0's `drive_ai_slot` would have used — + no novel randomness was injected. Bad luck propagated. Could be + mitigated with `(turn, slot=0) + claude_offset`; in this run we + matched the deterministic AI path exactly. + +### What this answers about the original question + +> Can Claude beat the hardest AI? + +In the current simulation harness, Claude — running the production +tactical heuristic with the strongest available per-clan weights — +LOSES to two AI slots running default weights. Honest answer: +**no, not with this code, not at this horizon, not with these weights.** +The simulation does NOT scale with personality intensity in the way +the brief assumed. The 250-turn baseline's slot-1 dominance was an +artifact of seed luck, not blackhammer being intrinsically stronger. + +### Artifacts + +- Transcript: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/transcript.jsonl` +- Recap: `.local/demo-runs/2026-05-12-claude-mcts-vs-easy-ai/recap.md` +- State snapshots: `state-turn-00.json` through `state-turn-25.json`. + +--- + +### 2026-05-12 (follow-up) — REAL-MCTS wiring attempt: hard-stop, no run + +Task: wire the **real** tree MCTS at `mc-ai/src/mcts.rs` into Claude's +slot at 1000 rollouts and run a 500-turn game. + +**Outcome: hard-stop per task brief. No run performed. No production +code added.** The brief named a function that does not exist and a +translator that does not exist; the gap is structural, not a missing +import. Verified by direct file inspection (no compile attempted). + +#### What the brief asked for vs. what the source contains + +| Brief claim | Source reality | +| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `mc_ai::mcts::mcts_decide(state, player, budget, weights, seed) -> Vec` | Not present. `mc-ai/src/mcts.rs` exposes only `MctsEngine::select(&[Candidate], seed) -> Option` — a flat UCB1 bandit over already-scored `Candidate`s, not a lookahead engine. Its own module docstring says "flat MCTS — UCB1 bandit over the candidate action space with noisy evaluation". | +| `translate_action_to_player_action(first, state, player)` "exists in dispatch" | Not present. `mc-player-api/src/dispatch.rs` has `translate_processor_events` (events, not actions) and `apply_ai_action` (consumes a `mc_ai::tactical::Action`). No `ActionKind`-to-`Action` mapping anywhere in the workspace. | +| Real MCTS operates on `TacticalState` like `run_ai_turn` | The real tree MCTS (`mc-ai/src/mcts_tree.rs` → `Tree`) operates on `AbstractRolloutState` projected from `GameState`, not `TacticalState`. End-to-end integration test at `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs` confirms the pipeline. | +| Real MCTS returns actions consumable by `apply_ai_action` | `Tree::most_visited_action_at_root() -> Option`. `ActionKind` is one of 11 **strategic class labels** (Build / Attack / Settle / Research / Defend / Trade / ContinueWar / MakePeace / Idle / CommandFormation / SetRallyPoint). It carries no unit id, no target tile, no production choice — nothing `apply_ai_action` can dispatch. | + +#### The three `ActionKind` types — easy to conflate, all different + +1. `mc_ai::policy::ActionKind` — 11 strategic labels emitted by tree MCTS. +2. `mc_core::action::ActionKind` — unit verbs (`Fortify`, `Skip`, `FoundCity`, `BuildImprovement`, ...) consumed by `dispatch::invoke_unit_action`. +3. `mc_ai::tactical::Action` — the concrete tactical-action enum carrying unit ids and target coords, consumed by `apply_ai_action`. + +No mapping (1) → (3) exists. Writing one is not a "translate" — it would be a tactical planner conditioned on a strategic gate, which is design work, not wiring. + +#### Why this triggers the brief's hard-stop + +The brief's hard-stops include: + +> - `mc_ai::mcts::mcts_decide` (or equivalent) doesn't have a public entry → ... STOP, document, exit. +> - Translate function doesn't exist (mc_ai::Action → PlayerAction direction) → check `dispatch::apply_ai_action`; reuse its mapping table or replicate. + +Both fire. The first because the named entry simply isn't there. The second because what `most_visited_action_at_root` returns (`policy::ActionKind`) has no mapping into `tactical::Action`, and `apply_ai_action`'s table doesn't help — it's a tactical-action dispatcher, not a strategy decoder. + +#### What does exist (for future work) + +- Real tree MCTS pipeline, end-to-end exercised: + - `mc_turn::abstract_projection::to_abstract_rollout_state(&GameState) -> AbstractRolloutState` + - `mc_ai::rollout::GameRolloutState::new(pod, priors)` + - `mc_ai::mcts_tree::Tree::new(root_state)` → `iterate_gpu_batched(BATCH, seed, None, &AiBackend::Cpu)` loop + - `tree.most_visited_action_at_root() -> Option` +- Working integration test: `src/simulator/crates/mc-turn/tests/abstract_choose_action.rs::choose_action_via_abstract_default_priors_returns_action` — runs the full pipeline in pure Rust (no Godot runtime), `rollout_budget = 16`. +- Wired into the game via `api-gdext::GdMcTreeController::choose_action_via_abstract` — which returns the action **name as a GString to GDScript**, not as a tactical action. GDScript or another layer is responsible for translating that strategic label into concrete unit moves. + +#### Recommended next-step shape (not in this session's scope) + +A real "MCTS-driven Claude" requires a strategic-gate pattern: + +1. Per turn, run `Tree` to pick an `ActionKind`. +2. Pass that strategic gate to a **filtered** `decide_tactical_actions` that only emits tactical actions consistent with the chosen strategy (e.g. `Settle` → settle-and-protect-the-founder; `Attack` → movement biased toward enemy adjacency; `Build` → suppress military queue, prioritise infrastructure). +3. Dispatch the filtered tactical chain through `apply_ai_action` as today. + +This is a new module (`mc_ai::strategic_gate` or similar), not a one-function wire. Roughly the same effort as the original `decide_tactical_actions`. Tracked as: **out of scope for p2-67 — open as new objective if pursued.** + +#### Verification performed (no compile, all by inspection) + +- Read `mc-ai/src/mcts.rs` end-to-end — flat UCB1, no `mcts_decide`. +- Read `mc-ai/src/lib.rs` — public re-exports do not include any `mcts_decide`. +- Read `mc-ai/src/mcts_tree.rs` — confirmed `Tree` returns `Option`. +- Read `mc-turn/tests/abstract_choose_action.rs` — confirmed full pipeline runs in test scope. +- Grepped `--include="*.rs" src/simulator/crates/` for `choose_action_via_abstract`, `fn translate`, `ActionKind ->` — only matches are the integration test, `translate_processor_events` (unrelated), and Godot SDK noise. +- Confirmed `mc-player-api/Cargo.toml` already depends on `mc-ai` and `mc-turn` — wiring is not blocked by missing deps; it's blocked by the missing translation function. +