diff --git a/.project/objectives/p2-67-followup-mcts-tactical-state-impl.md b/.project/objectives/p2-67-followup-mcts-tactical-state-impl.md new file mode 100644 index 00000000..1018e0a7 --- /dev/null +++ b/.project/objectives/p2-67-followup-mcts-tactical-state-impl.md @@ -0,0 +1,187 @@ +--- +id: p2-67-followup-mcts-tactical-state-impl +title: "TreeState impl for TacticalState — wire real MCTS into the AI decision path" +priority: p2 +status: open +scope: game1 +category: simulation +owner: simulator-infra +created: 2026-05-13 +updated_at: 2026-05-13 +blocked_by: [] +follow_ups: [p2-67] +--- + +## Context + +After the night's bug-fix pass (Bugs 1-5 closed, simulation fully playable, last-survivor victory firing), the question "can Claude beat the hardest AI?" hit a deeper architectural finding: + +**There is no MCTS-driven AI anywhere in live code today.** Both the production Godot game (`GdMcTreeController`) and the bench harness (`full_game_transcript.rs`) run `mc_ai::run_ai_turn` — the deterministic tactical heuristic pipeline (movement → combat_predict → settle → production → citizen). The p2-69 port (commit `be088c3ad`) deleted the old `mc_turn::snapshot` MCTS path. The real tree MCTS at `mc-ai/src/mcts_tree.rs::Tree` exists but is vestigial — used only by `abstract_choose_action.rs` test, not wired into any live game decision. + +The 500-turn Claude-with-stronger-weights vs default-weights AIs run ended with Claude in 3rd place because the heuristic is largely weight-insensitive. Personality + seed is the only differentiator across all live AI today. + +## Goal + +Make a meaningful "Claude with MCTS lookahead vs AI with heuristic" experiment possible by implementing `TreeState for TacticalState` (or a thin wrapper) so the existing tree infrastructure can drive over the bench projector's output. + +## Source-of-truth rails + +- **Rust crate**: edit `mc-ai`. Possibly a new submodule `mc_ai::tactical::tree_state` for the impl. +- **JSON path**: none. +- **GDScript**: none in this objective. The production Godot bridge stays on `run_ai_turn` until p2-69 is revisited. + +## Locked decisions + +- **Don't change `TacticalState`'s shape**. The bench projector (`mc-player-api::project_tactical`) emits a stable shape; widening it would ripple into the projector + dispatch. +- **Implement `TreeState` for a wrapper type**, not `TacticalState` directly. Avoid trait impl in a foreign crate; wrap in `mc_ai::tactical::TacticalTreeState { inner: TacticalState, depth: u32, scoring: ScoringWeights }`. +- **Action type**: use `mc_ai::tactical::Action` (the existing 14-variant enum). Don't invent a parallel set. +- **Rollout strategy**: random-action-and-score. Use `decide_tactical_actions` to enumerate, sample one, apply, repeat to depth-cap, score via `ScoringWeights`-based heuristic at the end. + +## Surface + +### 1. New module `mc-ai/src/tactical/tree_state.rs` + +```rust +use crate::mcts_tree::TreeState; +use crate::tactical::{Action, TacticalState, decide_tactical_actions}; +use mc_core::scoring_weights::ScoringWeights; + +pub struct TacticalTreeState { + pub inner: TacticalState, + pub depth: u32, + pub max_depth: u32, + pub scoring: ScoringWeights, +} + +impl TreeState for TacticalTreeState { + type Action = Action; + + fn legal_actions(&self) -> Vec { + if self.depth >= self.max_depth { return vec![]; } + // Walk decide_tactical_actions and enumerate full Vec. + // (decide_tactical_actions already returns the heuristic-prioritised + // chain; for MCTS, we want the full legal set, but for v1 the + // tactical pipeline's output IS a reasonable approximation.) + // Future: enumerate exhaustively via per-unit / per-city iteration. + let mut rng = XorShift64::new(self.depth as u64); + decide_tactical_actions(&self.inner, &self.scoring, &mut rng) + } + + fn apply(&self, action: &Self::Action) -> Self { + let mut next_inner = self.inner.clone(); + apply_tactical_action(&mut next_inner, action); // new helper + Self { + inner: next_inner, + depth: self.depth + 1, + max_depth: self.max_depth, + scoring: self.scoring.clone(), + } + } + + fn rollout(&self, rng: &mut XorShift64, horizon: u32, _temperature: f32, root_player: u8) -> f32 { + // Walk forward `horizon` steps applying random legal actions; at terminal, score. + let mut current = self.clone_inner(); + for _ in 0..horizon { + let actions = decide_tactical_actions(¤t, &self.scoring, rng); + if actions.is_empty() { break; } + let idx = (rng.next_u64() as usize) % actions.len(); + apply_tactical_action(&mut current, &actions[idx]); + } + score_for_player(¤t, root_player, &self.scoring) // new helper, [0, 1] + } +} +``` + +### 2. New helper `mc-ai::tactical::apply_tactical_action` + +Mirror what `apply_ai_action` does in `mc-player-api/src/dispatch.rs`, but operates on `TacticalState` directly (no GameState round-trip). For each `Action` variant, apply the corresponding `TacticalState` mutation: +- `MoveUnit { unit_id, to }` — find unit, update position. +- `AttackTarget { attacker_id, target_id }` — apply damage (simplified vs full mc-combat). +- `FoundCity { settler_id, at_hex }` — consume settler, push city. +- `EnqueueBuild { city_id, item_id }` — find city, set queue slot. +- ...all 14 variants. + +This is the bulk of the work — ~14 small state mutations. + +### 3. New helper `mc-ai::tactical::score_for_player` + +Aggregate a [0, 1] reward from `TacticalState` for a given player. Use ScoringWeights to weight: +- Cities owned (Settle) +- Units alive (Build/Defend) +- Enemy units killed since rollout start (Attack) +- Score axes already in `TacticalState.strategic_axes` + +### 4. Bench wiring + +In `mc-player-api/tests/full_game_transcript.rs`, replace the existing `pick_claude_action_mcts` policy with: + +```rust +fn pick_claude_action_real_mcts(state: &GameState, player: PlayerId, seed: u64) -> PlayerAction { + let tactical = project_tactical(state, player); + let wrapper = TacticalTreeState { + inner: tactical, + depth: 0, + max_depth: 5, + scoring: ScoringWeights::default(), + }; + let mut tree = Tree::new(wrapper); + let mut rng = XorShift64::new(seed); + for _ in 0..1000 { // budget + tree.iterate(&mut rng); + } + // Generic Tree::most_visited_action_at_root doesn't exist; inline it. + let best_action = walk_root_children_pick_most_visits(&tree); + translate_to_player_action(state, player, best_action) +} +``` + +### 5. Generic `most_visited_action_at_root` for `Tree` + +The existing one is specialized on `Tree`. Add a generic version. Should be a 10-line extension at `mcts_tree.rs`. + +### 6. New `claude_real_mcts_vs_ai_transcript` test + +`#[ignore]`'d, 500-turn run. Output to `.local/demo-runs/2026-05-XX-claude-real-mcts/`. Recap reports winner + Claude's MCTS action-frequency distribution. + +## Acceptance + +- ☐ `TacticalTreeState` exists with `TreeState` impl. +- ☐ `apply_tactical_action` covers all 14 `Action` variants. +- ☐ `score_for_player` returns [0, 1] reward. +- ☐ Generic `most_visited_action_at_root` added to `Tree`. +- ☐ `claude_real_mcts_vs_ai_transcript` test runs in < 5 minutes wall clock. +- ☐ Claude with 1000-rollout budget MCTS shows meaningful action diversity vs the heuristic baseline (action-frequency table differs). +- ☐ Result reported: did Claude win? Top 3? Bottom? +- ☐ Determinism: same seed → byte-identical transcript across two runs. + +## Why this size + +- TacticalTreeState impl: ~1 day +- apply_tactical_action across 14 variants: ~3 days (each variant is small but there's interaction with TacticalState's internal coherence — e.g. unit movement updates positions; FoundCity updates city array; combat updates HP) +- score_for_player: ~1 day +- Generic most_visited_action_at_root: ~2 hours +- Bench wiring + test: ~1 day +- Determinism + tuning: ~1 day + +**Total: 1-2 weeks** for a working skeleton. Production-quality (exhaustive legal-action enumeration, rich rollout heuristics, balanced scoring) extends to 4-6 weeks. + +## Unblocks + +Real answer to "can Claude beat the hardest AI?" — by giving Claude actual search-depth advantage that AI slots don't have. + +Also: if MCTS-driven Claude consistently wins, that validates the simulation's score function as a reliable reward signal — which is itself a step toward a better production AI later. + +## Risks + +- TacticalState clone is expensive (Vec, Vec, Vec). 1000 rollouts × 5-depth = 5000 clones per turn. May be too slow for real-time bench. Mitigation: shallow rollout depth (3) or limited-state-snapshot rollouts. +- Heuristic enumeration via `decide_tactical_actions` returns a CURATED list, not exhaustive legal actions. MCTS exploring only that list explores less than the true game tree — may not surface MCTS-vs-heuristic differential. Mitigation: write an exhaustive enumerator alongside. +- ScoringWeights tuning matters more than search depth at low budgets. The 500-turn run already proved weight changes barely affect heuristic output. May be true for MCTS too. + +## References + +- `src/simulator/crates/mc-ai/src/mcts_tree.rs::TreeState` trait (lines 13-58). +- `src/simulator/crates/mc-ai/src/tactical/mod.rs::Action` enum (14 variants, lines 55-196). +- `src/simulator/crates/mc-ai/src/rollout.rs::GameRolloutState` — existing wrapper for the GPU rollout path; reference for the wrapper pattern. +- `src/simulator/crates/mc-player-api/src/dispatch.rs::apply_ai_action` — reference for `Action` → state mutation mapping. +- `.project/objectives/p2-67-claude-player-api.md` — context (Real-game analysis section). +- `.project/objectives/p2-69-api-gdext-mctscontroller-port.md` — where MCTS was removed from production.