feat(@projects/@magic-civilization): ✨ implement mcts tree state wrapper for tactical state

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-13 11:29:38 -07:00 · 2026-05-13 11:29:38 -07:00 · f1ef762aa8
commit f1ef762aa8
parent 8e0ad24aca
1 changed files with 187 additions and 0 deletions
--- a/.project/objectives/p2-67-followup-mcts-tactical-state-impl.md
+++ b/.project/objectives/p2-67-followup-mcts-tactical-state-impl.md
@ -0,0 +1,187 @@
+---
+id: p2-67-followup-mcts-tactical-state-impl
+title: "TreeState impl for TacticalState — wire real MCTS into the AI decision path"
+priority: p2
+status: open
+scope: game1
+category: simulation
+owner: simulator-infra
+created: 2026-05-13
+updated_at: 2026-05-13
+blocked_by: []
+follow_ups: [p2-67]
+---
+
+## Context
+
+After the night's bug-fix pass (Bugs 1-5 closed, simulation fully playable, last-survivor victory firing), the question "can Claude beat the hardest AI?" hit a deeper architectural finding:
+
+**There is no MCTS-driven AI anywhere in live code today.** Both the production Godot game (`GdMcTreeController`) and the bench harness (`full_game_transcript.rs`) run `mc_ai::run_ai_turn` — the deterministic tactical heuristic pipeline (movement → combat_predict → settle → production → citizen). The p2-69 port (commit `be088c3ad`) deleted the old `mc_turn::snapshot` MCTS path. The real tree MCTS at `mc-ai/src/mcts_tree.rs::Tree<S: TreeState>` exists but is vestigial — used only by `abstract_choose_action.rs` test, not wired into any live game decision.
+
+The 500-turn Claude-with-stronger-weights vs default-weights AIs run ended with Claude in 3rd place because the heuristic is largely weight-insensitive. Personality + seed is the only differentiator across all live AI today.
+
+## Goal
+
+Make a meaningful "Claude with MCTS lookahead vs AI with heuristic" experiment possible by implementing `TreeState for TacticalState` (or a thin wrapper) so the existing tree infrastructure can drive over the bench projector's output.
+
+## Source-of-truth rails
+
+- **Rust crate**: edit `mc-ai`. Possibly a new submodule `mc_ai::tactical::tree_state` for the impl.
+- **JSON path**: none.
+- **GDScript**: none in this objective. The production Godot bridge stays on `run_ai_turn` until p2-69 is revisited.
+
+## Locked decisions
+
+- **Don't change `TacticalState`'s shape**. The bench projector (`mc-player-api::project_tactical`) emits a stable shape; widening it would ripple into the projector + dispatch.
+- **Implement `TreeState` for a wrapper type**, not `TacticalState` directly. Avoid trait impl in a foreign crate; wrap in `mc_ai::tactical::TacticalTreeState { inner: TacticalState, depth: u32, scoring: ScoringWeights }`.
+- **Action type**: use `mc_ai::tactical::Action` (the existing 14-variant enum). Don't invent a parallel set.
+- **Rollout strategy**: random-action-and-score. Use `decide_tactical_actions` to enumerate, sample one, apply, repeat to depth-cap, score via `ScoringWeights`-based heuristic at the end.
+
+## Surface
+
+### 1. New module `mc-ai/src/tactical/tree_state.rs`
+
+```rust
+use crate::mcts_tree::TreeState;
+use crate::tactical::{Action, TacticalState, decide_tactical_actions};
+use mc_core::scoring_weights::ScoringWeights;
+
+pub struct TacticalTreeState {
+    pub inner: TacticalState,
+    pub depth: u32,
+    pub max_depth: u32,
+    pub scoring: ScoringWeights,
+}
+
+impl TreeState for TacticalTreeState {
+    type Action = Action;
+
+    fn legal_actions(&self) -> Vec<Self::Action> {
+        if self.depth >= self.max_depth { return vec![]; }
+        // Walk decide_tactical_actions and enumerate full Vec.
+        // (decide_tactical_actions already returns the heuristic-prioritised
+        // chain; for MCTS, we want the full legal set, but for v1 the
+        // tactical pipeline's output IS a reasonable approximation.)
+        // Future: enumerate exhaustively via per-unit / per-city iteration.
+        let mut rng = XorShift64::new(self.depth as u64);
+        decide_tactical_actions(&self.inner, &self.scoring, &mut rng)
+    }
+
+    fn apply(&self, action: &Self::Action) -> Self {
+        let mut next_inner = self.inner.clone();
+        apply_tactical_action(&mut next_inner, action);  // new helper
+        Self {
+            inner: next_inner,
+            depth: self.depth + 1,
+            max_depth: self.max_depth,
+            scoring: self.scoring.clone(),
+        }
+    }
+
+    fn rollout(&self, rng: &mut XorShift64, horizon: u32, _temperature: f32, root_player: u8) -> f32 {
+        // Walk forward `horizon` steps applying random legal actions; at terminal, score.
+        let mut current = self.clone_inner();
+        for _ in 0..horizon {
+            let actions = decide_tactical_actions(&current, &self.scoring, rng);
+            if actions.is_empty() { break; }
+            let idx = (rng.next_u64() as usize) % actions.len();
+            apply_tactical_action(&mut current, &actions[idx]);
+        }
+        score_for_player(&current, root_player, &self.scoring)  // new helper, [0, 1]
+    }
+}
+```
+
+### 2. New helper `mc-ai::tactical::apply_tactical_action`
+
+Mirror what `apply_ai_action` does in `mc-player-api/src/dispatch.rs`, but operates on `TacticalState` directly (no GameState round-trip). For each `Action` variant, apply the corresponding `TacticalState` mutation:
+- `MoveUnit { unit_id, to }` — find unit, update position.
+- `AttackTarget { attacker_id, target_id }` — apply damage (simplified vs full mc-combat).
+- `FoundCity { settler_id, at_hex }` — consume settler, push city.
+- `EnqueueBuild { city_id, item_id }` — find city, set queue slot.
+- ...all 14 variants.
+
+This is the bulk of the work — ~14 small state mutations.
+
+### 3. New helper `mc-ai::tactical::score_for_player`
+
+Aggregate a [0, 1] reward from `TacticalState` for a given player. Use ScoringWeights to weight:
+- Cities owned (Settle)
+- Units alive (Build/Defend)
+- Enemy units killed since rollout start (Attack)
+- Score axes already in `TacticalState.strategic_axes`
+
+### 4. Bench wiring
+
+In `mc-player-api/tests/full_game_transcript.rs`, replace the existing `pick_claude_action_mcts` policy with:
+
+```rust
+fn pick_claude_action_real_mcts(state: &GameState, player: PlayerId, seed: u64) -> PlayerAction {
+    let tactical = project_tactical(state, player);
+    let wrapper = TacticalTreeState {
+        inner: tactical,
+        depth: 0,
+        max_depth: 5,
+        scoring: ScoringWeights::default(),
+    };
+    let mut tree = Tree::new(wrapper);
+    let mut rng = XorShift64::new(seed);
+    for _ in 0..1000 {  // budget
+        tree.iterate(&mut rng);
+    }
+    // Generic Tree<S>::most_visited_action_at_root doesn't exist; inline it.
+    let best_action = walk_root_children_pick_most_visits(&tree);
+    translate_to_player_action(state, player, best_action)
+}
+```
+
+### 5. Generic `most_visited_action_at_root` for `Tree<S>`
+
+The existing one is specialized on `Tree<GameRolloutState>`. Add a generic version. Should be a 10-line extension at `mcts_tree.rs`.
+
+### 6. New `claude_real_mcts_vs_ai_transcript` test
+
+`#[ignore]`'d, 500-turn run. Output to `.local/demo-runs/2026-05-XX-claude-real-mcts/`. Recap reports winner + Claude's MCTS action-frequency distribution.
+
+## Acceptance
+
+- ☐ `TacticalTreeState` exists with `TreeState` impl.
+- ☐ `apply_tactical_action` covers all 14 `Action` variants.
+- ☐ `score_for_player` returns [0, 1] reward.
+- ☐ Generic `most_visited_action_at_root<S>` added to `Tree<S>`.
+- ☐ `claude_real_mcts_vs_ai_transcript` test runs in < 5 minutes wall clock.
+- ☐ Claude with 1000-rollout budget MCTS shows meaningful action diversity vs the heuristic baseline (action-frequency table differs).
+- ☐ Result reported: did Claude win? Top 3? Bottom?
+- ☐ Determinism: same seed → byte-identical transcript across two runs.
+
+## Why this size
+
+- TacticalTreeState impl: ~1 day
+- apply_tactical_action across 14 variants: ~3 days (each variant is small but there's interaction with TacticalState's internal coherence — e.g. unit movement updates positions; FoundCity updates city array; combat updates HP)
+- score_for_player: ~1 day
+- Generic most_visited_action_at_root: ~2 hours
+- Bench wiring + test: ~1 day
+- Determinism + tuning: ~1 day
+
+**Total: 1-2 weeks** for a working skeleton. Production-quality (exhaustive legal-action enumeration, rich rollout heuristics, balanced scoring) extends to 4-6 weeks.
+
+## Unblocks
+
+Real answer to "can Claude beat the hardest AI?" — by giving Claude actual search-depth advantage that AI slots don't have.
+
+Also: if MCTS-driven Claude consistently wins, that validates the simulation's score function as a reliable reward signal — which is itself a step toward a better production AI later.
+
+## Risks
+
+- TacticalState clone is expensive (Vec<Unit>, Vec<City>, Vec<Tile>). 1000 rollouts × 5-depth = 5000 clones per turn. May be too slow for real-time bench. Mitigation: shallow rollout depth (3) or limited-state-snapshot rollouts.
+- Heuristic enumeration via `decide_tactical_actions` returns a CURATED list, not exhaustive legal actions. MCTS exploring only that list explores less than the true game tree — may not surface MCTS-vs-heuristic differential. Mitigation: write an exhaustive enumerator alongside.
+- ScoringWeights tuning matters more than search depth at low budgets. The 500-turn run already proved weight changes barely affect heuristic output. May be true for MCTS too.
+
+## References
+
+- `src/simulator/crates/mc-ai/src/mcts_tree.rs::TreeState` trait (lines 13-58).
+- `src/simulator/crates/mc-ai/src/tactical/mod.rs::Action` enum (14 variants, lines 55-196).
+- `src/simulator/crates/mc-ai/src/rollout.rs::GameRolloutState` — existing wrapper for the GPU rollout path; reference for the wrapper pattern.
+- `src/simulator/crates/mc-player-api/src/dispatch.rs::apply_ai_action` — reference for `Action` → state mutation mapping.
+- `.project/objectives/p2-67-claude-player-api.md` — context (Real-game analysis section).
+- `.project/objectives/p2-69-api-gdext-mctscontroller-port.md` — where MCTS was removed from production.