feat(@projects/@magic-civilization): implement mcts tree state wrapper for tactical state

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Natalie 2026-05-13 11:29:38 -07:00
parent 8e0ad24aca
commit f1ef762aa8

View file

@ -0,0 +1,187 @@
---
id: p2-67-followup-mcts-tactical-state-impl
title: "TreeState impl for TacticalState — wire real MCTS into the AI decision path"
priority: p2
status: open
scope: game1
category: simulation
owner: simulator-infra
created: 2026-05-13
updated_at: 2026-05-13
blocked_by: []
follow_ups: [p2-67]
---
## Context
After the night's bug-fix pass (Bugs 1-5 closed, simulation fully playable, last-survivor victory firing), the question "can Claude beat the hardest AI?" hit a deeper architectural finding:
**There is no MCTS-driven AI anywhere in live code today.** Both the production Godot game (`GdMcTreeController`) and the bench harness (`full_game_transcript.rs`) run `mc_ai::run_ai_turn` — the deterministic tactical heuristic pipeline (movement → combat_predict → settle → production → citizen). The p2-69 port (commit `be088c3ad`) deleted the old `mc_turn::snapshot` MCTS path. The real tree MCTS at `mc-ai/src/mcts_tree.rs::Tree<S: TreeState>` exists but is vestigial — used only by `abstract_choose_action.rs` test, not wired into any live game decision.
The 500-turn Claude-with-stronger-weights vs default-weights AIs run ended with Claude in 3rd place because the heuristic is largely weight-insensitive. Personality + seed is the only differentiator across all live AI today.
## Goal
Make a meaningful "Claude with MCTS lookahead vs AI with heuristic" experiment possible by implementing `TreeState for TacticalState` (or a thin wrapper) so the existing tree infrastructure can drive over the bench projector's output.
## Source-of-truth rails
- **Rust crate**: edit `mc-ai`. Possibly a new submodule `mc_ai::tactical::tree_state` for the impl.
- **JSON path**: none.
- **GDScript**: none in this objective. The production Godot bridge stays on `run_ai_turn` until p2-69 is revisited.
## Locked decisions
- **Don't change `TacticalState`'s shape**. The bench projector (`mc-player-api::project_tactical`) emits a stable shape; widening it would ripple into the projector + dispatch.
- **Implement `TreeState` for a wrapper type**, not `TacticalState` directly. Avoid trait impl in a foreign crate; wrap in `mc_ai::tactical::TacticalTreeState { inner: TacticalState, depth: u32, scoring: ScoringWeights }`.
- **Action type**: use `mc_ai::tactical::Action` (the existing 14-variant enum). Don't invent a parallel set.
- **Rollout strategy**: random-action-and-score. Use `decide_tactical_actions` to enumerate, sample one, apply, repeat to depth-cap, score via `ScoringWeights`-based heuristic at the end.
## Surface
### 1. New module `mc-ai/src/tactical/tree_state.rs`
```rust
use crate::mcts_tree::TreeState;
use crate::tactical::{Action, TacticalState, decide_tactical_actions};
use mc_core::scoring_weights::ScoringWeights;
pub struct TacticalTreeState {
pub inner: TacticalState,
pub depth: u32,
pub max_depth: u32,
pub scoring: ScoringWeights,
}
impl TreeState for TacticalTreeState {
type Action = Action;
fn legal_actions(&self) -> Vec<Self::Action> {
if self.depth >= self.max_depth { return vec![]; }
// Walk decide_tactical_actions and enumerate full Vec.
// (decide_tactical_actions already returns the heuristic-prioritised
// chain; for MCTS, we want the full legal set, but for v1 the
// tactical pipeline's output IS a reasonable approximation.)
// Future: enumerate exhaustively via per-unit / per-city iteration.
let mut rng = XorShift64::new(self.depth as u64);
decide_tactical_actions(&self.inner, &self.scoring, &mut rng)
}
fn apply(&self, action: &Self::Action) -> Self {
let mut next_inner = self.inner.clone();
apply_tactical_action(&mut next_inner, action); // new helper
Self {
inner: next_inner,
depth: self.depth + 1,
max_depth: self.max_depth,
scoring: self.scoring.clone(),
}
}
fn rollout(&self, rng: &mut XorShift64, horizon: u32, _temperature: f32, root_player: u8) -> f32 {
// Walk forward `horizon` steps applying random legal actions; at terminal, score.
let mut current = self.clone_inner();
for _ in 0..horizon {
let actions = decide_tactical_actions(&current, &self.scoring, rng);
if actions.is_empty() { break; }
let idx = (rng.next_u64() as usize) % actions.len();
apply_tactical_action(&mut current, &actions[idx]);
}
score_for_player(&current, root_player, &self.scoring) // new helper, [0, 1]
}
}
```
### 2. New helper `mc-ai::tactical::apply_tactical_action`
Mirror what `apply_ai_action` does in `mc-player-api/src/dispatch.rs`, but operates on `TacticalState` directly (no GameState round-trip). For each `Action` variant, apply the corresponding `TacticalState` mutation:
- `MoveUnit { unit_id, to }` — find unit, update position.
- `AttackTarget { attacker_id, target_id }` — apply damage (simplified vs full mc-combat).
- `FoundCity { settler_id, at_hex }` — consume settler, push city.
- `EnqueueBuild { city_id, item_id }` — find city, set queue slot.
- ...all 14 variants.
This is the bulk of the work — ~14 small state mutations.
### 3. New helper `mc-ai::tactical::score_for_player`
Aggregate a [0, 1] reward from `TacticalState` for a given player. Use ScoringWeights to weight:
- Cities owned (Settle)
- Units alive (Build/Defend)
- Enemy units killed since rollout start (Attack)
- Score axes already in `TacticalState.strategic_axes`
### 4. Bench wiring
In `mc-player-api/tests/full_game_transcript.rs`, replace the existing `pick_claude_action_mcts` policy with:
```rust
fn pick_claude_action_real_mcts(state: &GameState, player: PlayerId, seed: u64) -> PlayerAction {
let tactical = project_tactical(state, player);
let wrapper = TacticalTreeState {
inner: tactical,
depth: 0,
max_depth: 5,
scoring: ScoringWeights::default(),
};
let mut tree = Tree::new(wrapper);
let mut rng = XorShift64::new(seed);
for _ in 0..1000 { // budget
tree.iterate(&mut rng);
}
// Generic Tree<S>::most_visited_action_at_root doesn't exist; inline it.
let best_action = walk_root_children_pick_most_visits(&tree);
translate_to_player_action(state, player, best_action)
}
```
### 5. Generic `most_visited_action_at_root` for `Tree<S>`
The existing one is specialized on `Tree<GameRolloutState>`. Add a generic version. Should be a 10-line extension at `mcts_tree.rs`.
### 6. New `claude_real_mcts_vs_ai_transcript` test
`#[ignore]`'d, 500-turn run. Output to `.local/demo-runs/2026-05-XX-claude-real-mcts/`. Recap reports winner + Claude's MCTS action-frequency distribution.
## Acceptance
- ☐ `TacticalTreeState` exists with `TreeState` impl.
- ☐ `apply_tactical_action` covers all 14 `Action` variants.
- ☐ `score_for_player` returns [0, 1] reward.
- ☐ Generic `most_visited_action_at_root<S>` added to `Tree<S>`.
- ☐ `claude_real_mcts_vs_ai_transcript` test runs in < 5 minutes wall clock.
- ☐ Claude with 1000-rollout budget MCTS shows meaningful action diversity vs the heuristic baseline (action-frequency table differs).
- ☐ Result reported: did Claude win? Top 3? Bottom?
- ☐ Determinism: same seed → byte-identical transcript across two runs.
## Why this size
- TacticalTreeState impl: ~1 day
- apply_tactical_action across 14 variants: ~3 days (each variant is small but there's interaction with TacticalState's internal coherence — e.g. unit movement updates positions; FoundCity updates city array; combat updates HP)
- score_for_player: ~1 day
- Generic most_visited_action_at_root: ~2 hours
- Bench wiring + test: ~1 day
- Determinism + tuning: ~1 day
**Total: 1-2 weeks** for a working skeleton. Production-quality (exhaustive legal-action enumeration, rich rollout heuristics, balanced scoring) extends to 4-6 weeks.
## Unblocks
Real answer to "can Claude beat the hardest AI?" — by giving Claude actual search-depth advantage that AI slots don't have.
Also: if MCTS-driven Claude consistently wins, that validates the simulation's score function as a reliable reward signal — which is itself a step toward a better production AI later.
## Risks
- TacticalState clone is expensive (Vec<Unit>, Vec<City>, Vec<Tile>). 1000 rollouts × 5-depth = 5000 clones per turn. May be too slow for real-time bench. Mitigation: shallow rollout depth (3) or limited-state-snapshot rollouts.
- Heuristic enumeration via `decide_tactical_actions` returns a CURATED list, not exhaustive legal actions. MCTS exploring only that list explores less than the true game tree — may not surface MCTS-vs-heuristic differential. Mitigation: write an exhaustive enumerator alongside.
- ScoringWeights tuning matters more than search depth at low budgets. The 500-turn run already proved weight changes barely affect heuristic output. May be true for MCTS too.
## References
- `src/simulator/crates/mc-ai/src/mcts_tree.rs::TreeState` trait (lines 13-58).
- `src/simulator/crates/mc-ai/src/tactical/mod.rs::Action` enum (14 variants, lines 55-196).
- `src/simulator/crates/mc-ai/src/rollout.rs::GameRolloutState` — existing wrapper for the GPU rollout path; reference for the wrapper pattern.
- `src/simulator/crates/mc-player-api/src/dispatch.rs::apply_ai_action` — reference for `Action` → state mutation mapping.
- `.project/objectives/p2-67-claude-player-api.md` — context (Real-game analysis section).
- `.project/objectives/p2-69-api-gdext-mctscontroller-port.md` — where MCTS was removed from production.