feat(@projects/@magic-civilization): ✨ implement mcts tree state wrapper for tactical state
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
8e0ad24aca
commit
f1ef762aa8
1 changed files with 187 additions and 0 deletions
187
.project/objectives/p2-67-followup-mcts-tactical-state-impl.md
Normal file
187
.project/objectives/p2-67-followup-mcts-tactical-state-impl.md
Normal file
|
|
@ -0,0 +1,187 @@
|
|||
---
|
||||
id: p2-67-followup-mcts-tactical-state-impl
|
||||
title: "TreeState impl for TacticalState — wire real MCTS into the AI decision path"
|
||||
priority: p2
|
||||
status: open
|
||||
scope: game1
|
||||
category: simulation
|
||||
owner: simulator-infra
|
||||
created: 2026-05-13
|
||||
updated_at: 2026-05-13
|
||||
blocked_by: []
|
||||
follow_ups: [p2-67]
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
After the night's bug-fix pass (Bugs 1-5 closed, simulation fully playable, last-survivor victory firing), the question "can Claude beat the hardest AI?" hit a deeper architectural finding:
|
||||
|
||||
**There is no MCTS-driven AI anywhere in live code today.** Both the production Godot game (`GdMcTreeController`) and the bench harness (`full_game_transcript.rs`) run `mc_ai::run_ai_turn` — the deterministic tactical heuristic pipeline (movement → combat_predict → settle → production → citizen). The p2-69 port (commit `be088c3ad`) deleted the old `mc_turn::snapshot` MCTS path. The real tree MCTS at `mc-ai/src/mcts_tree.rs::Tree<S: TreeState>` exists but is vestigial — used only by `abstract_choose_action.rs` test, not wired into any live game decision.
|
||||
|
||||
The 500-turn Claude-with-stronger-weights vs default-weights AIs run ended with Claude in 3rd place because the heuristic is largely weight-insensitive. Personality + seed is the only differentiator across all live AI today.
|
||||
|
||||
## Goal
|
||||
|
||||
Make a meaningful "Claude with MCTS lookahead vs AI with heuristic" experiment possible by implementing `TreeState for TacticalState` (or a thin wrapper) so the existing tree infrastructure can drive over the bench projector's output.
|
||||
|
||||
## Source-of-truth rails
|
||||
|
||||
- **Rust crate**: edit `mc-ai`. Possibly a new submodule `mc_ai::tactical::tree_state` for the impl.
|
||||
- **JSON path**: none.
|
||||
- **GDScript**: none in this objective. The production Godot bridge stays on `run_ai_turn` until p2-69 is revisited.
|
||||
|
||||
## Locked decisions
|
||||
|
||||
- **Don't change `TacticalState`'s shape**. The bench projector (`mc-player-api::project_tactical`) emits a stable shape; widening it would ripple into the projector + dispatch.
|
||||
- **Implement `TreeState` for a wrapper type**, not `TacticalState` directly. Avoid trait impl in a foreign crate; wrap in `mc_ai::tactical::TacticalTreeState { inner: TacticalState, depth: u32, scoring: ScoringWeights }`.
|
||||
- **Action type**: use `mc_ai::tactical::Action` (the existing 14-variant enum). Don't invent a parallel set.
|
||||
- **Rollout strategy**: random-action-and-score. Use `decide_tactical_actions` to enumerate, sample one, apply, repeat to depth-cap, score via `ScoringWeights`-based heuristic at the end.
|
||||
|
||||
## Surface
|
||||
|
||||
### 1. New module `mc-ai/src/tactical/tree_state.rs`
|
||||
|
||||
```rust
|
||||
use crate::mcts_tree::TreeState;
|
||||
use crate::tactical::{Action, TacticalState, decide_tactical_actions};
|
||||
use mc_core::scoring_weights::ScoringWeights;
|
||||
|
||||
pub struct TacticalTreeState {
|
||||
pub inner: TacticalState,
|
||||
pub depth: u32,
|
||||
pub max_depth: u32,
|
||||
pub scoring: ScoringWeights,
|
||||
}
|
||||
|
||||
impl TreeState for TacticalTreeState {
|
||||
type Action = Action;
|
||||
|
||||
fn legal_actions(&self) -> Vec<Self::Action> {
|
||||
if self.depth >= self.max_depth { return vec![]; }
|
||||
// Walk decide_tactical_actions and enumerate full Vec.
|
||||
// (decide_tactical_actions already returns the heuristic-prioritised
|
||||
// chain; for MCTS, we want the full legal set, but for v1 the
|
||||
// tactical pipeline's output IS a reasonable approximation.)
|
||||
// Future: enumerate exhaustively via per-unit / per-city iteration.
|
||||
let mut rng = XorShift64::new(self.depth as u64);
|
||||
decide_tactical_actions(&self.inner, &self.scoring, &mut rng)
|
||||
}
|
||||
|
||||
fn apply(&self, action: &Self::Action) -> Self {
|
||||
let mut next_inner = self.inner.clone();
|
||||
apply_tactical_action(&mut next_inner, action); // new helper
|
||||
Self {
|
||||
inner: next_inner,
|
||||
depth: self.depth + 1,
|
||||
max_depth: self.max_depth,
|
||||
scoring: self.scoring.clone(),
|
||||
}
|
||||
}
|
||||
|
||||
fn rollout(&self, rng: &mut XorShift64, horizon: u32, _temperature: f32, root_player: u8) -> f32 {
|
||||
// Walk forward `horizon` steps applying random legal actions; at terminal, score.
|
||||
let mut current = self.clone_inner();
|
||||
for _ in 0..horizon {
|
||||
let actions = decide_tactical_actions(¤t, &self.scoring, rng);
|
||||
if actions.is_empty() { break; }
|
||||
let idx = (rng.next_u64() as usize) % actions.len();
|
||||
apply_tactical_action(&mut current, &actions[idx]);
|
||||
}
|
||||
score_for_player(¤t, root_player, &self.scoring) // new helper, [0, 1]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. New helper `mc-ai::tactical::apply_tactical_action`
|
||||
|
||||
Mirror what `apply_ai_action` does in `mc-player-api/src/dispatch.rs`, but operates on `TacticalState` directly (no GameState round-trip). For each `Action` variant, apply the corresponding `TacticalState` mutation:
|
||||
- `MoveUnit { unit_id, to }` — find unit, update position.
|
||||
- `AttackTarget { attacker_id, target_id }` — apply damage (simplified vs full mc-combat).
|
||||
- `FoundCity { settler_id, at_hex }` — consume settler, push city.
|
||||
- `EnqueueBuild { city_id, item_id }` — find city, set queue slot.
|
||||
- ...all 14 variants.
|
||||
|
||||
This is the bulk of the work — ~14 small state mutations.
|
||||
|
||||
### 3. New helper `mc-ai::tactical::score_for_player`
|
||||
|
||||
Aggregate a [0, 1] reward from `TacticalState` for a given player. Use ScoringWeights to weight:
|
||||
- Cities owned (Settle)
|
||||
- Units alive (Build/Defend)
|
||||
- Enemy units killed since rollout start (Attack)
|
||||
- Score axes already in `TacticalState.strategic_axes`
|
||||
|
||||
### 4. Bench wiring
|
||||
|
||||
In `mc-player-api/tests/full_game_transcript.rs`, replace the existing `pick_claude_action_mcts` policy with:
|
||||
|
||||
```rust
|
||||
fn pick_claude_action_real_mcts(state: &GameState, player: PlayerId, seed: u64) -> PlayerAction {
|
||||
let tactical = project_tactical(state, player);
|
||||
let wrapper = TacticalTreeState {
|
||||
inner: tactical,
|
||||
depth: 0,
|
||||
max_depth: 5,
|
||||
scoring: ScoringWeights::default(),
|
||||
};
|
||||
let mut tree = Tree::new(wrapper);
|
||||
let mut rng = XorShift64::new(seed);
|
||||
for _ in 0..1000 { // budget
|
||||
tree.iterate(&mut rng);
|
||||
}
|
||||
// Generic Tree<S>::most_visited_action_at_root doesn't exist; inline it.
|
||||
let best_action = walk_root_children_pick_most_visits(&tree);
|
||||
translate_to_player_action(state, player, best_action)
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Generic `most_visited_action_at_root` for `Tree<S>`
|
||||
|
||||
The existing one is specialized on `Tree<GameRolloutState>`. Add a generic version. Should be a 10-line extension at `mcts_tree.rs`.
|
||||
|
||||
### 6. New `claude_real_mcts_vs_ai_transcript` test
|
||||
|
||||
`#[ignore]`'d, 500-turn run. Output to `.local/demo-runs/2026-05-XX-claude-real-mcts/`. Recap reports winner + Claude's MCTS action-frequency distribution.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- ☐ `TacticalTreeState` exists with `TreeState` impl.
|
||||
- ☐ `apply_tactical_action` covers all 14 `Action` variants.
|
||||
- ☐ `score_for_player` returns [0, 1] reward.
|
||||
- ☐ Generic `most_visited_action_at_root<S>` added to `Tree<S>`.
|
||||
- ☐ `claude_real_mcts_vs_ai_transcript` test runs in < 5 minutes wall clock.
|
||||
- ☐ Claude with 1000-rollout budget MCTS shows meaningful action diversity vs the heuristic baseline (action-frequency table differs).
|
||||
- ☐ Result reported: did Claude win? Top 3? Bottom?
|
||||
- ☐ Determinism: same seed → byte-identical transcript across two runs.
|
||||
|
||||
## Why this size
|
||||
|
||||
- TacticalTreeState impl: ~1 day
|
||||
- apply_tactical_action across 14 variants: ~3 days (each variant is small but there's interaction with TacticalState's internal coherence — e.g. unit movement updates positions; FoundCity updates city array; combat updates HP)
|
||||
- score_for_player: ~1 day
|
||||
- Generic most_visited_action_at_root: ~2 hours
|
||||
- Bench wiring + test: ~1 day
|
||||
- Determinism + tuning: ~1 day
|
||||
|
||||
**Total: 1-2 weeks** for a working skeleton. Production-quality (exhaustive legal-action enumeration, rich rollout heuristics, balanced scoring) extends to 4-6 weeks.
|
||||
|
||||
## Unblocks
|
||||
|
||||
Real answer to "can Claude beat the hardest AI?" — by giving Claude actual search-depth advantage that AI slots don't have.
|
||||
|
||||
Also: if MCTS-driven Claude consistently wins, that validates the simulation's score function as a reliable reward signal — which is itself a step toward a better production AI later.
|
||||
|
||||
## Risks
|
||||
|
||||
- TacticalState clone is expensive (Vec<Unit>, Vec<City>, Vec<Tile>). 1000 rollouts × 5-depth = 5000 clones per turn. May be too slow for real-time bench. Mitigation: shallow rollout depth (3) or limited-state-snapshot rollouts.
|
||||
- Heuristic enumeration via `decide_tactical_actions` returns a CURATED list, not exhaustive legal actions. MCTS exploring only that list explores less than the true game tree — may not surface MCTS-vs-heuristic differential. Mitigation: write an exhaustive enumerator alongside.
|
||||
- ScoringWeights tuning matters more than search depth at low budgets. The 500-turn run already proved weight changes barely affect heuristic output. May be true for MCTS too.
|
||||
|
||||
## References
|
||||
|
||||
- `src/simulator/crates/mc-ai/src/mcts_tree.rs::TreeState` trait (lines 13-58).
|
||||
- `src/simulator/crates/mc-ai/src/tactical/mod.rs::Action` enum (14 variants, lines 55-196).
|
||||
- `src/simulator/crates/mc-ai/src/rollout.rs::GameRolloutState` — existing wrapper for the GPU rollout path; reference for the wrapper pattern.
|
||||
- `src/simulator/crates/mc-player-api/src/dispatch.rs::apply_ai_action` — reference for `Action` → state mutation mapping.
|
||||
- `.project/objectives/p2-67-claude-player-api.md` — context (Real-game analysis section).
|
||||
- `.project/objectives/p2-69-api-gdext-mctscontroller-port.md` — where MCTS was removed from production.
|
||||
Loading…
Add table
Reference in a new issue