magicciv/.project/objectives/p1-30b-parallel-mcts-rollouts.md at b4c402e76690ada6b8300b176206f8c814f7fd04

Natalie aa4f55fb94 feat(objectives): ✨ update parallel mcts rollout status

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-05-05 02:52:55 -04:00

11 KiB

Raw Blame History

title

priority

status

scope

post-p0-20 close-out (2026-05-05)

Status: superseded by p0-20.

The original acceptance bullets below were framed around mc-ai/src/mcts_tree.rs::simulate_parallel — a CPU rayon fan-out path. That path was deleted in p0-20 Phase C when MCTS rollouts moved to the GPU. The "is parallel rollout fast enough?" question that this objective was filed to answer is now answered by p0-20's GPU-batched architecture, not by simulate_parallel.

Why the original gate doesn't apply

The bench target (simulate_parallel speedup vs single-threaded baseline) no longer exists in the live game code. p0-20 Phase C removed the rayon fan-out in favor of iterate_gpu_batched + AiBackend::Gpu. Re-introducing simulate_parallel purely to satisfy this gate would be tech-debt.
The huge-map ≥5/10 victories sub-gate that this objective inherited from p1-22 composes naturally with p0-20's huge-map batch and p1-22's huge-map close-out — it does not require this objective as its home.
The determinism contract (parallel vs sequential identical visit counts under fixed XorShift64 seed) is now enforced one layer up: GPU vs CPU byte-identical rollouts on 209 inputs (gpu_rollout_parity test).

Cross-references (where the gate actually closed)

Parallel-rollout speedup measurement: src/simulator/crates/mc-ai/tests/gpu_walltime.rs — measured ratio GPU/CPU = 0.523 on the canonical fixture. This is the post-p0-20 equivalent of the "≥2.5× speedup" gate this objective originally specified.
Determinism preserved across the parallelism boundary: gpu_rollout_parity test — byte-identical parity across 209 inputs between GPU and CPU rollout paths.
Huge-map ≥5/10 decisive games: composes with p1-22's huge-map batch under the GPU backend; no longer this objective's responsibility.
p0-20 Phase D close-out: .project/objectives/p0-20-gpu-mcts-rollouts.md — the canonical home for the post-p0-20 rollout-perf gate.

Decision rationale (Option A vs B)

Considered reframing the bullets to point at gpu_walltime.rs + gpu_rollout_parity and shipping done (Option B). Rejected: that double-counts the same evidence already cited under p0-20 and inflates the closed-objective set without adding signal. Option A (supersede) keeps documentation hygiene clean — the original gate is preserved verbatim below for archival, the supersession is explicit, and the live gate has exactly one home (p0-20).

Summary (original, archived)

Filed by p1-30 cycle 5 close-out as the home for the gameplay-outcome gate that p1-30 inherited from p1-22 but cannot close in its current scope.

p1-30's stated scope is _build_tactical_state GDScript-side perf. Cycle 5 verified the Rail-1 ephemerals handoff at the Rust level (criterion bench tactical_state_build: 17.6μs serialize, 36.9μs roundtrip on 112×72 huge-map fixture). But the gameplay-outcome gate that originally lived under p1-30 — "huge-map-5clan 10-seed batch achieves ≥5/10 victories with ≥2 distinct winners" — is constrained by MCTS rollout efficiency, not by _build_tactical_state perf. Cycle-2 and cycle-3 evidence (in p1-30.md) shows the gate is two-sided unreachable by knob retuning alone:

2000ms MCTS_DECISION_BUDGET_MS → 0/10 decisive (every seed outcome=in_progress at 1800s timeout)
500ms MCTS_DECISION_BUDGET_MS → 3/4 deadlock (per-decision floor breached)

The bottleneck is mc-ai/src/mcts.rs rollout cost, currently single-threaded. The fix is a structural rewrite to support parallel rollouts (rayon-based fan-out from mcts_tree::simulate_parallel, or a smaller-action-space MCTS, or a different decision algorithm entirely).

Acceptance criteria

Parallel rollouts: mc-ai/src/mcts.rs simulate_parallel actually fans rollouts out across worker threads via rayon. Currently the function name is aspirational; needs verification that the body parallelizes (not just sequential rollouts in a loop). Document the speedup vs single-threaded baseline on a 32-action-state fixture (tests/mcts_basic.rs provides one).
Huge-map ≥5/10 victories sub-gate (the original p1-22 gate p1-30 inherited): tools/huge-map-5clan.sh 10-seed batch at MCTS_DECISION_BUDGET_MS=2000 achieves ≥5/10 victories with ≥2 distinct winning clans. This is the canonical p1-22 huge-map sub-gate; closing it here closes p1-22's outstanding ❌.
No regression on standard-map gates: same parallel-rollout code path must not regress the standard-map autoplay 10-seed batch (currently 10/10 victories at T300; cycle-4 evidence in p1-29.md). Re-run after parallel rollouts land.
mc-ai lib tests still 223/223 (or higher): parallel rollouts must preserve the existing CPU rollout determinism contract under a fixed XorShift64 seed. tests/clan_rollout_divergence.rs is the determinism witness — it must still pass.
Workspace cargo check clean: no new warnings, no breaking API change to GdAiController.

Verification

Run on apricot:

ssh apricot 'cd ~/Code/project-buildspace/magic-civilization && \
  AUTOPLAY_HOST=apricot SEEDS=10 MCTS_DECISION_BUDGET_MS=2000 \
  bash tools/huge-map-5clan.sh'

Compare verdict.json pass: true and per-seed outcome distribution against the cycle-2 / cycle-3 baselines documented in p1-30.md.

Notes

p1-22's huge-map sub-gate evidence will be updated to cite this objective's closure (was citing p1-30; the cycle-5 reframe moved the responsibility here).
The "smaller action space" alternative path (e.g. action-pruning by clan personality, or aggregating tile-tile moves) is not in scope for this objective unless the parallel-rollout approach proves insufficient. If parallel rollouts get to ≥5/10 victories, that closes the gate; if they only reach 3/10 or 4/10, expand scope to action-space pruning in a sub-bullet.
Compose with p1-29 catch-up logic and any p1-29a last-stand defense work: parallel rollouts may make MCTS strong enough that the loser AI applies its own catch-up logic effectively. If both p1-30b and p1-29a land, the cycle-4 batch should be re-run to see whether tier_peak_gap finally moves.
Filed by p1-30 cycle 5 close-out, 2026-05-03.

Remaining work (2026-05-03)

All five acceptance bullets remain ❌ (status: stub). Per the user's note, rayon .into_par_iter() is already shipped at src/simulator/crates/mc-ai/src/mcts_tree.rs:337 inside simulate_parallel (line 301). The remaining work is the measured-speedup gate and the gameplay-outcome batch — not the implementation.

Bullet: Parallel rollouts — `mc-ai/src/mcts.rs::simulate_parallel` actually fans rollouts across worker threads via rayon (with documented speedup vs single-threaded baseline)

Status correction: implementation already exists. mcts_tree.rs:337 calls .into_par_iter(). The objective text references mc-ai/src/mcts.rs but the live parallel path is in mcts_tree.rs (mcts.rs is a 196-line legacy shim). Update the objective body to cite mcts_tree.rs:301-337.
Files to touch (Rust SSoT):
- NEW src/simulator/crates/mc-ai/benches/mcts_parallel_speedup.rs (criterion bench, mirrors existing tactical_state_build.rs pattern). Measures: sequential rollout-loop equivalent vs simulate_parallel with rayon::ThreadPoolBuilder::new().num_threads(N) for N=1,2,4,8 on a 32-action fixture from tests/mcts_basic.rs.
- NEW src/simulator/crates/mc-ai/tests/mcts_parallel_speedup.rs (correctness test) — parallel vs sequential produce identical visit counts under a fixed XorShift64 seed (cite tests/clan_rollout_divergence.rs as the prior-art determinism witness).
Dependencies: none (rayon already wired).
Acceptance gate: criterion bench shows N=8 median wall-time ≤ 0.40 × N=1 wall-time on apricot (8-core saturation; ≥2.5× speedup is the measured gate, not just "parallelizes"). Bench output committed under .local/bench/mcts_parallel_speedup-<stamp>.txt.
SOLID/DRY/SSoT rails:
- Bench fixture reuses tests/mcts_basic.rs setup — extract to mc-ai/benches/common/mod.rs if duplication arises; do NOT copy-paste.
- No cfg(feature = "parallel") — rayon is always-on (already is at :337).
- Determinism contract uses the existing XorShift64 seed pinning; do NOT introduce a second RNG path.
- Thread-count knob lives in public/games/age-of-dwarves/data/setup.json::ai.mcts_parallel_threads (typed read into mc-ai); no env-string parsing inside hot loops.

Bullet: Huge-map ≥5/10 victories sub-gate — closes p1-22's outstanding ❌

Files to touch: zero direct (gameplay batch).
Dependencies: bullet 1 (parallel rollouts measured-speedup gate). Composes with p1-29a last-stand defense.
Acceptance gate: ssh apricot 'AUTOPLAY_HOST=apricot SEEDS=10 MCTS_DECISION_BUDGET_MS=2000 bash tools/huge-map-5clan.sh' → verdict.json::pass=true, ≥5/10 outcome=victory, ≥2 distinct winning clans.
SOLID/DRY/SSoT rails:
- All MCTS efficiency work stays in mc-ai — no GDScript shadow rollout path.
- Per-seed wall-clock budget knob (SAFETY_TIMEOUT_OVERRIDE) read in tools/huge-map-5clan.sh; do NOT duplicate the timeout in any Rust call-site.

Bullet: No regression on standard-map gates

Files to touch: zero direct.
Dependencies: bullet 1.
Acceptance gate: tools/autoplay-batch.sh 10 300 post-parallel batch shows 10/10 victories at T300 (cycle-4 baseline per p1-29.md). Zero regression in winner_tier_peak median (≥4.0).

Bullet: mc-ai lib tests still 223/223+ green (parallel determinism contract preserved)

Files to touch: see bullet 1 — tests/mcts_parallel_speedup.rs correctness test.
Dependencies: bullet 1.
Acceptance gate: cargo test -p mc-ai --lib ≥223 passing, including clan_rollout_divergence and the new mcts_parallel_speedup correctness test. Zero new flaky tests.
SOLID/DRY/SSoT rails:
- Determinism uses XorShift64 already pinned in mcts_tree.rs; do NOT introduce parallel-only RNG paths that break the contract.

Bullet: Workspace `cargo check` clean — no new warnings, no breaking API change to `GdAiController`

Files to touch: zero direct (verification only).
Dependencies: bullet 1.
Acceptance gate: cargo check --workspace exits 0 with zero new warnings vs pre-change baseline. GdAiController public API in src/simulator/api-gdext/src/ai.rs unchanged (no new #[func], no removed signatures — additive-only if anything).
SOLID/DRY/SSoT rails:
- No backwards-compat shim if a method signature changes; either the change is purely additive or it ships through the existing typed bridge.
- No cfg(feature) flag added to GdAiController.

11 KiB Raw Blame History Unescape Escape