fix(@projects/@magic-civilization): 🐛 update latency acceptance criteria

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-25 02:33:15 -07:00 · 2026-04-25 02:33:15 -07:00 · baf2f85628
commit baf2f85628
parent 1756b678f3
1 changed files with 7 additions and 3 deletions
--- a/.project/objectives/p2-05-turn-latency.md
+++ b/.project/objectives/p2-05-turn-latency.md
@ -17,6 +17,10 @@ evidence:

 ## Acceptance

- `tools/measure-turn-latency.py` (new) profiles 300 turns and emits `p50 / p90 / p99` latency.
- p99 ≤ 1.0 s for the 3-AI normal difficulty 512-tile scenario.
- Flame graph under `.project/reports/latency/` if regressions appear.
+- ✓ `tools/measure-turn-latency.py` exists and profiles N turns from a batch dir, emitting `p50 / p90 / p99` (latencies derived from cumulative `wall_clock_sec` deltas in turn_stats.jsonl). Verified working against `p0-01-chain-20260424_093210/` on apricot (1191 samples).
+- ✗ p99 ≤ 1.0 s on the canonical scenario — **gate calibration mismatch**:
+  - Specified scenario "3-AI normal difficulty 512-tile" doesn't match any current map config; smallest 4-player map is 52×32 = 1664 tiles. The "512-tile" target predates current map sizing.
+  - Measured on the 2-AI duel-map MCTS chain batch (closest analogue to "3-AI 512-tile"): p50=0.57s, p90=2.16s, p99=4.39s, max=6.29s (1191 samples across 6 seeds). p99 is ~4.4× the gate.
+  - Per-turn cost is dominated by MCTS rollout depth (~100-500ms per AI player). 3 AI players at MCTS depth 20 ≈ 600-1500ms baseline regardless of map size; the 1s p99 gate was set before MCTS+PUCT shipped (p0-01/p0-38) and would now require either a shallower default MCTS depth or rewriting the gate to match current AI architecture.
+  - Recommendation: rewrite gate to `p50 ≤ 1.0s` (currently 0.57s — passing) AND keep p99 as a softer "regression watch" metric (current ~4s baseline; alert on >2× regression).
+- ✗ Flame graph under `.project/reports/latency/` — would only matter if we accept the current p99 as the regression baseline first.