From baf2f856288db1c63cfb363f1c3fbb0784f1cd7f Mon Sep 17 00:00:00 2001
From: Natalie <natalie@lilithuwu.com>
Date: Sat, 25 Apr 2026 02:33:15 -0700
Subject: [PATCH] =?UTF-8?q?fix(@projects/@magic-civilization):=20?=
 =?UTF-8?q?=F0=9F=90=9B=20update=20latency=20acceptance=20criteria?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
---
 .project/objectives/p2-05-turn-latency.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/.project/objectives/p2-05-turn-latency.md b/.project/objectives/p2-05-turn-latency.md
index c3aec5b2..7aa1a8cf 100644
--- a/.project/objectives/p2-05-turn-latency.md
+++ b/.project/objectives/p2-05-turn-latency.md
@@ -17,6 +17,10 @@ evidence:
 
 ## Acceptance
 
-- `tools/measure-turn-latency.py` (new) profiles 300 turns and emits `p50 / p90 / p99` latency.
-- p99 ≤ 1.0 s for the 3-AI normal difficulty 512-tile scenario.
-- Flame graph under `.project/reports/latency/` if regressions appear.
+- ✓ `tools/measure-turn-latency.py` exists and profiles N turns from a batch dir, emitting `p50 / p90 / p99` (latencies derived from cumulative `wall_clock_sec` deltas in turn_stats.jsonl). Verified working against `p0-01-chain-20260424_093210/` on apricot (1191 samples).
+- ✗ p99 ≤ 1.0 s on the canonical scenario — **gate calibration mismatch**:
+  - Specified scenario "3-AI normal difficulty 512-tile" doesn't match any current map config; smallest 4-player map is 52×32 = 1664 tiles. The "512-tile" target predates current map sizing.
+  - Measured on the 2-AI duel-map MCTS chain batch (closest analogue to "3-AI 512-tile"): p50=0.57s, p90=2.16s, p99=4.39s, max=6.29s (1191 samples across 6 seeds). p99 is ~4.4× the gate.
+  - Per-turn cost is dominated by MCTS rollout depth (~100-500ms per AI player). 3 AI players at MCTS depth 20 ≈ 600-1500ms baseline regardless of map size; the 1s p99 gate was set before MCTS+PUCT shipped (p0-01/p0-38) and would now require either a shallower default MCTS depth or rewriting the gate to match current AI architecture.
+  - Recommendation: rewrite gate to `p50 ≤ 1.0s` (currently 0.57s — passing) AND keep p99 as a softer "regression watch" metric (current ~4s baseline; alert on >2× regression).
+- ✗ Flame graph under `.project/reports/latency/` — would only matter if we accept the current p99 as the regression baseline first.