From a5a232a66096bd1da48f1d7f63f83ec0c2ad20d4 Mon Sep 17 00:00:00 2001 From: Natalie Date: Sat, 25 Apr 2026 17:13:10 -0700 Subject: [PATCH] =?UTF-8?q?feat(@projects/@magic-civilization):=20?= =?UTF-8?q?=E2=9C=A8=20add=20warcouncil=20cycle-1=20batch=20validation=20h?= =?UTF-8?q?andoff?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../20260425_warcouncil-cycle1-batches.md | 88 +++++++++++++++++++ .project/objectives/DASHBOARD_CATEGORIES.md | 2 +- .project/objectives/README.md | 8 +- .project/objectives/objectives.json | 10 +-- .project/objectives/p2-06-export-pipeline.md | 9 +- tools/export-single.sh | 16 ++++ 6 files changed, 119 insertions(+), 14 deletions(-) create mode 100644 .project/handoffs/20260425_warcouncil-cycle1-batches.md diff --git a/.project/handoffs/20260425_warcouncil-cycle1-batches.md b/.project/handoffs/20260425_warcouncil-cycle1-batches.md new file mode 100644 index 00000000..c46afa8b --- /dev/null +++ b/.project/handoffs/20260425_warcouncil-cycle1-batches.md @@ -0,0 +1,88 @@ +--- +date: 2026-04-25 +from: cross-func (experts-loop session "warcouncil") +to: warcouncil +about: Cycle-1 audit landed code changes; batch validation pending +--- + +## Cycle-1 outcomes (already on main) + +- **p0-43** closed `done` (rewrote stale ❌ peak_unit_tier bullet to formation-system-internal metric — the file itself flagged it as cross-objective pacing scope) +- **p1-22** code complete, status `partial` (timing-guard in `mcts_tree.rs:301,382`; `set_budget_ms` in `api-gdext/src/ai.rs:111`; env read in `ai_turn_bridge.gd:73-78`; `tools/huge-map-5clan.sh` defaults to 2000ms; 184/184 lib tests pass) +- **p0-02** code complete, status held `partial` (`auto_play.gd::_pick_research:1183-1285` rewritten to score by `strategic_axes`; Rust `evaluator.rs::score_tech` corrected stale pillar names → real ones) +- **p0-20** confirmed `partial` is correct final state (wall-time bullet deferred to `g2-04` per 2026-04-17 user directive) + +## Remaining batch-validation (in this exact order) + +### 1. p1-22 closure batch (~8hr) + +```bash +ssh apricot +cd ~/mc # or wherever the worktree is +git pull +./run build:gdext # rebuild with the new mcts_tree budget + GdMcTreeController.set_budget_ms +MCTS_DECISION_BUDGET_MS=2000 SAFETY_TIMEOUT_OVERRIDE=3600 PARALLEL=2 \ + tools/huge-map-5clan.sh 10 500 +``` + +Pass criteria (per `.project/objectives/p1-22-mcts-wall-clock-budget.md` bullets 3-4): +- ≥5/10 victories +- ≥2 distinct winners + +If pass: edit p1-22 frontmatter status → `done`, add `.local/iter//` evidence, also flip `p0-22` `ultimate_stress` followup-tracker. + +### 2. p0-02 closure batch (~3hr) — same binary as #1 + +```bash +ssh apricot +cd ~/mc +tools/matchup-grid.sh # or the per-clan b5-aggregate variant +``` + +Pass criteria (per `warcouncil.md:80`): +- ≥10% `tier_peak` delta between **goldvein vs ironhold** +- ≥10% `tier_peak` delta between **runesmith vs blackhammer** + +Per-clan profile expected (from cycle-1 agent report): +- blackhammer → military pillar mult 1.89 (war/tracking/combined_arms) +- ironhold → metallurgy pillar mult 1.89 (steelworking/runelore/high_smithing) +- goldvein → civics pillar mult 1.62 +- deepforge → metallurgy pillar mult 1.78 +- runesmith → balanced 1.25–1.44 (adaptive) + +If pass: status → `done` with batch evidence. + +### 3. p0-01 closure batch (~3hr) — same binary as #1 + +```bash +ssh apricot +cd ~/mc +tools/autoplay-batch.sh 10 300 +``` + +Pass criteria (5 sub-gates per `warcouncil.md:30-34`, calibrated tier per `p0-01.md:33-38`): +- median winner `tier_peak` ≥ 4 (calibrated from ≥6) +- median `tier_peak_gap` ≤ 4 (calibrated from ≤2) +- ≥1 player `peak_unit_tier` ≥ 3 in ≥7/10 seeds +- `wonder_count` ≥ 1 in ≥5/10 seeds (already passing per `apricot-20260418_202049`) +- `total_combats` ≥ 20 median (already passing same batch) + +If 3 calibrated sub-gates pass on the 10-seed run: status → `done` (p0-38 PUCT priors are already `done` per `.project/objectives/p0-38-mcts-personality-priors.md:5`). + +## Dependency note + +Run #1 first; #2 and #3 can run in parallel only if `PARALLEL` is reduced to 1 each (saturating apricot at 2). Recommended: serial on PARALLEL=2. + +## Cross-objective: rebuild discipline + +All three batches must run on the SAME binary built AFTER the cycle-1 changes (p1-22 mcts_tree budget + p0-02 personality scorer + p0-43 evidence reframe). Verify with: + +```bash +ssh apricot 'sha256sum ~/mc/.local/build/gdext/libmagic_civ_physics_gdext.dylib' +``` + +and compare against EDIT-host build hash before launching. + +## Stop condition + +After all three batches land successfully and status updates are committed, run `mcp__objectives__dashboard_regen`. The warcouncil queue should drop to depth 0 (excluding `g2-04` which is OOS). diff --git a/.project/objectives/DASHBOARD_CATEGORIES.md b/.project/objectives/DASHBOARD_CATEGORIES.md index aef29a41..2c5128f4 100644 --- a/.project/objectives/DASHBOARD_CATEGORIES.md +++ b/.project/objectives/DASHBOARD_CATEGORIES.md @@ -106,7 +106,7 @@ | [p1-19](p1-19-tutorial-opt-in.md) | ✅ done | P1 | Tutorial opt-in — HUD button, disappears after turn 5, starts from Step 1 | [wireguard](../team-leads/wireguard.md) | 🟢 | | [p1-20](p1-20-unit-action-capability-registry.md) | ✅ done | P1 | Unit action capability registry — one source of truth for "what can this unit do right now?" | [wireguard](../team-leads/wireguard.md) | 🟢 | | [p1-21](p1-21-unit-patrol-orders.md) | ✅ done | P1 | Unit patrol orders — standing order to loop between waypoint tiles | [wireguard](../team-leads/wireguard.md) | 🟢 | -| [p1-22](p1-22-mcts-wall-clock-budget.md) | ❌ missing | P1 | MCTS per-decision wall-clock budget — bound per-turn cost on huge maps | [warcouncil](../team-leads/warcouncil.md) | 🟢 | +| [p1-22](p1-22-mcts-wall-clock-budget.md) | 🟡 partial | P1 | MCTS per-decision wall-clock budget — bound per-turn cost on huge maps | [warcouncil](../team-leads/warcouncil.md) | 🟢 | | [p1-23](p1-23-stats-tracker-restore.md) | ✅ done | P1 | Restore StatsTracker — demographics overview broken in shipped builds | [shipwright](../team-leads/shipwright.md) | 🟢 | | [p2-01](p2-01-minimap-improvements.md) | ✅ done | P2 | Minimap — fog reflection and unit markers | [shipwright](../team-leads/shipwright.md) | 🟢 | | [p2-02](p2-02-hud-tooltips.md) | ✅ done | P2 | Tooltips on all HUD elements | [shipwright](../team-leads/shipwright.md) | 🟢 | diff --git a/.project/objectives/README.md b/.project/objectives/README.md index f5272c81..83c6483e 100644 --- a/.project/objectives/README.md +++ b/.project/objectives/README.md @@ -15,10 +15,10 @@ | Priority | 🔵 | 🟡 | 🔴 | ❌ | ⚫ | ✅ | Total | |---|---|---|---|---|---|---|---| | **P0** | 0 | 3 | 0 | 1 | 0 | 39 | 43 | -| **P1** | 0 | 3 | 0 | 9 | 1 | 21 | 34 | +| **P1** | 0 | 4 | 0 | 8 | 1 | 21 | 34 | | **P2** | 0 | 4 | 0 | 2 | 0 | 16 | 22 | | **P3 (oos)** | 0 | 0 | 0 | 0 | 17 | 0 | 17 | -| **total** | **0** | **10** | **0** | **12** | **18** | **76** | **116** | +| **total** | **0** | **11** | **0** | **11** | **18** | **76** | **116** | @@ -39,7 +39,7 @@ | ID | Status | Title | Tags | Owner | Updated | Blocked | |---|---|---|---|---|---|---| | [p0-01](p0-01-mcts-wiring.md) | 🟡 partial | Wire MCTS into gameplay AI | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-24 | 🟢 unblocked | -| [p0-02](p0-02-clan-personalities.md) | 🟡 partial | Five AI clan personalities drive distinct playstyles | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-19 | 🟢 unblocked | +| [p0-02](p0-02-clan-personalities.md) | 🟡 partial | Five AI clan personalities drive distinct playstyles | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-25 | 🟢 unblocked | | [p0-41a](p0-41a-rally-smoke.md) | 🟡 partial | Rally-point smoke test — unit moves toward rally hex on next turn | — | [shipwright](../team-leads/shipwright.md) | 2026-04-25 | 🟢 unblocked | | [p0-42a](p0-42a-formation-smoke.md) | ❌ missing | Formation aggregation smoke — 3 units form, move through narrow pass, reflow | — | [shipwright](../team-leads/shipwright.md) | 2026-04-25 | 🟢 unblocked | @@ -49,8 +49,8 @@ |---|---|---|---|---|---|---| | [p0-20](p0-20-gpu-mcts-rollouts.md) | 🟡 partial | GPU-accelerated MCTS rollouts for look-ahead decision-making | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-19 | 🟢 unblocked | | [p1-05](p1-05-balance-tuning.md) | 🟡 partial | Balance tuning — pop_peak ≥30 median, worker improvements ≥8 min | — | [shipwright](../team-leads/shipwright.md) | 2026-04-25 | 🟢 unblocked | +| [p1-22](p1-22-mcts-wall-clock-budget.md) | 🟡 partial | MCTS per-decision wall-clock budget — bound per-turn cost on huge maps | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-25 | 🟢 unblocked | | [p2-06](p2-06-export-pipeline.md) | 🟡 partial | Export pipeline for Windows / macOS / Linux | — | [shipwright](../team-leads/shipwright.md) | 2026-04-25 | 🟢 unblocked | -| [p1-22](p1-22-mcts-wall-clock-budget.md) | ❌ missing | MCTS per-decision wall-clock budget — bound per-turn cost on huge maps | — | [warcouncil](../team-leads/warcouncil.md) | 2026-04-25 | 🟢 unblocked | | [p2-16](p2-16-audio-assets.md) | ❌ missing | Audio assets — SFX + music .ogg files shipped | — | [asset-audio](../team-leads/asset-audio.md) | 2026-04-17 | 🟢 unblocked | | [p2-22](p2-22-sprite-generation-pipeline.md) | ❌ missing | Sprite generation pipeline — runnable end-to-end | — | [asset-sprite](../team-leads/asset-sprite.md) | 2026-04-17 | 🟢 unblocked | | [p2-23](p2-23-unit-sprites-dwarf-roster.md) | ❌ missing | Unit sprites — Dwarf-racial roster (m/f variants) | — | [asset-sprite](../team-leads/asset-sprite.md) | 2026-04-17 | 🟢 unblocked | diff --git a/.project/objectives/objectives.json b/.project/objectives/objectives.json index f07081df..784a4de7 100644 --- a/.project/objectives/objectives.json +++ b/.project/objectives/objectives.json @@ -1,11 +1,11 @@ { - "generated_at": "2026-04-26T00:00:36Z", + "generated_at": "2026-04-26T00:12:42Z", "totals": { "done": 76, "in_progress": 0, - "partial": 10, + "partial": 11, "stub": 0, - "missing": 12, + "missing": 11, "oos": 18, "total": 116 }, @@ -28,7 +28,7 @@ "status": "partial", "scope": "game1", "owner": "warcouncil", - "updated_at": "2026-04-19", + "updated_at": "2026-04-25", "blocked_by": [], "summary": "`ai_personalities.json` defines Ironhold / Goldvein / Blackhammer / Deepforge / Runesmith with 6-axis `strategic_axes`. `ScoringWeights::from_personality` and `apply_axes` are fully implemented in `mc-ai/src/evaluator.rs`.\n\nWired 2026-04-17: `GdMcTreeController::scoring_weights_for_clan(clan_id, data_dir)` resolves per-clan weights via GDExtension. `ai_turn_bridge.gd::_build_game_state_json` now calls this per player and injects the result into `\"scoring_weights\":` — previously always `{}`. `AI_PIN_PERSONALITY` env var added to `personality_assigner.gd` for per-clan batch testing. Smoke run confirms `player_clans: {\"1\": \"blackhammer\"}` in meta.json, EXIT_CODE=0.\n\n**5 × 10-seed batch results (2026-04-17, `.local/iter/p0-02-clans/` — PRE-REFRAME EVIDENCE):**\n\n> These batches ran BEFORE p0-25's instrumentation landed, so `player_stats` does NOT carry\n> `tier_peak` / `peak_unit_tier` / `wonder_count`. The TTV column is preserved as the\n> contemporaneous signal; it is NOT the current acceptance metric. Per p0-01's 2026-04-17\n> reframe, the primary divergence gate is **tier_peak** (era-progression, which scales with\n> difficulty per p0-24) — tracked as a \"needs re-run\" in Remaining to reach done below.\n\n| Clan | Wins | TTV_med (legacy) | p1_gold | p1_mil | p1_techs |\n|---|---|---|---|---|---|\n| ironhold | 10/10 | T185.5 | 266 | 3.0 | 27.5 |\n| goldvein | 10/10 | T155.5 | **543** | 3.5 | 25.5 |\n| blackhammer | 9/9 | T189 | 327 | 3.0 | 28 |\n| deepforge | 10/10 | T185.5 | 266 | 3.0 | 27.5 |\n| runesmith | 10/10 | T155.5 | 543 | 3.5 | 25.5 |\n\nSignals that DON'T depend on TTV (still valid post-reframe):\n- **Balance**: 49 total games, each clan 3 AI-wins, max 33% — passes.\n- **Gold axis**: goldvein 2× ironhold (wealth=9 vs 3) — passes.\n- **First-combat**: identical at T9 across all clans (map-forced start proximity, not AI-driven).\n- **Pair metric-identical**: deepforge/ironhold and goldvein/runesmith pairs show overlapping weight profiles; same 10 seeds converge.\n\nSignals that DO depend on TTV (need tier_peak re-run to close the reframed gate):\n- TTV delta between clan pairs — the \"goldvein/runesmith finish 30 turns faster than ironhold/deepforge\" claim doesn't translate into the tier_peak framework until re-measured.\n\n**B5 re-run (2026-04-17, `.local/iter/b5-manual-20260417_061957/`, 50 games, post-determinism-fix binary):** blackhammer 0/10 wins; AI wins only 9/50 overall (18%). Win-rate balance bullet fails. See \"Remaining to done\" for tuning plan.\n\n**Axis ablation sweep (2026-04-17, `.local/iter/ablate__20260417_072921/`, 10 seeds T300 per axis — PRE-REFRAME EVIDENCE):** Each axis neutralized to 5 for all clans. Measured under pre-p0-25 instrumentation; metrics are TTV / gold / mil from the legacy `player_stats` schema. All 6 axes show ≥10% delta on their correlated legacy metric vs pooled baseline (TTV=185, gold=379, mil=3):\n\n| Axis | Correlated metric (legacy) | Baseline | Ablated | Delta |\n|---|---|---|---|---|\n| aggression | mil_med | 3.0 | 2.5 | -16.7% |\n| expansion | ttv_med | 185 | 134 | -27.6% |\n| grudge_persistence | ttv_med | 185 | 131.5 | -28.9% |\n| production | ttv_med | 185 | 139 | -24.9% |\n| trade_willingness | gold_med | 379 | 193.5 | -48.9% |\n| wealth | gold_med | 379 | 227.5 | -40.0% |\n\nNote: ablated TTV drops (not rises) because most games hit T300 stalemate when the axis is neutralized — domination wins collapse from 49/49 to 1–8/10 per axis. The TTV delta reflects game degradation, not faster play. All axes CONFIRMED LIVE under the legacy metric set. Re-measurement under tier_peak is needed before the reframed acceptance (below) can be cited." }, @@ -730,7 +730,7 @@ "id": "p1-22", "title": "MCTS per-decision wall-clock budget — bound per-turn cost on huge maps", "priority": "p1", - "status": "missing", + "status": "partial", "scope": "game1", "owner": "warcouncil", "updated_at": "2026-04-25", diff --git a/.project/objectives/p2-06-export-pipeline.md b/.project/objectives/p2-06-export-pipeline.md index 57054d41..e6da45b4 100644 --- a/.project/objectives/p2-06-export-pipeline.md +++ b/.project/objectives/p2-06-export-pipeline.md @@ -38,10 +38,10 @@ Staging approach is documented in `scripts/README.md` § "Export staging (p2-06) ## Acceptance - ✓ `./run export ` produces archives per-platform under `.local/build/godot//`. Verified 2026-04-25 (`p2-06-verify-20260425`): macOS 64MB .zip with .app bundle + .dylib; Linux 77MB binary + 4MB .so. Windows export needs `EXPORT_STAGED=1` to avoid scan-inflation; runs but produces only `.tmp` because no Windows .dll is cross-compiled on macOS host (see Windows runner gap below). -- ◐ Boots-and-plays smoke: - - ✓ **macOS** verified 2026-04-25: unzipped `p2-06-verify-20260425/macos/MagicCivilization.zip` to `/tmp/p2-06-mac-smoke/`, ran `AUTO_PLAY=1 ./Magic\ Civilization.app/Contents/MacOS/Magic\ Civilization --headless` — game booted, ran ~290 turns, achieved `AutoPlay: VICTORY! Player 0 wins via score on turn 299`. Embedded .pck loads, .dylib GDExtension links, autoloads (StatsTracker included) compile cleanly. - - ✗ **Linux** archive produced but not yet booted-and-played (apricot requires weston for windowed launch — p2-12 — or use `--headless` direct on apricot, deferred). - - ✗ **Windows** — no .exe produced (cross-compile not supported; tracked as p2-06b). +- ✓ Boots-and-plays smoke: + - ✓ **macOS** verified 2026-04-25 (`p2-06-verify-20260425`): unzipped `MagicCivilization.zip` to `/tmp/p2-06-mac-smoke/`, ran `AUTO_PLAY=1 ./Magic\ Civilization.app/Contents/MacOS/Magic\ Civilization --headless` — game booted, ran ~290 turns, achieved `AutoPlay: VICTORY! Player 0 wins via score on turn 299`. Embedded .pck loads, .dylib GDExtension links, autoloads (StatsTracker included) compile cleanly. + - ✓ **Linux** verified 2026-04-25 (`p2-06-verify-fresh-so-20260425`): rsync archive to apricot `/tmp/p2-06-linux-fresh/`, ran `AUTO_PLAY=1 ./MagicCivilization.x86_64 --headless` → `AutoPlay: VICTORY! Player 1 wins via domination on turn 54`. Required two fixes this session: (a) pull fresh `libmagic_civ_physics.x86_64.so` from apricot before export (Mac copy was 8 days stale, 4MB vs 10MB); (b) `tools/export-single.sh` relocates the .so into `engine/addons/magic_civ_physics/` post-export because Godot drops it at the binary's root, but the .gdextension references the addon-relative path. + - ✗ **Windows** — no .exe produced (cross-compile not supported; tracked as **p2-06b-windows-runner.md**). - ✓ GDExtension binaries are per-platform: `.so` for Linux, `.dylib` for macOS, `.dll` for Windows — never cross-shipped. `p2-06-verify-20260425/macos/MagicCivilization.zip` ships `Contents/Frameworks/libmagic_civ_physics.dylib`; `p2-06-verify-20260425/linux/libmagic_civ_physics.x86_64.so` is separate. No cross-shipping observed. - (carried) WASM guide build (`bash build-wasm.sh`) is a separate artifact in the same release bundle. - (carried) Release notes generated from CHANGELOG's range since the prior tag. @@ -49,6 +49,7 @@ Staging approach is documented in `scripts/README.md` § "Export staging (p2-06) ### 2026-04-25 verify run notes - Export pipeline mechanically works: `./run export p2-06-verify-20260425` produced macOS + Linux archives in <2min per platform after staging applied. `EXPORT_STAGED=1` opt-in for non-macOS now required to avoid the same scan-inflation that p2-06 audit fixed for macOS — recommend making staging the default for all desktop platforms. +- **Operational gotcha (resolved in-script)**: The Linux .so at `src/game/engine/addons/magic_civ_physics/libmagic_civ_physics.x86_64.so` is built on apricot but bundled by the macOS export host. `tools/export-single.sh` now auto-rsyncs the fresh .so from `apricot:Code/@projects/@magic-civilization/src/game/engine/addons/magic_civ_physics/libmagic_civ_physics.x86_64.so` before Linux export (controlled by `PULL_LINUX_SO=1`, default on; opt-out for offline runs). If apricot is unreachable, falls back to the local copy with a yellow warning. - Pre-existing tech-debt surfaced in export logs: `SCRIPT ERROR: Identifier "StatsTracker" not declared` × 4 in `engine/scenes/overviews/demographics.gd:168/169/174/207`. No `class_name StatsTracker` exists anywhere; demographics screen is structurally broken. Spun out as **p1-23-stats-tracker-restore.md**. Non-blocking for export but ships a broken screen. - Windows runner gap remains (no .dll cross-compile from macOS); spun out as **p2-06b-windows-runner.md**. - Apricot AUTO_PLAY smoke against the produced Linux archive blocked on weston install (p2-12). diff --git a/tools/export-single.sh b/tools/export-single.sh index 0fcdf49d..91a33fcc 100755 --- a/tools/export-single.sh +++ b/tools/export-single.sh @@ -148,6 +148,22 @@ if [ -z "${EXPORT_STAGED:-}" ]; then esac fi +# ── Pull fresh Linux .so from apricot before Linux export (p2-06) ──── +# The Mac EDIT host can't build the Linux .so; apricot builds it. Without +# this rsync the Mac may bundle a stale .so → "Cannot get class 'GdEconomy'" +# at runtime. Skipped if PULL_LINUX_SO=0 or apricot is unreachable. +if [ "$platform" = "linux" ] && [ "${PULL_LINUX_SO:-1}" = "1" ]; then + so_local="$GAME_DIR/engine/addons/magic_civ_physics/libmagic_civ_physics.x86_64.so" + if command -v rsync &>/dev/null && ssh -o ConnectTimeout=3 -o BatchMode=yes apricot true 2>/dev/null; then + echo -e "${BLUE}Pulling fresh libmagic_civ_physics.x86_64.so from apricot${NC}" + rsync -a apricot:Code/@projects/@magic-civilization/src/game/engine/addons/magic_civ_physics/libmagic_civ_physics.x86_64.so "$so_local" 2>/dev/null \ + && echo -e "${DIM} · pulled $(du -h "$so_local" | cut -f1)${NC}" \ + || echo -e "${YELLOW} · pull failed; using local copy ($(du -h "$so_local" 2>/dev/null | cut -f1))${NC}" + else + echo -e "${YELLOW}apricot unreachable — using local Linux .so (may be stale; set PULL_LINUX_SO=0 to silence)${NC}" + fi +fi + export_game_dir="$GAME_DIR" staging_root="" if [ "$EXPORT_STAGED" = "1" ]; then