fix(@projects/@magic-civilization): 🐛 resolve apricot SIGTERM blocker
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
c6bcc5ba91
commit
170cee49a1
6 changed files with 111 additions and 31 deletions
|
|
@ -139,20 +139,35 @@ successful A5/B5 evidence in the repo.
|
|||
(100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
|
||||
lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
|
||||
- ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
|
||||
≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. apricot (the only
|
||||
available RUN host) SIGTERMs any Godot flatpak cluster at 3–10s wall-clock
|
||||
(apparently host-infrastructure issue: `apricot-rail-watchdog` + user-scope
|
||||
cgroup pressure; systemd-oomd failed; reproduces under `nohup`, `setsid`,
|
||||
`systemd-run --user --scope`, and `systemd-run --user --property=KillMode=none`).
|
||||
Four failed relaunch attempts 2026-04-17 12:17 → 12:24 PDT; none of the
|
||||
games ran past T52 before external SIGTERM. Journal shows
|
||||
`warcouncil-a5.service: Unit process N (timeout) remains running after unit
|
||||
stopped` — SIGTERM came from outside the service. Needs host-side
|
||||
investigation of apricot's scope-kill daemon OR a different RUN host.
|
||||
- ✗ Victory rate on a 10-seed batch ≥60% — blocked on the same SIGTERM issue
|
||||
for fresh validation against the current binary. p0-01's evidence shows
|
||||
prior batches (pre-action-order-fix) at 80–90% victory rate; post-fix may
|
||||
differ but can't measure until SIGTERM issue resolved.
|
||||
≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. Two sequential
|
||||
blockers, first now resolved:
|
||||
- (resolved) apricot SIGTERM root-caused to cleanup cycles triggered by
|
||||
chronically-failing user services (`tor-manager`, `nightcrawler-crawl`,
|
||||
`nightcrawler-controlpanel`, `lilith-host-agent`, each with NRestarts in
|
||||
the hundreds). Off-scope handoff
|
||||
`~/.claude/handoffs/apricot-flaky-user-services-cleanup.md` executed by
|
||||
yellow session 2026-04-17 ~15:25 PDT: four services `systemctl --user
|
||||
disable --now`, plus `vpn-socks5-tunnel` re-pointed to a live endpoint.
|
||||
Sign-off batch `.local/iter/sigterm-fix-verify2-1518/` on apricot: 10/10
|
||||
`turn_stats.jsonl` + `meta.json`, zero exit-143. Response at
|
||||
`~/.claude/handoffs/apricot-flaky-user-services-cleanup-RESPONSE.md`.
|
||||
- (open) `AI_GPU_ROLLOUT` env var is not wired into runtime. Grep of
|
||||
`src/simulator/crates/mc-ai/src/`, `src/simulator/api-gdext/src/`, and
|
||||
`src/game/engine/src/modules/ai/` returns no hits; the var is referenced
|
||||
only in `tools/determinism-audit.sh`. `mc-ai/src/mcts_tree.rs::TreeState::rollout`
|
||||
is still the sole per-leaf rollout hook (serial CPU), and
|
||||
`mc-ai/src/gpu/inner.rs::batch_simulate_gpu` is a standalone function
|
||||
not called from `Tree::run_iteration`. Running the env-var comparison
|
||||
now would produce identical wall-times. **Integration work remaining:**
|
||||
thread `Option<GpuContext>` into `Tree`, dispatch leaf batches through
|
||||
`batch_simulate_gpu` when context present, plumb the flag through
|
||||
`api-gdext::ai::GdMcTreeController`, read env in `ai_turn_bridge.gd`.
|
||||
- ✗ Victory rate on a 10-seed batch ≥60% — apricot sign-off batch
|
||||
`.local/iter/sigterm-fix-verify2-1518/` on the current binary produced
|
||||
turn counts across {76, 102, 126, 143, 152, 193, 201, 204, 213, 242} but
|
||||
outcomes not yet tallied (needs `autoplay-report.py` run on the dir).
|
||||
CPU-path victory-rate gate can close as soon as that report is generated;
|
||||
GPU-path gate must wait on the integration work above.
|
||||
- ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
|
||||
`mc-ai --features gpu` all compile + test clean).
|
||||
- ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`
|
||||
|
|
@ -161,10 +176,23 @@ successful A5/B5 evidence in the repo.
|
|||
|
||||
## Remaining to reach done
|
||||
|
||||
- Resolve apricot SIGTERM issue (host infra, NOT warcouncil scope) OR stand
|
||||
up a second RUN host without the same kill daemon, then re-run the wall-time
|
||||
comparison batch + 10-seed victory-rate batch. Everything else in the
|
||||
acceptance list has been met or verified.
|
||||
1. **Integrate GPU rollouts into the MCTS tree.** `batch_simulate_gpu` exists
|
||||
and is byte-parity-validated, but `Tree::run_iteration` still calls
|
||||
`TreeState::rollout` serially per leaf. Needed:
|
||||
- Add `Option<GpuContext>` to `Tree` (or pass via `run_iteration` config).
|
||||
- Collect a batch of leaf `AbstractRolloutState`s per iteration and
|
||||
dispatch `batch_simulate_gpu` when context is `Some`.
|
||||
- Surface creation of `GpuContext::shared()` through `api-gdext::ai`,
|
||||
gated on env var `AI_GPU_ROLLOUT=true` read in `ai_turn_bridge.gd` and
|
||||
passed down to `GdMcTreeController`.
|
||||
- CPU fallback path (when `GpuContext::shared()` returns `None`) already
|
||||
covered by the parity-test skip path — just exercise it in the runtime.
|
||||
2. **Tally CPU-path victory rate** from the sign-off batch
|
||||
`.local/iter/sigterm-fix-verify2-1518/` via `tools/autoplay-report.py`.
|
||||
Cite result in the acceptance bullet.
|
||||
3. **Run the wall-time comparison** (AI_GPU_ROLLOUT=true vs false, 10 seeds
|
||||
T=300, PARALLEL=4) after step 1 lands. Record wall-clock delta.
|
||||
4. **Run the GPU-path 10-seed victory batch** and cite ≥60% gate.
|
||||
|
||||
## Depends on
|
||||
|
||||
|
|
|
|||
|
|
@ -67,22 +67,27 @@ a foregone conclusion; the grid is the precondition.
|
|||
- ✓ `python3 tools/test_matchup_and_ultimate.py` passes 26/26
|
||||
unit tests for matchup_balance and ultimate_stress verdict fns.
|
||||
- ✗ **`tools/matchup-grid.sh` → `matchup_balance: PASS`** — NOT yet run.
|
||||
Gated on a stable RUN host (see p0-20 for the apricot SIGTERM situation;
|
||||
batch work is blocked until host infra resolves).
|
||||
RUN host stabilized 2026-04-17 ~15:25 PDT (apricot flaky-services cleanup;
|
||||
10/10 sign-off batch clean — see p0-20 acceptance bullet for evidence
|
||||
path). Sole remaining blocker: `auto_play.gd` hardcodes 1v1 and doesn't
|
||||
honor `MAP_SIZE` / `NUM_PLAYERS` env vars, so the script can't target
|
||||
an asymmetric clan pair.
|
||||
- ✗ **`tools/huge-map-5clan.sh` → `ultimate_stress: PASS`** — NOT yet run.
|
||||
Depends on matchup_balance passing first AND the game binary honoring
|
||||
the new `MAP_SIZE=standard` / `NUM_PLAYERS=5` env vars.
|
||||
Same blocker as above — needs `MAP_SIZE=standard` and `NUM_PLAYERS=5`
|
||||
honored by the game binary. matchup_balance does not strictly precede
|
||||
this bullet for mechanical reasons, but the user has stated matchup_balance
|
||||
is the precondition per the "deeper validation" rationale in p0-02.
|
||||
|
||||
## Remaining to reach done
|
||||
|
||||
1. **RUN host stable enough for sustained flatpak-Godot batches**
|
||||
— tracked in p0-20's "remaining" section; SIGTERM-at-3-to-10s on
|
||||
apricot blocks every game-binary test, including matchup-grid +
|
||||
ultimate.
|
||||
2. **Game binary reads `MAP_SIZE` and `NUM_PLAYERS` env** — currently
|
||||
`auto_play.gd` hardcodes a 1v1 setup. Needs minimal wiring to read
|
||||
the env vars and size the player array / pick the map.
|
||||
3. **MAX_PLAYERS POD expansion** — NOT a blocker for p0-22 (the Civ5
|
||||
1. **Game binary reads `MAP_SIZE` and `NUM_PLAYERS` env.** `auto_play.gd`
|
||||
currently hardcodes a 1v1 setup. Needs minimal wiring to read the env
|
||||
vars and size the player array / pick the map. This is the sole
|
||||
remaining blocker for both acceptance bullets.
|
||||
2. **Run matchup-grid** (C(5,2)=10 pairs × seeds). Cite verdict.
|
||||
3. **Run huge-map-5clan** (5 clans on Civ5 `standard` 80×52 map).
|
||||
Cite verdict.
|
||||
4. **MAX_PLAYERS POD expansion** — NOT a blocker for p0-22 (the Civ5
|
||||
`standard` 80×52 runs 8 players but our 5-clan ultimate only needs
|
||||
5). If we later want to run the actual canonical `huge` (128×80,
|
||||
12-player) with 8+ AI, the POD's 4-slot-per-entry layout needs
|
||||
|
|
|
|||
|
|
@ -14,6 +14,22 @@ evidence:
|
|||
|
||||
Split out from p2-09 per user directive. Separate agent owns guide-web going forward; `owner:` is unclaimed for that agent to pick up.
|
||||
|
||||
## Deferral note (2026-04-17, user directive)
|
||||
|
||||
**Not a current priority.** User scoped the three deploy tiers as:
|
||||
|
||||
- **Dev** — `pnpm dev` on the contributor's current machine (plum, apricot,
|
||||
or wherever). Local only, no infra work.
|
||||
- **Staging** — `https://mc.next.black.local` via the Tourguide-owned
|
||||
pipeline in p1-15. LAN/VPN-only; this is the "prod-like" deploy for the
|
||||
moment. All production-shaped deploy testing happens here.
|
||||
- **Prod (this objective)** — public-internet hosting (GitHub Pages /
|
||||
Cloudflare Pages / S3 / ...). **Deferred until Early Access ship
|
||||
decision.** Don't invest agent time here until the user re-prioritises.
|
||||
|
||||
Keep `tools/deploy-guide.sh` intact as authored — it already has `zip`
|
||||
mode that produces a handoff artifact for whichever public host wins.
|
||||
|
||||
## Summary
|
||||
|
||||
Separate from p2-09 (which covers the build being clean): this objective covers choosing a public host and running the deploy. Currently the deploy script is ready (`tools/deploy-guide.sh` — modes `build` / `serve` / `apricot` / `zip`), but no public host has been committed for Early Access. The `apricot` mode ships dist/ to the LAN for preview; `zip` produces a handoff artifact that any external host can consume.
|
||||
|
|
|
|||
9
public/games/age-of-dwarves/guide/.env.development
Normal file
9
public/games/age-of-dwarves/guide/.env.development
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Vite dev-mode env (tracked). Loaded automatically by `pnpm dev`.
|
||||
#
|
||||
# VITE_DEV_GUIDE=1 makes the guide render every <EpisodeGate min={N}> subtree
|
||||
# and append Episodes 2-5 to the sidebar. Keeps contributor-facing dev runs
|
||||
# in "all episodes" mode so scope drift is visible early. Production builds
|
||||
# ignore this file — see .env.production (+ the explicit override
|
||||
# `VITE_DEV_GUIDE=1` in `./run deploy:guide:next` for the mc.next.black.local
|
||||
# dev-preview deploy).
|
||||
VITE_DEV_GUIDE=1
|
||||
6
public/games/age-of-dwarves/guide/.env.production
Normal file
6
public/games/age-of-dwarves/guide/.env.production
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# Vite prod-mode env (tracked). Loaded automatically by `pnpm build`.
|
||||
#
|
||||
# Production build = Episode 1 only (Age of Dwarves Early Access). The
|
||||
# dev-preview deploy at mc.next.black.local overrides VITE_DEV_GUIDE in the
|
||||
# shell env (see `./run deploy:guide:next`).
|
||||
VITE_DEV_GUIDE=0
|
||||
|
|
@ -1,9 +1,24 @@
|
|||
import type { NavGroup } from '@magic-civ/guide-engine'
|
||||
import episodes from '@resources/episodes.json'
|
||||
import { EPISODE_COLORS } from '@magic-civ/guide-engine'
|
||||
import {
|
||||
EPISODE_COLORS,
|
||||
EP2_NAV,
|
||||
EP3_NAV,
|
||||
EP4_NAV,
|
||||
EP5_NAV,
|
||||
} from '@magic-civ/guide-engine'
|
||||
|
||||
const [ep1] = episodes
|
||||
|
||||
// When VITE_DEV_GUIDE=1 (dev server + dev-preview deploy at mc.next.black.local),
|
||||
// append Episode 2-5 nav groups from the shared guide-engine so contributors
|
||||
// see the full multi-episode structure. Production Game 1 build leaves them
|
||||
// out. Routes for Ep2+ pages are provided by their own guide shells
|
||||
// (public/games/age-of-kzzkyt, public/games/age-of-elves); clicking them
|
||||
// here falls through to the wildcard redirect-to-home — known behavior
|
||||
// until a federated-route solution lands (future tourguide work).
|
||||
const SHOW_ALL_EPISODES = import.meta.env.VITE_DEV_GUIDE === '1'
|
||||
|
||||
export const NAV: NavGroup[] = [
|
||||
// ─── Common (cross-episode) ───────────────────────────────────────────────
|
||||
{
|
||||
|
|
@ -111,4 +126,5 @@ export const NAV: NavGroup[] = [
|
|||
{ to: '/playing/lenses', icon: '🔍', label: 'Lenses' },
|
||||
],
|
||||
},
|
||||
...(SHOW_ALL_EPISODES ? [...EP2_NAV, ...EP3_NAV, ...EP4_NAV, ...EP5_NAV] : []),
|
||||
]
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue