From 2e331d2b071f823eac1ec6d8bc340676f79dcae2 Mon Sep 17 00:00:00 2001 From: Natalie Date: Tue, 5 May 2026 14:06:40 -0400 Subject: [PATCH] =?UTF-8?q?feat(@projects):=20=E2=9C=A8=20add=20async=20ba?= =?UTF-8?q?tch=20protocol=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .../p2-64-apricot-async-batch-protocol.md | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 .project/objectives/p2-64-apricot-async-batch-protocol.md diff --git a/.project/objectives/p2-64-apricot-async-batch-protocol.md b/.project/objectives/p2-64-apricot-async-batch-protocol.md new file mode 100644 index 00000000..86a19ef6 --- /dev/null +++ b/.project/objectives/p2-64-apricot-async-batch-protocol.md @@ -0,0 +1,49 @@ +--- +id: p2-64 +title: Apricot async batch protocol — launch / status / fetch decoupling +priority: p2 +status: stub +scope: game1 +category: infra +owner: simulator-infra +created: 2026-05-05 +updated_at: 2026-05-05 +blocked_by: [] +follow_ups: [] +--- + +## Context + +Today `scripts/apricot-run.sh` runs a single synchronous flow: fetch → worktree → build → batch → fetch verdict → cleanup. The orchestration runs on plum and SSHes to apricot multiple times. When apricot connectivity drops mid-run (intermittent network, sshd channel saturation, sleep/wake), the local script aborts before fetching results — even though the apricot-side godot processes continue and write results to `.cache/mc-batches//` regardless. + +This couples job lifecycle to live SSH and forces every wake to do expensive ssh probes. With intermittent connectivity, that costs lost orchestration even when no work was lost. + +## Acceptance + +- ❌ `scripts/apricot-run.sh launch ` — fires the orchestration entirely on apricot via `systemd-run --user --unit=mc-batch- --collect`. Returns immediately with `STAMP=` on stdout (one line, scriptable). The systemd unit owns build + batch lifecycle; survives SSH disconnects. +- ❌ `scripts/apricot-run.sh status ` — single short SSH probe (`ConnectTimeout=5`), structured stdout: `{"state":"running|complete|failed|unreachable","seeds_done":N,"seeds_total":M,"completion_marker":bool}`. Tolerates SSH timeouts (returns `unreachable` on probe failure). +- ❌ `scripts/apricot-run.sh fetch ` — `rsync -a --partial` pulls `~/.cache/mc-batches//` to `.local/iter//`. Resumable. Exits 1 if batch isn't complete yet (so callers can retry). +- ❌ Existing synchronous modes (`smoke`, `huge-map-5clan`, `ai-quality-baseline-pre-c`, etc.) keep working — `launch` is a new sub-mode that wraps them, not a replacement. Backwards-compat for callers that DO want to block. +- ❌ Documentation in `scripts/apricot-run.sh` header + a short example snippet in `.claude/instructions/canonical-commands.md` showing the launch/status/fetch loop. +- ❌ `mc-batch-.service` systemd-unit template lives at `scripts/dev-setup/mc-batch.service.in` (or inline in apricot-run.sh) — instantiated per-stamp via `systemd-run --user --unit=...`, with `KillMode=mixed` + `TimeoutStopSec=10s` for clean shutdown. + +## Source-of-truth rails + +- **Bash crate**: `scripts/apricot-run.sh` owns the protocol. No GDScript dependency. +- **systemd unit**: `--user` scope (per-user lifecycle), `--collect` (auto-clean on success), `--unit=mc-batch-` (named for status query). +- **Result location**: existing `~/.cache/mc-batches//` — no new path. +- **Status output**: JSON, machine-readable. No prose. +- **No backwards-compat shim** for the old single-call pattern — both work, the user picks. + +## Out of scope + +- Cross-host queueing (only one batch can run at a time on apricot today; that's fine). +- Retry/resume of the build phase (if apricot reboots mid-build, the unit fails and a new `launch` is required). +- Result-collection across multiple stamps in one fetch call (batch by batch). + +## References + +- `scripts/apricot-run.sh` — current synchronous flow. +- `.claude/instructions/canonical-commands.md` — apricot batch invocation patterns. +- `.claude/instructions/two-host-workflow.md` — EDIT vs RUN host discipline. +- p1-22, p2-44, p1-38 — recent objectives that hit batch-orchestration friction.