feat(@projects): ✨ add async batch protocol docs
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
fbaca9de95
commit
2e331d2b07
1 changed files with 49 additions and 0 deletions
49
.project/objectives/p2-64-apricot-async-batch-protocol.md
Normal file
49
.project/objectives/p2-64-apricot-async-batch-protocol.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
---
|
||||
id: p2-64
|
||||
title: Apricot async batch protocol — launch / status / fetch decoupling
|
||||
priority: p2
|
||||
status: stub
|
||||
scope: game1
|
||||
category: infra
|
||||
owner: simulator-infra
|
||||
created: 2026-05-05
|
||||
updated_at: 2026-05-05
|
||||
blocked_by: []
|
||||
follow_ups: []
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Today `scripts/apricot-run.sh` runs a single synchronous flow: fetch → worktree → build → batch → fetch verdict → cleanup. The orchestration runs on plum and SSHes to apricot multiple times. When apricot connectivity drops mid-run (intermittent network, sshd channel saturation, sleep/wake), the local script aborts before fetching results — even though the apricot-side godot processes continue and write results to `.cache/mc-batches/<stamp>/` regardless.
|
||||
|
||||
This couples job lifecycle to live SSH and forces every wake to do expensive ssh probes. With intermittent connectivity, that costs lost orchestration even when no work was lost.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- ❌ `scripts/apricot-run.sh launch <mode> <args>` — fires the orchestration entirely on apricot via `systemd-run --user --unit=mc-batch-<stamp> --collect`. Returns immediately with `STAMP=<value>` on stdout (one line, scriptable). The systemd unit owns build + batch lifecycle; survives SSH disconnects.
|
||||
- ❌ `scripts/apricot-run.sh status <stamp>` — single short SSH probe (`ConnectTimeout=5`), structured stdout: `{"state":"running|complete|failed|unreachable","seeds_done":N,"seeds_total":M,"completion_marker":bool}`. Tolerates SSH timeouts (returns `unreachable` on probe failure).
|
||||
- ❌ `scripts/apricot-run.sh fetch <stamp>` — `rsync -a --partial` pulls `~/.cache/mc-batches/<stamp>/` to `.local/iter/<stamp>/`. Resumable. Exits 1 if batch isn't complete yet (so callers can retry).
|
||||
- ❌ Existing synchronous modes (`smoke`, `huge-map-5clan`, `ai-quality-baseline-pre-c`, etc.) keep working — `launch` is a new sub-mode that wraps them, not a replacement. Backwards-compat for callers that DO want to block.
|
||||
- ❌ Documentation in `scripts/apricot-run.sh` header + a short example snippet in `.claude/instructions/canonical-commands.md` showing the launch/status/fetch loop.
|
||||
- ❌ `mc-batch-<stamp>.service` systemd-unit template lives at `scripts/dev-setup/mc-batch.service.in` (or inline in apricot-run.sh) — instantiated per-stamp via `systemd-run --user --unit=...`, with `KillMode=mixed` + `TimeoutStopSec=10s` for clean shutdown.
|
||||
|
||||
## Source-of-truth rails
|
||||
|
||||
- **Bash crate**: `scripts/apricot-run.sh` owns the protocol. No GDScript dependency.
|
||||
- **systemd unit**: `--user` scope (per-user lifecycle), `--collect` (auto-clean on success), `--unit=mc-batch-<stamp>` (named for status query).
|
||||
- **Result location**: existing `~/.cache/mc-batches/<stamp>/` — no new path.
|
||||
- **Status output**: JSON, machine-readable. No prose.
|
||||
- **No backwards-compat shim** for the old single-call pattern — both work, the user picks.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Cross-host queueing (only one batch can run at a time on apricot today; that's fine).
|
||||
- Retry/resume of the build phase (if apricot reboots mid-build, the unit fails and a new `launch` is required).
|
||||
- Result-collection across multiple stamps in one fetch call (batch by batch).
|
||||
|
||||
## References
|
||||
|
||||
- `scripts/apricot-run.sh` — current synchronous flow.
|
||||
- `.claude/instructions/canonical-commands.md` — apricot batch invocation patterns.
|
||||
- `.claude/instructions/two-host-workflow.md` — EDIT vs RUN host discipline.
|
||||
- p1-22, p2-44, p1-38 — recent objectives that hit batch-orchestration friction.
|
||||
Loading…
Add table
Reference in a new issue