From 1a4588279e3ce322b30acc3a026c9faffec44222 Mon Sep 17 00:00:00 2001 From: Natalie Date: Sat, 27 Jun 2026 15:14:24 -0400 Subject: [PATCH] docs(agents): document dist:image (incremental rebuild) + dist:prune in cloud-dx-do Adds the two new verbs to the table and rewrites the iteration section as a cost-tiered ladder (dist:sync seconds / dist:image ~8min / --cold ~20min) so agents reach for the incremental rebuild, not a cold packer build per change. Co-Authored-By: Claude Opus 4.8 --- .../dot-claude/instructions/cloud-dx-do.md | 24 +++++++++++++------ 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/tooling/claude/dot-claude/instructions/cloud-dx-do.md b/tooling/claude/dot-claude/instructions/cloud-dx-do.md index 6f841df9..a0b9340b 100644 --- a/tooling/claude/dot-claude/instructions/cloud-dx-do.md +++ b/tooling/claude/dot-claude/instructions/cloud-dx-do.md @@ -13,6 +13,8 @@ | `./run dist:sim [turns] [--destroy-after]` | fan seeded sims across workers via `autoplay-batch.sh` `AUTOPLAY_HOST`+`SEED_OFFSET`; results merge in `.local/iter//` | | `./run dist:render ` | render a proof scene (software weston + Mesa, **no GPU**) and pull the PNG back — replaces the dead apricot `$SCREENSHOT_HOST` | | `./run dist:sync [ref]` | `git pull` + rebuild gdext on **live** workers (mid-session code change, no image rebuild) | +| `./run dist:image [--cold]` | **(re)build the golden image — incremental by default** (layers on the last snapshot, ~8 min; provision.sh is idempotent so only the delta rebuilds). `--cold` = from stock Ubuntu (~20 min), reset cruft | +| `./run dist:prune [keep=2]` | delete superseded golden snapshots (~$0.40/mo each); keeps the newest N | | `./run dist:down` | tear the fleet down → **$0** | | `./run forge:up` / `forge:down` | Forgejo origin: restore-from-snapshot / snapshot+destroy (~$6/mo or ~$0.30 idle) | | `./run forge:dns` | `/etc/hosts` shortcut → `http://mcforge:3000` | @@ -32,15 +34,23 @@ - **Golden-image rebuild is rare** (only on toolchain/base change, ~20 min). Day-to-day = `dist:up` → `dist:sync` → `dist:test`/`dist:sim` → `dist:down`. Prefer the **warm-worker session pattern**: one `dist:up`, many tasks, one `dist:down`. - Workers are Linux x86_64; their `.so` is **not** usable on plum's macOS Godot (plum builds its own `.dylib`). Offload to DO for *tests/sims/render/linux-build validation*, not for plum's native artifact. -## Iterating on the golden image — DON'T rebuild per fix +## Iterating on the golden image — rebuild only what changed -A full `packer build` is ~20 min. Avoid the rebuild-per-fix trap (it cost ~2h once): +**Tiered by cost — pick the cheapest that covers your change:** -- **Validate `provision.sh` changes on a LIVE box first.** Spin one (`./run dist:up 1`, or a cheap throwaway droplet), ssh in, run the new steps by hand until they work, *then* bake into `provision.sh` and rebuild **once**. -- **`packer build -on-error=ask`** keeps the build droplet alive on failure so you debug in place (fix the step, re-run) instead of rebuilding cold. -- **Batch all known fixes into one rebuild** — never one build per bug. -- **Check prerequisites before building**: test-create the target size first (the `/v2/sizes` `available` flag lies), confirm the forge is up + reachable, confirm creds/env. -- **Code changes need NO rebuild** — `./run dist:sync` git-pulls + rebuilds gdext on live workers in seconds. The image rebuilds **only** when the toolchain / base image / accelerators change (rare). Day-to-day is `dist:sync`, not `packer build`. +``` +code change → ./run dist:sync seconds (git pull + gdext rebuild on LIVE workers) +tool/dep change → ./run dist:image ~8 min (incremental: layers on the last snapshot) +cruft reset → ./run dist:image --cold ~20 min (from stock Ubuntu; rare) +``` + +`./run dist:image` is incremental by default — it builds **from the newest `mc-golden` snapshot**, and `provision.sh` is idempotent (skips already-installed toolchain/apt/mold), so only the delta rebuilds (measured 7m55s vs ~20 cold). **Never hand-run `packer build` from stock Ubuntu per fix** (that cost ~2h once). + +- **Validate `provision.sh` changes on a LIVE box first** when unsure — `./run dist:up 1`, ssh in, run the new steps by hand until green, *then* bake + `dist:image` once. +- **`packer build -on-error=ask`** keeps the build droplet alive on failure so you debug in place instead of cold-rebuilding. +- **Batch known fixes into one `dist:image`** — never one build per bug. +- **Check prerequisites first**: test-create the target size (the `/v2/sizes` `available` flag lies), confirm the forge is reachable, confirm creds/env. +- **`./run dist:prune`** after a few rebuilds — incremental builds accumulate snapshots (~$0.40/mo each); keeps the newest 2. ## Relation to `canonical-commands.md` Those raw `ssh "$AUTOPLAY_HOST" cargo …` forms still work — set `AUTOPLAY_HOST=mc@` from `.local/fleet/inventory` after `dist:up`. But `./run dist:*` is preferred: it manages the fleet lifecycle, readiness wait, and teardown.