From 68099051b8f68a605aa7cfc2cbe015e983ba8df3 Mon Sep 17 00:00:00 2001 From: Natalie Date: Sat, 27 Jun 2026 14:37:07 -0400 Subject: [PATCH] docs(agents): add 'avoid per-fix image rebuilds' iteration discipline to cloud-dx-do Validate provision.sh on a live box first; packer -on-error=ask; batch fixes; check size/forge prereqs before building; code via dist:sync (image rebuild only for toolchain/accelerator changes). Co-Authored-By: Claude Opus 4.8 --- tooling/claude/dot-claude/instructions/cloud-dx-do.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tooling/claude/dot-claude/instructions/cloud-dx-do.md b/tooling/claude/dot-claude/instructions/cloud-dx-do.md index 02e541c2..6f841df9 100644 --- a/tooling/claude/dot-claude/instructions/cloud-dx-do.md +++ b/tooling/claude/dot-claude/instructions/cloud-dx-do.md @@ -32,6 +32,16 @@ - **Golden-image rebuild is rare** (only on toolchain/base change, ~20 min). Day-to-day = `dist:up` → `dist:sync` → `dist:test`/`dist:sim` → `dist:down`. Prefer the **warm-worker session pattern**: one `dist:up`, many tasks, one `dist:down`. - Workers are Linux x86_64; their `.so` is **not** usable on plum's macOS Godot (plum builds its own `.dylib`). Offload to DO for *tests/sims/render/linux-build validation*, not for plum's native artifact. +## Iterating on the golden image — DON'T rebuild per fix + +A full `packer build` is ~20 min. Avoid the rebuild-per-fix trap (it cost ~2h once): + +- **Validate `provision.sh` changes on a LIVE box first.** Spin one (`./run dist:up 1`, or a cheap throwaway droplet), ssh in, run the new steps by hand until they work, *then* bake into `provision.sh` and rebuild **once**. +- **`packer build -on-error=ask`** keeps the build droplet alive on failure so you debug in place (fix the step, re-run) instead of rebuilding cold. +- **Batch all known fixes into one rebuild** — never one build per bug. +- **Check prerequisites before building**: test-create the target size first (the `/v2/sizes` `available` flag lies), confirm the forge is up + reachable, confirm creds/env. +- **Code changes need NO rebuild** — `./run dist:sync` git-pulls + rebuilds gdext on live workers in seconds. The image rebuilds **only** when the toolchain / base image / accelerators change (rare). Day-to-day is `dist:sync`, not `packer build`. + ## Relation to `canonical-commands.md` Those raw `ssh "$AUTOPLAY_HOST" cargo …` forms still work — set `AUTOPLAY_HOST=mc@` from `.local/fleet/inventory` after `dist:up`. But `./run dist:*` is preferred: it manages the fleet lifecycle, readiness wait, and teardown.