docs(agents): teach specialists the DigitalOcean fleet is the RUN host

New cloud-dx-do.md (dist:*/forge:* verbs, setup state, gotchas: size tier,
exfil autoMode gate, always dist:down, linux-only .so). Wired into the CLAUDE.md
router, specialist-preamble (all specialists), canonical-commands banner, and the
instructions README index/tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Natalie 2026-06-27 13:55:03 -04:00
parent 04fabbc1c2
commit e9e8a8220c
7 changed files with 82 additions and 2 deletions

View file

@ -75,6 +75,28 @@ echo "=== [4/7] toolchain via scripts/dev-setup/linux.sh ==="
# we use GitLab CI, not a forgejo runner, so keep it false. # we use GitLab CI, not a forgejo runner, so keep it false.
as_user "cd ~/$REPO_PATH && WITH_RUNNER=false bash scripts/dev-setup/linux.sh" as_user "cd ~/$REPO_PATH && WITH_RUNNER=false bash scripts/dev-setup/linux.sh"
echo "=== [4b/7] build accelerators: mold linker + sccache ==="
# mold: much faster linking of the big GDExtension cdylib. sccache: caches rustc
# outputs so fresh workers reuse compiled crates. Both configured ONLY for the
# build user on the worker (Linux) — never touches plum's macOS .cargo config.
MOLD_OK=false; apt-get -o DPkg::Lock::Timeout=300 install -y mold && MOLD_OK=true
SCCACHE_OK=false
as_user "source ~/.cargo/env && (command -v sccache >/dev/null || cargo binstall -y sccache >/dev/null 2>&1 || cargo install sccache)" && SCCACHE_OK=true
mkdir -p "/home/$BUILD_USER/.cargo"
{
if $MOLD_OK; then
echo '[target.x86_64-unknown-linux-gnu]'
echo 'rustflags = ["-C", "link-arg=-fuse-ld=mold"]'
echo
fi
if $SCCACHE_OK; then
echo '[build]'
echo 'rustc-wrapper = "sccache"'
fi
} > "/home/$BUILD_USER/.cargo/config.toml"
chown "$BUILD_USER:$BUILD_USER" "/home/$BUILD_USER/.cargo/config.toml"
echo " mold=$MOLD_OK sccache=$SCCACHE_OK"
echo "=== [5/7] python RL deps ===" echo "=== [5/7] python RL deps ==="
as_user "pip3 install --user --break-system-packages -r ~/$REPO_PATH/tooling/rl_self_play/requirements.txt || pip3 install --user -r ~/$REPO_PATH/tooling/rl_self_play/requirements.txt" as_user "pip3 install --user --break-system-packages -r ~/$REPO_PATH/tooling/rl_self_play/requirements.txt || pip3 install --user -r ~/$REPO_PATH/tooling/rl_self_play/requirements.txt"

View file

@ -45,6 +45,7 @@ Modules live at `.claude/instructions/<file>.md` (symlink resolves to `tooling/c
| Picking, dispatching, parallelizing & verifying specialist agents | `agents-task-map.md` | | Picking, dispatching, parallelizing & verifying specialist agents | `agents-task-map.md` |
| Running commands on EDIT vs RUN host, env vars, rsync | `two-host-workflow.md` | | Running commands on EDIT vs RUN host, env vars, rsync | `two-host-workflow.md` |
| Running tests/builds via ssh to the RUN host | `canonical-commands.md` | | Running tests/builds via ssh to the RUN host | `canonical-commands.md` |
| **Offloading builds/tests/sims/render to cloud compute — the DigitalOcean fleet (`./run dist:*` / `forge:*`), the current RUN host** | `cloud-dx-do.md` |
| Forgejo vs Gitea terminology, `.forgejo/workflows/` | `forgejo-vs-gitea.md` | | Forgejo vs Gitea terminology, `.forgejo/workflows/` | `forgejo-vs-gitea.md` |
| `./run` commands, screenshots, `.env.*` | `task-runner.md` | | `./run` commands, screenshots, `.env.*` | `task-runner.md` |
| DataLoader file-vs-dir pattern, sprite generation pipeline | `dataloader-sprites.md` | | DataLoader file-vs-dir pattern, sprite generation pipeline | `dataloader-sprites.md` |

View file

@ -29,6 +29,7 @@ tooling/claude/
├── agents-task-map.md ├── agents-task-map.md
├── two-host-workflow.md ├── two-host-workflow.md
├── canonical-commands.md ├── canonical-commands.md
├── cloud-dx-do.md
├── forgejo-vs-gitea.md ├── forgejo-vs-gitea.md
├── task-runner.md ├── task-runner.md
├── dataloader-sprites.md ├── dataloader-sprites.md
@ -58,6 +59,7 @@ tooling/claude/
| `agents-task-map.md` | Choosing which specialist to dispatch | ~450 | | `agents-task-map.md` | Choosing which specialist to dispatch | ~450 |
| `two-host-workflow.md` | EDIT vs RUN host, env vars, rsync safety | ~750 | | `two-host-workflow.md` | EDIT vs RUN host, env vars, rsync safety | ~750 |
| `canonical-commands.md` | Running tests, builds, sims via ssh to RUN host | ~300 | | `canonical-commands.md` | Running tests, builds, sims via ssh to RUN host | ~300 |
| `cloud-dx-do.md` | DigitalOcean compute/render fleet — `./run dist:*` / `forge:*` (current RUN host) | ~900 |
| `forgejo-vs-gitea.md` | CI workflows, runner setup, forge terminology | ~300 | | `forgejo-vs-gitea.md` | CI workflows, runner setup, forge terminology | ~300 |
| `task-runner.md` | `./run` commands, screenshots, `.env.*` | ~300 | | `task-runner.md` | `./run` commands, screenshots, `.env.*` | ~300 |
| `dataloader-sprites.md` | JSON data layout, sprite generation pipeline | ~300 | | `dataloader-sprites.md` | JSON data layout, sprite generation pipeline | ~300 |

View file

@ -2,6 +2,8 @@
**Load when:** running Rust tests, Godot tests, sims, or builds. These must run FROM the EDIT host and execute ON the RUN host via ssh — never run the raw `cargo`/`flatpak`/`build-gdext.sh` commands directly on the EDIT host. **Load when:** running Rust tests, Godot tests, sims, or builds. These must run FROM the EDIT host and execute ON the RUN host via ssh — never run the raw `cargo`/`flatpak`/`build-gdext.sh` commands directly on the EDIT host.
> **The RUN host is now the DigitalOcean fleet** (apricot/black are down). **Prefer the `./run dist:*` verbs — see `cloud-dx-do.md`.** `./run dist:up 1` boots a beefy worker (waits for readiness), then `dist:test` / `dist:sim` / `dist:render`, then `dist:down`. The ssh table below is the underlying mechanism — set `AUTOPLAY_HOST=mc@<ip>` from `.local/fleet/inventory` after `dist:up`.
For env var setup (`AUTOPLAY_HOST`, `PROJECT_ROOT_REMOTE`, etc.) see `two-host-workflow.md`. For env var setup (`AUTOPLAY_HOST`, `PROJECT_ROOT_REMOTE`, etc.) see `two-host-workflow.md`.
| Intent | Canonical command (from EDIT host) | | Intent | Canonical command (from EDIT host) |

View file

@ -0,0 +1,38 @@
# Cloud DX — DigitalOcean compute/render fleet (the current RUN host)
**Load when:** running Rust builds/tests, headless sims, RL training, or render proofs on cloud compute. The home RUN hosts (apricot GPU, black CPU) are down; **DigitalOcean is the RUN host now**, driven by `./run dist:*` / `./run forge:*`.
## The verbs (run from the EDIT host = plum; auto-registered via `scripts/run/{dist,forge}.sh`)
| Verb | Does |
|---|---|
| `./run dist:check` | offline-validate the IaC — `terraform fmt`+`validate`+mocked `terraform test`. **No token, no spend.** Run anytime. |
| `./run dist:up <N> [size] [region]` | boot N workers from the golden image; **waits for cloud-init readiness** before returning |
| `./run dist:test` | `cargo test --workspace` (nextest) on a worker |
| `./run dist:build` | `cargo build` + WASM on a worker; rsync the WASM back (native `.so` is linux-only, stays on the worker) |
| `./run dist:sim <games> [turns] [--destroy-after]` | fan seeded sims across workers via `autoplay-batch.sh` `AUTOPLAY_HOST`+`SEED_OFFSET`; results merge in `.local/iter/<stamp>/` |
| `./run dist:render <res://scene.tscn> <out.png>` | render a proof scene (software weston + Mesa, **no GPU**) and pull the PNG back — replaces the dead apricot `$SCREENSHOT_HOST` |
| `./run dist:sync [ref]` | `git pull` + rebuild gdext on **live** workers (mid-session code change, no image rebuild) |
| `./run dist:down` | tear the fleet down → **$0** |
| `./run forge:up` / `forge:down` | Forgejo origin: restore-from-snapshot / snapshot+destroy (~$6/mo or ~$0.30 idle) |
| `./run forge:dns` | `/etc/hosts` shortcut → `http://mcforge:3000` |
## Standing setup (already built — proven 2026-06-27)
- **Forge**: `mc-forge` droplet running Forgejo; repo `mcadmin/magicciv`; IP + admin creds in `~/.vault/mc_forge_creds`.
- **Golden image**: Packer `infra/packer/`, auto-discovered by the fleet (snapshot name prefix `mc-golden`). Bakes: toolchain (via `scripts/dev-setup/linux.sh`) + prebuilt GDExtension `.so` + warm Godot import + **weston/Mesa render stack** + **mold + sccache** build accelerators + the fleet ssh key in `mc`'s `authorized_keys`.
- **Fleet TF**: `infra/terraform/test-fleet/` — DO provider, golden-image data-source discovery, grouped under the `mc:dev` DO project, mocked-provider test suite.
- **Secrets**: `~/.vault/{do_pat_mc, mc_forge_creds}` (600). Key `~/.ssh/id_mc_fleet` (DO key `mc-fleet`).
## Gotchas every agent must respect
- **Default worker size is `s-8vcpu-16gb-amd`** (8 vCPU AMD). The account tier restricts `c-*` and non-amd 8 vCPU+ Basic sizes → `422 size restricted`. Don't pick those without a DO tier ticket.
- **Exfil hard-deny**: an agent cannot push/clone the private repo onto a fresh cloud box unless the **`autoMode` trust block** is present in `.claude/settings.local.json` (owner-added by hand — the agent can't self-grant). With it + **creds via `PKR_VAR_*`/`TF_VAR_*` env, never on argv**, `packer build`/`terraform apply`/`git push` run fine. If you hit a "data exfiltration" denial, the trust block is missing — stop and tell the owner.
- **Always `./run dist:down`** when done. DO bills a droplet while it *exists* — powering off does NOT stop billing; only destroy does.
- **Golden-image rebuild is rare** (only on toolchain/base change, ~20 min). Day-to-day = `dist:up``dist:sync``dist:test`/`dist:sim``dist:down`. Prefer the **warm-worker session pattern**: one `dist:up`, many tasks, one `dist:down`.
- Workers are Linux x86_64; their `.so` is **not** usable on plum's macOS Godot (plum builds its own `.dylib`). Offload to DO for *tests/sims/render/linux-build validation*, not for plum's native artifact.
## Relation to `canonical-commands.md`
Those raw `ssh "$AUTOPLAY_HOST" cargo …` forms still work — set `AUTOPLAY_HOST=mc@<ip>` from `.local/fleet/inventory` after `dist:up`. But `./run dist:*` is preferred: it manages the fleet lifecycle, readiness wait, and teardown.
Full design + cost model: `~/.claude/plans/flickering-riding-blum.md`. Memory: `project_cloud_test_fleet`. cocotte replica handoff: `~/Code/@projects/@cocottetech/docs/CLOUD_DX_HANDOFF.md`.

View file

@ -32,7 +32,7 @@ Layer specifics: **`rust-source-of-truth.md`** (Rust/crates), **`gdscript-conven
"Looks done" is not done. Match the proof to what you changed: "Looks done" is not done. Match the proof to what you changed:
- **Rust logic**`cargo test -p <crate>` green (set `CARGO_PROFILE_DEV_DEBUG=0 CARGO_PROFILE_TEST_DEBUG=0`). Commands: **`canonical-commands.md`**. - **Rust logic**`cargo test -p <crate>` green (set `CARGO_PROFILE_DEV_DEBUG=0 CARGO_PROFILE_TEST_DEBUG=0`). Commands: **`canonical-commands.md`**. The RUN host is the **DigitalOcean fleet** now (apricot/black down) — offload via `./run dist:test` / `dist:sim` / `dist:render`; see **`cloud-dx-do.md`**.
- **Sim behavior** → run the **headless play loop** (view/act/end_turn) to get ground truth — don't launch the UI to "prove" the sim. Don't guess intent; run it. - **Sim behavior** → run the **headless play loop** (view/act/end_turn) to get ground truth — don't launch the UI to "prove" the sim. Don't guess intent; run it.
- **Golden tests** → a sequencing change *may* move goldens; re-pin **intentionally** with a comment saying why, and re-verify determinism. Never re-pin blind. - **Golden tests** → a sequencing change *may* move goldens; re-pin **intentionally** with a comment saying why, and re-verify determinism. Never re-pin blind.
- **UI / live-game / rendered change** → a **render-proof** is required before "done" (the phase gate): **`phase-gate-protocol.md`**. Headless cannot prove a rendered change. - **UI / live-game / rendered change** → a **render-proof** is required before "done" (the phase gate): **`phase-gate-protocol.md`**. Headless cannot prove a rendered change.

View file

@ -58,7 +58,22 @@
"Bash(awk '{print $1}')", "Bash(awk '{print $1}')",
"Bash(rm -f magic_civ_gameplay_demo.zip)", "Bash(rm -f magic_civ_gameplay_demo.zip)",
"Bash(zip -j magic_civ_gameplay_demo.zip magic_civ_gameplay_arc_*.png magic_civ_demo_*.png)", "Bash(zip -j magic_civ_gameplay_demo.zip magic_civ_gameplay_arc_*.png magic_civ_demo_*.png)",
"Bash(unzip -l magic_civ_gameplay_demo.zip)" "Bash(unzip -l magic_civ_gameplay_demo.zip)",
"Bash(git push:*)",
"Bash(packer init:*)",
"Bash(packer build:*)",
"Bash(terraform apply:*)",
"Bash(terraform destroy:*)"
]
},
"autoMode": {
"environment": [
"$defaults",
"The self-hosted Forgejo at 159.203.170.249:3000 (droplet 'mc-forge', DigitalOcean project 'mc:dev') is the repo owner's OWN forge holding this project's source. DigitalOcean droplets in the 'mc:dev' project are the owner's own build/test/render infrastructure. Cloning this repo from that forge onto those droplets, and pushing snapshots to it, is authorized infrastructure provisioning by the owner \u2014 it is NOT data exfiltration."
],
"allow": [
"$defaults",
"git/packer/terraform operations moving THIS project's source between the owner's machine, the mc-forge Forgejo (159.203.170.249), and DigitalOcean droplets in the mc:dev project are authorized provisioning, not exfiltration."
] ]
} }
} }