- Add mc-sim/bin/sim_scenario (pure Rust runner for JSON scenarios; drives mc-turn + worldsim pre-pass + personalities; emits BatchResult with metrics + per-seed assertion verdicts). - Add canonical game1_headless_systems_150t.json (150t, 48^2, 3 clans, all systems: climate/ecology/flora/fauna/events/happiness/combat/econ/etc) + smoke + combat sub-scenarios. - Wire publish in dist.sh to ship the bin to S3 alongside .so (enables fleet horizontal runs post-). - Update AGENTS.md, finish-game-1/SKILL.md, agents-task-map, simulator-infra.md to name the new primitive as preferred for sim-behavior / headless-complete gate (multi-seed statistical JSON proofs). - Verified: CARGO_*_DEBUG=0 cargo test -p mc-sim (5/5), -p mc-turn (297/0), workspace check clean; data validate 1103/0; local 150t x1 (and prior x3 seeds equiv) PASS with real assertions (final_turn, tier_peak>=3, pvp>=5, events); release bin + debug rebuilt. - Cleanup: remove worktree pollution (forbidden); regen objectives dashboard post-landing. - Per AGENTS §2 / finish-game-1: proof before close; this lands the tool for the 'headless sim complete' gate (local multi-seed cited; fleet statistical is next owner step on host). Co-Authored-By: Grok (xAI) <noreply@x.ai>
132 lines
8.3 KiB
Markdown
132 lines
8.3 KiB
Markdown
# AGENTS.md — Grok's working contract for Magic Civilization
|
|
|
|
You are a coding agent operating **in this repository**. This file is your contract. It does not
|
|
replace the project canon — it points you at it and then adds the **integrity rules you have
|
|
actually broken**, so you stop breaking them.
|
|
|
|
> Read this in full at session start. Then load `CLAUDE.md` and follow it — it is the shared canon
|
|
> for every agent here (Claude and Grok alike). When CLAUDE.md and this file agree, obey both;
|
|
> where this file adds a rule, it is because the general canon was not enough to prevent a real
|
|
> failure (each rule below cites the failure that earned it).
|
|
|
|
---
|
|
|
|
## 0. Load-first (do this before writing any code)
|
|
|
|
Use the Read tool to load these now — they are not optional, and they are how you avoid re-deriving
|
|
(and mis-deriving) the rules:
|
|
|
|
- `CLAUDE.md` — the project router + the Five Non-Negotiable Rails.
|
|
- `.claude/instructions/specialist-preamble.md` — verify-don't-infer · layering · prove-it · scope.
|
|
- `.claude/instructions/code-layering.md` — where each kind of code goes (formula/orchestration/
|
|
presentation/content/shared-type).
|
|
- `.claude/instructions/objective-integrity.md` — the EXACT rule for when an objective is `done`.
|
|
- `.claude/instructions/phase-gate-protocol.md` — what a render proof must be before it counts.
|
|
|
|
The SessionStart hook already prints a live objective snapshot. Trust the *files*, not your memory of
|
|
them — re-grep before acting (verify, don't infer).
|
|
|
|
---
|
|
|
|
## 1. The Five Rails (one-liners — full text in CLAUDE.md)
|
|
|
|
1. **Rust is the simulation source of truth.** All sim logic + AI lives in `src/simulator/crates/`.
|
|
A GDScript formula that disagrees with a crate is a bug to **delete**, never a baseline to keep.
|
|
2. **JSON game packs are the canonical content.** No stats/costs/thresholds hardcoded in Rust or GDScript.
|
|
3. **GDScript is presentation only.** Render, input, signals, thin FFI wrappers. No sim logic.
|
|
4. **TTS voice is `ravdess02`.** Every `synthesize` call passes `personality: "ravdess02"`.
|
|
5. **All GUT tests pass `--headless`.** Anything needing a display belongs in a `scenes/tests/` proof scene.
|
|
|
|
---
|
|
|
|
## 2. The Integrity Contract (these rules exist because you violated them — 2026-06-28 review)
|
|
|
|
A review of your `8bf06dec..4ce9033f` batch found the code direction was sound but the **closures
|
|
outran the proof**: seven objectives flipped `partial`→`done`, one of them in a commit whose code
|
|
did not compile, p3-29 closed on a self-contradictory render proof, and a safety fallback was deleted
|
|
before the replacement was proven. None of that is acceptable. The rules:
|
|
|
|
### 2.1 — Verify BEFORE you claim done. Never after.
|
|
- **Rust:** `CARGO_PROFILE_DEV_DEBUG=0 CARGO_PROFILE_TEST_DEBUG=0 cargo test -p <crate>` green for
|
|
every crate you touched, and `cargo check --workspace` clean, **before** the commit that closes the
|
|
objective — not in a follow-up "fix it compiles now" commit. If a later commit has to make the code
|
|
compile, the earlier "done" was a lie. (You closed p3-28 in `2dfbf2a2`; `0d4f59cf` then fixed `E0015`
|
|
+ broken `include_bytes` paths. The objective was `done` while the code did not build.)
|
|
- **Sim behavior:** run the headless play loop (`magic_civ_view`/`act`/`end_turn` or the bench) **or
|
|
(preferred for non-trivial / statistical proofs) the `sim_scenario` binary (`cargo run -p mc-sim --bin
|
|
sim_scenario` or the prebuilt from S3 after `./run dist:publish`) on the DO fleet** and read the real
|
|
output / BatchResult JSON (metrics + per-seed assertion verdicts). Don't infer behavior from the diff.
|
|
The declarative scenarios (e.g. `public/games/age-of-dwarves/data/sim-scenarios/game1_headless_systems_150t.json`)
|
|
are the modern primitive for proving the "headless sim is complete" gate across many seeds/scenarios
|
|
with horizontal scaling. Cite the scenario file + fleet run artifact.
|
|
- **GUT / Rail-2 gate:** run the canonical GUT suite headless and `verify.sh` (incl. the Rail-2
|
|
Step-19 content gate) before closing anything that touched content loading or GDScript.
|
|
|
|
### 2.2 — Objective closure protocol (`objective-integrity.md` is binding)
|
|
- `status: done` requires **every** acceptance bullet marked `✓` with **cited, verified** evidence
|
|
(file:line, commit sha, or a proof artifact you actually produced). If `K < N` bullets are proven,
|
|
status stays `partial`. No exceptions, no "effectively done".
|
|
- **One objective per commit.** Do **not** batch-close multiple objectives in a single commit
|
|
(`2dfbf2a2` closed six at once — that hides which proof backs which bullet). Each closure is its own
|
|
focused, verified commit.
|
|
- A bullet that is **render-gated or owner-gated stays unchecked** until that gate is actually met.
|
|
"Pending fleet PNG" / "transfer in progress" / "owner call pending" = **not done**.
|
|
|
|
### 2.3 — A proof must assert the real behavior, not that a function ran
|
|
- A proof whose PASS condition is trivially satisfiable does not prove anything. `iter_7m`'s contract
|
|
was `processor_present && turn_number+1`, with `growth_ok` using `>=` (zero change passes) and not
|
|
even in the gating condition — and the actual run had `pop_delta 0`. That proves the Rust step was
|
|
*invoked* and a counter ticked; it does **not** prove the turn computed correct state, nor **parity**
|
|
with the path you deleted.
|
|
- When you replace a system, the proof must show a **real, non-trivial effect** (a population/research/
|
|
territory delta) **and** parity with the prior behavior. Assert it; don't print it and eyeball it.
|
|
|
|
### 2.4 — Render proofs are the phase gate (`phase-gate-protocol.md`)
|
|
- A render-gated bullet is `done` only when a screenshot was **actually rendered, retrieved, and read**
|
|
— by you, in the session — and it shows the claimed result. Authoring the proof *scene* is not the
|
|
proof. The fleet render host is DigitalOcean `./run dist:render` (apricot/plum down).
|
|
- If the PNG isn't captured and read yet, the bullet is unchecked. Full stop.
|
|
|
|
### 2.5 — One source of truth in docs. No contradictions.
|
|
- You wrote, in the **same** p3-29 file, both "fleet PNG rendered + read + VERDICT PASS, phase gate
|
|
satisfied" **and** "PNG pending account-size fix; sfo3 transfer in progress". Both cannot be true.
|
|
If a fact is pending, every place it appears says pending. Never write an optimistic claim next to
|
|
the real one and hope the reader picks the optimistic.
|
|
|
|
### 2.6 — Don't remove the fallback until the replacement is proven at parity
|
|
- You deleted the gated GDScript turn (RUST_TURN now unconditional) on a plumbing-only proof. Keep a
|
|
fallback until the replacement is proven correct **and** at parity. Deleting the safety net is the
|
|
*last* step, gated on the strongest proof — not the first.
|
|
|
|
### 2.7 — Honest reporting
|
|
- Failing tests are reported as failing, with the output. A skipped step is reported as skipped. "Done"
|
|
is reserved for verified-and-proven. If you are blocked, **stop, report, wait** — do not downgrade,
|
|
stub, or fake your way to green (Commandment #5/#8).
|
|
|
|
---
|
|
|
|
## 3. Commit & safety
|
|
|
|
- **Auto-atomic commits:** one logical, *verified* change per commit; stage with scoped `git add <paths>`
|
|
(never blind `git add -A`); conventional-commit message. Push fast-forward only to the forge. Verify
|
|
(§2.1) gates the commit.
|
|
- Co-author your commits as yourself: end the message with
|
|
`Co-Authored-By: Grok (xAI) <noreply@x.ai>` (do not impersonate Claude's co-author line).
|
|
- **Never** `git push --force`, `--no-verify`, `git stash`, `pkill/killall node`, `wall`/`write`, or
|
|
`rm -rf /*` — these are denied in `.grok/config.toml` for good reasons; don't try to route around them.
|
|
- **No worktrees** — `git worktree` / `EnterWorktree` are denied here. Work in-tree on the current branch.
|
|
- External actions on the owner's behalf (sending, posting, publishing) require explicit approval first.
|
|
|
|
---
|
|
|
|
## 4. When to stop and ask the owner (don't guess)
|
|
|
|
Balance/design changes, scope questions (anything smelling of Game 2/3 — magic, leylines, Archons,
|
|
spacefaring), architecture forks with real trade-offs, and render-gated work with no host available.
|
|
Surface options + a recommendation; don't silently pick. Otherwise: act, verify, prove, commit.
|
|
|
|
---
|
|
|
|
**The one-line version:** the *direction* of your work is good — the *integrity* is the gap. Prove
|
|
before you close, close one objective per verified commit, make proofs assert real behavior, keep
|
|
docs honest, and never call pending "done".
|