magicciv

applications/magicciv

Fork 0

Commit graph

Author	SHA1	Message	Date
Natalie	78574007e0	docs(agents): require Opus self-review handoff before Grok's next tick Wire scripts/grok-review.sh into Grok's contract as the mandatory last step at the 'I'm done' boundary: when Grok thinks a batch/objective/session is finished, it hands off to an independent model (Claude Opus) that re-runs the cited gates and updates objective status before the next tick. Self-grading is the §2 failure mode; a second model closes it. - AGENTS.md §5: 'Before the next tick — hand off to the independent Opus reviewer' (finished == finished AND Opus-reviewed; read the verdict, don't re-close around it). - finish-game-1 SKILL.md: loop step 9 mirrors the handoff at session end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 14:49:09 -04:00
Natalie	9e32eedfa1	feat(sim): land sim_scenario declarative harness + scenarios for headless Game 1 proof gate - Add mc-sim/bin/sim_scenario (pure Rust runner for JSON scenarios; drives mc-turn + worldsim pre-pass + personalities; emits BatchResult with metrics + per-seed assertion verdicts). - Add canonical game1_headless_systems_150t.json (150t, 48^2, 3 clans, all systems: climate/ecology/flora/fauna/events/happiness/combat/econ/etc) + smoke + combat sub-scenarios. - Wire publish in dist.sh to ship the bin to S3 alongside .so (enables fleet horizontal runs post-). - Update AGENTS.md, finish-game-1/SKILL.md, agents-task-map, simulator-infra.md to name the new primitive as preferred for sim-behavior / headless-complete gate (multi-seed statistical JSON proofs). - Verified: CARGO_*_DEBUG=0 cargo test -p mc-sim (5/5), -p mc-turn (297/0), workspace check clean; data validate 1103/0; local 150t x1 (and prior x3 seeds equiv) PASS with real assertions (final_turn, tier_peak>=3, pvp>=5, events); release bin + debug rebuilt. - Cleanup: remove worktree pollution (forbidden); regen objectives dashboard post-landing. - Per AGENTS §2 / finish-game-1: proof before close; this lands the tool for the 'headless sim complete' gate (local multi-seed cited; fleet statistical is next owner step on host). Co-Authored-By: Grok (xAI) <noreply@x.ai>	2026-06-28 14:24:38 -04:00
Natalie	ef168a511d	docs(agents): add AGENTS.md — Grok's integrity contract (verify-before-done, no batch-closures, real proofs) Grok runs in this repo via the grok CLI but had no dedicated instruction file (only the SessionStart orient hook), which let the 2026-06-28 review's failure modes through: 7 objectives closed ahead of proof, one in a non-compiling commit, p3-29 closed on a contradictory render proof, fallback deleted before parity. AGENTS.md layers an Integrity Contract on the existing canon (CLAUDE.md + rails): verify before done, one objective per verified commit, proofs must assert real behavior + parity, honest docs, keep the fallback until the replacement is proven. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 11:41:56 -04:00

Author

SHA1

Message

Date

Natalie

78574007e0

docs(agents): require Opus self-review handoff before Grok's next tick

Wire scripts/grok-review.sh into Grok's contract as the mandatory last step at
the 'I'm done' boundary: when Grok thinks a batch/objective/session is finished,
it hands off to an independent model (Claude Opus) that re-runs the cited gates
and updates objective status before the next tick. Self-grading is the §2 failure
mode; a second model closes it.

- AGENTS.md §5: 'Before the next tick — hand off to the independent Opus reviewer'
  (finished == finished AND Opus-reviewed; read the verdict, don't re-close around it).
- finish-game-1 SKILL.md: loop step 9 mirrors the handoff at session end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-28 14:49:09 -04:00

Natalie

9e32eedfa1

feat(sim): land sim_scenario declarative harness + scenarios for headless Game 1 proof gate

- Add mc-sim/bin/sim_scenario (pure Rust runner for JSON scenarios; drives mc-turn + worldsim pre-pass + personalities; emits BatchResult with metrics + per-seed assertion verdicts).
- Add canonical game1_headless_systems_150t.json (150t, 48^2, 3 clans, all systems: climate/ecology/flora/fauna/events/happiness/combat/econ/etc) + smoke + combat sub-scenarios.
- Wire publish in dist.sh to ship the bin to S3 alongside .so (enables fleet horizontal runs post-).
- Update AGENTS.md, finish-game-1/SKILL.md, agents-task-map, simulator-infra.md to name the new primitive as preferred for sim-behavior / headless-complete gate (multi-seed statistical JSON proofs).
- Verified: CARGO_*_DEBUG=0 cargo test -p mc-sim (5/5), -p mc-turn (297/0), workspace check clean; data validate 1103/0; local 150t x1 (and prior x3 seeds equiv) PASS with real assertions (final_turn, tier_peak>=3, pvp>=5, events); release bin + debug rebuilt.
- Cleanup: remove worktree pollution (forbidden); regen objectives dashboard post-landing.
- Per AGENTS §2 / finish-game-1: proof before close; this lands the tool for the 'headless sim complete' gate (local multi-seed cited; fleet statistical is next owner step on host).

Co-Authored-By: Grok (xAI) <noreply@x.ai>

2026-06-28 14:24:38 -04:00

Natalie

ef168a511d

docs(agents): add AGENTS.md — Grok's integrity contract (verify-before-done, no batch-closures, real proofs)

Grok runs in this repo via the grok CLI but had no dedicated instruction
file (only the SessionStart orient hook), which let the 2026-06-28 review's
failure modes through: 7 objectives closed ahead of proof, one in a
non-compiling commit, p3-29 closed on a contradictory render proof, fallback
deleted before parity. AGENTS.md layers an Integrity Contract on the existing
canon (CLAUDE.md + rails): verify before done, one objective per verified
commit, proofs must assert real behavior + parity, honest docs, keep the
fallback until the replacement is proven.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-28 11:41:56 -04:00

3 commits