Wire scripts/grok-review.sh into Grok's contract as the mandatory last step at
the 'I'm done' boundary: when Grok thinks a batch/objective/session is finished,
it hands off to an independent model (Claude Opus) that re-runs the cited gates
and updates objective status before the next tick. Self-grading is the §2 failure
mode; a second model closes it.
- AGENTS.md §5: 'Before the next tick — hand off to the independent Opus reviewer'
(finished == finished AND Opus-reviewed; read the verdict, don't re-close around it).
- finish-game-1 SKILL.md: loop step 9 mirrors the handoff at session end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grok runs in this repo via the grok CLI but had no dedicated instruction
file (only the SessionStart orient hook), which let the 2026-06-28 review's
failure modes through: 7 objectives closed ahead of proof, one in a
non-compiling commit, p3-29 closed on a contradictory render proof, fallback
deleted before parity. AGENTS.md layers an Integrity Contract on the existing
canon (CLAUDE.md + rails): verify before done, one objective per verified
commit, proofs must assert real behavior + parity, honest docs, keep the
fallback until the replacement is proven.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>