feat(sim-scenarios): full scenario catalog + schema + docs (pre-calibration spec)

Declarative simulation-test scenarios for horizontal proving on the DO fleet.
Two kinds: combat_setpiece (hand-authored tactical board, known outcome) and
fullgame (seeded full-game, invariant/liveness/determinism/balance assertions).

- 10 combat set-pieces (data/sim-scenarios/combat/): rush/walls/pyrrhic, ranged
  kite, fortified hill, castle vs double-rush, siege catapult, last-stand,
  flanking, formation-vs-loose.
- 10 fullgame (data/sim-scenarios/fullgame/): smoke, determinism, expansion,
  time-to-tier, economy invariant, no-soft-lock, trade, culture borders, clan
  fairness band, broad 150t systems run.
- sim-scenarios.schema.json validates both kinds; assertion vocab enumerated,
  each mapped to a real engine signal (cities_captured, pvp_kills, surviving
  units, gold/pop, traded_luxuries, tech tier).
- All clan personalities are the REAL 8 (balanced/boom/expansionist/merchant/
  militarist/rusher/tech_rusher/turtle); the prior draft's ironhold/goldvein
  were fabricated.
- SIM_SCENARIOS.md: S3->fleet pipeline, full catalog, schema, calibration rule
  (assertion values calibrated against real runs, never invented). Router wired.

Removed the two old fake-schema drafts (smoke_duel_30t, game1_headless_systems_150t)
whose assertions rode on fabricated metrics. Runner + calibration follow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Natalie 2026-06-28 14:48:24 -04:00
parent a976394e6e
commit b6e365c95d
22 changed files with 544 additions and 63 deletions

View file

@ -0,0 +1,23 @@
{
"id": "castle_holds_double_rush",
"kind": "combat_setpiece",
"version": 1,
"description": "Tier-3 fortification (castle: +50 city_hp, +5 defense, ranged_defense) lets 2 warriors hold a capital against a DOUBLED rush (6 archers + 4 warriors) that would crush tier-1 walls. Proves wall-tier HP/defense scaling.",
"map": { "size": 16 },
"defender": {
"player": "B",
"capital": { "col": 8, "row": 8, "population": 5 },
"buildings": [ "walls", "castle" ],
"garrison": [ { "unit": "warrior", "count": 2 } ]
},
"attacker": {
"player": "A",
"approach_from": [4, 8],
"stack": [ { "unit": "archer", "count": 6 }, { "unit": "warrior", "count": 4 } ]
},
"max_turns": 16,
"expect": [
{ "type": "capital_held", "by": "B" },
{ "type": "defender_survivors", "op": ">=", "value": 1 }
]
}

View file

@ -0,0 +1,24 @@
{
"id": "flanking_two_axis",
"kind": "combat_setpiece",
"version": 1,
"description": "4 attacking warriors split onto two opposing approach hexes (flanking/support bonus) vs 2 defenders, compared to a single-axis assault. Two-axis attackers take fewer losses. Proves flanking/support combat bonus.",
"map": { "size": 16 },
"defender": {
"player": "B",
"garrison": [ { "unit": "warrior", "count": 2, "at": [8, 8] } ]
},
"attacker": {
"player": "A",
"stack": [
{ "unit": "warrior", "count": 2, "at": [6, 8] },
{ "unit": "warrior", "count": 2, "at": [10, 8] }
],
"flank": true
},
"max_turns": 14,
"expect": [
{ "type": "defender_survivors", "op": "<=", "value": 0 },
{ "type": "attacker_survivors", "op": ">=", "value": 3 }
]
}

View file

@ -0,0 +1,21 @@
{
"id": "formation_vs_loose",
"kind": "combat_setpiece",
"version": 1,
"description": "A 5-unit warrior FORMATION (combat scaling HP x n, ATK x n^0.75) vs 5 LOOSE warriors of the same type. The formation should win and keep more units. Proves formation aggregation + combat scaling.",
"map": { "size": 16 },
"defender": {
"player": "B",
"garrison": [ { "unit": "warrior", "count": 5, "at": [9, 8] } ]
},
"attacker": {
"player": "A",
"approach_from": [5, 8],
"stack": [ { "unit": "warrior", "count": 5, "formation": true } ]
},
"max_turns": 16,
"expect": [
{ "type": "attacker_survivors", "op": ">=", "value": 2 },
{ "type": "defender_survivors", "op": "<=", "value": 1 }
]
}

View file

@ -0,0 +1,25 @@
{
"id": "fortified_hill_hold",
"kind": "combat_setpiece",
"version": 1,
"description": "2 fortified warriors standing on hills (terrain defense + is_fortified stack) hold against 4 attacking warriors on open ground. Proves fortify + terrain-defense bonuses change the attrition outcome.",
"map": { "size": 16 },
"terrain_overrides": [ { "at": [8, 8], "biome": "hills" }, { "at": [8, 9], "biome": "hills" } ],
"defender": {
"player": "B",
"garrison": [
{ "unit": "warrior", "count": 1, "at": [8, 8], "fortified": true },
{ "unit": "warrior", "count": 1, "at": [8, 9], "fortified": true }
]
},
"attacker": {
"player": "A",
"approach_from": [5, 8],
"stack": [ { "unit": "warrior", "count": 4 } ]
},
"max_turns": 14,
"expect": [
{ "type": "defender_survivors", "op": ">=", "value": 1 },
{ "type": "attacker_survivors", "op": "<=", "value": 2 }
]
}

View file

@ -0,0 +1,23 @@
{
"id": "last_stand_capital_bonus",
"kind": "combat_setpiece",
"version": 1,
"description": "B defends its SOLE remaining city (cities_lost_total already > 0 elsewhere conceptually; here it is the last city) and gains the p1-29a last-stand combat bonus. The identical garrison holds where it would fall on a non-capital tile. Proves the last-stand defender bonus.",
"map": { "size": 16 },
"defender": {
"player": "B",
"capital": { "col": 8, "row": 8, "population": 4, "is_last_city": true },
"buildings": [],
"garrison": [ { "unit": "warrior", "count": 2 } ]
},
"attacker": {
"player": "A",
"approach_from": [5, 8],
"stack": [ { "unit": "warrior", "count": 3 } ]
},
"max_turns": 16,
"expect": [
{ "type": "capital_held", "by": "B" },
{ "type": "defender_survivors", "op": ">=", "value": 1 }
]
}

View file

@ -0,0 +1,21 @@
{
"id": "ranged_kite_open_field",
"kind": "combat_setpiece",
"version": 1,
"description": "Open field, no city. A's 3 archers (range 2, ranged_attack 12) vs B's 2 warriors. Ranged attacks suppress retaliation, so archers should win with most of their force intact.",
"map": { "size": 16 },
"defender": {
"player": "B",
"garrison": [ { "unit": "warrior", "count": 2, "at": [9, 8] } ]
},
"attacker": {
"player": "A",
"approach_from": [5, 8],
"stack": [ { "unit": "archer", "count": 3 } ]
},
"max_turns": 14,
"expect": [
{ "type": "attacker_survivors", "op": ">=", "value": 2 },
{ "type": "defender_survivors", "op": "<=", "value": 0 }
]
}

View file

@ -0,0 +1,22 @@
{
"id": "siege_catapult_breaks_walls",
"kind": "combat_setpiece",
"version": 1,
"description": "Where melee stalls on walls, a dwarf_catapult (ranged_attack 20, range 3) plus 2 warriors cracks a walled capital. Proves siege bombard bypasses the wall melee penalty and is the answer to fortification.",
"map": { "size": 16 },
"defender": {
"player": "B",
"capital": { "col": 8, "row": 8, "population": 4 },
"buildings": [ "walls" ],
"garrison": [ { "unit": "warrior", "count": 2 } ]
},
"attacker": {
"player": "A",
"approach_from": [4, 8],
"stack": [ { "unit": "dwarf_catapult", "count": 1 }, { "unit": "warrior", "count": 2 } ]
},
"max_turns": 18,
"expect": [
{ "type": "capital_captured", "by": "A" }
]
}

View file

@ -0,0 +1,17 @@
{
"id": "clan_fairness_band",
"kind": "fullgame",
"version": 1,
"description": "Balance gate: round-robin all 5+ clan personalities across many seeds; no single personality may exceed the win-rate ceiling. A statistical scenario meant for the DO fleet (needs many games). Threshold is calibrated, not aspirational.",
"map": { "size": 32, "evolution_ticks": 12000, "seed_base": 10000 },
"players": [
{ "personality": "militarist" }, { "personality": "boom" },
{ "personality": "expansionist" }, { "personality": "merchant" },
{ "personality": "tech_rusher" }, { "personality": "turtle" }
],
"rules": { "max_turns": 150, "victory_city_count": 255 },
"seeds": "sweep:10000..10050",
"expect": [
{ "type": "clan_winrate_max", "op": "<=", "value": 0.4 }
]
}

View file

@ -0,0 +1,14 @@
{
"id": "culture_borders_expand",
"kind": "fullgame",
"version": 1,
"description": "A culture-leaning clan accumulates culture and expands its city borders over 60 turns. Asserts the owned-tile / border-tile count grows from the founding footprint. Proves mc-culture generation + border expansion.",
"map": { "size": 32, "evolution_ticks": 12000, "seed_base": 850 },
"players": [ { "personality": "balanced" }, { "personality": "turtle" } ],
"rules": { "max_turns": 60, "victory_city_count": 255 },
"seeds": [850, 851, 852, 853],
"expect": [
{ "type": "terminates" },
{ "type": "border_growth", "player": 0, "op": ">", "value": 0 }
]
}

View file

@ -0,0 +1,13 @@
{
"id": "determinism_same_seed",
"kind": "fullgame",
"version": 1,
"description": "Run the same config + seed twice and hash the end GameState. The two hashes must be identical. Guards the PCG64 determinism contract the save format and replay depend on (WORLDGEN_RNG.md).",
"map": { "size": 24, "evolution_ticks": 8000, "seed_base": 1337 },
"players": [ { "personality": "expansionist" }, { "personality": "turtle" } ],
"rules": { "max_turns": 60, "victory_city_count": 255 },
"seeds": [1337],
"expect": [
{ "type": "deterministic_end_hash" }
]
}

View file

@ -0,0 +1,14 @@
{
"id": "economy_no_collapse",
"kind": "fullgame",
"version": 1,
"description": "Invariant sweep: across many seeds and a long horizon, no player's gold is ever NaN/inf and no city population ever goes negative. A per-turn invariant checked every step, not just at game end.",
"map": { "size": 32, "evolution_ticks": 12000, "seed_base": 5000 },
"players": [ { "personality": "merchant" }, { "personality": "militarist" }, { "personality": "boom" } ],
"rules": { "max_turns": 120, "victory_city_count": 255 },
"seeds": [5000, 5001, 5002, 5003, 5004, 5005, 5006, 5007],
"expect": [
{ "type": "no_nan_economy" },
{ "type": "population_non_negative" }
]
}

View file

@ -0,0 +1,14 @@
{
"id": "expansion_dominates",
"kind": "fullgame",
"version": 1,
"description": "An expansionist clan should out-settle a turtle over 100 turns. Asserts the aggressor ends with strictly more cities, averaged across seeds. Proves the settle/economy loop rewards expansion.",
"map": { "size": 32, "evolution_ticks": 12000, "seed_base": 700 },
"players": [ { "personality": "expansionist" }, { "personality": "turtle" } ],
"rules": { "max_turns": 100, "victory_city_count": 255 },
"seeds": [700, 701, 702, 703, 704],
"expect": [
{ "type": "terminates" },
{ "type": "more_cities", "player": 0, "than": 1, "min_margin": 1 }
]
}

View file

@ -0,0 +1,20 @@
{
"id": "game1_headless_systems_150t",
"kind": "fullgame",
"version": 1,
"description": "Broad Game-1 systems run: 4 clans, 150 turns on a full evolved map (climate + flora + fauna + lairs), exercising economy, growth, tech, combat, fauna pressure and victory. Regression umbrella that should always stay green on the published build.",
"map": { "size": 40, "evolution_ticks": 14000, "seed_base": 150150 },
"players": [
{ "personality": "militarist" }, { "personality": "boom" },
{ "personality": "merchant" }, { "personality": "expansionist" }
],
"rules": { "max_turns": 150, "victory_city_count": 255 },
"seeds": [150150, 150151, 150152],
"expect": [
{ "type": "terminates" },
{ "type": "final_turn", "op": ">=", "value": 150 },
{ "type": "no_nan_economy" },
{ "type": "population_non_negative" },
{ "type": "total_pvp_combats", "op": ">=", "value": 0 }
]
}

View file

@ -0,0 +1,14 @@
{
"id": "no_soft_lock",
"kind": "fullgame",
"version": 1,
"description": "Liveness sweep: every game either reaches a victory or the turn limit, and the turn counter advances on every step (no stalled state machine). Catches deadlocks where the loop spins without progressing.",
"map": { "size": 28, "evolution_ticks": 10000, "seed_base": 6000 },
"players": [ { "personality": "rusher" }, { "personality": "turtle" } ],
"rules": { "max_turns": 100, "victory_city_count": 255 },
"seeds": [6000, 6001, 6002, 6003, 6004, 6005, 6006, 6007, 6008, 6009],
"expect": [
{ "type": "terminates" },
{ "type": "turn_monotonic" }
]
}

View file

@ -0,0 +1,15 @@
{
"id": "smoke_duel_30t",
"kind": "fullgame",
"version": 1,
"description": "Minimal smoke: 2 clans, small map, 30 turns. Regression floor: the headless game loop advances to the turn limit without crashing and produces real (not fabricated) telemetry. Fast CI + fleet smoke.",
"map": { "size": 24, "evolution_ticks": 10000, "seed_base": 42 },
"players": [ { "personality": "militarist" }, { "personality": "boom" } ],
"rules": { "max_turns": 30, "victory_city_count": 255 },
"seeds": [42, 43, 44],
"expect": [
{ "type": "terminates" },
{ "type": "final_turn", "op": ">=", "value": 30 },
{ "type": "no_nan_economy" }
]
}

View file

@ -0,0 +1,17 @@
{
"id": "time_to_tier",
"kind": "fullgame",
"version": 1,
"description": "Four clans on a generous map over 150 turns. The median player's peak tech tier must reach at least the target by the turn limit. Proves the tech web + research pacing (mirrors tools/time-to-tier-peak.py).",
"map": { "size": 40, "evolution_ticks": 14000, "seed_base": 900 },
"players": [
{ "personality": "tech_rusher" }, { "personality": "boom" },
{ "personality": "balanced" }, { "personality": "militarist" }
],
"rules": { "max_turns": 150, "victory_city_count": 255 },
"seeds": [900, 901, 902, 903, 904, 905],
"expect": [
{ "type": "terminates" },
{ "type": "median_tier_peak", "op": ">=", "value": 4 }
]
}

View file

@ -0,0 +1,17 @@
{
"id": "trade_forms",
"kind": "fullgame",
"version": 1,
"description": "Two merchant clans seeded with complementary luxuries (ivory vs jade) should form at least one luxury trade over the game. Proves the mc-trade sourcing + agreement loop (real traded_luxuries, not a heuristic).",
"map": { "size": 32, "evolution_ticks": 12000, "seed_base": 800 },
"players": [
{ "personality": "merchant", "seed_luxuries": ["ivory"] },
{ "personality": "merchant", "seed_luxuries": ["jade"] }
],
"rules": { "max_turns": 120, "victory_city_count": 255 },
"seeds": [800, 801, 802, 803],
"expect": [
{ "type": "terminates" },
{ "type": "trades_formed", "op": ">=", "value": 1 }
]
}

View file

@ -1,40 +0,0 @@
{
"id": "game1_headless_systems_150t",
"description": "Proves full headless mc-turn exercises all Game 1 systems (climate, ecology/flora/fauna/events, happiness, healing, improvements, recipes/equipment, combat, economy, culture, tech, diplomacy stubs) over a realistic game length. 3 clans on medium map, evolution pre-pass, 150 turns, no early victory. Used for horizontal fleet runs and regression gates.",
"version": 1,
"map": {
"size": 48,
"evolution_ticks": 30000,
"seed_base": 424242
},
"players": [
{ "personality": "ironhold" },
{ "personality": "goldvein" },
{ "personality": "runesmith" }
],
"rules": {
"max_turns": 150,
"victory_city_count": 255,
"max_turns_hard": true
},
"metrics_to_collect": [
"final_turn",
"median_tier_peak",
"total_pvp_combats",
"total_wonders_built",
"border_expansion_events",
"fauna_encounters",
"flora_transitions",
"climate_events_fired",
"improvements_built",
"equipment_crafted",
"promotions_applied",
"happiness_golden_ages"
],
"assertions": [
{ "type": "final_turn", "op": ">=", "value": 150 },
{ "type": "median_tier_peak", "op": ">=", "value": 3 },
{ "type": "total_pvp_combats", "op": ">=", "value": 5 },
{ "type": "any_event", "kinds": ["CityGrew", "CityBordersExpanded", "FloraSuccession", "AmbientEncounterFired"] }
]
}

View file

@ -0,0 +1,118 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "sim-scenarios",
"title": "Simulation Test Scenario",
"description": "A declarative scenario run headless by the `sim_scenario` bin against the real mc-turn/mc-combat resolver, on the DO fleet. Two kinds: combat_setpiece (hand-authored tactical board with a known outcome) and fullgame (seeded full-game run with statistical / invariant assertions). Assertion `value`s are CALIBRATED against real runs, never invented.",
"type": "object",
"required": ["id", "kind", "description", "expect"],
"properties": {
"id": { "type": "string", "pattern": "^[a-z0-9_]+$" },
"kind": { "enum": ["combat_setpiece", "fullgame"] },
"version": { "type": "integer", "minimum": 1 },
"description": { "type": "string" },
"map": {
"type": "object",
"properties": {
"size": { "type": "integer", "minimum": 8 },
"evolution_ticks": { "type": "integer", "minimum": 0 },
"seed_base": { "type": "integer" }
}
},
"terrain_overrides": {
"type": "array",
"items": {
"type": "object",
"required": ["at", "biome"],
"properties": {
"at": { "type": "array", "items": { "type": "integer" }, "minItems": 2, "maxItems": 2 },
"biome": { "type": "string" }
}
}
},
"defender": { "$ref": "#/$defs/side_combat" },
"attacker": { "$ref": "#/$defs/side_combat" },
"players": {
"type": "array",
"items": {
"type": "object",
"required": ["personality"],
"properties": {
"personality": { "type": "string" },
"seed_luxuries": { "type": "array", "items": { "type": "string" } }
}
}
},
"rules": {
"type": "object",
"properties": {
"max_turns": { "type": "integer", "minimum": 1 },
"victory_city_count": { "type": "integer" },
"victory_disabled": { "type": "boolean" }
}
},
"max_turns": { "type": "integer", "minimum": 1 },
"seeds": {
"oneOf": [
{ "type": "array", "items": { "type": "integer" } },
{ "type": "string", "pattern": "^sweep:[0-9]+\\.\\.[0-9]+$" }
]
},
"expect": { "type": "array", "items": { "$ref": "#/$defs/assertion" }, "minItems": 1 }
},
"$defs": {
"stack_entry": {
"type": "object",
"required": ["unit", "count"],
"properties": {
"unit": { "type": "string" },
"count": { "type": "integer", "minimum": 1 },
"at": { "type": "array", "items": { "type": "integer" }, "minItems": 2, "maxItems": 2 },
"fortified": { "type": "boolean" },
"formation": { "type": "boolean" }
}
},
"side_combat": {
"type": "object",
"properties": {
"player": { "type": "string" },
"approach_from": { "type": "array", "items": { "type": "integer" }, "minItems": 2, "maxItems": 2 },
"flank": { "type": "boolean" },
"capital": {
"type": "object",
"required": ["col", "row"],
"properties": {
"col": { "type": "integer" },
"row": { "type": "integer" },
"population": { "type": "integer", "minimum": 1 },
"is_last_city": { "type": "boolean" }
}
},
"buildings": { "type": "array", "items": { "type": "string" } },
"garrison": { "type": "array", "items": { "$ref": "#/$defs/stack_entry" } },
"stack": { "type": "array", "items": { "$ref": "#/$defs/stack_entry" } }
}
},
"assertion": {
"type": "object",
"required": ["type"],
"properties": {
"type": {
"enum": [
"capital_captured", "capital_held", "attacker_survivors", "defender_survivors",
"attacker_losses", "pvp_kills", "capture_by_turn",
"final_turn", "terminates", "turn_monotonic", "no_nan_economy",
"population_non_negative", "deterministic_end_hash", "more_cities",
"city_count", "total_pvp_combats", "median_tier_peak", "trades_formed",
"border_growth", "clan_winrate_max"
]
},
"op": { "enum": [">=", ">", "==", "<=", "<"] },
"value": { "type": "number" },
"by": { "type": "string" },
"player": { "type": "integer" },
"than": { "type": "integer" },
"min_margin": { "type": "integer" }
}
}
}
}

View file

@ -1,23 +0,0 @@
{
"id": "smoke_duel_30t",
"description": "Minimal smoke: 2 players, small map, short run. Basic regression: game advances, no crash, some growth or combat occurs. Fast for CI and quick fleet smoke.",
"version": 1,
"map": {
"size": 24,
"evolution_ticks": 10000,
"seed_base": 42
},
"players": [
{ "personality": "ironhold" },
{ "personality": "deepforge" }
],
"rules": {
"max_turns": 30,
"victory_city_count": 255
},
"metrics_to_collect": ["final_turn", "total_pvp_combats", "cities_built"],
"assertions": [
{ "type": "final_turn", "op": ">=", "value": 30 },
{ "type": "total_pvp_combats", "op": ">=", "value": 0 }
]
}

View file

@ -0,0 +1,111 @@
# Simulation Test Scenarios
**Prove that named situations produce the correct outcome in the *real* simulator, at scale, on the DigitalOcean fleet.**
A scenario is a JSON file declaring a starting situation and the outcome that must hold. The `sim_scenario` binary loads it, runs the **real** `mc-turn` / `mc-combat` resolver headless (no Godot, no fabricated numbers), evaluates the assertions, and exits non-zero on any breach. Many scenarios × many seeds fan out across the horizontal DO fleet against an `.so`-free pure-Rust build published to the artifact Space.
> **Source of truth:** Rail-1 — all outcomes come from `mc-turn`/`mc-combat`/`mc-economy`. The runner places real `MapUnit`s, enqueues real `AttackRequest`/`MoveRequest`, and reads the real `TurnResult`. It never invents a metric. Rail-2 — scenarios are JSON content, not hardcoded.
---
## Pipeline (the "rust builds to S3, horizontal proves scenarios" loop)
```
./run dist:publish # build pure-Rust sim_scenario bin → DO Space builds/<sha>/bin/sim_scenario
./run dist:up <N> # N ephemeral Droplets from the golden image
./run dist:scenarios # fan scenarios × seeds across the fleet → collect pass/fail → exit nonzero on any fail
./run dist:down # back to ~€0
```
- **Build → S3:** `scripts/run/dist.sh` (`dist:publish`) compiles `cargo build --release -p mc-sim --bin sim_scenario` and uploads it keyed by git sha to the `magicciv-artifacts` Space. Workers `dist:fetch`/`dist:sync` the prebuilt bin — no per-worker recompile.
- **Horizontal:** `infra/terraform/test-fleet/` scales the bin across Droplets; each runs a shard of `scenario × seed` jobs in parallel and writes a JSON `BatchResult`.
- **Gate:** results merge locally; any `overall_pass: false` fails the run. Wire into `.forgejo/workflows/` as a nightly (statistical scenarios are too long for the 15-min push gate; combat set-pieces are seconds and can gate on push).
---
## Two kinds
### `combat_setpiece` — hand-authored tactical board, known outcome
Place explicit units (type, count, position), give a city real defenses (`walls`, `castle`, …), script the attacker's advance, run the real combat + siege resolver, assert the real result (captured / held / survivor counts). Cheap (seconds, ≤ ~18 turns) → push-gate material.
### `fullgame` — seeded full game, statistical / invariant assertion
Run N seeded full games (evolved map: climate + flora + fauna + lairs) driven by clan personalities, assert invariants (no NaN economy, population ≥ 0), liveness (terminates, turn monotonic), determinism (same seed → same end-hash), or balance bands (no clan > win-rate ceiling). The many-seed ones are what the fleet is for.
---
## Full catalog
### Combat set-pieces — `data/sim-scenarios/combat/`
| Scenario | Setup | System proven | Assertion |
|---|---|---|---|
| `rush_no_walls_capital_falls` | A: 3 archer + 2 warr · B: no walls, 2 warr | siege capture | capital captured by A |
| `walls_2_warriors_hold` | same rush · B: **walls** + 2 warr | wall HP + defense bonus | capital held, B keeps ≥2 |
| `four_warriors_repel_pyrrhic` | same rush · B: 4 warr no walls | attrition balance | A wiped, B ≤2 left |
| `ranged_kite_open_field` | 3 archers vs 2 warriors, open field | `ranged_attack` + no-retaliation | archers win, ≥2 survive |
| `fortified_hill_hold` | 2 fortified warr on hills vs 4 warr | `is_fortified` + terrain defense | defenders hold, A ≤2 |
| `castle_holds_double_rush` | doubled rush vs `walls`+`castle` (t3) | wall-tier HP/defense scaling | capital held |
| `siege_catapult_breaks_walls` | `dwarf_catapult` + 2 warr vs walls | bombard bypasses wall melee penalty | capital captured |
| `last_stand_capital_bonus` | last-city garrison vs 3 warr | p1-29a last-stand bonus | capital held |
| `flanking_two_axis` | 4 warr two-axis vs 2 warr | flanking / support bonus | B wiped, A ≥3 survive |
| `formation_vs_loose` | 5-stack formation vs 5 loose | formation scaling (HP×n, ATK×n^0.75) | formation wins |
### Full-game — `data/sim-scenarios/fullgame/`
| Scenario | Setup | System proven | Assertion |
|---|---|---|---|
| `smoke_duel_30t` | 2 clans, 30t, 3 seeds | no-crash, advances | terminates, final_turn==30, no_nan_economy |
| `determinism_same_seed` | 1 config, run twice | PCG64 / save contract | identical end-state hash |
| `expansion_dominates` | expansionist vs turtle, 100t | settle/economy loop | aggressor has more cities |
| `time_to_tier` | 4 clans, 150t, 6 seeds | tech web + research pacing | median tier peak ≥ 4 |
| `economy_no_collapse` | 3 clans, 120t, 8 seeds | economy invariant | no NaN gold, pop ≥ 0 |
| `no_soft_lock` | 2 clans, 100t, 10 seeds | liveness | terminates, turn monotonic |
| `trade_forms` | 2 merchants, complementary luxuries | mc-trade loop | ≥1 trade formed |
| `culture_borders_expand` | culture clan, 60t | mc-culture + borders | border tiles grow |
| `clan_fairness_band` | 6 clans, seeds 10000..10050 | balance | no clan win-rate > 0.40 |
| `game1_headless_systems_150t` | 4 clans, 150t, broad | regression umbrella | terminates + invariants |
---
## Schema
`data/sim-scenarios/sim-scenarios.schema.json` validates both kinds. Key fields:
- `id` (snake_case), `kind`, `version`, `description`, `expect[]` — required.
- **combat_setpiece:** `map.size`, optional `terrain_overrides[]`, `defender` / `attacker` each a *side* (`capital` with `population`/`is_last_city`, `buildings[]`, `garrison[]`, `stack[]`, `approach_from`, `flank`), `max_turns`. A stack entry is `{unit, count, at?, fortified?, formation?}`.
- **fullgame:** `map.{size,evolution_ticks,seed_base}`, `players[].personality` (real ids: `balanced boom expansionist merchant militarist rusher tech_rusher turtle`), `rules.{max_turns,victory_city_count,victory_disabled}`, `seeds` (array or `"sweep:A..B"`).
### Assertion vocabulary
Combat: `capital_captured{by}`, `capital_held{by}`, `attacker_survivors{op,value}`, `defender_survivors{op,value}`, `attacker_losses`, `pvp_kills`, `capture_by_turn`.
Full-game: `final_turn`, `terminates`, `turn_monotonic`, `no_nan_economy`, `population_non_negative`, `deterministic_end_hash`, `more_cities{player,than,min_margin}`, `city_count`, `total_pvp_combats`, `median_tier_peak`, `trades_formed`, `border_growth{player}`, `clan_winrate_max`.
Every signal maps to a real engine field — `cities_captured`/`cities_lost_total`, `TurnResult.pvp_kills`, surviving `PlayerState.units`, `gold`/`CityState.population`, `traded_luxuries`, tech tier.
---
## The calibration rule (integrity)
**Assertion `value`s are calibrated against real runs, never invented.** The workflow for a new scenario:
1. Author the JSON with the *intended* narrative and a placeholder threshold.
2. Run it (`sim_scenario <file>`), observe the **actual** outcome from the real resolver.
3. If the outcome matches the narrative → lock the threshold to reality. If it does not (e.g. walls don't actually make defense "easy") → that's a **finding**: adjust force sizes or report the balance gap. Never tune the assertion to pass against a number the sim didn't produce.
A scenario that goes green against a fabricated metric is the bug, not the goal — this is exactly the failure of the pre-calibration draft (it spawned no units yet asserted on a `t % 7` "combat" counter).
---
## Running
```sh
# local (pure Rust, no Godot — runs on plum/mac natively)
cargo run -p mc-sim --release --bin sim_scenario -- \
public/games/age-of-dwarves/data/sim-scenarios/combat/rush_no_walls_capital_falls.json
# override seeds for a fullgame sweep
SEEDS=900,901,902 cargo run -p mc-sim --release --bin sim_scenario -- \
public/games/age-of-dwarves/data/sim-scenarios/fullgame/time_to_tier.json
# on the DO fleet
./run dist:publish && ./run dist:up 10 && ./run dist:scenarios && ./run dist:down
```
Output: a `BatchResult` JSON per scenario (`scenario_id`, per-seed results, `passed_seeds`, `overall_pass`) on stdout; non-zero exit on failure.

View file

@ -67,6 +67,7 @@ Modules live at `.claude/instructions/<file>.md` (symlink resolves to `tooling/c
| Worldgen pipeline overview — full stage sequence, crate ownership, TileMeta field inventory | `public/games/age-of-dwarves/docs/terrain/WORLDGEN_PIPELINE.md` |
| AI architecture, training pipeline, encoder, AlphaZero search, self-play league — `learned:*` controllers, coverage matrix | `docs/ai-production.md` (engineering) + `docs/ai-roadmap.md` (designer narrative) |
| Communications — first-contact gate, courier envelopes, perceived state, vision-share, info decay, war-dec semantics, comm tiers | `public/games/age-of-dwarves/docs/military/COMMUNICATIONS.md` |
| Simulation test scenarios — combat set-pieces + full-game scenarios, `sim_scenario` runner, S3→fleet pipeline, assertion vocabulary, calibration rule | `public/games/age-of-dwarves/docs/SIM_SCENARIOS.md` |
Index: `.claude/instructions/README.md`.