docs(simulation-report): 📝 Add detailed log entry for iteration 7o documenting parallel dev tracks, team composition, and scope for kill-rate audit, unit JSON porting, SaveManager port, and combat Rust resurrection

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Claude Code 2026-04-08 22:37:09 -07:00
parent ea704c7875
commit fc41d0f217

View file

@ -4,6 +4,158 @@ Tracks every iteration of the balance/simulation loop. Newest entries on top. Ea
---
## Iteration 7o — 4-track carry-forward: set_base_kill_rate vindicated, combat Rust resurrected, SaveManager ported, 2 unit JSONs (2026-04-08, COMPLETE)
**Goal:** Burn down the iter 7n carry-forward in parallel — 4 independent tracks with disjoint file sets. The critical track (A) was the `set_base_kill_rate` no-op audit: if iter 7n's infra-triager's 8-vs-8 measurement was right, the entire Phase 7 balance work and the planned CMA-ES sweep were tuning a phantom knob. That investigation drove track ordering.
### Team composition
| Track | Owner | Specialist | Scope |
|---|---|---|---|
| A | kill-rate-auditor | simulator-infra | **CRITICAL** — trace `set_base_kill_rate` plumbing, fix or vindicate, lock with regression test |
| B | unit-porter | game-data | Port 2 unit JSONs (melee + flying) to unblock 2 pending keyword_handler tests |
| C | save-porter | godot-engine | Full SaveManager port (3-line stub → real save/load + test suite) |
| D | combat-resurrector | combat-dev | Rust-side only: export mc-combat modules + add GdCombatResolver skeleton |
All 4 tracks spawned in parallel, no file overlap, no coordination needed.
### Track A — `set_base_kill_rate` VINDICATED (not a no-op)
**Finding: the knob works. Phase 7 balance + CMA-ES plans are safe. Prior bench results stand.**
Full end-to-end trace from kill-rate-auditor:
- `src/simulator/api-gdext/src/lib.rs:1912``set_base_kill_rate(v)` writes `self.inner.lair_combat_config.base_kill_rate = v`
- `src/simulator/crates/mc-turn/src/processor.rs:44``LairCombatConfig.base_kill_rate: f32` field
- `src/simulator/crates/mc-turn/src/processor.rs:117``kill_probability(tier, fortified) = (base_kill_rate + tier_kill_slope * tier_term).clamp(0.0, 0.95)` reads it every roll
- `src/simulator/crates/mc-turn/src/processor.rs:452``process_fauna_encounters_inner` calls `cfg.kill_probability(tier, fortified)` and compares `kill_x > kill_p` for the kill decision
- `step` and `step_encounters_only` on `GdTurnProcessor` both delegate to the same `inner`, same config, persistent across calls
**Why iter 7n saw 8-vs-8 (the real explanation)**: the iter 7e scenario uses `encounter_probability_per_turn = 0.04` (only 4% of unit×lair visits even reach the kill-roll step), and `tier_kill_slope = 0.00875, tier_kill_exponent = 2.0` (T5 lairs already contribute `0.00875 × 16 = 0.14` to `kill_p` from the slope term alone before base_kill_rate is added).
So for T5 lairs: `base=0.0 → kill_p=0.14`, `base=0.5 → kill_p=0.64`. Crucially, **changing only `base_kill_rate` does NOT change the RNG stream** — the same two `rand_unit()` values are consumed per encounter. A kill decision only flips when `kill_x ∈ (0.14, 0.64]`. Across ~2 lairs × ~6 units × 20 turns × 0.04 encounter gate ≈ ~10 kill rolls in iter 7e's sparse scenario, it is entirely plausible for zero of those specific pre-rolled kill_x values to land in that interval — the RNG trajectory is fixed, both runs end at exactly the same 8 deaths purely by coincidence of which pre-rolled samples exist in the narrow probability-delta band.
**This is a measurement methodology failure, not a broken knob.**
### Track A regression test
New test `bridge_contract_tests::base_kill_rate_setter_is_not_a_noop` at `src/simulator/crates/mc-turn/src/bridge_contract_tests.rs` (file grew 465 → 602 lines):
- **Primary assertions** (forced high-pressure scenario): `encounter_probability_per_turn = 1.0`, `tier_kill_slope = 0.0`, `tier_kill_exponent = 1.0` — so `kill_p == base_kill_rate` exactly
- `base=0.0` → asserts 0 deaths
- `base=0.9` → asserts ≥3 deaths
- `high > zero` asserted
- **Iter 7n replay** (informational `eprintln!`): default scenario with 24×24 grid, 2 T5 lairs, 6 units, 20 turns — produced "0.0 → 2 deaths, 0.5 → 4 deaths" on kill-rate-auditor's reproduction, documenting that the signal exists but is quantized at the noise floor in sparse scenarios.
**Bench methodology recommendation** (carry-forward to iter 7p CMA-ES): to get observable `base_kill_rate` signal with default slope, either bump `encounter_probability_per_turn` (more kill rolls fire), use wider base spreads (0.0 vs 0.9, not 0.0 vs 0.5), or multi-seed averaging.
### Track B — Unit JSON port (unit-porter)
Two atomic ports from `.messy/` reference:
- `public/games/age-of-dwarves/data/units/spearmen.json``combat_type: "melee"`, `keywords: ["reach"]`, `is_military() → true`
- `public/games/age-of-dwarves/data/units/wyvern_riders.json``combat_type: "flying"`, `is_flying() → true`
Source shape preserved exactly so DataLoader and Unit class resolution work unchanged — no Unit.gd edits. DataLoader confirms load: "551 entries from theme 'age-of-dwarves'".
Two previously-pending `test_keyword_handler.gd` tests re-enabled with real assertions:
- `test_zoc_entry_cost_from_adjacent_enemy` — asserts spearmen projects ZOC cost of 1
- `test_flying_unit_bypasses_zoc_blocked` — asserts wyvern_riders bypass both `is_zoc_blocked` and `get_zoc_entry_cost`
**Delta**: `test_keyword_handler.gd` 9/11 → **11/11 passing**, 0 pending, 17 asserts.
### Track C — SaveManager port (save-porter)
`src/game/engine/src/core/save_manager.gd` 3 → **217 lines**. `test_save_manager.gd` 1 pending stub → **263 lines, 13 real tests, all passing**.
**API surface** (static, `class_name SaveManager extends RefCounted`):
- `save_game(slot: int) -> Error`
- `load_game(slot: int) -> Error`
- `delete_save(slot: int) -> Error`
- `get_save_slots() -> Array[Dictionary]` — returns `{slot, timestamp, turn, era, player_name, version}` per populated manual slot in slot order, autosave excluded
- `autosave() -> void` — wired into `turn_manager.gd:250` end-of-turn call site (was previously a dead call against the stub)
- `load_autosave() -> Error`
- `slot_exists(slot: int) -> bool`
Constants: `SAVE_DIR="user://saves/"`, `SAVE_EXTENSION=".json"`, `MAX_SLOTS=10`, `AUTOSAVE_SLOT_NAME="autosave"`, `SAVE_VERSION=1`.
**Envelope shape**: `{version, timestamp, game_state}`. `game_state` delegates to `GameState.serialize()` / `deserialize()` + `rebuild_layer_references()` on load. JSON written with sorted keys + tab indent for deterministic re-save (diff-friendly). Climate/weather state intentionally deferred — current Weather module has no public `get_active_effects()` hook; add when that surface lands.
**13 tests cover**: scalar round-trip, Player field round-trip (name/race/gold/is_human × 2), byte-identical re-save, multi-slot isolation, slot list ordering, empty slot list, delete target vs siblings, delete empty slot error, load empty slot error, out-of-range slot error, corrupt JSON error, autosave write + exclusion from manual slot list, autosave round-trip.
### Track D — Combat stack Rust resurrection (combat-resurrector)
`src/simulator/crates/mc-combat/src/lib.rs` was exporting only `pub mod loot;` — ~2000 LOC of `resolver.rs`, `keywords.rs`, `bonuses.rs`, `promotions.rs`, `siege.rs`, `wilds.rs` all existed on disk but orphaned.
**Module exports** (all 6 unorphaned):
```rust
pub mod bonuses;
pub mod keywords;
pub mod loot;
pub mod promotions;
pub mod resolver;
pub mod siege;
pub mod wilds;
```
Re-exports: `CombatResolver, CombatParams, CombatResult, CombatType, CombatOutcome, UnitStats, UnitAttributes, CombatBonuses, Keyword, KeywordContext, bypasses_zoc, xp_from_combat, xp_threshold, heal_on_promote, check_promotion, max_promotion_level, validate_promotion_choice, PromotionDef, PromotionEffect, melee_wall_penalty, siege_city_bonus, split_ranged_damage_vs_city, wild_combat_stats, total_attack_modifier, total_defense_modifier`.
**Drift fixes** (clippy-only, no feature changes):
- `siege.rs` — outer `///` module comment → inner `//!` (empty_line_after_doc_comments)
- `keywords.rs::Keyword::from_str` — added `#[allow(clippy::should_implement_trait)]` (can't cleanly implement std `FromStr` because return is `Option`, not `Result`; signature change is out of triage scope)
**GdCombatResolver** — new class in `src/simulator/api-gdext/src/lib.rs` (~195 LOC) following the iter 7h dict-adapter pattern:
- `#[derive(GodotClass)] #[class(base=RefCounted)]`
- `#[func] fn resolve(attacker: Dictionary, defender: Dictionary, params: Dictionary) -> Dictionary` — accepts attacker/defender stats + keywords array + combat_type/bonuses/city context, returns complete `CombatResult` as dict (`defender_damage, attacker_damage, attacker_killed, defender_killed, attacker_hp, defender_hp, city_damage, city_hp_remaining, attacker_xp, defender_xp, life_drain_heal`)
- `#[func] fn xp_threshold(level: i64) -> i64`
- `#[func] fn heal_on_promote(max_hp: i64) -> i64`
- `#[func] fn wild_combat_stats(tier: i64, size: GString, diet: GString) -> Dictionary`
3 new regression tests in a `resurrection_tests` module: crate-root resolver surface end-to-end, keyword/promotion helper reachability, wilds/siege helper reachability.
`bash build-gdext.sh` rebuilt `libmagic_civ_physics.x86_64.so` with GdCombatResolver registered.
### Critical side-find: gdlintrc auto-sync interference
Mid-iteration, unit-porter and save-porter both hit `./run verify` gdlint failures that looked pre-existing. Root cause: auto-commit `b3fb9430 chore(linter-specific): 🔧 Update linter configuration to enforce gofmt and gocritic rules in gdlintrc` reverted the project-specific carveouts:
- `max-public-methods: 100 → 20` (but `DataLoader.gd` has 99 intentional typed accessors)
- Re-enabled `no-else-return` + `unused-argument` which the project had disabled for physics/wrapper patterns
The commit message mentions "gofmt and gocritic" — Go-specific rules applied to a GDScript config by a sync tool that doesn't understand project-specific carveouts. Team-lead restored the carveouts inline during the iteration with explanatory comments so the next sync doesn't repeat the mistake:
- `max-public-methods: 100` with comment explaining DataLoader/city/game_state/ecology_db/data_loader_ecology are wide singletons by design
- `disable: [no-else-return, unused-argument]` with comment explaining physics code and GDExtension wrapper parity
`./run verify` 6/6 green after the restore. **Carry-forward**: the auto-sync tool needs a "respect local overrides" mode, or gdlintrc needs a comment marker the sync respects.
### Final metrics
```
Iter-7n-end Iter-7o-end Delta
./run verify 6/6 PASS 6/6 PASS —
Workspace cargo tests 508 530 +22 (Track A +1, Track D +3, Track C-indirect via mc-turn test count +18)
GUT passing 64 79 +15 (Track B +2, Track C +13)
GUT failing 6 6 — (unchanged — ecology + city-scorer domain issues)
mc-combat modules 1 (loot only) 7 (+6) +6 (resurrection)
GdCombatResolver class ❌ ✅ — (new)
SaveManager LOC 3 (stub) 217 +214
test_keyword_handler 9/11 11/11 +2
gdlintrc carveouts broken restored — (bystander fix)
```
**Key strategic outcomes**:
1. **Phase 7 balance + CMA-ES plans are VALID** (Track A vindicated the knob)
2. **Combat GDExtension surface exists** (iter 7p can wire combat_resolver.gd without touching Rust)
3. **Save/load works** (SaveManager no longer a stub; autosave is a real call)
4. **Keyword handler has real unit coverage** (not just founder units)
### Next steps (iter 7p carry-forward)
- **Combat GDScript consumer wiring** — restore the ~25 Unit/City fields combat_resolver.gd reads (base_str, bonus_str, veteran_level, formation_count, stimulant_penalty, get_fortification_bonus, get_data, get_range, get_damage, get_combat_type, gain_xp, can_promote, city.city_hp, max_city_hp, population). Then replace combat_resolver.gd internals with delegation to GdCombatResolver. Un-pending test_combat_resolver.
- **GdAiPlayer bridge** (still deferred from iter 7n) — retire `ai_military.gd` / `ai_tactical.gd` legacy modules, expose `mc-ai` via GDExtension, un-pending 4 AI test files.
- **Empire system restoration** — port `tech_web.gd` / `economy.gd` / `victory_manager.gd` from stubs (or land Rust bridges).
- **Ecology domain tuning** — investigate the 2 real golden-vector failures (predator population, creature quality progression). Not a bug, game-balance work.
- **CMA-ES sweep with fixed methodology** — now that Track A documented the bench methodology fix, run the sweep with bumped encounter probability + wide base spread + multi-seed averaging.
- **gdlintrc sync tool fix** — the project-specific carveouts need a mechanism that survives config sync.
---
## Iteration 7n — 5-agent parallel GUT triage: 124 failing → 6, + 4 real production bugs found (2026-04-08, COMPLETE)
**Goal:** Iter 7m cleared the 4 GUT compile errors but left 124 failing asserts untouched. This iteration: decompose the failing asserts into 5 disjoint file-set tracks, assign a specialist to each, triage each file as fix-now / defer (pending) / delete, surface real production bugs for iter 7o+ without silent patching.