feat(@projects/@magic-civilization): update quality metrics schema and tests

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Natalie 2026-04-17 14:28:51 -07:00
parent 43989eed82
commit b1139dc32b
2 changed files with 13 additions and 12 deletions

View file

@ -7,23 +7,24 @@ owner: shipwright
status: done
updated_at: 2026-04-17
evidence:
- src/game/engine/scenes/tests/auto_play.gd
- src/game/engine/src/generation/auto_play.gd
- tools/autoplay-report.py
- tools/autoplay-validate.py
- tools/schemas/autoplay/turn-stats-line.json
- tools/test_quality_metrics.py
- tools/tests/test_quality_metrics.py
---
## Summary
Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). `turn_stats.jsonl` per-player stats now carry three quality metrics: `tier_peak` (max era reached, derived from max `era` across `researched_techs`), `peak_unit_tier` (running max unit tier across all units ever alive, tracked in `_stats[idx]`), and `wonder_count` (buildings with `"flags": ["wonder"]` summed across all cities). The schema declares all three with backward-compat (not `required`, sentinel 0 for historical batches). `tools/autoplay-report.py` reports `build_quality_metrics` + `print_quality_metrics` surfacing all five p0-01 sub-gate inputs. 14 pytest tests in `tools/test_quality_metrics.py` cover schema round-trips and reporter medians.
Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). `turn_stats.jsonl` per-player stats now carry three quality metrics: `tier_peak` (max era reached, monotonic across turns; derived each turn by folding `DataLoader.get_tech(id).era` over `player.researched_techs` in `_check_invariants`), `peak_unit_tier` (max `DataLoader.get_unit(id).tier` seen via the `EventBus.unit_created` hook in `_on_unit_created`), and `wonder_count` (entries in `GameState.wonders_built` whose value equals the player's index, computed in `_build_player_stats`). The schema declares all three with backward-compat — fields are NOT in `required`, so historical batches (pre-p0-25) still validate; the reporter treats absent fields as sentinel `-1` and filters them from medians. `tools/autoplay-report.py` adds `build_quality_metrics` + `print_quality_metrics`, surfacing winner/loser `tier_peak`, per-game `tier_peak_gap`, `peak_unit_tier` across all players, and `wonder_count_per_player`. 8 pytest tests in `tools/tests/test_quality_metrics.py` cover schema round-trips (new + old jsonl + min/max rejection) and reporter medians (new-only, mixed, old-only).
## Acceptance
- ✓ `turn_stats.jsonl` per-player stats carry three new fields: `tier_peak: int` (1-10), `peak_unit_tier: int` (1-10), `wonder_count: int`. Implemented in `auto_play.gd:_build_player_stats()`.
- ✓ `tools/schemas/autoplay/turn-stats-line.json` declares the three new fields in the `player_stats` definition (min/max constraints; not `required` for backward compat). Fields validated by `test_schema_accepts_new_fields` + `test_schema_rejects_tier_peak_out_of_range`.
- ✓ `tools/autoplay-report.py` computes + renders medians for all three new fields plus `tier_peak_gap_median` across seeds via `build_quality_metrics` / `print_quality_metrics`. Both CSV (`PLAYER_FIELDS`) and stdout (`print_summary`) paths covered.
- ✓ 14 pytest tests in `tools/test_quality_metrics.py`: schema validator round-trip with and without new fields; reporter medians; sentinel filtering. All pass.
- ✓ Backward compatibility: schema does not mark new fields `required`; `QUALITY_METRIC_ABSENT = -1` sentinel in reporter filters pre-p0-25 rows from median calculations. Verified by `test_schema_accepts_missing_new_fields_backward_compat` and `test_quality_metrics_absent_sentinel_filtered`.
- ✓ `turn_stats.jsonl` per-player stats carry three new fields: `tier_peak: int` (1-10), `peak_unit_tier: int` (1-10), `wonder_count: int`. Implemented in `src/game/engine/src/generation/auto_play.gd` — fields emitted by `_build_player_stats()` (lines around 2065-2067 after the edit); tracked in `_check_invariants` (tier_peak) and `_on_unit_created` (peak_unit_tier); per-player wonder count folded from `GameState.wonders_built`.
- ✓ `tools/schemas/autoplay/turn-stats-line.json` declares the three new fields in the `player_stats` definition with `minimum: 0, maximum: 10` for tier fields and `minimum: 0` for `wonder_count`. Not added to `required[]` so old jsonl still validates — per-field docstring spells out the sentinel-0 contract. `tools/autoplay-validate.py` extended to honor `maximum` so the 1-10 cap actually enforces.
- ✓ `tools/autoplay-report.py` computes + renders medians for all three new fields plus `median_tier_peak_gap` across seeds via `build_quality_metrics` / `print_quality_metrics`. Both CSV (`PLAYER_FIELDS` extended with `tier_peak`, `peak_unit_tier`, `wonder_count` → p0_/p1_ columns) and stdout (`print_summary``print_quality_metrics` block) paths covered.
- ✓ 8 pytest tests in `tools/tests/test_quality_metrics.py` cover: schema accepts new jsonl, schema accepts old jsonl, schema rejects `tier_peak` out of [0,10], reporter extracts new fields, reporter emits `-1` sentinel for old jsonl, reporter computes correct medians across 3 fabricated games, reporter returns `None` medians when batch has no quality data, reporter aggregates cleanly on a mixed new/old batch. Verified green on apricot — `python3 -m pytest tools/tests/test_quality_metrics.py -v``8 passed in 0.07s`.
- ✓ Backward compatibility: schema does not mark new fields `required`; `QUALITY_METRIC_ABSENT = -1` sentinel in reporter filters pre-p0-25 rows from median calculations. Verified end-to-end by `python3 tools/autoplay-report.py .local/iter/blackhammer_tune_20260417_101447` (a pre-p0-25 batch) — CSV gains the new `pN_tier_peak/peak_unit_tier/wonder_count` columns filled with `-1`, the quality block prints `(no data — batch pre-dates p0-25 instrumentation)`, all assertions pass.
## Depends on

View file

@ -1,11 +1,11 @@
{
"generated_at": "2026-04-17T21:23:15Z",
"generated_at": "2026-04-17T21:24:08Z",
"totals": {
"stub": 1,
"partial": 12,
"done": 37,
"oos": 9,
"partial": 12,
"missing": 2,
"stub": 1,
"total": 61
},
"objectives": [
@ -257,7 +257,7 @@
"scope": "game1",
"owner": "shipwright",
"updated_at": "2026-04-17",
"summary": "Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). `turn_stats.jsonl` per-player stats now carry three quality metrics: `tier_peak` (max era reached, derived from max `era` across `researched_techs`), `peak_unit_tier` (running max unit tier across all units ever alive, tracked in `_stats[idx]`), and `wonder_count` (buildings with `\"flags\": [\"wonder\"]` summed across all cities). The schema declares all three with backward-compat (not `required`, sentinel 0 for historical batches). `tools/autoplay-report.py` reports `build_quality_metrics` + `print_quality_metrics` surfacing all five p0-01 sub-gate inputs. 14 pytest tests in `tools/test_quality_metrics.py` cover schema round-trips and reporter medians."
"summary": "Added 2026-04-17 as part of the TTV → state-at-end metric reframe (see p0-01). `turn_stats.jsonl` per-player stats now carry three quality metrics: `tier_peak` (max era reached, monotonic across turns; derived each turn by folding `DataLoader.get_tech(id).era` over `player.researched_techs` in `_check_invariants`), `peak_unit_tier` (max `DataLoader.get_unit(id).tier` seen via the `EventBus.unit_created` hook in `_on_unit_created`), and `wonder_count` (entries in `GameState.wonders_built` whose value equals the player's index, computed in `_build_player_stats`). The schema declares all three with backward-compat — fields are NOT in `required`, so historical batches (pre-p0-25) still validate; the reporter treats absent fields as sentinel `-1` and filters them from medians. `tools/autoplay-report.py` adds `build_quality_metrics` + `print_quality_metrics`, surfacing winner/loser `tier_peak`, per-game `tier_peak_gap`, `peak_unit_tier` across all players, and `wonder_count_per_player`. 8 pytest tests in `tools/tests/test_quality_metrics.py` cover schema round-trips (new + old jsonl + min/max rejection) and reporter medians (new-only, mixed, old-only)."
},
{
"id": "p1-01",