Spins up the real harness (Rust .so -> wire -> Python env), steps ~60 turns, and asserts the LIVE
score_estimate and per-step reward are in sane ranges (score < 2M, |reward| < 100). This catches
the class of bug the formula unit tests missed: the fixed-point inflation passed every
relative/monotonic test but produced score ~30M / reward ~3310 through the wire, wasting a 4-hour
run. Demonstrated: pre-fix would FAIL here; post-fix PREFLIGHT OK (max score 277, max |reward| 2.00).
Training launches gate on a zero exit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
score_estimate is now the unbounded unified score (~10-20x the old clamped [0,1000] magnitude);
scale the per-turn score-delta reward down to keep it in range with the other reward terms.
Empirical retune tracked for when the self-play stable resumes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Headless fleet images may lack the [extra] deps; fall back to progress_bar=False
instead of crashing model.learn(). Surfaced by the clan-conditioning smoke (the
clan wiring itself verified: CP_LEARNER_CLAN=blackhammer -> clan_index=2 in the obs).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The wiring for per-clan trained AI. Each training episode samples a clan, stamps it
on the LEARNER slot so the obs one-hots it, and scales the SHAPING rewards by that
clan's overlay (terminal win/loss stay universal):
- player_api_main.gd: CP_LEARNER_CLAN stamps the learner slot's clan via
set_player_personality_json -> PlayerState.clan_id -> PlayerView.clan_index ->
obs clan one-hot. (Previously only non-learner slots got a clan.)
- reward_overlays.json: per-clan group multipliers (combat/expansion/production/
economy/tech) derived from ai_personalities.json strategic_axes, normalized per
clan to mean 1.0 (no fairness confound). Archetypes emerge: blackhammer combat 1.5,
goldvein economy 1.64, deepforge expansion 0.42.
- magic_civ_env.py: samples the clan per episode (seeded), passes CP_LEARNER_CLAN,
scales the 8 shaping reward terms by self._ov(group).
- harness_client.py: HarnessConfig.learner_clan -> CP_LEARNER_CLAN.
- train.py: --clan ('' generalist | 'all' samples every clan | comma list).
Local checks: py_compile clean; overlays cover all 6 clans. Next: fleet smoke
(clan_index in the learner view + a tiny training run) before scaling out.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grows the obs contract 32->96 as a SCHEMA change, not a dual hand-rewrite. obs_schema.json
v2 adds the channels the trained AI needs: economy aggregates (yields/territory/golden-age),
tech/culture/civics, military (army-health, experience, posture, equipment), per-city blocks,
terrain summary + biome histogram, richer diplomacy, and the 6-wide CLAN ONE-HOT (the
clan-conditioning input; -1 generalist = all-zero).
Both interpreters gain the same new ops — onehot, frac (nested operands), histogram,
per_entity, plus reduce/sum_len, count_nonnull, truthy, where_any (OR), bool->1.0 coercion
in scalar reads. Multi-slot ops emit a run of consecutive slots. OBS_DIM 32->96,
OBS_SCHEMA_VERSION 1->2.
The contract gate earned its keep: it caught a real Python<->Rust divergence (a stale
isinstance(dict) guard zeroed counts over string lists like city.buildings) then confirmed
the fix byte-exact. Verified on the DO fleet: learned_encoder_parity green (Rust v2 ==
Python v2, 56 fixtures incl clan one-hot variety -1..5, zero drift); mc-player-api 188/188.
Next: learned:clan-v1 controller wiring, then training on the richer obs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
verify-obs-contract.sh + verify_obs_contract.py: the third pillar of the shared
contract. Asserts the single schema is honoured byte-for-byte by BOTH interpreters
— Python (schema well-formed + obs_contract reproduces the parity fixtures) and Rust
(cargo test learned_encoder_parity, which also asserts schema version/obs_dim at
load). Exit 0 only if schema + Python + Rust agree; the Rust step runs on the fleet
where cargo exists (skipped with instructions on the toolchain-less EDIT host).
Completes the schema + versioning + verification contract: one source of truth, two
thin interpreters, one gate. Verified: gate green (Python 56/56; Rust proven on fleet).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hand-duplicated observation encoder with a schema-driven contract:
obs_schema.json declares the layout (version, obs_dim, per-field ops from a fixed
vocabulary: scalar/reduce/clamp_div, +onehot/frac/histogram/per_entity for v2),
and both Python and Rust interpret it instead of hardcoding the math. Kills the
bit-exact-drift risk that made growing 32->96 dims dangerous.
This commit lands the Python half + the v1 schema (reproduces the historical
32-dim encoder EXACTLY): obs_contract.py interprets the schema; encoders.py
delegates to it (OBS_DIM + field math now come from the schema, not module code).
Verified locally: encoders.encode_observation matches all 56 parity fixtures with
ZERO drift. Design: .project/designs/obs-contract.md.
Next: Rust interpreter (encoder.rs reads the embedded schema), verify-obs-contract
gate + version assertions, then bump to v2 (richer 96-dim) as a schema data change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exposed by a new hotseat full-game driver (drives both player seats over the
multi-slot wire, no AI dependency) — a 31-turn 2-player game surfaced these.
- mc-player-api: the AI→PlayerAction converter (apply_ai_action + the suggest
sibling) emitted the bare tactical city index ("0") for QueueProduction, but
find_city_indices needs the projector wire id "{player}_{c_idx}" — so every
AI/suggested queue_production failed UnknownCity. This silently broke the
in-box AI's production-steering, not just the wire. Emit the wire id at all
three sites; thread slot into the suggest converter; add a regression test.
Result in the playthrough: roundtrip failures 58→1, city_building_completed 0→18.
- api-gdext: advance_round_phase/end_player_round_phase did not compile at HEAD —
godot-rust 0.2.4 Array::push needs &Dictionary (AsArg); Pcg64 builds via ::seed
not ::seed_from_u64; dropped a dead rng binding. The gdext crate could not be
rebuilt from source until this.
- mc-worldsim: pub use GamePhase/RoundPhase (api-gdext references them through
mc_worldsim; they were a private re-export → E0603).
- tooling: add hotseat_playthrough.py — applies each seat's suggested actions
and flags any offered action that fails to apply, with severity triage.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>