Commit graph

34 commits

Author SHA1 Message Date
Natalie
d9fbbdadd3 feat(rl): training preflight gate — prove the Rust score/reward wiring before training
Some checks failed
ci / regression gate (push) Failing after 51s
Spins up the real harness (Rust .so -> wire -> Python env), steps ~60 turns, and asserts the LIVE
score_estimate and per-step reward are in sane ranges (score < 2M, |reward| < 100). This catches
the class of bug the formula unit tests missed: the fixed-point inflation passed every
relative/monotonic test but produced score ~30M / reward ~3310 through the wire, wasting a 4-hour
run. Demonstrated: pre-fix would FAIL here; post-fix PREFLIGHT OK (max score 277, max |reward| 2.00).
Training launches gate on a zero exit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 06:24:15 -04:00
Natalie
e1f3a66a67 tune(rl): drop SCORE_DELTA_SCALE 1e-3 -> 1e-4 for the unified raw score
Some checks failed
ci / regression gate (push) Failing after 54s
score_estimate is now the unbounded unified score (~10-20x the old clamped [0,1000] magnitude);
scale the per-turn score-delta reward down to keep it in range with the other reward terms.
Empirical retune tracked for when the self-play stable resumes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 20:40:48 -04:00
Natalie
9b7b72f248 fix(ai): train.py degrades gracefully without SB3 progress-bar extras (tqdm/rich)
Some checks failed
ci / regression gate (push) Failing after 2m6s
Headless fleet images may lack the [extra] deps; fall back to progress_bar=False
instead of crashing model.learn(). Surfaced by the clan-conditioning smoke (the
clan wiring itself verified: CP_LEARNER_CLAN=blackhammer -> clan_index=2 in the obs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:21:12 -04:00
Natalie
57b326b670 feat(ai): clan-conditioned training pipeline (harness + env + reward overlays)
Some checks failed
ci / regression gate (push) Waiting to run
deploy-next / deploy dev guide to mc.next.black.lan (push) Failing after 44s
The wiring for per-clan trained AI. Each training episode samples a clan, stamps it
on the LEARNER slot so the obs one-hots it, and scales the SHAPING rewards by that
clan's overlay (terminal win/loss stay universal):

- player_api_main.gd: CP_LEARNER_CLAN stamps the learner slot's clan via
  set_player_personality_json -> PlayerState.clan_id -> PlayerView.clan_index ->
  obs clan one-hot. (Previously only non-learner slots got a clan.)
- reward_overlays.json: per-clan group multipliers (combat/expansion/production/
  economy/tech) derived from ai_personalities.json strategic_axes, normalized per
  clan to mean 1.0 (no fairness confound). Archetypes emerge: blackhammer combat 1.5,
  goldvein economy 1.64, deepforge expansion 0.42.
- magic_civ_env.py: samples the clan per episode (seeded), passes CP_LEARNER_CLAN,
  scales the 8 shaping reward terms by self._ov(group).
- harness_client.py: HarnessConfig.learner_clan -> CP_LEARNER_CLAN.
- train.py: --clan ('' generalist | 'all' samples every clan | comma list).

Local checks: py_compile clean; overlays cover all 6 clans. Next: fleet smoke
(clan_index in the learner view + a tiny training run) before scaling out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:06:02 -04:00
Natalie
a6fb75a480 feat(ai): v2 richer 96-dim clan-conditioned observation (schema data + 4 new ops)
Some checks are pending
ci / regression gate (push) Waiting to run
deploy-next / deploy dev guide to mc.next.black.lan (push) Waiting to run
Grows the obs contract 32->96 as a SCHEMA change, not a dual hand-rewrite. obs_schema.json
v2 adds the channels the trained AI needs: economy aggregates (yields/territory/golden-age),
tech/culture/civics, military (army-health, experience, posture, equipment), per-city blocks,
terrain summary + biome histogram, richer diplomacy, and the 6-wide CLAN ONE-HOT (the
clan-conditioning input; -1 generalist = all-zero).

Both interpreters gain the same new ops — onehot, frac (nested operands), histogram,
per_entity, plus reduce/sum_len, count_nonnull, truthy, where_any (OR), bool->1.0 coercion
in scalar reads. Multi-slot ops emit a run of consecutive slots. OBS_DIM 32->96,
OBS_SCHEMA_VERSION 1->2.

The contract gate earned its keep: it caught a real Python<->Rust divergence (a stale
isinstance(dict) guard zeroed counts over string lists like city.buildings) then confirmed
the fix byte-exact. Verified on the DO fleet: learned_encoder_parity green (Rust v2 ==
Python v2, 56 fixtures incl clan one-hot variety -1..5, zero drift); mc-player-api 188/188.

Next: learned:clan-v1 controller wiring, then training on the richer obs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:38:50 -04:00
Natalie
5f73ccf950 test(ai): cross-language verification gate for the obs contract
Some checks are pending
ci / regression gate (push) Waiting to run
verify-obs-contract.sh + verify_obs_contract.py: the third pillar of the shared
contract. Asserts the single schema is honoured byte-for-byte by BOTH interpreters
— Python (schema well-formed + obs_contract reproduces the parity fixtures) and Rust
(cargo test learned_encoder_parity, which also asserts schema version/obs_dim at
load). Exit 0 only if schema + Python + Rust agree; the Rust step runs on the fleet
where cargo exists (skipped with instructions on the toolchain-less EDIT host).

Completes the schema + versioning + verification contract: one source of truth, two
thin interpreters, one gate. Verified: gate green (Python 56/56; Rust proven on fleet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:06:13 -04:00
Natalie
b67764ec67 feat(ai): shared obs encoder contract — schema as single source of truth (Python side)
Some checks are pending
ci / regression gate (push) Waiting to run
deploy-next / deploy dev guide to mc.next.black.lan (push) Waiting to run
Replace the hand-duplicated observation encoder with a schema-driven contract:
obs_schema.json declares the layout (version, obs_dim, per-field ops from a fixed
vocabulary: scalar/reduce/clamp_div, +onehot/frac/histogram/per_entity for v2),
and both Python and Rust interpret it instead of hardcoding the math. Kills the
bit-exact-drift risk that made growing 32->96 dims dangerous.

This commit lands the Python half + the v1 schema (reproduces the historical
32-dim encoder EXACTLY): obs_contract.py interprets the schema; encoders.py
delegates to it (OBS_DIM + field math now come from the schema, not module code).
Verified locally: encoders.encode_observation matches all 56 parity fixtures with
ZERO drift. Design: .project/designs/obs-contract.md.

Next: Rust interpreter (encoder.rs reads the embedded schema), verify-obs-contract
gate + version assertions, then bump to v2 (richer 96-dim) as a schema data change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 11:49:04 -04:00
Natalie
60c8ce0ef6 fix(simulator): 🐛 AI/suggest production city_id round-trip + restore gdext build
Exposed by a new hotseat full-game driver (drives both player seats over the
multi-slot wire, no AI dependency) — a 31-turn 2-player game surfaced these.

- mc-player-api: the AI→PlayerAction converter (apply_ai_action + the suggest
  sibling) emitted the bare tactical city index ("0") for QueueProduction, but
  find_city_indices needs the projector wire id "{player}_{c_idx}" — so every
  AI/suggested queue_production failed UnknownCity. This silently broke the
  in-box AI's production-steering, not just the wire. Emit the wire id at all
  three sites; thread slot into the suggest converter; add a regression test.
  Result in the playthrough: roundtrip failures 58→1, city_building_completed 0→18.
- api-gdext: advance_round_phase/end_player_round_phase did not compile at HEAD —
  godot-rust 0.2.4 Array::push needs &Dictionary (AsArg); Pcg64 builds via ::seed
  not ::seed_from_u64; dropped a dead rng binding. The gdext crate could not be
  rebuilt from source until this.
- mc-worldsim: pub use GamePhase/RoundPhase (api-gdext references them through
  mc_worldsim; they were a private re-export → E0603).
- tooling: add hotseat_playthrough.py — applies each seat's suggested actions
  and flags any offered action that fails to apply, with severity triage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 18:48:37 -04:00
Natalie
0d2520a700 feat(@projects/@magic-civilization): add terraforming cascade design and fauna updates
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-09 19:51:48 -07:00
Natalie
0763db8e2d feat(game): persist wind_direction for climate fidelity
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-09 01:17:04 -07:00
Natalie
00e98329fa feat(@projects/@magic-civilization): update objectives dashboard and climate integration
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-09 01:07:07 -07:00
autocommit
55935afbd2 refactor(rl-self-play): ♻️ Optimize ONNX export script for RL self-play model (p1_29f) to improve compatibility and performance
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-02 22:59:04 -07:00
autocommit
dbeb3f4088 test(rl-self-play): Add evaluation functions, opponent models, and smoke tests for divergence mining in RL self-play tools
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:26:00 -07:00
autocommit
2637b79e15 feat(rl-self-play): Add lightweight SmokeModelOpponent class with core act() and train() methods for RL self-play testing
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:34 -07:00
autocommit
236160134c feat(rl-self-play): Implement opponent model loading, execution, and behavior management for reinforcement learning self-play
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:34 -07:00
autocommit
4564074d86 feat(rl-self-play): Add opponent model evaluation support with new training parameters and evaluation metrics in the self-play loop
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:33 -07:00
autocommit
20d842004d feat(rl-self-play): Add methods to load and integrate learned opponent policies into MagicCivEnv for reinforcement learning workflows
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:33 -07:00
autocommit
e2e578cdab feat(rl-self-play): Add learned opponent policy evaluation options to RL self-play evaluation script
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:33 -07:00
autocommit
bb15503079 feat(rl-self-play): Add mine divergence metric for evaluating strategy differences in RL self-play
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:04:30 -07:00
autocommit
fd64dc5622 test(rl-self-play): Add comprehensive test suite for RL self-play pretraining, diagnostics, encoders, harness client, and expert recording validation
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:15 -07:00
autocommit
e6d90a6a47 feat(rl-self-play): Add encoder logic, training modes, behavior cloning pretraining, diagnostic tools, and expert data handling to the RL self-play pipeline
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:14 -07:00
autocommit
af0cad4873 perf(rl-self-play): Optimize RL self-play environment with faster episode evaluation, optimized state encoding, and reduced training overhead
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:13 -07:00
autocommit
eb8b82700c feat(game-engine): Improve game state management with audio utilities, auto-play logic, and entity handling; add integration tests for game-over and rally scenarios; update smoke testing tool for multi-slot support
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:12 -07:00
autocommit
3f1aeaa602 infra(player-api): 🧱 Update player API infrastructure to enable multi-slot configuration for concurrent player agents
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:12 -07:00
autocommit
34911ad08c perf(rl-self-play): Refactor environment state transitions and agent communication for faster RL self-play execution
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:11 -07:00
autocommit
3241bdacd1 feat(rl-self-play): Introduce turn/step cap tracking in evaluation metrics for improved RL self-play observability
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:11 -07:00
autocommit
e5a2a37d0e feat(rl-self-play): Add stochastic evaluation with masked softmax sampling to replace deterministic argmax in RL self-play training
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:11 -07:00
autocommit
b82e4a8fbd feat(rl-self-play): Introduce no-op penalty and turn advancement bonus in RL environment
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:11 -07:00
Natalie
50e174ab06 feat(@projects/@magic-civilization): add step_cap evaluation category
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 05:34:29 -07:00
Natalie
4a862b76fb fix(@projects/@magic-civilization): 🐛 improve pid detection in rl scripts
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 05:28:24 -07:00
Natalie
14fbe501ca feat(tooling): add turn tracking and forced end turn logic
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 05:16:18 -07:00
Natalie
de5fbd42c4 feat(tooling): add apricot gpu device guidance
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 04:02:09 -07:00
Natalie
7cdc8178b7 feat(tooling): add smoke test for protocol layer
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 03:59:39 -07:00
Natalie
b7891991a4 feat(@projects/@magic-civilization): add rl_self_play tooling for self-play training
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 03:54:40 -07:00