magicciv

Author	SHA1	Message	Date
Natalie	d9fbbdadd3	feat(rl): training preflight gate — prove the Rust score/reward wiring before training Some checks failed ci / regression gate (push) Failing after 51s Details Spins up the real harness (Rust .so -> wire -> Python env), steps ~60 turns, and asserts the LIVE score_estimate and per-step reward are in sane ranges (score < 2M, \|reward\| < 100). This catches the class of bug the formula unit tests missed: the fixed-point inflation passed every relative/monotonic test but produced score ~30M / reward ~3310 through the wire, wasting a 4-hour run. Demonstrated: pre-fix would FAIL here; post-fix PREFLIGHT OK (max score 277, max \|reward\| 2.00). Training launches gate on a zero exit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 06:24:15 -04:00
Natalie	e1f3a66a67	tune(rl): drop SCORE_DELTA_SCALE 1e-3 -> 1e-4 for the unified raw score Some checks failed ci / regression gate (push) Failing after 54s Details score_estimate is now the unbounded unified score (~10-20x the old clamped [0,1000] magnitude); scale the per-turn score-delta reward down to keep it in range with the other reward terms. Empirical retune tracked for when the self-play stable resumes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 20:40:48 -04:00
Natalie	9b7b72f248	fix(ai): train.py degrades gracefully without SB3 progress-bar extras (tqdm/rich) Some checks failed ci / regression gate (push) Failing after 2m6s Details Headless fleet images may lack the [extra] deps; fall back to progress_bar=False instead of crashing model.learn(). Surfaced by the clan-conditioning smoke (the clan wiring itself verified: CP_LEARNER_CLAN=blackhammer -> clan_index=2 in the obs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 13:21:12 -04:00
Natalie	57b326b670	feat(ai): clan-conditioned training pipeline (harness + env + reward overlays) Some checks failed ci / regression gate (push) Waiting to run Details deploy-next / deploy dev guide to mc.next.black.lan (push) Failing after 44s Details The wiring for per-clan trained AI. Each training episode samples a clan, stamps it on the LEARNER slot so the obs one-hots it, and scales the SHAPING rewards by that clan's overlay (terminal win/loss stay universal): - player_api_main.gd: CP_LEARNER_CLAN stamps the learner slot's clan via set_player_personality_json -> PlayerState.clan_id -> PlayerView.clan_index -> obs clan one-hot. (Previously only non-learner slots got a clan.) - reward_overlays.json: per-clan group multipliers (combat/expansion/production/ economy/tech) derived from ai_personalities.json strategic_axes, normalized per clan to mean 1.0 (no fairness confound). Archetypes emerge: blackhammer combat 1.5, goldvein economy 1.64, deepforge expansion 0.42. - magic_civ_env.py: samples the clan per episode (seeded), passes CP_LEARNER_CLAN, scales the 8 shaping reward terms by self._ov(group). - harness_client.py: HarnessConfig.learner_clan -> CP_LEARNER_CLAN. - train.py: --clan ('' generalist \| 'all' samples every clan \| comma list). Local checks: py_compile clean; overlays cover all 6 clans. Next: fleet smoke (clan_index in the learner view + a tiny training run) before scaling out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 13:06:02 -04:00
Natalie	a6fb75a480	feat(ai): v2 richer 96-dim clan-conditioned observation (schema data + 4 new ops) Some checks are pending ci / regression gate (push) Waiting to run Details deploy-next / deploy dev guide to mc.next.black.lan (push) Waiting to run Details Grows the obs contract 32->96 as a SCHEMA change, not a dual hand-rewrite. obs_schema.json v2 adds the channels the trained AI needs: economy aggregates (yields/territory/golden-age), tech/culture/civics, military (army-health, experience, posture, equipment), per-city blocks, terrain summary + biome histogram, richer diplomacy, and the 6-wide CLAN ONE-HOT (the clan-conditioning input; -1 generalist = all-zero). Both interpreters gain the same new ops — onehot, frac (nested operands), histogram, per_entity, plus reduce/sum_len, count_nonnull, truthy, where_any (OR), bool->1.0 coercion in scalar reads. Multi-slot ops emit a run of consecutive slots. OBS_DIM 32->96, OBS_SCHEMA_VERSION 1->2. The contract gate earned its keep: it caught a real Python<->Rust divergence (a stale isinstance(dict) guard zeroed counts over string lists like city.buildings) then confirmed the fix byte-exact. Verified on the DO fleet: learned_encoder_parity green (Rust v2 == Python v2, 56 fixtures incl clan one-hot variety -1..5, zero drift); mc-player-api 188/188. Next: learned:clan-v1 controller wiring, then training on the richer obs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 12:38:50 -04:00
Natalie	5f73ccf950	test(ai): cross-language verification gate for the obs contract Some checks are pending ci / regression gate (push) Waiting to run Details verify-obs-contract.sh + verify_obs_contract.py: the third pillar of the shared contract. Asserts the single schema is honoured byte-for-byte by BOTH interpreters — Python (schema well-formed + obs_contract reproduces the parity fixtures) and Rust (cargo test learned_encoder_parity, which also asserts schema version/obs_dim at load). Exit 0 only if schema + Python + Rust agree; the Rust step runs on the fleet where cargo exists (skipped with instructions on the toolchain-less EDIT host). Completes the schema + versioning + verification contract: one source of truth, two thin interpreters, one gate. Verified: gate green (Python 56/56; Rust proven on fleet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 12:06:13 -04:00
Natalie	b67764ec67	feat(ai): shared obs encoder contract — schema as single source of truth (Python side) Some checks are pending ci / regression gate (push) Waiting to run Details deploy-next / deploy dev guide to mc.next.black.lan (push) Waiting to run Details Replace the hand-duplicated observation encoder with a schema-driven contract: obs_schema.json declares the layout (version, obs_dim, per-field ops from a fixed vocabulary: scalar/reduce/clamp_div, +onehot/frac/histogram/per_entity for v2), and both Python and Rust interpret it instead of hardcoding the math. Kills the bit-exact-drift risk that made growing 32->96 dims dangerous. This commit lands the Python half + the v1 schema (reproduces the historical 32-dim encoder EXACTLY): obs_contract.py interprets the schema; encoders.py delegates to it (OBS_DIM + field math now come from the schema, not module code). Verified locally: encoders.encode_observation matches all 56 parity fixtures with ZERO drift. Design: .project/designs/obs-contract.md. Next: Rust interpreter (encoder.rs reads the embedded schema), verify-obs-contract gate + version assertions, then bump to v2 (richer 96-dim) as a schema data change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 11:49:04 -04:00
Natalie	60c8ce0ef6	fix(simulator): 🐛 AI/suggest production city_id round-trip + restore gdext build Exposed by a new hotseat full-game driver (drives both player seats over the multi-slot wire, no AI dependency) — a 31-turn 2-player game surfaced these. - mc-player-api: the AI→PlayerAction converter (apply_ai_action + the suggest sibling) emitted the bare tactical city index ("0") for QueueProduction, but find_city_indices needs the projector wire id "{player}_{c_idx}" — so every AI/suggested queue_production failed UnknownCity. This silently broke the in-box AI's production-steering, not just the wire. Emit the wire id at all three sites; thread slot into the suggest converter; add a regression test. Result in the playthrough: roundtrip failures 58→1, city_building_completed 0→18. - api-gdext: advance_round_phase/end_player_round_phase did not compile at HEAD — godot-rust 0.2.4 Array::push needs &Dictionary (AsArg); Pcg64 builds via ::seed not ::seed_from_u64; dropped a dead rng binding. The gdext crate could not be rebuilt from source until this. - mc-worldsim: pub use GamePhase/RoundPhase (api-gdext references them through mc_worldsim; they were a private re-export → E0603). - tooling: add hotseat_playthrough.py — applies each seat's suggested actions and flags any offered action that fails to apply, with severity triage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 18:48:37 -04:00
Natalie	0d2520a700	feat(@projects/@magic-civilization): ✨ add terraforming cascade design and fauna updates Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-06-09 19:51:48 -07:00
Natalie	0763db8e2d	feat(game): ✨ persist wind_direction for climate fidelity Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-06-09 01:17:04 -07:00
Natalie	00e98329fa	feat(@projects/@magic-civilization): ✨ update objectives dashboard and climate integration Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-06-09 01:07:07 -07:00
autocommit	55935afbd2	refactor(rl-self-play): ♻️ Optimize ONNX export script for RL self-play model (p1_29f) to improve compatibility and performance Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-06-02 22:59:04 -07:00
autocommit	dbeb3f4088	test(rl-self-play): ✅ Add evaluation functions, opponent models, and smoke tests for divergence mining in RL self-play tools Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:26:00 -07:00
autocommit	2637b79e15	feat(rl-self-play): ✨ Add lightweight SmokeModelOpponent class with core act() and train() methods for RL self-play testing Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:34 -07:00
autocommit	236160134c	feat(rl-self-play): ✨ Implement opponent model loading, execution, and behavior management for reinforcement learning self-play Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:34 -07:00
autocommit	4564074d86	feat(rl-self-play): ✨ Add opponent model evaluation support with new training parameters and evaluation metrics in the self-play loop Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:33 -07:00
autocommit	20d842004d	feat(rl-self-play): ✨ Add methods to load and integrate learned opponent policies into MagicCivEnv for reinforcement learning workflows Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:33 -07:00
autocommit	e2e578cdab	feat(rl-self-play): ✨ Add learned opponent policy evaluation options to RL self-play evaluation script Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:33 -07:00
autocommit	bb15503079	feat(rl-self-play): ✨ Add mine divergence metric for evaluating strategy differences in RL self-play Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:04:30 -07:00
autocommit	fd64dc5622	test(rl-self-play): ✅ Add comprehensive test suite for RL self-play pretraining, diagnostics, encoders, harness client, and expert recording validation Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:15 -07:00
autocommit	e6d90a6a47	feat(rl-self-play): ✨ Add encoder logic, training modes, behavior cloning pretraining, diagnostic tools, and expert data handling to the RL self-play pipeline Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:14 -07:00
autocommit	af0cad4873	perf(rl-self-play): ⚡ Optimize RL self-play environment with faster episode evaluation, optimized state encoding, and reduced training overhead Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:13 -07:00
autocommit	eb8b82700c	feat(game-engine): ✨ Improve game state management with audio utilities, auto-play logic, and entity handling; add integration tests for game-over and rally scenarios; update smoke testing tool for multi-slot support Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:12 -07:00
autocommit	3f1aeaa602	infra(player-api): 🧱 Update player API infrastructure to enable multi-slot configuration for concurrent player agents Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:12 -07:00
autocommit	34911ad08c	perf(rl-self-play): ⚡ Refactor environment state transitions and agent communication for faster RL self-play execution Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:11 -07:00
autocommit	3241bdacd1	feat(rl-self-play): ✨ Introduce turn/step cap tracking in evaluation metrics for improved RL self-play observability Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:11 -07:00
autocommit	e5a2a37d0e	feat(rl-self-play): ✨ Add stochastic evaluation with masked softmax sampling to replace deterministic argmax in RL self-play training Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:11 -07:00
autocommit	b82e4a8fbd	feat(rl-self-play): ✨ Introduce no-op penalty and turn advancement bonus in RL environment Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:11 -07:00
Natalie	50e174ab06	feat(@projects/@magic-civilization): ✨ add step_cap evaluation category Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 05:34:29 -07:00
Natalie	4a862b76fb	fix(@projects/@magic-civilization): 🐛 improve pid detection in rl scripts Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 05:28:24 -07:00
Natalie	14fbe501ca	feat(tooling): ✨ add turn tracking and forced end turn logic Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 05:16:18 -07:00
Natalie	de5fbd42c4	feat(tooling): ✨ add apricot gpu device guidance Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 04:02:09 -07:00
Natalie	7cdc8178b7	feat(tooling): ✨ add smoke test for protocol layer Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 03:59:39 -07:00
Natalie	b7891991a4	feat(@projects/@magic-civilization): ✨ add rl_self_play tooling for self-play training Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 03:54:40 -07:00

34 commits