magicciv

Author	SHA1	Message	Date
Natalie	9b7b72f248	fix(ai): train.py degrades gracefully without SB3 progress-bar extras (tqdm/rich) Some checks failed ci / regression gate (push) Failing after 2m6s Details Headless fleet images may lack the [extra] deps; fall back to progress_bar=False instead of crashing model.learn(). Surfaced by the clan-conditioning smoke (the clan wiring itself verified: CP_LEARNER_CLAN=blackhammer -> clan_index=2 in the obs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 13:21:12 -04:00
Natalie	57b326b670	feat(ai): clan-conditioned training pipeline (harness + env + reward overlays) Some checks failed ci / regression gate (push) Waiting to run Details deploy-next / deploy dev guide to mc.next.black.lan (push) Failing after 44s Details The wiring for per-clan trained AI. Each training episode samples a clan, stamps it on the LEARNER slot so the obs one-hots it, and scales the SHAPING rewards by that clan's overlay (terminal win/loss stay universal): - player_api_main.gd: CP_LEARNER_CLAN stamps the learner slot's clan via set_player_personality_json -> PlayerState.clan_id -> PlayerView.clan_index -> obs clan one-hot. (Previously only non-learner slots got a clan.) - reward_overlays.json: per-clan group multipliers (combat/expansion/production/ economy/tech) derived from ai_personalities.json strategic_axes, normalized per clan to mean 1.0 (no fairness confound). Archetypes emerge: blackhammer combat 1.5, goldvein economy 1.64, deepforge expansion 0.42. - magic_civ_env.py: samples the clan per episode (seeded), passes CP_LEARNER_CLAN, scales the 8 shaping reward terms by self._ov(group). - harness_client.py: HarnessConfig.learner_clan -> CP_LEARNER_CLAN. - train.py: --clan ('' generalist \| 'all' samples every clan \| comma list). Local checks: py_compile clean; overlays cover all 6 clans. Next: fleet smoke (clan_index in the learner view + a tiny training run) before scaling out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 13:06:02 -04:00
autocommit	4564074d86	feat(rl-self-play): ✨ Add opponent model evaluation support with new training parameters and evaluation metrics in the self-play loop Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-27 20:15:33 -07:00
autocommit	e6d90a6a47	feat(rl-self-play): ✨ Add encoder logic, training modes, behavior cloning pretraining, diagnostic tools, and expert data handling to the RL self-play pipeline Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:14 -07:00
autocommit	af0cad4873	perf(rl-self-play): ⚡ Optimize RL self-play environment with faster episode evaluation, optimized state encoding, and reduced training overhead Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:13 -07:00
autocommit	e5a2a37d0e	feat(rl-self-play): ✨ Add stochastic evaluation with masked softmax sampling to replace deterministic argmax in RL self-play training Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-26 02:21:11 -07:00
Natalie	de5fbd42c4	feat(tooling): ✨ add apricot gpu device guidance Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 04:02:09 -07:00
Natalie	b7891991a4	feat(@projects/@magic-civilization): ✨ add rl_self_play tooling for self-play training Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-05-17 03:54:40 -07:00

8 commits