Commit graph

8 commits

Author SHA1 Message Date
Natalie
9b7b72f248 fix(ai): train.py degrades gracefully without SB3 progress-bar extras (tqdm/rich)
Some checks failed
ci / regression gate (push) Failing after 2m6s
Headless fleet images may lack the [extra] deps; fall back to progress_bar=False
instead of crashing model.learn(). Surfaced by the clan-conditioning smoke (the
clan wiring itself verified: CP_LEARNER_CLAN=blackhammer -> clan_index=2 in the obs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:21:12 -04:00
Natalie
57b326b670 feat(ai): clan-conditioned training pipeline (harness + env + reward overlays)
Some checks failed
ci / regression gate (push) Waiting to run
deploy-next / deploy dev guide to mc.next.black.lan (push) Failing after 44s
The wiring for per-clan trained AI. Each training episode samples a clan, stamps it
on the LEARNER slot so the obs one-hots it, and scales the SHAPING rewards by that
clan's overlay (terminal win/loss stay universal):

- player_api_main.gd: CP_LEARNER_CLAN stamps the learner slot's clan via
  set_player_personality_json -> PlayerState.clan_id -> PlayerView.clan_index ->
  obs clan one-hot. (Previously only non-learner slots got a clan.)
- reward_overlays.json: per-clan group multipliers (combat/expansion/production/
  economy/tech) derived from ai_personalities.json strategic_axes, normalized per
  clan to mean 1.0 (no fairness confound). Archetypes emerge: blackhammer combat 1.5,
  goldvein economy 1.64, deepforge expansion 0.42.
- magic_civ_env.py: samples the clan per episode (seeded), passes CP_LEARNER_CLAN,
  scales the 8 shaping reward terms by self._ov(group).
- harness_client.py: HarnessConfig.learner_clan -> CP_LEARNER_CLAN.
- train.py: --clan ('' generalist | 'all' samples every clan | comma list).

Local checks: py_compile clean; overlays cover all 6 clans. Next: fleet smoke
(clan_index in the learner view + a tiny training run) before scaling out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:06:02 -04:00
autocommit
4564074d86 feat(rl-self-play): Add opponent model evaluation support with new training parameters and evaluation metrics in the self-play loop
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-27 20:15:33 -07:00
autocommit
e6d90a6a47 feat(rl-self-play): Add encoder logic, training modes, behavior cloning pretraining, diagnostic tools, and expert data handling to the RL self-play pipeline
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:14 -07:00
autocommit
af0cad4873 perf(rl-self-play): Optimize RL self-play environment with faster episode evaluation, optimized state encoding, and reduced training overhead
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:13 -07:00
autocommit
e5a2a37d0e feat(rl-self-play): Add stochastic evaluation with masked softmax sampling to replace deterministic argmax in RL self-play training
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-26 02:21:11 -07:00
Natalie
de5fbd42c4 feat(tooling): add apricot gpu device guidance
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 04:02:09 -07:00
Natalie
b7891991a4 feat(@projects/@magic-civilization): add rl_self_play tooling for self-play training
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 03:54:40 -07:00