docs(ai): richer clan-conditioned observation encoder spec
Some checks are pending
ci / regression gate (push) Waiting to run

Pins the new OBS_DIM=96 observation contract (vs current macro-only 32) so the
Python (encoders.py) + Rust (learned/encoder.rs) encoders land bit-exact in one
verified batch. Adds the discarded channels the owner wants — tech/culture/civics,
per-city territory/buildings/siege, army health/experience, terrain summary — plus
the 6-wide clan one-hot for the clan-conditioned model (generalist = all-zero).

Surfaces the key prerequisite: PlayerView exposes no clan, so PlayerState.clan_id
must be projected as clan_index first. Action space unchanged. Per-slot temperature
(difficulty lever) + controller wiring specified. Verification on the fleet:
regen learned_parity fixtures + cargo test, determinism. A spatial/CNN obs stays a
later v1.1 step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Natalie 2026-06-30 11:03:09 -04:00
parent 618599d22e
commit b6d539eaab

View file

@ -0,0 +1,75 @@
# Richer Observation Encoder — Spec (clan-conditioned)
> Owner decisions (2026-06-30): clan-CONDITIONED single model; **block on a richer
> encoder before training**; land all Rust wiring first. This spec pins the new
> observation contract so `tooling/rl_self_play/encoders.py` and
> `src/simulator/crates/mc-player-api/src/learned/encoder.rs` land bit-exact in one
> verified batch. See [[mc-ai-trained-not-scripted]] and
> `.project/designs/trained-ai-per-clan-difficulty-plan.md`.
## Why (the cap being removed)
The current 32-float obs (`encode_observation`) is macro-only: it discards tech-tree
progress, per-city territory/buildings/siege state, army health/experience, terrain,
and civics — exactly the signals a competent 4X policy needs. This is the documented
competence cap. The action space (`ACTION_DIM`, mask) is **unchanged** — only the
observation grows.
## Prerequisite (Rust, cargo-verifiable, do first)
`PlayerView` exposes **no clan/personality**. Clan-conditioning needs it.
1. Add `pub clan_index: i32` to `PlayerView` (view.rs) — the bound player's clan as a
stable 0..5 index (`-1` = unset/generalist), mapped from `PlayerState.clan_id`
string via a canonical order `[militarist, boom, expansionist, merchant,
tech_rusher, turtle]` (the order in `ai_personalities.json`).
2. Project it in `projection.rs` where `PlayerView` is built. Generalist training/play
passes `-1` → all-zero one-hot.
3. Mirror the field in the Python view dict (it already serializes from the same wire).
## New observation layout (richer DENSE; a spatial/CNN obs stays a later v1.1 step)
`OBS_DIM = 96`. asinh-normalized uniformly (unchanged contract), f64→f32, parity ≤1e-4.
All counts/fractions sourced ONLY from the fog-aware `PlayerView` (no privileged state).
| Block | Idx | Fields (source) |
|---|---|---|
| A self macro (keep) | 0:8 | gold, gold_per_turn, science_per_turn, score_estimate, city_count, unit_count, happiness_pool, culture_per_turn (resources/score) |
| B economy aggregate | 8:16 | Σcity food, Σprod, Σscience, Σgold, Σculture yields; avg pop; Σowned_tiles (territory); golden_age_active (cities[].yields, cities[].owned_tiles, resources.golden_age_active) |
| C tech/culture/civics | 16:24 | researched_count, tech_progress/tech_cost frac, available_count, tradition researched_count, tradition_progress frac, civics set-count (authority+labor+economy non-null), anarchy_turns, current_tech present (research/culture/civics) |
| D military | 24:34 | total units, warriors, founders, Σhp/Σmax_hp (army health frac), avg experience, #promotion_available, #fortified, #shield_wall‖braced posture, Σequipped, avg movement_left/movement_max (units[]) |
| E per-city (4×6) | 34:58 | per city c in cities[:4]: population, production yield, food_stored/threshold frac, hp/max_hp frac (siege), buildings count, owned_tiles count |
| F terrain summary | 58:72 | #visible tiles, explored fraction (explored/total seen), #river tiles, #improvement tiles, #owned tiles, biome histogram over a fixed biome vocab (top ~8 biomes counts) (tiles[]) |
| G diplomacy | 72:80 | #known opponents, #war, #peace, #open_borders, #shared_map, #trade_deals, #pending_envelopes, #ransom_offers (diplomacy/pending_events) |
| H turn | 80:82 | turn number, turn/1000 progress |
| I clan one-hot | 82:88 | 6-wide one-hot from `clan_index` (all-zero if -1 = generalist) |
| — reserve | 88:96 | zero-pad headroom for next additions without a third contract break |
Fixed biome vocab for block F (canonical order, append-only): the biome label ids in
`public/games/age-of-dwarves/data/.../biomes` — pin the exact list in code, unknown
biomes fold into an "other" bucket so the dim is stable.
## Difficulty lever (Rust, independent, cargo-verifiable)
Per-slot temperature: replace the single global `MC_LEARNED_TEMPERATURE` read
(`dispatch.rs:1137`) with a per-slot value sourced from the difficulty tier on
`players[slot]`. Test: two slots → different sampling streams from the same logits.
## Controller wiring
Single `learned:clan-v1` id; clan_index + temperature are *inputs*, not separate ids.
`ident()` version string gains the trained-run tag for save/replay fingerprinting.
## Parity + verification (on the fleet — no local cargo)
1. Regenerate `learned_parity.rs` fixtures via the Python capture tool on real views
(now OBS_DIM=96 + clan).
2. `cargo test -p mc-player-api learned_parity` → float-exact obs, bit-exact mask.
3. Determinism: same seed twice → byte-identical action stream.
4. `encoders.py` self-test: encode a captured view, assert shape 96 + asinh applied.
## Landing order (one verified batch)
1. `clan_index` on PlayerView + projection (Rust). 2. Rewrite `encoders.py` to 96-dim.
3. Rewrite `encoder.rs::encode_observation` to match field-for-field. 4. Bump
`inference.rs` input shape 32→96. 5. Per-slot temperature. 6. Regen fixtures +
`cargo test learned_parity` + determinism, all green on a worker. 7. Commit.