docs(ai): richer clan-conditioned observation encoder spec

Pins the new OBS_DIM=96 observation contract (vs current macro-only 32) so the Python (encoders.py) + Rust (learned/encoder.rs) encoders land bit-exact in one verified batch. Adds the discarded channels the owner wants — tech/culture/civics, per-city territory/buildings/siege, army health/experience, terrain summary — plus the 6-wide clan one-hot for the clan-conditioned model (generalist = all-zero). Surfaces the key prerequisite: PlayerView exposes no clan, so PlayerState.clan_id must be projected as clan_index first. Action space unchanged. Per-slot temperature (difficulty lever) + controller wiring specified. Verification on the fleet: regen learned_parity fixtures + cargo test, determinism. A spatial/CNN obs stays a later v1.1 step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 11:03:09 -04:00 · 2026-06-30 11:03:09 -04:00 · b6d539eaab
commit b6d539eaab
parent 618599d22e
1 changed files with 75 additions and 0 deletions
--- a/.project/designs/richer-encoder-spec.md
+++ b/.project/designs/richer-encoder-spec.md
@ -0,0 +1,75 @@
+# Richer Observation Encoder — Spec (clan-conditioned)
+
+> Owner decisions (2026-06-30): clan-CONDITIONED single model; **block on a richer
+> encoder before training**; land all Rust wiring first. This spec pins the new
+> observation contract so `tooling/rl_self_play/encoders.py` and
+> `src/simulator/crates/mc-player-api/src/learned/encoder.rs` land bit-exact in one
+> verified batch. See [[mc-ai-trained-not-scripted]] and
+> `.project/designs/trained-ai-per-clan-difficulty-plan.md`.
+
+## Why (the cap being removed)
+
+The current 32-float obs (`encode_observation`) is macro-only: it discards tech-tree
+progress, per-city territory/buildings/siege state, army health/experience, terrain,
+and civics — exactly the signals a competent 4X policy needs. This is the documented
+competence cap. The action space (`ACTION_DIM`, mask) is **unchanged** — only the
+observation grows.
+
+## Prerequisite (Rust, cargo-verifiable, do first)
+
+`PlayerView` exposes **no clan/personality**. Clan-conditioning needs it.
+1. Add `pub clan_index: i32` to `PlayerView` (view.rs) — the bound player's clan as a
+   stable 0..5 index (`-1` = unset/generalist), mapped from `PlayerState.clan_id`
+   string via a canonical order `[militarist, boom, expansionist, merchant,
+   tech_rusher, turtle]` (the order in `ai_personalities.json`).
+2. Project it in `projection.rs` where `PlayerView` is built. Generalist training/play
+   passes `-1` → all-zero one-hot.
+3. Mirror the field in the Python view dict (it already serializes from the same wire).
+
+## New observation layout (richer DENSE; a spatial/CNN obs stays a later v1.1 step)
+
+`OBS_DIM = 96`. asinh-normalized uniformly (unchanged contract), f64→f32, parity ≤1e-4.
+All counts/fractions sourced ONLY from the fog-aware `PlayerView` (no privileged state).
+
+| Block | Idx | Fields (source) |
+|---|---|---|
+| A self macro (keep) | 0:8 | gold, gold_per_turn, science_per_turn, score_estimate, city_count, unit_count, happiness_pool, culture_per_turn (resources/score) |
+| B economy aggregate | 8:16 | Σcity food, Σprod, Σscience, Σgold, Σculture yields; avg pop; Σowned_tiles (territory); golden_age_active (cities[].yields, cities[].owned_tiles, resources.golden_age_active) |
+| C tech/culture/civics | 16:24 | researched_count, tech_progress/tech_cost frac, available_count, tradition researched_count, tradition_progress frac, civics set-count (authority+labor+economy non-null), anarchy_turns, current_tech present (research/culture/civics) |
+| D military | 24:34 | total units, warriors, founders, Σhp/Σmax_hp (army health frac), avg experience, #promotion_available, #fortified, #shield_wall‖braced posture, Σequipped, avg movement_left/movement_max (units[]) |
+| E per-city (4×6) | 34:58 | per city c in cities[:4]: population, production yield, food_stored/threshold frac, hp/max_hp frac (siege), buildings count, owned_tiles count |
+| F terrain summary | 58:72 | #visible tiles, explored fraction (explored/total seen), #river tiles, #improvement tiles, #owned tiles, biome histogram over a fixed biome vocab (top ~8 biomes counts) (tiles[]) |
+| G diplomacy | 72:80 | #known opponents, #war, #peace, #open_borders, #shared_map, #trade_deals, #pending_envelopes, #ransom_offers (diplomacy/pending_events) |
+| H turn | 80:82 | turn number, turn/1000 progress |
+| I clan one-hot | 82:88 | 6-wide one-hot from `clan_index` (all-zero if -1 = generalist) |
+| — reserve | 88:96 | zero-pad headroom for next additions without a third contract break |
+
+Fixed biome vocab for block F (canonical order, append-only): the biome label ids in
+`public/games/age-of-dwarves/data/.../biomes` — pin the exact list in code, unknown
+biomes fold into an "other" bucket so the dim is stable.
+
+## Difficulty lever (Rust, independent, cargo-verifiable)
+
+Per-slot temperature: replace the single global `MC_LEARNED_TEMPERATURE` read
+(`dispatch.rs:1137`) with a per-slot value sourced from the difficulty tier on
+`players[slot]`. Test: two slots → different sampling streams from the same logits.
+
+## Controller wiring
+
+Single `learned:clan-v1` id; clan_index + temperature are *inputs*, not separate ids.
+`ident()` version string gains the trained-run tag for save/replay fingerprinting.
+
+## Parity + verification (on the fleet — no local cargo)
+
+1. Regenerate `learned_parity.rs` fixtures via the Python capture tool on real views
+   (now OBS_DIM=96 + clan).
+2. `cargo test -p mc-player-api learned_parity` → float-exact obs, bit-exact mask.
+3. Determinism: same seed twice → byte-identical action stream.
+4. `encoders.py` self-test: encode a captured view, assert shape 96 + asinh applied.
+
+## Landing order (one verified batch)
+
+1. `clan_index` on PlayerView + projection (Rust).  2. Rewrite `encoders.py` to 96-dim.
+3. Rewrite `encoder.rs::encode_observation` to match field-for-field.  4. Bump
+`inference.rs` input shape 32→96.  5. Per-slot temperature.  6. Regen fixtures +
+`cargo test learned_parity` + determinism, all green on a worker.  7. Commit.