docs(ai): richer clan-conditioned observation encoder spec
Some checks are pending
ci / regression gate (push) Waiting to run
Some checks are pending
ci / regression gate (push) Waiting to run
Pins the new OBS_DIM=96 observation contract (vs current macro-only 32) so the Python (encoders.py) + Rust (learned/encoder.rs) encoders land bit-exact in one verified batch. Adds the discarded channels the owner wants — tech/culture/civics, per-city territory/buildings/siege, army health/experience, terrain summary — plus the 6-wide clan one-hot for the clan-conditioned model (generalist = all-zero). Surfaces the key prerequisite: PlayerView exposes no clan, so PlayerState.clan_id must be projected as clan_index first. Action space unchanged. Per-slot temperature (difficulty lever) + controller wiring specified. Verification on the fleet: regen learned_parity fixtures + cargo test, determinism. A spatial/CNN obs stays a later v1.1 step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
618599d22e
commit
b6d539eaab
1 changed files with 75 additions and 0 deletions
75
.project/designs/richer-encoder-spec.md
Normal file
75
.project/designs/richer-encoder-spec.md
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
# Richer Observation Encoder — Spec (clan-conditioned)
|
||||
|
||||
> Owner decisions (2026-06-30): clan-CONDITIONED single model; **block on a richer
|
||||
> encoder before training**; land all Rust wiring first. This spec pins the new
|
||||
> observation contract so `tooling/rl_self_play/encoders.py` and
|
||||
> `src/simulator/crates/mc-player-api/src/learned/encoder.rs` land bit-exact in one
|
||||
> verified batch. See [[mc-ai-trained-not-scripted]] and
|
||||
> `.project/designs/trained-ai-per-clan-difficulty-plan.md`.
|
||||
|
||||
## Why (the cap being removed)
|
||||
|
||||
The current 32-float obs (`encode_observation`) is macro-only: it discards tech-tree
|
||||
progress, per-city territory/buildings/siege state, army health/experience, terrain,
|
||||
and civics — exactly the signals a competent 4X policy needs. This is the documented
|
||||
competence cap. The action space (`ACTION_DIM`, mask) is **unchanged** — only the
|
||||
observation grows.
|
||||
|
||||
## Prerequisite (Rust, cargo-verifiable, do first)
|
||||
|
||||
`PlayerView` exposes **no clan/personality**. Clan-conditioning needs it.
|
||||
1. Add `pub clan_index: i32` to `PlayerView` (view.rs) — the bound player's clan as a
|
||||
stable 0..5 index (`-1` = unset/generalist), mapped from `PlayerState.clan_id`
|
||||
string via a canonical order `[militarist, boom, expansionist, merchant,
|
||||
tech_rusher, turtle]` (the order in `ai_personalities.json`).
|
||||
2. Project it in `projection.rs` where `PlayerView` is built. Generalist training/play
|
||||
passes `-1` → all-zero one-hot.
|
||||
3. Mirror the field in the Python view dict (it already serializes from the same wire).
|
||||
|
||||
## New observation layout (richer DENSE; a spatial/CNN obs stays a later v1.1 step)
|
||||
|
||||
`OBS_DIM = 96`. asinh-normalized uniformly (unchanged contract), f64→f32, parity ≤1e-4.
|
||||
All counts/fractions sourced ONLY from the fog-aware `PlayerView` (no privileged state).
|
||||
|
||||
| Block | Idx | Fields (source) |
|
||||
|---|---|---|
|
||||
| A self macro (keep) | 0:8 | gold, gold_per_turn, science_per_turn, score_estimate, city_count, unit_count, happiness_pool, culture_per_turn (resources/score) |
|
||||
| B economy aggregate | 8:16 | Σcity food, Σprod, Σscience, Σgold, Σculture yields; avg pop; Σowned_tiles (territory); golden_age_active (cities[].yields, cities[].owned_tiles, resources.golden_age_active) |
|
||||
| C tech/culture/civics | 16:24 | researched_count, tech_progress/tech_cost frac, available_count, tradition researched_count, tradition_progress frac, civics set-count (authority+labor+economy non-null), anarchy_turns, current_tech present (research/culture/civics) |
|
||||
| D military | 24:34 | total units, warriors, founders, Σhp/Σmax_hp (army health frac), avg experience, #promotion_available, #fortified, #shield_wall‖braced posture, Σequipped, avg movement_left/movement_max (units[]) |
|
||||
| E per-city (4×6) | 34:58 | per city c in cities[:4]: population, production yield, food_stored/threshold frac, hp/max_hp frac (siege), buildings count, owned_tiles count |
|
||||
| F terrain summary | 58:72 | #visible tiles, explored fraction (explored/total seen), #river tiles, #improvement tiles, #owned tiles, biome histogram over a fixed biome vocab (top ~8 biomes counts) (tiles[]) |
|
||||
| G diplomacy | 72:80 | #known opponents, #war, #peace, #open_borders, #shared_map, #trade_deals, #pending_envelopes, #ransom_offers (diplomacy/pending_events) |
|
||||
| H turn | 80:82 | turn number, turn/1000 progress |
|
||||
| I clan one-hot | 82:88 | 6-wide one-hot from `clan_index` (all-zero if -1 = generalist) |
|
||||
| — reserve | 88:96 | zero-pad headroom for next additions without a third contract break |
|
||||
|
||||
Fixed biome vocab for block F (canonical order, append-only): the biome label ids in
|
||||
`public/games/age-of-dwarves/data/.../biomes` — pin the exact list in code, unknown
|
||||
biomes fold into an "other" bucket so the dim is stable.
|
||||
|
||||
## Difficulty lever (Rust, independent, cargo-verifiable)
|
||||
|
||||
Per-slot temperature: replace the single global `MC_LEARNED_TEMPERATURE` read
|
||||
(`dispatch.rs:1137`) with a per-slot value sourced from the difficulty tier on
|
||||
`players[slot]`. Test: two slots → different sampling streams from the same logits.
|
||||
|
||||
## Controller wiring
|
||||
|
||||
Single `learned:clan-v1` id; clan_index + temperature are *inputs*, not separate ids.
|
||||
`ident()` version string gains the trained-run tag for save/replay fingerprinting.
|
||||
|
||||
## Parity + verification (on the fleet — no local cargo)
|
||||
|
||||
1. Regenerate `learned_parity.rs` fixtures via the Python capture tool on real views
|
||||
(now OBS_DIM=96 + clan).
|
||||
2. `cargo test -p mc-player-api learned_parity` → float-exact obs, bit-exact mask.
|
||||
3. Determinism: same seed twice → byte-identical action stream.
|
||||
4. `encoders.py` self-test: encode a captured view, assert shape 96 + asinh applied.
|
||||
|
||||
## Landing order (one verified batch)
|
||||
|
||||
1. `clan_index` on PlayerView + projection (Rust). 2. Rewrite `encoders.py` to 96-dim.
|
||||
3. Rewrite `encoder.rs::encode_observation` to match field-for-field. 4. Bump
|
||||
`inference.rs` input shape 32→96. 5. Per-slot temperature. 6. Regen fixtures +
|
||||
`cargo test learned_parity` + determinism, all green on a worker. 7. Commit.
|
||||
Loading…
Add table
Reference in a new issue