autocommit f88e9b072e feat(sprites): ✨ OSS standin coverage p2-23..27 (536 PNGs) + xi-v11 charter

- 536 game-icons.net CC-BY-3.0 standins fill every renderer slot (units/buildings/wonders/city-tiers), id-keyed flat layout
- LICENSES.md (536 ledgered rows, SHA256), STANDINS.md, sprite-license-audit passes
- build_standins.py rewritten data-driven off manifest + icon_rules.json (replaces mapping.json)
- juggernaut-xi-v11 added to approved model list (charter + 2 instruction modules), operator decision
- objectives p2-23..27 + p2-22: partial (standin coverage; final art deferred)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-04 04:40:14 -07:00

17 KiB

Raw Permalink Blame History

Sprite Generation Pipeline

Post-reset refresh — 2026-06-03. The slate was cleared on 2026-04-17 (7 pre-existing sprites deleted for quality-bar failure; prompt library + ranker rebuilt). The numbers below are read directly from the live code, not from the pre-reset doc.

Overview

A single long-running process (cli.py run) submits sprites to the model-boss queue, collects results via Redis pubsub, scores each variant through a multi-stage tiered pipeline, and re-queues sprites that need more good variants. The human's job is reviewing passing variants in the Theater GUI and approving winners.

  game JSON ──scan──▶ spritegen.db (sprites table — what to generate)
                            │
                     ┌──────┴──────┐
                     │ ORCHESTRATOR │  cli.py run
                     │   (daemon)   │
                     └──────┬──────┘
                            │
               ┌────────────┼────────────┐
               ▼            ▼            ▼
          ┌─────────┐ ┌──────────┐ ┌──────────┐
          │GENERATE │ │   RANK   │ │  STATUS  │
          │model-boss│ │ 4-stage  │ │  CHECK   │
          │ queue +  │ │ tiered   │ │          │
          │  Redis   │ │ scoring  │ │ pass?    │
          └────┬────┘ └────┬─────┘ └────┬─────┘
               │           │            │
               ▼           ▼            ▼
          raw/*.png   spritegen.db   status=review (≥3 good)
                      variant.notes  status=needed (retry)
                      variant.rating max 15 attempts → review anyway
                            │
                     ┌──────┴──────┐
                     │ THEATER GUI │  localhost:5850/?spriteTheater=true
                     │ human picks │
                     │ approve/skip│
                     └──────┬──────┘
                            │
                     ┌──────┴──────┐
                     │   INSTALL   │  cli.py approve <variant_id>
                     │ rembg bg cut│
                     │ resize/cat  │
                     │ → game dir  │
                     │ + LICENSES  │
                     └─────────────┘

Infrastructure dependencies

The pipeline is infra-gated — code is all present, but a full run loop needs:

Dependency	Used by	How it is reached
model-boss coordinator	generation + local VLM scoring (qwen3 tier)	`model_boss.InferenceClient` (submit / wait_for_result)
Redis	result delivery (pubsub)	via model-boss `InferenceClient`
GPU (diffusion)	image generation (`juggernaut-xi-v11`)	model-boss pool slot
Claude API	haiku / sonnet / opus scoring tiers	`claude-code-batch-sdk` `ClaudeClient`

InferenceClient is imported at module top in engine/generator.py and inside Scorer in engine/ranker.py. If model-boss / Redis are down, submit_batch and the qwen3 scoring tier fail; the GUI (server.py) and scan / status work without any of the above.

Data Model (spritegen.db)

sprites                         One row per sprite to generate
  id          TEXT PK           "units/spearmen_dwarves_m"
  category    TEXT              "units"
  entity_id   TEXT              "spearmen_dwarves_m"
  status      TEXT              needed → review → approved → installed (also: skip, rejected)
  prompt      TEXT              Scan-time prompt (recomposed fresh from YAML at submit)
  negative_prompt TEXT          Scan-time negative (recomposed fresh from YAML at submit)
  install_path TEXT             Game asset destination path
  gen_width/height              Generation resolution (1024×1024)
  target_width/height           Final sprite size (category-dependent — see below)
        │
        │ 1:N
        ▼
variants                        One row per generated image
  id            INTEGER PK
  sprite_id     TEXT FK → sprites
  seed          INTEGER         Reproducible seed (70/30 proven/random split)
  job_status    TEXT            submitted → completed | failed
  job_id        TEXT            model-boss request id (survives restarts)
  raw_path      TEXT            raw/{sprite_id}_{variant_id}.png
  processed_path TEXT           variants/{...}.png (after bg removal)
  is_approved   INTEGER         0 or 1
  rating        INTEGER         1-5 (derived from confidence), -1 = rejected
  review_tier   INTEGER         How many scoring stages this variant has passed
  notes         TEXT            JSON result {"gates":{...},"quality":{...},"confidence":...}
  ── immutable generation record ──
  model         TEXT            "juggernaut-xi-v11"
  prompt_used   TEXT            Exact prompt sent to model
  negative_used TEXT            Exact negative sent
  guidance_scale REAL           7.0 (config default; adaptive 6.5–9.0 once hints exist)
  steps         INTEGER         28

sprite_dimensions               Quality/race/gender permutations
generation_runs                 Batch tracking (total_jobs / completed / failed)
seed_pool                       Accumulated high-scoring seeds (drives 70/30 reuse)

Orchestrator Loop (`cli.py run`)

python3 cli.py run --category units --variants 1

MAX_REGEN_ATTEMPTS = 15 (default; --max-attempts overrides). Each loop iteration (cmd_run in cli.py):

Submit — query up to 100 needed sprites in the category, skip any with in-flight (job_status='submitted') variants, submit the rest at priority="high", variants_per each. Each sprite's regen counter increments; at the cap it is forced to review.
Collect — gen.collect_pending() awaits all submitted variants via Redis pubsub, saves each image to raw/, and calls ranker.advance_sprite() per completed variant.
Evaluate — for every sprite now in review, run ranker.rank_and_filter(). If it needs_regen and attempts remain, set back to needed; otherwise report best variant.
Status + sleep — print the funnel (needed | queued | review | done / total), sleep 2 s, repeat. When needed == 0 and queued == 0, idle (30 s sleep) until Ctrl+C.

The GUI server is started in a daemon thread on --port (default 5850) at loop start.

Scoring — tiered gate+quality pipeline

Scoring is two-tier per stage and multi-stage overall.

Per-variant: boolean gates → quality ranges (`engine/ranker.py`)

Each Scorer.score() evaluates a variant in two passes (or one combined pass when single_pass: true):

Gates (binary pass/fail). ANY false gate ⇒ instant reject, confidence = 0.0.
Quality (0–100 per dimension, scored only if all gates pass). confidence = mean(quality) / 100.
Quality floor: QUALITY_DIM_FLOOR = 45 — any single quality dim below 45 ⇒ reject.

Unit category (UNIT_GATES, 15 gates): facing_southwest, single_character, no_text_watermark, no_base_or_ground, full_body_visible, correct_subject_type, is_fantasy_dressed, dwarf_proportions, not_photorealistic, no_anime_style, no_pixel_art, no_multiple_poses, no_chroma_bleed, correct_camera_elevation, clean_background.

Unit quality dims (UNIT_QUALITY_DIMS, 5, count toward confidence): direction_quality, art_style, equipment_detail, background_cleanliness, shadow_acceptability.

Unit display-only dims (UNIT_DISPLAY_DIMS, NOT in confidence — rear-view hides them): race_accuracy, gender_accuracy.

Other categories define their own gate/quality sets (TERRAIN_GATES, BUILDING_GATES, RESOURCE_GATES, SPELL_GATES and matching quality tuples).

Gate and quality descriptions are contextualized per entity at score time (_contextualize_descriptions) — e.g. gender_accuracy for a female dwarf becomes a "no beard" check, dwarf_proportions is auto-passed for non-dwarf races.

Per-sprite: multi-stage escalation (`engine/prompts/scoring_pipeline.yaml`)

Variants escalate through stages; a variant only advances if it passes the current stage. target_approved: 3 — the pipeline stops escalating once 3 variants clear ALL stages, and only the deficit is sent onward.

Stage	Backend	Model	Threshold	Mode	Tiebreaker
qwen3	model-boss	`qwen3-vl-8b-instruct`	0.40	two-pass	±0.12
haiku	claude	`haiku`	0.50	two-pass	±0.08
sonnet	claude	`sonnet`	0.58	two-pass	±0.08
opus	claude	`opus`	0.65	single-pass	—

Tiebreaker: when a variant's confidence lands within ±range of the stage threshold, quality is re-scored once and the two passes are averaged (_merge_quality).

Confidence thresholds (`engine/ranker.py`)

CONFIDENCE_THRESHOLD = 0.70 — default base threshold for rank_and_filter display and for any stage that omits its own threshold in YAML.
MIN_GOOD_VARIANTS = 3 — fallback target_approved when YAML omits it.
CATEGORY_THRESHOLDS — per-category relaxations applied in advance_sprite / rank_and_filter: resources: 0.55, improvements: 0.55, ui: 0.55.
Concurrency: model-boss: 4, claude: 8 requests in flight per backend.

Prompt Library (`engine/prompts/`)

All prompt content lives in YAML data files — prompts/__init__.py is pure composition logic (no hardcoded prompt strings except the unit Layer-1 type-lock anchor and the BIOME_* biome-grid lookup tables, both documented inline).

YAML files: combat_types.yaml, composition.yaml, genders.yaml, keywords.yaml, negatives.yaml, quality_tiers.yaml, races.yaml, styles.yaml, unit_classes.yaml, scoring_pipeline.yaml.

Unit prompt — 9-layer SDXL weight-ordered composition (`compose_prompt`)

Token order matters: SDXL weights the first ~40 tokens ~4× more heavily, so the type lock and direction anchor come FIRST and the weapon sits at ~token 15.

Layer	Content	Source
1	Type lock + direction anchor + overhead view (hardcoded anchor)	`__init__.py` Layer 1
2	Unit-class weapon / mount / equipment / armor / stance	`unit_classes.yaml`
3	Gender cues (race-specific override if present)	`genders.yaml`
4	Race body + features + armor aesthetic	`races.yaml`
5	Combat-type composition (direction, camera, style)	`composition.yaml`
6	Keyword ability flavors	`keywords.yaml`
7	Style tail (`painted fantasy game art, clean readable silhouette`)	hardcoded tail
8	Quality-tier equipment detail	`quality_tiers.yaml`

The Layer-1 anchor is: game sprite, single character, simple background, character walking AWAY toward bottom-left, BACK turned to camera, STEEP top-down overhead view, top of head visible from above.

Negatives — rule-based (`get_negative`)

negatives.yaml holds rules with when: property matchers. _sprite_properties derives properties (background, is_tileable, is_layered, has_character, has_mount, facing, race) from category + combat type; every matching rule's negate tokens are concatenated. Adding a category/combat-type only requires editing _sprite_properties.

Background strategy

No chroma-key color in prompts. simple background is emitted; the actual background is removed in post by rembg (U2Net neural segmentation) during install. (Older docs describing a green chroma key are pre-reset — the no_chroma_bleed gate still guards against green/yellow contamination bleeding onto the subject.)

Generation parameters (`sprite-config.json` + `engine/generator.py`)

{
  "model": "juggernaut-xi-v11",
  "api_base": "http://localhost:8210",
  "defaults": { "steps": 28, "guidance_scale": 7.0, "width": 1024, "height": 1024 }
}

steps: 28, guidance_scale: 7.0 (config defaults). Guidance is adaptive: once a (entity, category) has ≥10 passing samples, best_guidance is used, clamped to 6.5–9.0.
Seeds: 70/30 proven/random split (_select_seeds). Proven seeds (avg_quality ≥ 65) come from seed_pool; ±1..3 neighbors of the best proven seed are also explored.
Variant modifiers: cycled from styles.yaml::variant_modifiers; biased 60% toward historically-passing modifier indices once ≥20 samples exist.

Model approval caveat (unresolved). juggernaut-xi-v11 is the configured model and is present + cached in the model-boss registry, but it is NOT on the asset-sprite approved list (juggernaut-xl-v9, epicrealism-xl, illustrious-xl-v2 — per dot-claude/ instructions/dataloader-sprites.md, safety-rules-local.md, team-leads/asset-sprite.md, and objectives/p2-28). The installer writes model-commercial:juggernaut-xi-v11 into LICENSES.md on every approved install, so this attribution is load-bearing for commercial-rights compliance. Resolving this requires the asset-sprite charter owner / user to either (a) add XI-v11 to the approved list in CLAUDE.md, or (b) switch sprite-config.json back to an approved model. Until then, do not ship sprites generated by XI-v11.

Resolution by category (`engine/prompts/init.py`)

Category	Generation (`get_generation_size`)	Final (`get_target_size`)
units	1024×1024	256×256
terrain / biome_grid / edges	1024×1024	384×332
buildings	1024×1024	128×128
spells	1024×1024	128×128
resources / improvements	1024×1024	64×64
ui	512×512	64×64

Theater GUI (`server.py` + `gui/`)

FastAPI app (create_app) exposes the review REST API (/api/sprites, /api/theater, /api/stats, /api/progress, /api/pipeline, image serving under /images/..., SSE at /api/stream/variants). The React SPA in gui/ builds to gui/dist/; when that build is present the server mounts it and serves the Theater at /?spriteTheater=true. Without a build the server runs API-only (root returns 404 — expected).

GUI build currently broken (2026-06-03). pnpm build in gui/ fails with TS2307: Cannot find module 'react' because pnpm install does not materialize gui/node_modules — the @lilith/ui-animated workspace dependency hits the known pnpm workspace:* resolution bug (needs a .pnpmfile.cjs + Verdaccio registry; see the project memory note). The Python server boot-smokes green, but the Theater front-end cannot render until the GUI install/build is fixed.

Boot:

python3 cli.py start --port 5850          # GUI server only (cmd_review)
python3 -m uvicorn server:app --port 5850 # equivalent direct invocation

CLI Commands

# Full pipeline (submit → collect → rank → regen loop + GUI server)
python3 cli.py run --category units --variants 1

# GUI server only
python3 cli.py start --port 5850

# Scan game data → populate sprite registry (use --demo for minimal data)
python3 cli.py scan --sprite-type units
python3 cli.py --demo scan

# Status funnel
python3 cli.py status

# Manual stage operations
python3 cli.py generate --sprite units/spearmen_dwarves_m --variants 8   # submit only
python3 cli.py listen                                                    # collect results
python3 cli.py rank --sprite units/spearmen_dwarves_m                    # score one sprite
python3 cli.py approve 129                                               # approve → install + ledger
python3 cli.py reset --sprite units/spearmen_dwarves_m                   # back to needed

# Prompt iteration
python3 cli.py test-prompt --entity spearmen --race dwarves --gender male --seeds 42 123 777
python3 cli.py refresh-prompts --category units --clear-scores

# Monitoring / export
python3 cli.py monitor
python3 cli.py export --format json

File Flow

game JSON data (public/games/age-of-dwarves/data/ — or demo-data/ with --demo)
    │
    ▼ scan
spritegen.db ──── sprites table (what to generate)
    │
    ▼ generate (model-boss queue → juggernaut-xi-v11, results via Redis pubsub)
raw/{sprite_id}_{variant_id}.png ──── 1024×1024
    │
    ▼ rank (tiered: qwen3-VL → haiku → sonnet → opus, gates + quality)
spritegen.db ──── variants.notes (gates+quality JSON), variants.rating, review_tier
    │
    ▼ human approve (Theater GUI or cli.py approve <variant_id>)
spritegen.db ──── variant.is_approved = 1
    │
    ▼ process (rembg background removal + resize/composite)
variants/{sprite_id}_{variant_id}.png ──── category target size, transparent
    │
    ▼ install (+ LICENSES.md row via installer._append_ledger_row)
public/games/age-of-dwarves/assets/sprites/<category>/{name}.png

17 KiB Raw Permalink Blame History Unescape Escape

Sprite Generation Pipeline

Overview

Infrastructure dependencies

Data Model (spritegen.db)

Orchestrator Loop (cli.py run)

Scoring — tiered gate+quality pipeline

Per-variant: boolean gates → quality ranges (engine/ranker.py)

Per-sprite: multi-stage escalation (engine/prompts/scoring_pipeline.yaml)

Confidence thresholds (engine/ranker.py)

Prompt Library (engine/prompts/)

Unit prompt — 9-layer SDXL weight-ordered composition (compose_prompt)

Negatives — rule-based (get_negative)