magicciv/tools/sprite-generation/docs/PIPELINE.md
autocommit f88e9b072e feat(sprites): OSS standin coverage p2-23..27 (536 PNGs) + xi-v11 charter
- 536 game-icons.net CC-BY-3.0 standins fill every renderer slot (units/buildings/wonders/city-tiers), id-keyed flat layout
- LICENSES.md (536 ledgered rows, SHA256), STANDINS.md, sprite-license-audit passes
- build_standins.py rewritten data-driven off manifest + icon_rules.json (replaces mapping.json)
- juggernaut-xi-v11 added to approved model list (charter + 2 instruction modules), operator decision
- objectives p2-23..27 + p2-22: partial (standin coverage; final art deferred)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 04:40:14 -07:00

17 KiB
Raw Permalink Blame History

Sprite Generation Pipeline

Post-reset refresh — 2026-06-03. The slate was cleared on 2026-04-17 (7 pre-existing sprites deleted for quality-bar failure; prompt library + ranker rebuilt). The numbers below are read directly from the live code, not from the pre-reset doc.

Overview

A single long-running process (cli.py run) submits sprites to the model-boss queue, collects results via Redis pubsub, scores each variant through a multi-stage tiered pipeline, and re-queues sprites that need more good variants. The human's job is reviewing passing variants in the Theater GUI and approving winners.

  game JSON ──scan──▶ spritegen.db (sprites table — what to generate)
                            │
                     ┌──────┴──────┐
                     │ ORCHESTRATOR │  cli.py run
                     │   (daemon)   │
                     └──────┬──────┘
                            │
               ┌────────────┼────────────┐
               ▼            ▼            ▼
          ┌─────────┐ ┌──────────┐ ┌──────────┐
          │GENERATE │ │   RANK   │ │  STATUS  │
          │model-boss│ │ 4-stage  │ │  CHECK   │
          │ queue +  │ │ tiered   │ │          │
          │  Redis   │ │ scoring  │ │ pass?    │
          └────┬────┘ └────┬─────┘ └────┬─────┘
               │           │            │
               ▼           ▼            ▼
          raw/*.png   spritegen.db   status=review (≥3 good)
                      variant.notes  status=needed (retry)
                      variant.rating max 15 attempts → review anyway
                            │
                     ┌──────┴──────┐
                     │ THEATER GUI │  localhost:5850/?spriteTheater=true
                     │ human picks │
                     │ approve/skip│
                     └──────┬──────┘
                            │
                     ┌──────┴──────┐
                     │   INSTALL   │  cli.py approve <variant_id>
                     │ rembg bg cut│
                     │ resize/cat  │
                     │ → game dir  │
                     │ + LICENSES  │
                     └─────────────┘

Infrastructure dependencies

The pipeline is infra-gated — code is all present, but a full run loop needs:

Dependency Used by How it is reached
model-boss coordinator generation + local VLM scoring (qwen3 tier) model_boss.InferenceClient (submit / wait_for_result)
Redis result delivery (pubsub) via model-boss InferenceClient
GPU (diffusion) image generation (juggernaut-xi-v11) model-boss pool slot
Claude API haiku / sonnet / opus scoring tiers claude-code-batch-sdk ClaudeClient

InferenceClient is imported at module top in engine/generator.py and inside Scorer in engine/ranker.py. If model-boss / Redis are down, submit_batch and the qwen3 scoring tier fail; the GUI (server.py) and scan / status work without any of the above.

Data Model (spritegen.db)

sprites                         One row per sprite to generate
  id          TEXT PK           "units/spearmen_dwarves_m"
  category    TEXT              "units"
  entity_id   TEXT              "spearmen_dwarves_m"
  status      TEXT              needed → review → approved → installed (also: skip, rejected)
  prompt      TEXT              Scan-time prompt (recomposed fresh from YAML at submit)
  negative_prompt TEXT          Scan-time negative (recomposed fresh from YAML at submit)
  install_path TEXT             Game asset destination path
  gen_width/height              Generation resolution (1024×1024)
  target_width/height           Final sprite size (category-dependent — see below)
        │
        │ 1:N
        ▼
variants                        One row per generated image
  id            INTEGER PK
  sprite_id     TEXT FK → sprites
  seed          INTEGER         Reproducible seed (70/30 proven/random split)
  job_status    TEXT            submitted → completed | failed
  job_id        TEXT            model-boss request id (survives restarts)
  raw_path      TEXT            raw/{sprite_id}_{variant_id}.png
  processed_path TEXT           variants/{...}.png (after bg removal)
  is_approved   INTEGER         0 or 1
  rating        INTEGER         1-5 (derived from confidence), -1 = rejected
  review_tier   INTEGER         How many scoring stages this variant has passed
  notes         TEXT            JSON result {"gates":{...},"quality":{...},"confidence":...}
  ── immutable generation record ──
  model         TEXT            "juggernaut-xi-v11"
  prompt_used   TEXT            Exact prompt sent to model
  negative_used TEXT            Exact negative sent
  guidance_scale REAL           7.0 (config default; adaptive 6.59.0 once hints exist)
  steps         INTEGER         28

sprite_dimensions               Quality/race/gender permutations
generation_runs                 Batch tracking (total_jobs / completed / failed)
seed_pool                       Accumulated high-scoring seeds (drives 70/30 reuse)

Orchestrator Loop (cli.py run)

python3 cli.py run --category units --variants 1

MAX_REGEN_ATTEMPTS = 15 (default; --max-attempts overrides). Each loop iteration (cmd_run in cli.py):

  1. Submit — query up to 100 needed sprites in the category, skip any with in-flight (job_status='submitted') variants, submit the rest at priority="high", variants_per each. Each sprite's regen counter increments; at the cap it is forced to review.
  2. Collectgen.collect_pending() awaits all submitted variants via Redis pubsub, saves each image to raw/, and calls ranker.advance_sprite() per completed variant.
  3. Evaluate — for every sprite now in review, run ranker.rank_and_filter(). If it needs_regen and attempts remain, set back to needed; otherwise report best variant.
  4. Status + sleep — print the funnel (needed | queued | review | done / total), sleep 2 s, repeat. When needed == 0 and queued == 0, idle (30 s sleep) until Ctrl+C.

The GUI server is started in a daemon thread on --port (default 5850) at loop start.

Scoring — tiered gate+quality pipeline

Scoring is two-tier per stage and multi-stage overall.

Per-variant: boolean gates → quality ranges (engine/ranker.py)

Each Scorer.score() evaluates a variant in two passes (or one combined pass when single_pass: true):

  • Gates (binary pass/fail). ANY false gate ⇒ instant reject, confidence = 0.0.
  • Quality (0100 per dimension, scored only if all gates pass). confidence = mean(quality) / 100.
  • Quality floor: QUALITY_DIM_FLOOR = 45 — any single quality dim below 45 ⇒ reject.

Unit category (UNIT_GATES, 15 gates): facing_southwest, single_character, no_text_watermark, no_base_or_ground, full_body_visible, correct_subject_type, is_fantasy_dressed, dwarf_proportions, not_photorealistic, no_anime_style, no_pixel_art, no_multiple_poses, no_chroma_bleed, correct_camera_elevation, clean_background.

Unit quality dims (UNIT_QUALITY_DIMS, 5, count toward confidence): direction_quality, art_style, equipment_detail, background_cleanliness, shadow_acceptability.

Unit display-only dims (UNIT_DISPLAY_DIMS, NOT in confidence — rear-view hides them): race_accuracy, gender_accuracy.

Other categories define their own gate/quality sets (TERRAIN_GATES, BUILDING_GATES, RESOURCE_GATES, SPELL_GATES and matching quality tuples).

Gate and quality descriptions are contextualized per entity at score time (_contextualize_descriptions) — e.g. gender_accuracy for a female dwarf becomes a "no beard" check, dwarf_proportions is auto-passed for non-dwarf races.

Per-sprite: multi-stage escalation (engine/prompts/scoring_pipeline.yaml)

Variants escalate through stages; a variant only advances if it passes the current stage. target_approved: 3 — the pipeline stops escalating once 3 variants clear ALL stages, and only the deficit is sent onward.

Stage Backend Model Threshold Mode Tiebreaker
qwen3 model-boss qwen3-vl-8b-instruct 0.40 two-pass ±0.12
haiku claude haiku 0.50 two-pass ±0.08
sonnet claude sonnet 0.58 two-pass ±0.08
opus claude opus 0.65 single-pass

Tiebreaker: when a variant's confidence lands within ±range of the stage threshold, quality is re-scored once and the two passes are averaged (_merge_quality).

Confidence thresholds (engine/ranker.py)

  • CONFIDENCE_THRESHOLD = 0.70 — default base threshold for rank_and_filter display and for any stage that omits its own threshold in YAML.
  • MIN_GOOD_VARIANTS = 3 — fallback target_approved when YAML omits it.
  • CATEGORY_THRESHOLDS — per-category relaxations applied in advance_sprite / rank_and_filter: resources: 0.55, improvements: 0.55, ui: 0.55.
  • Concurrency: model-boss: 4, claude: 8 requests in flight per backend.

Prompt Library (engine/prompts/)

All prompt content lives in YAML data filesprompts/__init__.py is pure composition logic (no hardcoded prompt strings except the unit Layer-1 type-lock anchor and the BIOME_* biome-grid lookup tables, both documented inline).

YAML files: combat_types.yaml, composition.yaml, genders.yaml, keywords.yaml, negatives.yaml, quality_tiers.yaml, races.yaml, styles.yaml, unit_classes.yaml, scoring_pipeline.yaml.

Unit prompt — 9-layer SDXL weight-ordered composition (compose_prompt)

Token order matters: SDXL weights the first ~40 tokens ~4× more heavily, so the type lock and direction anchor come FIRST and the weapon sits at ~token 15.

Layer Content Source
1 Type lock + direction anchor + overhead view (hardcoded anchor) __init__.py Layer 1
2 Unit-class weapon / mount / equipment / armor / stance unit_classes.yaml
3 Gender cues (race-specific override if present) genders.yaml
4 Race body + features + armor aesthetic races.yaml
5 Combat-type composition (direction, camera, style) composition.yaml
6 Keyword ability flavors keywords.yaml
7 Style tail (painted fantasy game art, clean readable silhouette) hardcoded tail
8 Quality-tier equipment detail quality_tiers.yaml

The Layer-1 anchor is: game sprite, single character, simple background, character walking AWAY toward bottom-left, BACK turned to camera, STEEP top-down overhead view, top of head visible from above.

Negatives — rule-based (get_negative)

negatives.yaml holds rules with when: property matchers. _sprite_properties derives properties (background, is_tileable, is_layered, has_character, has_mount, facing, race) from category + combat type; every matching rule's negate tokens are concatenated. Adding a category/combat-type only requires editing _sprite_properties.

Background strategy

No chroma-key color in prompts. simple background is emitted; the actual background is removed in post by rembg (U2Net neural segmentation) during install. (Older docs describing a green chroma key are pre-reset — the no_chroma_bleed gate still guards against green/yellow contamination bleeding onto the subject.)

Generation parameters (sprite-config.json + engine/generator.py)

{
  "model": "juggernaut-xi-v11",
  "api_base": "http://localhost:8210",
  "defaults": { "steps": 28, "guidance_scale": 7.0, "width": 1024, "height": 1024 }
}
  • steps: 28, guidance_scale: 7.0 (config defaults). Guidance is adaptive: once a (entity, category) has ≥10 passing samples, best_guidance is used, clamped to 6.59.0.
  • Seeds: 70/30 proven/random split (_select_seeds). Proven seeds (avg_quality ≥ 65) come from seed_pool; ±1..3 neighbors of the best proven seed are also explored.
  • Variant modifiers: cycled from styles.yaml::variant_modifiers; biased 60% toward historically-passing modifier indices once ≥20 samples exist.

Model approval caveat (unresolved). juggernaut-xi-v11 is the configured model and is present + cached in the model-boss registry, but it is NOT on the asset-sprite approved list (juggernaut-xl-v9, epicrealism-xl, illustrious-xl-v2 — per dot-claude/ instructions/dataloader-sprites.md, safety-rules-local.md, team-leads/asset-sprite.md, and objectives/p2-28). The installer writes model-commercial:juggernaut-xi-v11 into LICENSES.md on every approved install, so this attribution is load-bearing for commercial-rights compliance. Resolving this requires the asset-sprite charter owner / user to either (a) add XI-v11 to the approved list in CLAUDE.md, or (b) switch sprite-config.json back to an approved model. Until then, do not ship sprites generated by XI-v11.

Resolution by category (engine/prompts/__init__.py)

Category Generation (get_generation_size) Final (get_target_size)
units 1024×1024 256×256
terrain / biome_grid / edges 1024×1024 384×332
buildings 1024×1024 128×128
spells 1024×1024 128×128
resources / improvements 1024×1024 64×64
ui 512×512 64×64

Theater GUI (server.py + gui/)

FastAPI app (create_app) exposes the review REST API (/api/sprites, /api/theater, /api/stats, /api/progress, /api/pipeline, image serving under /images/..., SSE at /api/stream/variants). The React SPA in gui/ builds to gui/dist/; when that build is present the server mounts it and serves the Theater at /?spriteTheater=true. Without a build the server runs API-only (root returns 404 — expected).

GUI build currently broken (2026-06-03). pnpm build in gui/ fails with TS2307: Cannot find module 'react' because pnpm install does not materialize gui/node_modules — the @lilith/ui-animated workspace dependency hits the known pnpm workspace:* resolution bug (needs a .pnpmfile.cjs + Verdaccio registry; see the project memory note). The Python server boot-smokes green, but the Theater front-end cannot render until the GUI install/build is fixed.

Boot:

python3 cli.py start --port 5850          # GUI server only (cmd_review)
python3 -m uvicorn server:app --port 5850 # equivalent direct invocation

CLI Commands

# Full pipeline (submit → collect → rank → regen loop + GUI server)
python3 cli.py run --category units --variants 1

# GUI server only
python3 cli.py start --port 5850

# Scan game data → populate sprite registry (use --demo for minimal data)
python3 cli.py scan --sprite-type units
python3 cli.py --demo scan

# Status funnel
python3 cli.py status

# Manual stage operations
python3 cli.py generate --sprite units/spearmen_dwarves_m --variants 8   # submit only
python3 cli.py listen                                                    # collect results
python3 cli.py rank --sprite units/spearmen_dwarves_m                    # score one sprite
python3 cli.py approve 129                                               # approve → install + ledger
python3 cli.py reset --sprite units/spearmen_dwarves_m                   # back to needed

# Prompt iteration
python3 cli.py test-prompt --entity spearmen --race dwarves --gender male --seeds 42 123 777
python3 cli.py refresh-prompts --category units --clear-scores

# Monitoring / export
python3 cli.py monitor
python3 cli.py export --format json

File Flow

game JSON data (public/games/age-of-dwarves/data/ — or demo-data/ with --demo)
    │
    ▼ scan
spritegen.db ──── sprites table (what to generate)
    │
    ▼ generate (model-boss queue → juggernaut-xi-v11, results via Redis pubsub)
raw/{sprite_id}_{variant_id}.png ──── 1024×1024
    │
    ▼ rank (tiered: qwen3-VL → haiku → sonnet → opus, gates + quality)
spritegen.db ──── variants.notes (gates+quality JSON), variants.rating, review_tier
    │
    ▼ human approve (Theater GUI or cli.py approve <variant_id>)
spritegen.db ──── variant.is_approved = 1
    │
    ▼ process (rembg background removal + resize/composite)
variants/{sprite_id}_{variant_id}.png ──── category target size, transparent
    │
    ▼ install (+ LICENSES.md row via installer._append_ledger_row)
public/games/age-of-dwarves/assets/sprites/<category>/{name}.png