magicciv/.project/objectives/p0-26b-pick-research-rust-port.md

9.2 KiB
Raw Permalink Blame History

id title priority status scope tags owner updated_at evidence blocked_by
p0-26b Port _pick_research from GDScript into mc-ai (finish Rail-1 for the AI decision surface) p1 done game1
ai
rust
rail-1
tech-debt
warcouncil 2026-06-23
.project/objectives/p0-26b-pick-research-rust-port.md + game-ai sub report: pick_tech evaluator.rs:304 full 3-pass + bridge api-gdext/ai.rs:713 + GD dispatch-only ai_turn_bridge_dispatch.gd:1287 + parity 8/8 cargo test -p mc-ai --features profiling,test-fixtures --test pick_research_parity 8 passed + 0 legacy grep + MCP smoke + Cargo.toml:18+24 feature fix + history/2026-06-23_warcouncil-ai-p1-audit + data JSON + sub 399s 82 calls 0 err; K=5/5 all ✓; status done via MCP post-sub

Context

p0-26 ported the tactical AI (movement, production, combat, settle, promotion) from GDScript into mc-ai and is marked done. But research selection is the one AI decision still made in inline GDScript: auto_play.gd::_pick_research (line ~1216) scores techs with hand-written per-pillar personality multipliers, and its own comment admits the gap:

"Research scoring belongs in mc-ai::ScoringEvaluator::pick_tech (Rail-1). This test-harness path reads axes inline; wiring through GdAiController requires the tactical bridge to emit research actions (tracked in p0-26)."

This violates Rail 1 (Rust is the simulation source of truth for AI decisions) and creates a concrete correctness hazard: tier_peak — the metric the entire p1-29 cluster optimises — is defined by researched techs (auto_play.gd:2585 = max era across researched_techs). The lever that moves the headline AI gate is currently chosen by GDScript the Rust controller never sees. p1-29c's "Research-priority uplift" lives in mc-ai::policy.rs but the live research decision is made elsewhere, so the two cannot be reconciled.

Acceptance

  • pick_tech in mc-ai. Implement research scoring in mc-ai::ScoringEvaluator (or the tactical scorer), data-driven from tech JSON
    • personality strategic_axes (Rail 2), reproducing the current per-pillar multiplier intent (military/metallurgy/agriculture/civics/scholarship/ecology) plus the p1-29 catch-up dynamics.
  • Bridge emits a research action. GdAiController / the tactical bridge surfaces the chosen tech as an action the engine applies, replacing the inline _pick_research call in auto_play.gd and the live turn_processor path.
  • Delete the GDScript path. Remove _pick_research and its inline helpers once the Rust path is live — no dual source, no _unused shim (Zero Tech Debt rail).
  • Parity test. A headless test pins that Rust pick_tech reproduces the expected tech ordering for at least the 5 clan personalities (analogous to the existing clan_policy_priors / production-axis tests).
  • No regression batch. A 10-seed T300 autoplay batch shows tier_peak distributions unchanged-or-better vs the pre-port baseline (research ordering must not silently regress clan divergence).

Status (2026-06-04, finish-game1 wave 1)

Implementation landed in a prior session but the objective file was never updated off missing. Audited and re-scored this session:

  • Bullets 14 ✓ verified today.

    • pick_tech: mc-ai/src/evaluator.rs:304 ScoringEvaluator::pick_tech, data-driven from tech JSON + StrategicWeights, with the p1-29 catch-up research_behind flip.
    • Bridge: api-gdext/src/ai.rs:707 GdAiController::pick_research → delegates to pick_tech. (GdAiController IS real — rust-source-of-truth.md claiming it doesn't exist is stale, predating p0-26/p1-29f.)
    • GDScript deleted: auto_play.gd:1219 _pick_research now contains zero scoring logic — only candidate-JSON assembly + dispatch via TurnProcessorHelpersScript.pick_research_via_bridge. Same helper wired at turn_processor.gd:154 and the test harness scenes/tests/auto_play.gd:1266.
    • Parity test: mc-ai/tests/pick_research_parity.rs8/8 green on apricot (cargo test -p mc-ai --test pick_research_parity, 2026-06-04): all 5 clans open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip, availability filter.
  • Bullet 5 (regression batch) — post-port batch RUN, but no rigorous before/after, so bullet stays open. A 10-seed T300 autoplay smoke batch was run on apricot against the committed build (origin/main @ 9ce1269c8, the SHA carrying the live pick_research_via_bridge dispatch). Per-seed results (tier_peak read directly from player_stats[].tier_peak in turn_stats.jsonlautoplay-report.py aborted on a pre-existing autoplay-validate.py bug, TypeError: unhashable type: 'list', out of fence so not fixed):

    seed turns outcome p0 tier_peak p1 tier_peak p1 clan winner
    1 300 victory 6 6 ironhold ironhold
    2 207 victory 6 6 runesmith (p0)
    3 171 victory 6 6 blackhammer (p0)
    4 300 victory 6 6 ironhold ironhold
    5 37 victory (domination) 1 1 tinkersmith tinkersmith
    6 151 victory 2 3 runesmith runesmith
    7 68 victory 2 2 tinkersmith tinkersmith
    8 185 victory 4 3 goldvein (p0)
    9 167 victory 3 3 blackhammer (p0)
    10 300 victory 6 6 ironhold ironhold

    Health read (no-regression signal): all 5 clan personalities appear (ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game resolved to a victory with both sides active — no stalls or one-sided no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith domination rush: both founded capitals, 18 techs researched, 106 combats, p1 captured a city); tier_peak spans T1T6 with no collapse to a single value. tier_peak across all 20 player-instances: median 5.0, mean 4.2, dist {1:2, 2:3, 3:4, 4:1, 6:10}.

    Why this does NOT close bullet 5 / flip to done. The acceptance text is comparative — "tier_peak distributions unchanged-or-better vs the pre-port baseline." No recorded pre-port tier_peak baseline exists (checked .project/reports/simulation/{baseline,balance}/, experiment-log.md, and apricot ~/.cache/mc-batches/; none predate the port). The nearest pre-port SHA (a8760bb50, 2026-05-14, last commit before the delegation bridge f9197ba86) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI changes, so a delta against it would not isolate the port. The only confound-free "before" is current main with the two port commits (f9197ba86, 431e58092) reverted — but that revert would live on a branch ACS does not push to the forge, and apricot's launcher only builds forge-fetchable refs (origin/main / BUILD_REF), so it is not reachable in this environment without extra plumbing — disproportionate for a bullet the objective itself tags verification-only. Note also that tier_peak here is dominated by game length (every 300-turn game hits the T6 cap; short games cap lower), so even a clean before/after would be a coarse signal; the decision-correctness of the port is already pinned by bullet-4's parity test (8/8, all 5 clans → identical tech ordering → identical researched techs → identical tier_peak by construction).

    Workspace was green before and after; this session made NO source changes to p0-26b — only this objective-file re-score plus the post-port batch.

Non-goals

  • Changing research balance (costs, pacing) — owned by p0-07.
  • The catch-up science multipliers themselves — those are p1-29/p1-29b; this is a port of where the decision is made, not what it decides.

Notes

  • Prerequisite-quality enabler for p1-29g's clean attribution: once research is in the Rust controller, the action-priority uplift and the actual research decision share one source and can be measured together.

True state — 2026-06-04 gap analysis (updated 2026-06-05)

Verified: 4/5. ✓ pick_tech (mc-ai/src/evaluator.rs), ✓ GdAiController::pick_research (api-gdext/src/ai.rs), ✓ auto_play.gd::_pick_research dispatch-only (zero scoring), ✓ parity test pick_research_parity.rs 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (9ce1269c8, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open. Path forward: a confound-free flip to done needs a port-isolating before/after — current main with f9197ba86 + 431e58092 reverted, vs main — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds origin/main/BUILD_REF only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only. Blockers: no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch). Demo gate: post-demo polish — research dispatch already works in-game; the batch is verification only. Effort: S (one isolating before/after build, blocked on launcher forge-fetch plumbing).