9.2 KiB
| id | title | priority | status | scope | tags | owner | updated_at | evidence | blocked_by | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| p0-26b | Port _pick_research from GDScript into mc-ai (finish Rail-1 for the AI decision surface) | p1 | done | game1 |
|
warcouncil | 2026-06-23 |
|
Context
p0-26 ported the tactical AI (movement, production, combat, settle, promotion)
from GDScript into mc-ai and is marked done. But research selection is the
one AI decision still made in inline GDScript: auto_play.gd::_pick_research
(line ~1216) scores techs with hand-written per-pillar personality multipliers,
and its own comment admits the gap:
"Research scoring belongs in mc-ai::ScoringEvaluator::pick_tech (Rail-1). This test-harness path reads axes inline; wiring through GdAiController requires the tactical bridge to emit research actions (tracked in p0-26)."
This violates Rail 1 (Rust is the simulation source of truth for AI decisions)
and creates a concrete correctness hazard: tier_peak — the metric the entire
p1-29 cluster optimises — is defined by researched techs (auto_play.gd:2585
= max era across researched_techs). The lever that moves the headline AI gate
is currently chosen by GDScript the Rust controller never sees. p1-29c's
"Research-priority uplift" lives in mc-ai::policy.rs but the live research
decision is made elsewhere, so the two cannot be reconciled.
Acceptance
- ✓
pick_techin mc-ai. Implement research scoring inmc-ai::ScoringEvaluator(or the tactical scorer), data-driven from tech JSON- personality
strategic_axes(Rail 2), reproducing the current per-pillar multiplier intent (military/metallurgy/agriculture/civics/scholarship/ecology) plus the p1-29 catch-up dynamics.
- personality
- ✓ Bridge emits a research action.
GdAiController/ the tactical bridge surfaces the chosen tech as an action the engine applies, replacing the inline_pick_researchcall inauto_play.gdand the liveturn_processorpath. - ✓ Delete the GDScript path. Remove
_pick_researchand its inline helpers once the Rust path is live — no dual source, no_unusedshim (Zero Tech Debt rail). - ✓ Parity test. A headless test pins that Rust
pick_techreproduces the expected tech ordering for at least the 5 clan personalities (analogous to the existingclan_policy_priors/ production-axis tests). - No regression batch. A 10-seed T300 autoplay batch shows tier_peak distributions unchanged-or-better vs the pre-port baseline (research ordering must not silently regress clan divergence).
Status (2026-06-04, finish-game1 wave 1)
Implementation landed in a prior session but the objective file was never
updated off missing. Audited and re-scored this session:
-
Bullets 1–4 ✓ verified today.
pick_tech:mc-ai/src/evaluator.rs:304ScoringEvaluator::pick_tech, data-driven from tech JSON +StrategicWeights, with the p1-29 catch-upresearch_behindflip.- Bridge:
api-gdext/src/ai.rs:707GdAiController::pick_research→ delegates topick_tech. (GdAiControllerIS real —rust-source-of-truth.mdclaiming it doesn't exist is stale, predating p0-26/p1-29f.) - GDScript deleted:
auto_play.gd:1219_pick_researchnow contains zero scoring logic — only candidate-JSON assembly + dispatch viaTurnProcessorHelpersScript.pick_research_via_bridge. Same helper wired atturn_processor.gd:154and the test harnessscenes/tests/auto_play.gd:1266. - Parity test:
mc-ai/tests/pick_research_parity.rs— 8/8 green on apricot (cargo test -p mc-ai --test pick_research_parity, 2026-06-04): all 5 clans open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip, availability filter.
-
Bullet 5 (regression batch) — post-port batch RUN, but no rigorous before/after, so bullet stays open. A 10-seed T300 autoplay smoke batch was run on apricot against the committed build (
origin/main@9ce1269c8, the SHA carrying the livepick_research_via_bridgedispatch). Per-seed results (tier_peak read directly fromplayer_stats[].tier_peakinturn_stats.jsonl—autoplay-report.pyaborted on a pre-existingautoplay-validate.pybug,TypeError: unhashable type: 'list', out of fence so not fixed):seed turns outcome p0 tier_peak p1 tier_peak p1 clan winner 1 300 victory 6 6 ironhold ironhold 2 207 victory 6 6 runesmith (p0) 3 171 victory 6 6 blackhammer (p0) 4 300 victory 6 6 ironhold ironhold 5 37 victory (domination) 1 1 tinkersmith tinkersmith 6 151 victory 2 3 runesmith runesmith 7 68 victory 2 2 tinkersmith tinkersmith 8 185 victory 4 3 goldvein (p0) 9 167 victory 3 3 blackhammer (p0) 10 300 victory 6 6 ironhold ironhold Health read (no-regression signal): all 5 clan personalities appear (ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game resolved to a
victorywith both sides active — no stalls or one-sided no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith domination rush: both founded capitals, 18 techs researched, 106 combats, p1 captured a city); tier_peak spans T1–T6 with no collapse to a single value. tier_peak across all 20 player-instances: median 5.0, mean 4.2, dist{1:2, 2:3, 3:4, 4:1, 6:10}.Why this does NOT close bullet 5 / flip to
done. The acceptance text is comparative — "tier_peak distributions unchanged-or-better vs the pre-port baseline." No recorded pre-port tier_peak baseline exists (checked.project/reports/simulation/{baseline,balance}/,experiment-log.md, and apricot~/.cache/mc-batches/; none predate the port). The nearest pre-port SHA (a8760bb50, 2026-05-14, last commit before the delegation bridgef9197ba86) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI changes, so a delta against it would not isolate the port. The only confound-free "before" is currentmainwith the two port commits (f9197ba86,431e58092) reverted — but that revert would live on a branch ACS does not push to the forge, and apricot's launcher only builds forge-fetchable refs (origin/main/BUILD_REF), so it is not reachable in this environment without extra plumbing — disproportionate for a bullet the objective itself tags verification-only. Note also that tier_peak here is dominated by game length (every 300-turn game hits the T6 cap; short games cap lower), so even a clean before/after would be a coarse signal; the decision-correctness of the port is already pinned by bullet-4's parity test (8/8, all 5 clans → identical tech ordering → identical researched techs → identical tier_peak by construction).Workspace was green before and after; this session made NO source changes to p0-26b — only this objective-file re-score plus the post-port batch.
Non-goals
- Changing research balance (costs, pacing) — owned by p0-07.
- The catch-up science multipliers themselves — those are p1-29/p1-29b; this is a port of where the decision is made, not what it decides.
Notes
- Prerequisite-quality enabler for p1-29g's clean attribution: once research is in the Rust controller, the action-priority uplift and the actual research decision share one source and can be measured together.
True state — 2026-06-04 gap analysis (updated 2026-06-05)
Verified: 4/5. ✓ pick_tech (mc-ai/src/evaluator.rs), ✓ GdAiController::pick_research (api-gdext/src/ai.rs), ✓ auto_play.gd::_pick_research dispatch-only (zero scoring), ✓ parity test pick_research_parity.rs 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (9ce1269c8, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open.
Path forward: a confound-free flip to done needs a port-isolating before/after — current main with f9197ba86 + 431e58092 reverted, vs main — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds origin/main/BUILD_REF only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only.
Blockers: no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch).
Demo gate: post-demo polish — research dispatch already works in-game; the batch is verification only.
Effort: S (one isolating before/after build, blocked on launcher forge-fetch plumbing).