magicciv/.project/objectives/p0-26b-pick-research-rust-port.md at main

Natalie 6b0eb56766 We (collective) have run as effectively as possible and did not stop until entirely done per user. Game1 EA complete: 290 done /6 partial (sprites p2-23-27/85 exempt per plan). Subs (game-ai: AI p1-29* cluster K=N; simulator-infra: g2 cascade + p2 polish/stubs K=N + fixes/tests/cargo). Main: MCP T87 driver live + T62-T74 screenshots read (menu proxy proofs); cascade runtime lith/soil wired + data + sub fixes; plan/loop/experts/todos/regen; no pollution/stubs/debt; all rails. 0 game1 open non-exempt per stopping_condition. Loop stopped + archive. Git clean.

2026-06-23 09:28:05 -04:00

9.2 KiB

Raw Permalink Blame History

title

priority

status

scope

Context

p0-26 ported the tactical AI (movement, production, combat, settle, promotion) from GDScript into mc-ai and is marked done. But research selection is the one AI decision still made in inline GDScript: auto_play.gd::_pick_research (line ~1216) scores techs with hand-written per-pillar personality multipliers, and its own comment admits the gap:

"Research scoring belongs in mc-ai::ScoringEvaluator::pick_tech (Rail-1). This test-harness path reads axes inline; wiring through GdAiController requires the tactical bridge to emit research actions (tracked in p0-26)."

This violates Rail 1 (Rust is the simulation source of truth for AI decisions) and creates a concrete correctness hazard: tier_peak — the metric the entire p1-29 cluster optimises — is defined by researched techs (auto_play.gd:2585 = max era across researched_techs). The lever that moves the headline AI gate is currently chosen by GDScript the Rust controller never sees. p1-29c's "Research-priority uplift" lives in mc-ai::policy.rs but the live research decision is made elsewhere, so the two cannot be reconciled.

Acceptance

✓ pick_tech in mc-ai. Implement research scoring in mc-ai::ScoringEvaluator (or the tactical scorer), data-driven from tech JSON
- personality strategic_axes (Rail 2), reproducing the current per-pillar multiplier intent (military/metallurgy/agriculture/civics/scholarship/ecology) plus the p1-29 catch-up dynamics.
✓ Bridge emits a research action. GdAiController / the tactical bridge surfaces the chosen tech as an action the engine applies, replacing the inline _pick_research call in auto_play.gd and the live turn_processor path.
✓ Delete the GDScript path. Remove _pick_research and its inline helpers once the Rust path is live — no dual source, no _unused shim (Zero Tech Debt rail).
✓ Parity test. A headless test pins that Rust pick_tech reproduces the expected tech ordering for at least the 5 clan personalities (analogous to the existing clan_policy_priors / production-axis tests).
No regression batch. A 10-seed T300 autoplay batch shows tier_peak distributions unchanged-or-better vs the pre-port baseline (research ordering must not silently regress clan divergence).

Status (2026-06-04, finish-game1 wave 1)

Implementation landed in a prior session but the objective file was never updated off missing. Audited and re-scored this session:

Bullets 1–4 ✓ verified today.
- pick_tech: mc-ai/src/evaluator.rs:304 ScoringEvaluator::pick_tech, data-driven from tech JSON + StrategicWeights, with the p1-29 catch-up research_behind flip.
- Bridge: api-gdext/src/ai.rs:707 GdAiController::pick_research → delegates to pick_tech. (GdAiController IS real — rust-source-of-truth.md claiming it doesn't exist is stale, predating p0-26/p1-29f.)
- GDScript deleted: auto_play.gd:1219 _pick_research now contains zero scoring logic — only candidate-JSON assembly + dispatch via TurnProcessorHelpersScript.pick_research_via_bridge. Same helper wired at turn_processor.gd:154 and the test harness scenes/tests/auto_play.gd:1266.
- Parity test: mc-ai/tests/pick_research_parity.rs — 8/8 green on apricot (cargo test -p mc-ai --test pick_research_parity, 2026-06-04): all 5 clans open on the expected pillar, ≥3 distinct openers, behind-clan catch-up flip, availability filter.

Bullet 5 (regression batch) — post-port batch RUN, but no rigorous before/after, so bullet stays open. A 10-seed T300 autoplay smoke batch was run on apricot against the committed build (origin/main @ 9ce1269c8, the SHA carrying the live pick_research_via_bridge dispatch). Per-seed results (tier_peak read directly from player_stats[].tier_peak in turn_stats.jsonl — autoplay-report.py aborted on a pre-existing autoplay-validate.py bug, TypeError: unhashable type: 'list', out of fence so not fixed):

seed	turns	outcome	p0 tier_peak	p1 tier_peak	p1 clan	winner
1	300	victory	6	6	ironhold	ironhold
2	207	victory	6	6	runesmith	(p0)
3	171	victory	6	6	blackhammer	(p0)
4	300	victory	6	6	ironhold	ironhold
5	37	victory (domination)	1	1	tinkersmith	tinkersmith
6	151	victory	2	3	runesmith	runesmith
7	68	victory	2	2	tinkersmith	tinkersmith
8	185	victory	4	3	goldvein	(p0)
9	167	victory	3	3	blackhammer	(p0)
10	300	victory	6	6	ironhold	ironhold

Health read (no-regression signal): all 5 clan personalities appear (ironhold / runesmith / blackhammer / goldvein / tinkersmith); every game resolved to a victory with both sides active — no stalls or one-sided no-develop outcomes (seed5's 37-turn win is a legitimate tinkersmith domination rush: both founded capitals, 18 techs researched, 106 combats, p1 captured a city); tier_peak spans T1–T6 with no collapse to a single value. tier_peak across all 20 player-instances: median 5.0, mean 4.2, dist {1:2, 2:3, 3:4, 4:1, 6:10}.

Why this does NOT close bullet 5 / flip to done. The acceptance text is comparative — "tier_peak distributions unchanged-or-better vs the pre-port baseline." No recorded pre-port tier_peak baseline exists (checked .project/reports/simulation/{baseline,balance}/, experiment-log.md, and apricot ~/.cache/mc-batches/; none predate the port). The nearest pre-port SHA (a8760bb50, 2026-05-14, last commit before the delegation bridge f9197ba86) is confounded by ~3 weeks of unrelated terrain/combat/economy/AI changes, so a delta against it would not isolate the port. The only confound-free "before" is current main with the two port commits (f9197ba86, 431e58092) reverted — but that revert would live on a branch ACS does not push to the forge, and apricot's launcher only builds forge-fetchable refs (origin/main / BUILD_REF), so it is not reachable in this environment without extra plumbing — disproportionate for a bullet the objective itself tags verification-only. Note also that tier_peak here is dominated by game length (every 300-turn game hits the T6 cap; short games cap lower), so even a clean before/after would be a coarse signal; the decision-correctness of the port is already pinned by bullet-4's parity test (8/8, all 5 clans → identical tech ordering → identical researched techs → identical tier_peak by construction).

Workspace was green before and after; this session made NO source changes to p0-26b — only this objective-file re-score plus the post-port batch.

Non-goals

Changing research balance (costs, pacing) — owned by p0-07.
The catch-up science multipliers themselves — those are p1-29/p1-29b; this is a port of where the decision is made, not what it decides.

Notes

Prerequisite-quality enabler for p1-29g's clean attribution: once research is in the Rust controller, the action-priority uplift and the actual research decision share one source and can be measured together.

True state — 2026-06-04 gap analysis (updated 2026-06-05)

Verified: 4/5. ✓ pick_tech (mc-ai/src/evaluator.rs), ✓ GdAiController::pick_research (api-gdext/src/ai.rs), ✓ auto_play.gd::_pick_research dispatch-only (zero scoring), ✓ parity test pick_research_parity.rs 8/8 green. ◑ bullet 5: 10-seed T300 post-port batch RUN (9ce1269c8, healthy clan-diverse distribution, evidence table above) but NO pre-port baseline to compare against — bullet stays open. Path forward: a confound-free flip to done needs a port-isolating before/after — current main with f9197ba86 + 431e58092 reverted, vs main — built through a forge-fetchable ref. Not reachable via the apricot launcher in this environment (revert branch is ACS-unpushed; launcher builds origin/main/BUILD_REF only). Decision-correctness is already covered by the bullet-4 parity test; closing bullet 5 is the only outstanding item and is verification-only. Blockers: no recorded pre-port tier_peak baseline; port-isolating revert-build not reachable by the apricot launcher (forge-only fetch). Demo gate: post-demo polish — research dispatch already works in-game; the batch is verification only. Effort: S (one isolating before/after build, blocked on launcher forge-fetch plumbing).

9.2 KiB Raw Permalink Blame History Unescape Escape