Decision Router and Thompson Sampling

wcore-dispatch is the internal routing layer that selects orchestration templates, agent personas, and skills at runtime. It is not a model router (provider selection lives in wcore-providers). Its purpose is to learn from past outcomes and prefer choices that have worked on similar tasks.

Architecture

wcore-dispatch  (trait + scorer)
      |
      +-- TemplateRouter   picks orchestration template (Direct / Consensus / ...)
      +-- AgentRouter      picks a named agent from the AgentPack
      +-- SkillRouter      picks a catalog skill per turn  (lives in wcore-skills)

All three routers share the same DecisionRouter trait and the same BetaScorer backend.

The DecisionRouter trait (lib.rs)

pub trait DecisionRouter<TKey, TInput> {
    fn choose(&mut self, input: TInput) -> Result<TKey, RouterError>;
    fn observe(&mut self, choice: &TKey, outcome: TaskOutcome);
}

choose picks the best candidate for the current input. observe feeds the outcome of a prior choice back to the scorer. The trait is not Send + Sync on its own; callers wrap implementations in a Mutex when crossing async boundaries.

TaskOutcome has three variants:

Variant	Effect on scorer
`Success`	Increments alpha (success count) for the chosen arm.
`Failure`	Increments beta (failure count) for the chosen arm.
`Neutral`	No-op. Neither alpha nor beta changes.

BetaScorer (scorer.rs)

BetaScorer<TKey> maintains a HashMap<TKey, Stats> where each entry holds a (success, failure) pair. Picking works as follows:

For each candidate arm, read (success, failure). A cold-start arm with no observations defaults to (0, 0).
Compute alpha = success + 1, beta = failure + 1. A cold-start arm is therefore Beta(1, 1) = Uniform[0, 1].
Draw one sample from Beta(alpha, beta) using rand_distr::Beta.
Return the arm with the highest sample.

All samples are computed before the argmax comparison. Drawing inside the argmax loop would re-randomize and corrupt the comparison.

Cold-start arms receive Beta(1, 1), which gives them equal expected value to any other cold-start arm. As outcomes accumulate, the posteriors sharpen and the router converges toward the stronger arms.

BetaScorer::new() seeds the RNG from the OS. BetaScorer::with_seed(seed) is deterministic and intended for tests only. restore hydrates the stats map from a previously persisted snapshot (used at boot to load GEPA winners).

TemplateRouter (template_router.rs)

TemplateRouter picks among five orchestration templates:

Template	Description
`Direct`	Single-agent call.
`Consensus`	Parallel fan-out with majority joiner.
`SelfCritique`	Agent then critic loop.
`Adaptive`	Replan-on-result via a `ReplanFn`.
`Hierarchical`	Supervisor with delegated sub-graphs.

TemplateRouter::new() includes all five arms. with_arms(arms) restricts the candidate set (for example, dropping Hierarchical on a single-host deploy). An empty arms list falls back to all five.

Manual override

Embed @@template=<name> (case-insensitive) anywhere in the input to force a specific template:

please use @@template=consensus for this analysis

The override is honored only if the named template is in the configured arm set. An unrecognised name falls back to the scorer. Accepted spellings: direct, consensus, self_critique or self-critique or selfcritique, adaptive, hierarchical.

Installation (bootstrap.rs:1635)

The TemplateRouter is installed at bootstrap under the label F-024. It is default-initialised (all five arms, OS-RNG seed). Before this fix, set_template_router had zero production callers and every turn fell through to the deterministic IntentClassifier. The IntentClassifier (keyword-based, in orchestration/intent.rs) remains the cold-start fallback when the router has no data for a given input.

AgentRouter (agent_router.rs)

AgentRouter picks a named agent from the AgentPack registry. The arm set defaults to all 13 bundled agents (via AgentRouter::new_with_all_agents()), or you can restrict it with with_allowlist. Names not present in the registry are silently dropped at construction time, so a stale allowlist does not break startup.

Manual override

Embed @@agent=<name> (case-insensitive prefix, name preserved verbatim) to force a specific agent:

@@agent=security-auditor review this diff

The override is honored only if the named agent is in the current arm set. An override that names an agent outside the set falls back to the scorer.

SkillRouter (wcore-skills::router)

SkillRouter extends the same DecisionRouter / BetaScorer pattern but operates on catalog skill names rather than enum variants. It is seeded in two layers at boot.

Boot seeding (bootstrap.rs:1253-1296)

When a real Db handle is available, the bootstrap process opens a PromptStore against the evolved_prompts SQLite table and runs seed_pairs_for twice:

Layer 1: GEPA bench winners

store.seed_pairs_for(&candidate_names, "bench", 1)

Converts the top scoring GEPA-evolved prompt for each skill to a seed pair: score * 5 rounded equals the number of simulated successes loaded into restore. A skill with a bench score of 0.9 gets 5 simulated successes before the session’s first turn.

Layer 1b: Auto-drafted skills

store.seed_pairs_for(&candidate_names, "auto_drafter", 1)

Same conversion, but for skills written by the SkillDrafter (scorer = "auto_drafter", score = 0.7). A score of 0.7 produces 4 simulated successes: confident but not dominant over a proven GEPA winner. Idempotent against Layer 1 since restore only fills arms not already seeded.

Layer 2: Prioritizer head-start

sk_router.seed_from_prioritizer(&candidate_names)

Fills any arm that neither GEPA nor the auto-drafter touched, using a usage-frequency ranking. Top-quartile skills get 3 simulated successes; the boost fades toward zero at the tail.

Per-turn operation

At the start of each run() call, the engine calls choose with a SkillRouterInput containing the user input text and the list of candidate names. The winning skill name is stashed as current_skill_router_pick. At turn end, observe_skill_router_outcome maps StopReason to TaskOutcome and records it.

The hint appended to the system prompt is a single non-binding line. It is present only when the router picked a visible catalog skill. Engines without a router are byte-identical to the pre-router behaviour.

How GEPA winners reach the router

The full path from an offline GEPA run to the per-turn router:

wcore-evolve binary
  → EvolveOutcome (winner retained, losers archived)
  → CuratorPort::submit → Decision::Promote
  → PromptStore::record_variant (evolved_prompts table)
      columns: skill_name, scorer="bench", score, generation, ...

Next session boot:
  → PromptStore::seed_pairs_for(candidates, "bench", 1)
  → score × 5 = simulated successes
  → BetaScorer::restore(pairs)
  → SkillRouter picks with informed prior

GEPA winners carry a scorer = "bench" tag. The seed_pairs_for query selects the single best-scoring row per skill for the given scorer, converts the score to a simulated success count, and returns (skill_name, Stats { success, failure: 0 }) pairs.

How auto-drafted skills reach the router

The in-session drafter path runs in the opposite direction: it produces a row the next session reads.

Session N, turn end:
  observe_auto_skill → Bucketer (N=3 streak trigger)
  → SkillDrafter::draft
  → writes $WAYLAND_HOME/skills/auto/<sig>/SKILL.md
  → PromptStore::insert scorer="auto_drafter", score=0.7

Session N+1 boot:
  → seed_pairs_for(candidates, "auto_drafter", 1)
  → 0.7 × 5 = 4 simulated successes
  → BetaScorer::restore  (only fills arms GEPA did not already seed)

The SkillDrafter also registers the new skill into the current session’s catalog immediately (using Box::leak for process-lifetime allocation), so the drafted skill is available for the rest of session N without waiting for a restart.

Router internals: Stats persistence and the UNIQUE constraint

PromptStore::record_variant enforces a UNIQUE constraint on (skill_name, generation, id). Callers must generate a fresh UUID for each insertion; retrying the same row returns an error. The seed_pairs_for query reads but does not write, so boot seeding is always safe to call multiple times.

Manual override syntax summary

Router	Override syntax	Scope
`TemplateRouter`	`@@template=<name>` in input text	Honored only if name is in configured arm set
`AgentRouter`	`@@agent=<name>` in input text	Honored only if name is in configured arm set
`SkillRouter`	none; CLI flag `--agent <name>` selects persona only	Picker is fully automatic

Decision Router and Thompson Sampling

Architecture

The DecisionRouter trait (lib.rs)

BetaScorer (scorer.rs)

TemplateRouter (template_router.rs)

Manual override

Installation (bootstrap.rs:1635)

AgentRouter (agent_router.rs)

Manual override

SkillRouter (wcore-skills::router)

Boot seeding (bootstrap.rs:1253-1296)

Per-turn operation

How GEPA winners reach the router

How auto-drafted skills reach the router

Router internals: Stats persistence and the UNIQUE constraint

Manual override syntax summary

Related pages