Self-Evolution (GEPA)

Wayland Core includes two complementary self-improvement tracks. This page covers the offline evolutionary optimizer: wcore-evolve, which mutates and scores skill prompt bodies across generations to find variants that score higher than the parent. The in-session counterpart, which drafts new skills from recurring turn patterns, is covered in Autonomous Skill Drafting.

What GEPA does

GEPA (the name used in the codebase for this evolutionary loop) takes a seed skill, generates mutated children, scores each child with a deterministic scorer, retains the best child if it beats the parent, and repeats across generations. Losers are archived to a graveyard directory. Winners pass through a CuratorPort and, if promoted, are written to the evolved_prompts SQLite table so the next agent session’s SkillRouter is seeded with their scores.

Scoring never calls an LLM. The wcore_eval::DefaultScorer uses locked constants, making every run reproducible given the same seed skill and generation parameters.

Running the optimizer

The wcore-evolve binary is the entry point. A typical invocation:

wcore-evolve \
  --seed-file path/to/skill.md \
  --seed-name my-skill \
  --generations 5 \
  --fan-out 4 \
  --plateau-window 3 \
  --plateau-min-delta 0.01 \
  --child-timeout-secs 30 \
  --graveyard-root ~/.local/share/wayland/evolve/graveyard

All flags have defaults. --seed-file and --seed-name are the minimum required inputs.

Output is one key=value per line:

Key	Meaning
`run_id`	Unique identifier for this run
`parent_id`	The seed skill name
`generations_run`	How many generations completed before termination
`termination`	Why the loop stopped (see below)
`parent_score`	Scorer result for the unmodified seed
`best_score`	Best score achieved by any child
`losers_archived`	Count of entries written to the graveyard
`graveyard_root`	Path where loser entries were written
`curator_decision`	`Promote` or `Archive` from the `CuratorPort`

The generation loop

The main loop is in crates/wcore-evolve/src/evolve/mod.rs (evolve(params: EvolveParams)).

Each generation:

Checks budget exhaustion. Exits with BudgetExhausted if the budget is spent.
Picks a mutator by round-robin: mutators[generation_index % mutators.len()]. The CLI default order is [Paraphrase, Reorder, SwapSynonym, Precondition].
Runs fan_out children concurrently, each with a timeout of child_timeout_secs.
Scores every child via wcore_eval::DefaultScorer.
Retains a child only if its score beats both the current best and the original parent score.
Archives all non-retained children (and any displaced prior best) to the graveyard.
Pushes the generation’s top score to the plateau detector.
Checks for termination.

If every child in a generation times out, that generation is skipped for plateau purposes: the detector sees no sample rather than a synthetic floor, preventing a false-plateau from a noisy generation.

Mutators

All four mutators are in crates/wcore-evolve/src/mutator/. Each is deterministically seeded from the triple (parent_hash, generation, child_index) via blake3 into a ChaCha20Rng. The same triple always produces byte-identical output, so any graveyard entry can be regenerated from its lineage.

Mutator	What it changes	LLM used
`Reorder`	Shuffles the `## Steps` list	No
`SwapSynonym`	Replaces one word from a static synonym table	No
`Precondition`	Adds or drops one `## Preconditions` row	No
`Paraphrase`	Rewrites the body via a `ParaphraseProvider`	Yes (LLM call)

Only Paraphrase touches an LLM. In tests it uses a FixtureProvider that replays a static response, making the test suite fully deterministic. In production it calls the configured provider.

Reorder requires a ## Steps section in the skill body. Precondition requires a ## Preconditions section. If the required section is absent, the mutator returns MutationError and that child is skipped rather than crashing the run.

Plateau heuristic

The plateau detector (crates/wcore-evolve/src/evolve/plateau.rs) terminates the loop when there has been no meaningful improvement across a rolling window of generations.

Default parameters (also used as the CLI defaults):

Parameter	Default	Meaning
`plateau_window`	`3`	Number of generations to look back
`plateau_min_delta`	`0.01`	Minimum score improvement required to avoid plateau

The window must be at least as large as the number of mutator strategies in rotation. With four mutators and window = 3, each mutator gets at least one attempt before the detector can declare a plateau.

Non-finite scores (NaN, +inf, -inf) are rejected at push time. If a generation produces a non-finite top score, the loop terminates with TerminationReason::ScoreInvalid rather than hanging, and the offending score’s IEEE 754 bit pattern is included in the output for diagnostics.

Termination reasons

Reason	When it fires
`GenerationCeiling`	`--generations` limit reached with at least one improvement
`NoImprovementFound`	`--generations` limit reached, no child ever beat the parent
`Plateau`	Rolling window showed no improvement above `min_delta`
`BudgetExhausted`	Budget spent before the generation ceiling
`ScoreInvalid`	A non-finite top score was produced in a generation

Graveyard

Every non-retained child (and every displaced prior best) is archived as a JSON file:

<graveyard_root>/<run_id>/<generation>/<child_index>.json

Default graveyard roots by platform:

Platform	Default path
macOS	`~/Library/Application Support/wayland/evolve/graveyard`
Linux	`~/.local/share/wayland/evolve/graveyard`
Windows	`%APPDATA%\wayland\evolve\graveyard`

Each entry records the run_id, generation, child_index, parent_id, mutation_kind, score, and a 512-character excerpt of the mutated body. The full body is not stored, but it can be regenerated from the lineage triple and the original seed.

CuratorPort handoff

Winners do not go directly to the live skill catalog. They pass through the CuratorPort trait (crates/wcore-evolve/src/curator_handoff.rs), which exposes one method: submit(body, lineage) -> Decision { Promote | Archive }. The wcore-evolve crate does not depend on wcore-skills; the adapter that wires the two together lives in the binary, not the library.

PromptStore: the bridge to the online router

The PromptStore (crates/wcore-evolve/src/prompt_store.rs) persists evolved winners to the evolved_prompts SQLite table, managed by wcore-memory. This table is the connection between the offline GEPA run and the online per-turn SkillRouter.

Key columns in evolved_prompts:

Column	Purpose
`skill_name`	Identifies which skill this variant evolved from
`parent_id`	Optional pointer to the prior winner that seeded this variant
`prompt_body`	The full mutated skill body
`score`	`dimensions.combined` from `DefaultScorer`, or `pass_ratio` from `BenchScorer`
`scorer`	Stable string identifier: `"bench"`, `"default"`, or `"auto_drafter"`
`generation`	Zero-based generation index when this variant was produced
`created_at`	Unix seconds (UTC)
`metadata`	Optional JSON blob for extras such as termination reason or mutator kind

Each row uses a UUID v4 primary key. Inserting a duplicate (skill_name, generation, id) is a hard error; callers must generate fresh UUIDs.

seed_pairs_for: converting scores to router seeds

PromptStore::seed_pairs_for(candidates, scorer, limit) (prompt_store.rs:163) is the seam between wcore-evolve and wcore-skills. It takes a list of skill names, looks up the top-scored winner for each from the store, and converts the score to a simulated-success count for the Thompson Beta sampler:

simulated_successes = round(clamp(score, 0.0, 1.0) * 5)

A score of 1.0 becomes 5 simulated successes. The auto_drafter score of 0.7 becomes 4. Skills with no winner row, or a zero scaled value, are skipped.

At bootstrap, seeding happens in layered priority order:

GEPA bench winners (scorer = "bench") are seeded first.
Auto-drafted skills (scorer = "auto_drafter") are seeded second.
A prioritizer provides head-start arms for skills touched by neither.

This means a GEPA-evolved variant always takes precedence over a draft of the same skill name.

Evolution event

When capabilities.gepa_enabled is advertised by the host, the agent emits one EvolutionEvent per scored child on the JSON-stream protocol:

{
  "run_id": "...",
  "generation": 2,
  "parent_id": "my-skill",
  "child_id": "my-run/2/0",
  "mutation_kind": "Reorder",
  "score": 0.83,
  "retained": true
}

This is independent of structured_traces: a host can receive evolution events without enabling full trace output.