Skip to content

Self-Evolution (GEPA)

Wayland Core includes two complementary self-improvement tracks. This page covers the offline evolutionary optimizer: wcore-evolve, which mutates and scores skill prompt bodies across generations to find variants that score higher than the parent. The in-session counterpart, which drafts new skills from recurring turn patterns, is covered in Autonomous Skill Drafting.

GEPA (the name used in the codebase for this evolutionary loop) takes a seed skill, generates mutated children, scores each child with a deterministic scorer, retains the best child if it beats the parent, and repeats across generations. Losers are archived to a graveyard directory. Winners pass through a CuratorPort and, if promoted, are written to the evolved_prompts SQLite table so the next agent session’s SkillRouter is seeded with their scores.

Scoring never calls an LLM. The wcore_eval::DefaultScorer uses locked constants, making every run reproducible given the same seed skill and generation parameters.

The wcore-evolve binary is the entry point. A typical invocation:

Terminal window
wcore-evolve \
--seed-file path/to/skill.md \
--seed-name my-skill \
--generations 5 \
--fan-out 4 \
--plateau-window 3 \
--plateau-min-delta 0.01 \
--child-timeout-secs 30 \
--graveyard-root ~/.local/share/wayland/evolve/graveyard

All flags have defaults. --seed-file and --seed-name are the minimum required inputs.

Output is one key=value per line:

KeyMeaning
run_idUnique identifier for this run
parent_idThe seed skill name
generations_runHow many generations completed before termination
terminationWhy the loop stopped (see below)
parent_scoreScorer result for the unmodified seed
best_scoreBest score achieved by any child
losers_archivedCount of entries written to the graveyard
graveyard_rootPath where loser entries were written
curator_decisionPromote or Archive from the CuratorPort

The main loop is in crates/wcore-evolve/src/evolve/mod.rs (evolve(params: EvolveParams)).

Each generation:

  1. Checks budget exhaustion. Exits with BudgetExhausted if the budget is spent.
  2. Picks a mutator by round-robin: mutators[generation_index % mutators.len()]. The CLI default order is [Paraphrase, Reorder, SwapSynonym, Precondition].
  3. Runs fan_out children concurrently, each with a timeout of child_timeout_secs.
  4. Scores every child via wcore_eval::DefaultScorer.
  5. Retains a child only if its score beats both the current best and the original parent score.
  6. Archives all non-retained children (and any displaced prior best) to the graveyard.
  7. Pushes the generation’s top score to the plateau detector.
  8. Checks for termination.

If every child in a generation times out, that generation is skipped for plateau purposes: the detector sees no sample rather than a synthetic floor, preventing a false-plateau from a noisy generation.

All four mutators are in crates/wcore-evolve/src/mutator/. Each is deterministically seeded from the triple (parent_hash, generation, child_index) via blake3 into a ChaCha20Rng. The same triple always produces byte-identical output, so any graveyard entry can be regenerated from its lineage.

MutatorWhat it changesLLM used
ReorderShuffles the ## Steps listNo
SwapSynonymReplaces one word from a static synonym tableNo
PreconditionAdds or drops one ## Preconditions rowNo
ParaphraseRewrites the body via a ParaphraseProviderYes (LLM call)

Only Paraphrase touches an LLM. In tests it uses a FixtureProvider that replays a static response, making the test suite fully deterministic. In production it calls the configured provider.

Reorder requires a ## Steps section in the skill body. Precondition requires a ## Preconditions section. If the required section is absent, the mutator returns MutationError and that child is skipped rather than crashing the run.

The plateau detector (crates/wcore-evolve/src/evolve/plateau.rs) terminates the loop when there has been no meaningful improvement across a rolling window of generations.

Default parameters (also used as the CLI defaults):

ParameterDefaultMeaning
plateau_window3Number of generations to look back
plateau_min_delta0.01Minimum score improvement required to avoid plateau

The window must be at least as large as the number of mutator strategies in rotation. With four mutators and window = 3, each mutator gets at least one attempt before the detector can declare a plateau.

Non-finite scores (NaN, +inf, -inf) are rejected at push time. If a generation produces a non-finite top score, the loop terminates with TerminationReason::ScoreInvalid rather than hanging, and the offending score’s IEEE 754 bit pattern is included in the output for diagnostics.

ReasonWhen it fires
GenerationCeiling--generations limit reached with at least one improvement
NoImprovementFound--generations limit reached, no child ever beat the parent
PlateauRolling window showed no improvement above min_delta
BudgetExhaustedBudget spent before the generation ceiling
ScoreInvalidA non-finite top score was produced in a generation

Every non-retained child (and every displaced prior best) is archived as a JSON file:

<graveyard_root>/<run_id>/<generation>/<child_index>.json

Default graveyard roots by platform:

PlatformDefault path
macOS~/Library/Application Support/wayland/evolve/graveyard
Linux~/.local/share/wayland/evolve/graveyard
Windows%APPDATA%\wayland\evolve\graveyard

Each entry records the run_id, generation, child_index, parent_id, mutation_kind, score, and a 512-character excerpt of the mutated body. The full body is not stored, but it can be regenerated from the lineage triple and the original seed.

Winners do not go directly to the live skill catalog. They pass through the CuratorPort trait (crates/wcore-evolve/src/curator_handoff.rs), which exposes one method: submit(body, lineage) -> Decision { Promote | Archive }. The wcore-evolve crate does not depend on wcore-skills; the adapter that wires the two together lives in the binary, not the library.

PromptStore: the bridge to the online router

Section titled “PromptStore: the bridge to the online router”

The PromptStore (crates/wcore-evolve/src/prompt_store.rs) persists evolved winners to the evolved_prompts SQLite table, managed by wcore-memory. This table is the connection between the offline GEPA run and the online per-turn SkillRouter.

Key columns in evolved_prompts:

ColumnPurpose
skill_nameIdentifies which skill this variant evolved from
parent_idOptional pointer to the prior winner that seeded this variant
prompt_bodyThe full mutated skill body
scoredimensions.combined from DefaultScorer, or pass_ratio from BenchScorer
scorerStable string identifier: "bench", "default", or "auto_drafter"
generationZero-based generation index when this variant was produced
created_atUnix seconds (UTC)
metadataOptional JSON blob for extras such as termination reason or mutator kind

Each row uses a UUID v4 primary key. Inserting a duplicate (skill_name, generation, id) is a hard error; callers must generate fresh UUIDs.

seed_pairs_for: converting scores to router seeds

Section titled “seed_pairs_for: converting scores to router seeds”

PromptStore::seed_pairs_for(candidates, scorer, limit) (prompt_store.rs:163) is the seam between wcore-evolve and wcore-skills. It takes a list of skill names, looks up the top-scored winner for each from the store, and converts the score to a simulated-success count for the Thompson Beta sampler:

simulated_successes = round(clamp(score, 0.0, 1.0) * 5)

A score of 1.0 becomes 5 simulated successes. The auto_drafter score of 0.7 becomes 4. Skills with no winner row, or a zero scaled value, are skipped.

At bootstrap, seeding happens in layered priority order:

  1. GEPA bench winners (scorer = "bench") are seeded first.
  2. Auto-drafted skills (scorer = "auto_drafter") are seeded second.
  3. A prioritizer provides head-start arms for skills touched by neither.

This means a GEPA-evolved variant always takes precedence over a draft of the same skill name.

When capabilities.gepa_enabled is advertised by the host, the agent emits one EvolutionEvent per scored child on the JSON-stream protocol:

{
"run_id": "...",
"generation": 2,
"parent_id": "my-skill",
"child_id": "my-run/2/0",
"mutation_kind": "Reorder",
"score": 0.83,
"retained": true
}

This is independent of structured_traces: a host can receive evolution events without enabling full trace output.