Self-Evolution (GEPA)
Wayland Core includes two complementary self-improvement tracks. This page covers the offline evolutionary optimizer: wcore-evolve, which mutates and scores skill prompt bodies across generations to find variants that score higher than the parent. The in-session counterpart, which drafts new skills from recurring turn patterns, is covered in Autonomous Skill Drafting.
What GEPA does
Section titled “What GEPA does”GEPA (the name used in the codebase for this evolutionary loop) takes a seed skill, generates mutated children, scores each child with a deterministic scorer, retains the best child if it beats the parent, and repeats across generations. Losers are archived to a graveyard directory. Winners pass through a CuratorPort and, if promoted, are written to the evolved_prompts SQLite table so the next agent session’s SkillRouter is seeded with their scores.
Scoring never calls an LLM. The wcore_eval::DefaultScorer uses locked constants, making every run reproducible given the same seed skill and generation parameters.
Running the optimizer
Section titled “Running the optimizer”The wcore-evolve binary is the entry point. A typical invocation:
wcore-evolve \ --seed-file path/to/skill.md \ --seed-name my-skill \ --generations 5 \ --fan-out 4 \ --plateau-window 3 \ --plateau-min-delta 0.01 \ --child-timeout-secs 30 \ --graveyard-root ~/.local/share/wayland/evolve/graveyardAll flags have defaults. --seed-file and --seed-name are the minimum required inputs.
Output is one key=value per line:
| Key | Meaning |
|---|---|
run_id | Unique identifier for this run |
parent_id | The seed skill name |
generations_run | How many generations completed before termination |
termination | Why the loop stopped (see below) |
parent_score | Scorer result for the unmodified seed |
best_score | Best score achieved by any child |
losers_archived | Count of entries written to the graveyard |
graveyard_root | Path where loser entries were written |
curator_decision | Promote or Archive from the CuratorPort |
The generation loop
Section titled “The generation loop”The main loop is in crates/wcore-evolve/src/evolve/mod.rs (evolve(params: EvolveParams)).
Each generation:
- Checks budget exhaustion. Exits with
BudgetExhaustedif the budget is spent. - Picks a mutator by round-robin:
mutators[generation_index % mutators.len()]. The CLI default order is[Paraphrase, Reorder, SwapSynonym, Precondition]. - Runs
fan_outchildren concurrently, each with a timeout ofchild_timeout_secs. - Scores every child via
wcore_eval::DefaultScorer. - Retains a child only if its score beats both the current best and the original parent score.
- Archives all non-retained children (and any displaced prior best) to the graveyard.
- Pushes the generation’s top score to the plateau detector.
- Checks for termination.
If every child in a generation times out, that generation is skipped for plateau purposes: the detector sees no sample rather than a synthetic floor, preventing a false-plateau from a noisy generation.
Mutators
Section titled “Mutators”All four mutators are in crates/wcore-evolve/src/mutator/. Each is deterministically seeded from the triple (parent_hash, generation, child_index) via blake3 into a ChaCha20Rng. The same triple always produces byte-identical output, so any graveyard entry can be regenerated from its lineage.
| Mutator | What it changes | LLM used |
|---|---|---|
Reorder | Shuffles the ## Steps list | No |
SwapSynonym | Replaces one word from a static synonym table | No |
Precondition | Adds or drops one ## Preconditions row | No |
Paraphrase | Rewrites the body via a ParaphraseProvider | Yes (LLM call) |
Only Paraphrase touches an LLM. In tests it uses a FixtureProvider that replays a static response, making the test suite fully deterministic. In production it calls the configured provider.
Reorder requires a ## Steps section in the skill body. Precondition requires a ## Preconditions section. If the required section is absent, the mutator returns MutationError and that child is skipped rather than crashing the run.
Plateau heuristic
Section titled “Plateau heuristic”The plateau detector (crates/wcore-evolve/src/evolve/plateau.rs) terminates the loop when there has been no meaningful improvement across a rolling window of generations.
Default parameters (also used as the CLI defaults):
| Parameter | Default | Meaning |
|---|---|---|
plateau_window | 3 | Number of generations to look back |
plateau_min_delta | 0.01 | Minimum score improvement required to avoid plateau |
The window must be at least as large as the number of mutator strategies in rotation. With four mutators and window = 3, each mutator gets at least one attempt before the detector can declare a plateau.
Non-finite scores (NaN, +inf, -inf) are rejected at push time. If a generation produces a non-finite top score, the loop terminates with TerminationReason::ScoreInvalid rather than hanging, and the offending score’s IEEE 754 bit pattern is included in the output for diagnostics.
Termination reasons
Section titled “Termination reasons”| Reason | When it fires |
|---|---|
GenerationCeiling | --generations limit reached with at least one improvement |
NoImprovementFound | --generations limit reached, no child ever beat the parent |
Plateau | Rolling window showed no improvement above min_delta |
BudgetExhausted | Budget spent before the generation ceiling |
ScoreInvalid | A non-finite top score was produced in a generation |
Graveyard
Section titled “Graveyard”Every non-retained child (and every displaced prior best) is archived as a JSON file:
<graveyard_root>/<run_id>/<generation>/<child_index>.jsonDefault graveyard roots by platform:
| Platform | Default path |
|---|---|
| macOS | ~/Library/Application Support/wayland/evolve/graveyard |
| Linux | ~/.local/share/wayland/evolve/graveyard |
| Windows | %APPDATA%\wayland\evolve\graveyard |
Each entry records the run_id, generation, child_index, parent_id, mutation_kind, score, and a 512-character excerpt of the mutated body. The full body is not stored, but it can be regenerated from the lineage triple and the original seed.
CuratorPort handoff
Section titled “CuratorPort handoff”Winners do not go directly to the live skill catalog. They pass through the CuratorPort trait (crates/wcore-evolve/src/curator_handoff.rs), which exposes one method: submit(body, lineage) -> Decision { Promote | Archive }. The wcore-evolve crate does not depend on wcore-skills; the adapter that wires the two together lives in the binary, not the library.
PromptStore: the bridge to the online router
Section titled “PromptStore: the bridge to the online router”The PromptStore (crates/wcore-evolve/src/prompt_store.rs) persists evolved winners to the evolved_prompts SQLite table, managed by wcore-memory. This table is the connection between the offline GEPA run and the online per-turn SkillRouter.
Key columns in evolved_prompts:
| Column | Purpose |
|---|---|
skill_name | Identifies which skill this variant evolved from |
parent_id | Optional pointer to the prior winner that seeded this variant |
prompt_body | The full mutated skill body |
score | dimensions.combined from DefaultScorer, or pass_ratio from BenchScorer |
scorer | Stable string identifier: "bench", "default", or "auto_drafter" |
generation | Zero-based generation index when this variant was produced |
created_at | Unix seconds (UTC) |
metadata | Optional JSON blob for extras such as termination reason or mutator kind |
Each row uses a UUID v4 primary key. Inserting a duplicate (skill_name, generation, id) is a hard error; callers must generate fresh UUIDs.
seed_pairs_for: converting scores to router seeds
Section titled “seed_pairs_for: converting scores to router seeds”PromptStore::seed_pairs_for(candidates, scorer, limit) (prompt_store.rs:163) is the seam between wcore-evolve and wcore-skills. It takes a list of skill names, looks up the top-scored winner for each from the store, and converts the score to a simulated-success count for the Thompson Beta sampler:
simulated_successes = round(clamp(score, 0.0, 1.0) * 5)A score of 1.0 becomes 5 simulated successes. The auto_drafter score of 0.7 becomes 4. Skills with no winner row, or a zero scaled value, are skipped.
At bootstrap, seeding happens in layered priority order:
- GEPA bench winners (
scorer = "bench") are seeded first. - Auto-drafted skills (
scorer = "auto_drafter") are seeded second. - A prioritizer provides head-start arms for skills touched by neither.
This means a GEPA-evolved variant always takes precedence over a draft of the same skill name.
Evolution event
Section titled “Evolution event”When capabilities.gepa_enabled is advertised by the host, the agent emits one EvolutionEvent per scored child on the JSON-stream protocol:
{ "run_id": "...", "generation": 2, "parent_id": "my-skill", "child_id": "my-run/2/0", "mutation_kind": "Reorder", "score": 0.83, "retained": true}This is independent of structured_traces: a host can receive evolution events without enabling full trace output.