Context Compression

A model has a fixed context window. A long session steadily fills it with messages, tool results, and file contents, and once it overflows the model cannot continue. Compression is how Wayland keeps a session within budget so it can run for a long time without hitting that wall.

Three-tier compaction

The engine applies compaction in three tiers, each triggered by a different pressure:

Microcompact runs when tool results pile up or a time gap appears. It clears the content of old tool results while keeping the most recent ones. It uses no model call, so it is cheap and frequent.
Autocompact runs when input tokens approach the context limit. It uses a model call to summarize the conversation, trading some detail for room to keep going.
Emergency runs when tokens reach the absolute limit. It stops further calls and asks you to start fresh rather than failing mid-turn.

The tiers escalate: the light, no-cost pass handles the common case, the summarizing pass handles real pressure, and the emergency pass is the backstop.

Output compaction and TOON

Alongside conversation compaction, output compaction trims the size of tool output before it enters the context. It has three settings: off, safe, and full. TOON encoding is a compact representation that further reduces the token cost of structured data. Together these shrink how much each tool result spends, which slows how fast the window fills.

Prompt caching and the file cache

Compression is also about not paying twice for the same context. With Anthropic prompt caching, the stable prefix of a request (the Constitution, tool definitions, system prompt) is cached on the provider side, so repeated requests reprocess only the changed parts. Because Wayland composes that prefix to be stable across turns, it lines up with the cache turn to turn. A file state cache deduplicates reads and tracks writes, so re-reading a file the agent already loaded does not re-spend its tokens.

These pieces work toward one goal: keep a long session inside the budget while spending as few tokens as possible to do it. On the command line you can tune the behavior with --compaction and --toon.

Context Compression

Three-tier compaction

Output compaction and TOON

Prompt caching and the file cache

Next