Skip to content

Budget & Cost Caps

Wayland Core tracks spending at two levels and enforces limits before a charge lands, not after. All caps are opt-in: nothing is blocked by default, but every cap configured will prevent overruns atomically.

Source crates: wcore-budget, wcore-pricing.


1. ExecutionBudget - global session caps ([budget])

Section titled “1. ExecutionBudget - global session caps ([budget])”

ExecutionBudget is an Arc<RwLock<...>> tree threaded through every tool call via ToolContext.budget. It tracks seven axes and calls is_exceeded() before launching any long work.

# ~/.wayland-core/config.toml (or project .wayland-core.toml)
[budget]
max_wall_time_secs = 600 # wall clock since session start
max_tool_runtime_secs = 120 # cumulative tool execution time
max_processes = 8 # concurrent subprocesses at any instant
max_agent_depth = 4 # sub-agent nesting depth
max_tokens_in = 200000 # total input tokens
max_tokens_out = 16384 # total output tokens
max_cost_usd = 1.50 # total USD cost

All seven fields are optional. Omitting a field means no cap on that axis.

Check order when is_exceeded() is called:
max_wall_timemax_tool_runtimemax_processesmax_agent_depthmax_tokens_inmax_tokens_outmax_cost_usd.
The first exceeded axis is returned as the reason string; the others are not evaluated.

Tree rollup. When a sub-agent spawns, it gets a sub_budget() child view. Counters recorded on the child roll up to all ancestors, so the root view always reflects the full session total. A stricter cap on a child does not relax the parent.

RAII guards. Tools that fork subprocesses call enter_tool_run(), which increments processes_active and decrements it on drop via ToolRunGuard. Sub-agent spawns use enter_agent() / AgentDepthGuard the same way.


2. BudgetTracker - per-session and per-user caps ([session_cap])

Section titled “2. BudgetTracker - per-session and per-user caps ([session_cap])”

BudgetTracker is the M5.3 model. It is keyed by session id and (optionally) user id, and enforces three axes:

AxisBuilder callBehaviour
per_session_tokens.per_session_tokens(n)Blocks a charge that would push this session’s token total past n.
per_session_usd.per_session_usd(d)Blocks a charge that would push this session’s USD total past d.
per_user_daily_usd.per_user_daily_usd(d)Blocks a charge that would push this user’s UTC-day USD total past d. Resets at midnight UTC.

Build a cap:

let cap = BudgetCap::builder()
.per_session_usd(0.10)
.per_user_daily_usd(1.00)
.build();
let mut tracker = BudgetTracker::new(cap);

For a single-tenant run, call tracker.charge(session_id, tokens, usd).
For a multi-tenant run where one user maps to multiple sessions in a day:

tracker.charge_for_user(session_id, user_id, tokens, usd)?;

charge_for_user checks the per-user daily cap first, then calls charge for the per-session cap. A rejected charge leaves both buckets at their prior totals - the rejected amount does not “stick”.

Reading totals:

let (tokens, usd) = tracker.session_totals(session_id);
let daily_usd = tracker.user_daily_usd(user_id);

Translating [budget] TOML into a BudgetCap

Section titled “Translating [budget] TOML into a BudgetCap”

BudgetCap implements From<&BudgetConfig> so you can translate the TOML block directly:

let cap = BudgetCap::from(&budget_config);

The mapping:

  • max_tokens_in + max_tokens_out (either or both) → per_session_tokens (sum).
  • max_cost_usdper_session_usd.
  • max_wall_time_secs, max_tool_runtime_secs, max_processes, max_agent_depth belong to ExecutionBudget only; no counterpart in BudgetCap.
  • per_user_daily_usd has no TOML counterpart. Set it directly via the builder for multi-tenant deployments.

Every charge attempt emits a BudgetEvent to an attached BudgetEventSink. Three variants:

pub enum BudgetEvent {
Charge { session_id, tokens, usd }, // accepted
CapWarn { session_id, pct_used: f32 }, // accepted, but >= 80% of strictest cap
CapBlock { session_id, reason: BudgetError }, // rejected
}

CapWarn fires when the running total for a session crosses 80% of the strictest configured per-session cap (tokens or USD, whichever is closer to its limit). It fires on every accepted charge after that threshold, not just the first.

CapBlock carries a BudgetError::CapExceeded with the cap kind (per_session_tokens, per_session_usd, or per_user_daily_usd), the configured limit, and the observed total that would have resulted.

In production the engine connects ObservabilityBudgetEventBridge, which serializes each BudgetEvent to JSON and forwards it through the SpanSink telemetry channel:

let bridge = Arc::new(ObservabilityBudgetEventBridge::new(span_sink.clone()));
tracker.set_event_sink(bridge);

Sink calls happen synchronously on the charge hot path; implementations must not block.


Cost is computed from wcore-pricing/pricing.toml, bundled at compile time. Override with WAYLAND_PRICING_PATH=/path/to/my-pricing.toml.

The catalog uses TOML sections keyed by [provider.model]:

[anthropic.claude-opus-4-7]
input_per_mtok_usd = 5.0
output_per_mtok_usd = 25.0
cache_read_per_mtok_usd = 0.5
cache_write_per_mtok_usd = 6.25
[openai.gpt-5]
input_per_mtok_usd = 5.0
output_per_mtok_usd = 15.0

cache_read_per_mtok_usd and cache_write_per_mtok_usd are optional; models that do not support prompt caching omit them.

Selected rates from the bundled catalog (Q2-2026 list prices, USD per million tokens):

ProviderModelInputOutput
anthropicclaude-opus-4-75.0025.00
anthropicclaude-sonnet-4-63.0015.00
anthropicclaude-haiku-4-51.005.00
openaigpt-55.0015.00
openaigpt-4o2.5010.00
openaio115.0060.00
geminigemini-2-5-pro1.2510.00
deepseekdeepseek-v4-flash0.140.28
groqllama-3-3-70b0.590.79

Self-hosted / local providers (ollama, vllm, lmstudio, litellm, openai-compatible) are cataloged at 0.00/0.00.

PricingRefresher can fetch the OpenRouter /api/v1/models catalog (24-hour TTL) and diff it against the bundled data, emitting CatalogChange events (Added, Removed, PriceChanged). Auto-application is off by default; diffs are surfaced for review only. Set WAYLAND_PRICING_AUTO_REFRESH=off to skip the live fetch entirely and keep the bundled catalog static.

The on-disk cache lives at ~/.wayland/pricing-cache.json by default (WAYLAND_HOME respected).




KeySectionTypeDefault
max_wall_time_secs[budget]u64none
max_tool_runtime_secs[budget]u64none
max_processes[budget]usizenone
max_agent_depth[budget]usizenone
max_tokens_in[budget]u64none
max_tokens_out[budget]u64none
max_cost_usd[budget]f64none

[session_cap] per-user-daily-usd has no TOML field; set it programmatically via BudgetCapBuilder.