Skip to content

Browser Tool

The Browser tool gives the model a controlled window into the web. It is intentionally narrower than a full browser-automation library: the surface is ARIA-tree-first (no arbitrary JavaScript), every outbound URL is checked by a fail-closed policy, and all HTTP traffic routes through the engine’s egress gate.

The op enum is locked at 18 variants (BROWSER_OP_LOCKED_VARIANT_COUNT = 18 in wcore-browser/src/op.rs). Adding a variant requires bumping that constant and a new audit pass. Ops are serialized as a tagged JSON union - { "kind": "navigate", "url": "https://example.com", "wait_until_loaded": true }.

OpKind tagDescription
NavigatenavigateNavigate to a URL. wait_until_loaded: bool (default true).
SnapshotsnapshotARIA-tree snapshot of the current page. Mints fresh @eN element refs.
ReadreadReadability-style markdown extraction. mode: main_content / article / raw.
ClickclickClick an element by its @eN ref from the most recent snapshot.
FillfillType text into an input field identified by ref.
PresspressPress a single key by name, e.g. "Enter", "Tab", "Escape".
SelectselectChoose a <select> option by value.
UploaduploadFile upload via a file input. Path is confined to the operator’s downloads root; .. traversal and symlink escapes are rejected before the op reaches any backend.
DownloaddownloadDownload a URL to dest_path. Same path confinement as Upload.
ScreenshotscreenshotCapture the current viewport or full page.
GetStateget_stateReturn the current URL and page title without touching the DOM.
WaitForwait_forWait until a CSS selector or ARIA role appears. Takes timeout_ms.
NetworkLognetwork_logDump the per-session network request log.
ConsoleconsoleDump the per-session browser console log.
NewTabnew_tabOpen a new tab, optionally at a URL.
CloseTabclose_tabClose the current tab.
BackbackNavigate back one entry in the tab’s history.
ForwardforwardNavigate forward one entry in the tab’s history.

There is no arbitrary-JavaScript execution op. The omission is a deliberate locked-surface decision from design §5.16 (REV-2 audit F6 lock). The ARIA-tree-first approach means the model interacts with semantic element refs rather than DOM positions or XPath, which is both more stable and avoids the attack surface that eval-style ops introduce.

Three backends are available. The engine picks one based on hints and environment, in this order:

  1. Browserbase (cloud) - selected when ProviderHint::Browserbase is set AND BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID are present AND allow_cloud: true is set in the tool spec. Requires the browserbase Cargo feature.
  2. Chromium: selected when ProviderHint::Chromium is set. Requires the chromium Cargo feature; uses chromiumoxide CDP.
  3. Camoufox (default) - used when no other hint matches. Talks to a sidecar process via HTTP at localhost:9377 (configurable via WAYLAND_BROWSER_HINT=camoufox or the default ProviderHint::Auto). The sidecar wraps a privacy-hardened Firefox fork.
# config.toml - switch to Browserbase when the env keys are set
[browser]
hint = "browserbase"
allow_cloud = true
Terminal window
# Or via env (takes precedence over config)
WAYLAND_BROWSER_HINT=chromium

The Camoufox backend communicates via HTTP using EgressClient, so all sidecar traffic passes through the egress gate. Any 3xx redirect from the sidecar is re-checked against the URL policy before the browser follows it (the reqwest_redirect_policy() hook, which caps redirect chains at 10 hops).

Every URL the model supplies - including redirect targets - is evaluated by BrowserPolicy (wcore-browser/src/policy.rs) before any network I/O happens.

Hard-blocked (always-on, no operator override)

Section titled “Hard-blocked (always-on, no operator override)”

The following are blocked unconditionally regardless of allow- or deny-list configuration:

  • RFC 1918 private ranges: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
  • Loopback: 127.0.0.0/8, localhost, *.localhost, ::1
  • Cloud metadata endpoint: 169.254.169.254 (shared by AWS, GCP, Azure, OpenStack)
  • Link-local IPv4: 169.254.0.0/16
  • IPv6 unique-local: fc00::/7
  • IPv6 link-local: fe80::/10
  • IPv6 multicast: ff00::/8
  • IPv4-mapped IPv6 literals, e.g. ::ffff:169.254.169.254, where the embedded v4 address hits any of the above
  • Legacy IPv4 encodings that bypass strict parsers: octal (0177.0.0.1), hex (0x7f.0.0.1), two-octet (127.1), and decimal-overflow (2130706433) forms - all normalized via parse_ipv4_loose() before the block check
  • RFC 6598 CGN range: 100.64.0.0/10

Only http and https pass. Everything else is refused at the gate:

javascript: data: blob: file: ftp: gopher: view-source: ...

Three fields in BrowserPolicy are operator-configurable:

FieldTypeBehaviour
allowed_originsVec<String>Suffix-glob list (*.example.com). When non-empty, only matching origins pass; everything else falls through to default_action.
denied_originsVec<String>Suffix-glob list. Always wins over the allow list.
default_actionPolicyActiondeny (default, fail-closed since v0.2.1) / allow / ask

The ask action routes the URL to Suspend, which triggers an S4 HITL approval event. The operator-facing host must handle the suspend and resume the op.

# Minimal allow-only-example.com policy
[browser.policy]
default_action = "deny"
allowed_origins = ["*.example.com", "example.com"]
denied_origins = []

The fail-closed default (default_action = "deny") means a fresh install with no allowed_origins configured will refuse every outbound Browse request. Set default_action = "allow" when you want pass-through with only the hard-blocked ranges enforced.

When a backend resolves a hostname to an IP address, check_resolved_host(host, ip) pins the first-seen IP for that hostname in an in-memory cache (TOFU - trust-on-first-use). Any subsequent resolution of the same hostname that returns a different IP is refused:

DNS rebinding refused: foo.example.com resolved to 127.0.0.1,
first-seen resolve was 203.0.113.5

This defends against attacks that swap a benign initial resolution for a private or metadata target after the URL policy check has already passed. The TOFU cache is per-policy instance and is cleared when the policy is dropped.

The same IP block rules that apply to URL literals also apply to resolved IPs: resolving foo.example.com to 10.0.0.1 fails even on first resolution, before any TOFU entry is written.

When the capabilities.browser_suite flag is set on a JSON-stream session, the engine emits two typed events:

  • BrowserEvent - per-completed-op trail carrying op kind, URL (when applicable), and a human-readable summary.
  • BrowserPolicyDenied - emitted when a URL is blocked, carrying the blocked URL and the denial reason.