Skip to content

Self-Hosted Endpoints

Wayland Core provides two paths for local and self-hosted inference: the openai-compatible catch-all adapter for any server that speaks the OpenAI /chat/completions wire format, and the ollama plugin for Ollama’s native NDJSON API. A third path, the data-driven provider catalog, extends the built-in provider list without requiring a hand-written match arm.

wcore-providers/src/openai_compatible.rs implements OpenAICompatibleProvider, a thin wrapper over OpenAIProvider. The only meaningful difference from the named providers is that you must supply an explicit base_url; there is no default. Registering with an empty or whitespace-only base_url is rejected at startup with a RegistryError::EmptyId so the misconfiguration surfaces before any request is sent.

The provider is registered under the lowercase id "openai-compatible".

Some self-hosted servers do not require authentication. The underlying OpenAIProvider always sends an Authorization: Bearer <key> header, but servers that do not authenticate simply ignore it. Pass the string "no-key" as the API key in those cases:

[providers.my-local]
provider = "openai-compatible"
base_url = "http://localhost:8080/v1"
api_key = "no-key"

Or from the CLI:

Terminal window
wayland-core \
--provider openai-compatible \
--base-url http://localhost:8080/v1 \
--api-key no-key \
"Summarize this file"

vLLM exposes an OpenAI-compatible server at /v1 by default. Point base_url at it:

[providers.vllm]
provider = "openai-compatible"
base_url = "http://localhost:8000/v1"
api_key = "no-key"
model = "meta-llama/Llama-3-8b-instruct"
Terminal window
# Start vLLM (example):
vllm serve meta-llama/Llama-3-8b-instruct --port 8000
# Use it:
wayland-core \
--provider openai-compatible \
--base-url http://localhost:8000/v1 \
--api-key no-key \
--model meta-llama/Llama-3-8b-instruct \
"Refactor this function"

llama.cpp’s --server mode serves OpenAI-compatible completions at /v1:

[providers.llamacpp]
provider = "openai-compatible"
base_url = "http://localhost:8081/v1"
api_key = "no-key"
model = "local"
Terminal window
# Start llama.cpp (example):
./server -m model.gguf --port 8081
# Use it:
wayland-core \
--provider openai-compatible \
--base-url http://localhost:8081/v1 \
--api-key no-key \
--model local \
"Write a test for this function"

LM Studio’s local server listens on port 1234 by default and serves the OpenAI chat completions format:

[providers.lmstudio]
provider = "openai-compatible"
base_url = "http://localhost:1234/v1"
api_key = "no-key"
Terminal window
wayland-core \
--provider openai-compatible \
--base-url http://localhost:1234/v1 \
--api-key no-key \
--model "<model-name-from-lm-studio>" \
"Review this PR"

The ollama plugin crate (wayland-ollama) implements LlmProvider over Ollama’s native POST /api/chat NDJSON endpoint. It is not a wrapper over OpenAIProvider.

Invoke it with the ollama: model prefix:

Terminal window
wayland-core --model ollama:llama3
wayland-core --model ollama:mistral
wayland-core --model ollama:codestral

The plugin connects to http://localhost:11434 by default. To point it at a remote Ollama instance, set base_url in a provider block:

[providers.my-ollama]
provider = "ollama"
base_url = "http://192.168.1.10:11434"
model = "llama3"

Beyond the ~20 hardcoded ProviderType arms, wcore-config/src/catalog.rs loads a bundled TOML table (data/providers.toml, compiled into the binary with include_str!) of additional OpenAI-compatible providers. The catalog currently holds 75 or more entries (the test suite asserts catalog.len() >= 75).

Each catalog entry has four required fields:

FieldDescription
idCLI id for --provider <id>. Must be unique.
base_urlOpenAI-compatible REST root (no trailing slash).
env_varEnvironment variable holding the API key.
api_pathOptional. Path suffix appended to base_url for the chat completions endpoint. None defaults to /v1/chat/completions.

wcore-providers/src/catalog.rs:register_catalog wires each entry as an OpenAIProvider factory, skipping any id already claimed by a native ProviderType arm. The ProviderCompat for each catalog entry is derived from ProviderCompat::from_catalog_entry, which stamps the entry id as the provider_type for cost attribution and applies the api_path override.

One confirmed catalog entry is novita-ai (https://api.novita.ai/openai, NOVITA_API_KEY). Use --provider novita-ai to reach it. Other catalog entries follow the same pattern; consult the bundled data/providers.toml for the full list.

The resolution order for a catalog-backed provider (mirrors native providers):

  1. --api-key CLI flag.
  2. [providers.<name>].api_key in the config file.
  3. The entry’s env_var read from the process environment.

A non-empty --base-url CLI flag or base_url in the config overrides the catalog entry’s URL.

The compat sub-table in a [providers.<name>] block lets you adjust per-provider wire behavior without touching the engine. Useful keys for self-hosted servers that diverge from strict OpenAI semantics:

[providers.my-local]
provider = "openai-compatible"
base_url = "http://localhost:8080/v1"
api_key = "no-key"
compat.max_tokens_field = "max_tokens" # some servers use max_completion_tokens
compat.merge_assistant_messages = true # merge consecutive assistant turns
compat.sanitize_schema = true # strip additionalProperties from tool schemas
compat.ensure_alternation = true # enforce user/assistant alternation

All compat.* keys are optional. Unset fields inherit the openai-compatible preset defaults.