# Architecture: LLM Backends > How Cortex selects and talks to AI models. > Last updated: 2026-06-18 --- ## Providers Cortex supports two model types, each dispatched differently: | Type | Auth | Use | |---|---|---| | `local_openai` | API key per host in model registry | Open WebUI, Ollama, OpenRouter, LiteLLM, any OpenAI-compatible endpoint | | `anthropic_api` | API key in model registry (Anthropic cloud provider) | Claude models via Anthropic SDK | The Gemini API (`gemini_api`) is a third type used exclusively by the orchestrator engine — it is not dispatched through `llm_client.py` and is not available for chat/distill roles. --- ## Backend Selection ### Default: Role-Based Routing (Auto) All routing goes through the user's model registry. When a request arrives, `complete()` in `llm_client.py` resolves the model for the given role: ``` slot specified → resolve that exact slot (primary / backup_1 / backup_2) no slot → get_model_for_role(username, role) no registry entry → RuntimeError: "No model configured for role '...'" ``` Roles: `chat`, `orchestrator`, `distill`, `janitor`, `coder`, `research` (extensible via `DEFINED_ROLES` in `.env`). There is no implicit fallback to a built-in model. If no model is configured for a role, the request fails with a clear error directing the user to `/settings/models`. ### Explicit Slot Selection The **Role** toggle in the Context & Memory panel cycles through configured role slots: **Primary → Backup 1 → auto**. Each slot resolves the configured model for that position. When a model is explicitly configured (via slot or registry entry), errors surface immediately — no silent fallback to another backend. --- ## Model Registry Schema Per-user configuration stored in `home/{user}/model_registry.json`. Managed at **Settings → Models** (`/settings/models`). ```json { "version": 2, "providers": { "anthropic": { "credentials": [ {"id": "key1", "label": "My Anthropic Key", "type": "api_key", "api_key": "sk-ant-..."} ] }, "google": { "accounts": [ {"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."} ] } }, "hosts": [ { "id": "abc123", "label": "OpenRouter", "api_url": "https://openrouter.ai/api/v1", "api_key": "sk-or-...", "host_type": "openai" }, { "id": "def456", "label": "Gaming Laptop", "api_url": "http://192.168.x.x:3000", "api_key": "", "host_type": "openwebui" } ], "models": [ { "id": "m1", "type": "local_openai", "label": "Claude Sonnet 4.6 (OpenRouter)", "model_name": "anthropic/claude-sonnet-4-6", "host_id": "abc123", "context_k": 200, "tags": ["chat", "persona"] }, { "id": "m2", "type": "anthropic_api", "label": "Claude Sonnet 4.6 (Direct)", "model_name": "claude-sonnet-4-6", "provider": "anthropic", "credential_id": "key1", "context_k": 200, "tags": ["chat"] }, { "id": "m3", "type": "local_openai", "label": "Gemma 4 E4B", "model_name": "gemma4:e4b", "provider": "local", "host_id": "def456", "context_k": 72, "max_rounds": 5, "tools": true, "tags": ["fast", "local"] } ], "roles": { "chat": {"primary": "m1", "backup_1": "m2"}, "orchestrator": {"primary": "m2"}, "distill": {"primary": "m1"} } } ``` ### Optional model fields | Field | Type | Default | Meaning | |---|---|---|---| | `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). | | `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. | | `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. | ### host_type (local hosts) | `host_type` | Chat endpoint | Models endpoint | Use for | |---|---|---|---| | `openwebui` (default) | `POST {url}/api/chat/completions` | `GET {url}/api/models` | Open WebUI, Ollama | | `openai` | `POST {url}/chat/completions` | `GET {url}/models` | OpenRouter, LiteLLM, Anthropic-compat | Set `api_url` to the base path before `/chat/completions`: - OpenRouter: `https://openrouter.ai/api/v1` --- ## Local/OpenAI-Compatible Backend (`_local()`) HTTP POST to an OpenAI-compatible endpoint. Model config is resolved via the model registry. ```python # host_type "openwebui": POST {api_url}/api/chat/completions # host_type "openai": POST {api_url}/chat/completions ``` System prompt is sent as the first `{"role": "system", "content": "..."}` message. Image attachments are injected into the last user message as `image_url` content blocks. Token usage is recorded when returned by the endpoint. Streaming variant: `_local_streaming()` — SSE line-by-line, yields tokens via `token_sink`. Timeout: `TIMEOUT_LOCAL=300` seconds (`.env`) — local models may need to load from disk. --- ## Anthropic API Backend (`_anthropic_api()`) Direct call to the Anthropic Messages API via the `anthropic` Python SDK. System prompt passed as top-level `system` field. Messages stripped to `role`/`content` only. Token usage is always recorded from `resp.usage`. Streaming variant: `_anthropic_api_streaming()` — uses `client.messages.stream()`, yields tokens via `token_sink`. API key comes from the model registry: `providers.anthropic.credentials[n].api_key`. Timeout: governed by httpx defaults and the Anthropic SDK's own connection handling. --- ## Gemini API (Orchestrator only) Used by `orchestrator_engine.py` for the ReAct tool loop. Not dispatched through `llm_client.py` and not available for chat, distill, or other roles. API key resolution order: 1. `api_key` embedded in the resolved orchestrator model config (V2 registry with `account_id`) 2. `get_user_gemini_key(user)` — reads from `auth.json` (legacy, kept for compat) 3. `GEMINI_API_KEY` in `.env` (server default) --- ## Distillation Memory distillation uses `role="distill"`. Configure via Model Registry → Role Assignments. Any `local_openai` or `anthropic_api` model can be assigned to the distill role. --- ## Code locations | File | Responsibility | |---|---| | `cortex/llm_client.py` | `complete()` — routing, dispatch, fallback | | `cortex/model_registry.py` | Per-user registry CRUD and resolution (V2) | | `cortex/routers/local_llm.py` | Settings UI routes + `/api/models/role` AJAX | | `cortex/routers/chat.py` | `_backend_label()`, `fallback_used` flag | | `cortex/routers/orchestrator.py` | Engine selection, Gemini API key resolution | | `cortex/config.py` | `ROLE_*` env defaults, `DEFINED_ROLES`, `TIMEOUT_LOCAL` |