# Architecture: LLM Backends > How Cortex selects and talks to AI models. > Last updated: 2026-04-06 --- ## Backends | Backend | Type | Auth | Notes | |---|---|---|---| | **Claude CLI** | `claude_cli` | OAuth token from `~/.claude/.credentials.json` | Primary chat; model set via `DEFAULT_MODEL` in `.env` | | **Gemini CLI** | `gemini_cli` | Gemini CLI credentials | Fallback / explicit selection | | **Gemini API** | `gemini_api` | `GEMINI_API_KEY` in `.env` | Orchestrator tool loop only — not general chat | | **Local (OpenAI-compat)** | `local_openai` | API key per host in model registry | Open WebUI, Ollama, OpenRouter, LiteLLM, etc. | --- ## Backend Selection ### Default: Role-Based Routing (Auto) When no explicit backend is selected, Cortex routes to the model configured for the request's **role** in the user's model registry. Roles: `chat`, `orchestrator`, `distill`, `coder`, `research` (extensible via `DEFINED_ROLES` in `.env`). Resolution order for a role: 1. User registry: `roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4` 2. `.env` role default: `ROLE_CHAT=claude_cli`, `ROLE_DISTILL=gemini_api`, etc. 3. Hardcoded last-resort: `chat/distill/coder → claude_cli`, `orchestrator/research → gemini_api` ### Explicit Override The UI backend toggle cycles: **auto → claude → gemini → local → auto** - **auto** (default): role-based routing as above; sends `model: null` to `/chat` - **claude / gemini / local**: bypasses role routing; forces that specific backend - When "local" is active, the configured model name appears below the toggle button **Fallback chain** (automatic, on any error): ``` claude → gemini gemini → claude local → claude ``` Each response includes a model label (bottom-right of the message bubble) showing what actually responded. Amber label with `⚡` = fallback was used. Auth expiry on Claude triggers a UI banner + `claude_auth_expired` SSE event. --- ## Model Registry Per-user configuration stored in `home/{user}/model_registry.json`. Hosts and models are managed at **Settings → Model Registry** (`/settings/local`). ### Schema ```json { "version": 1, "hosts": [ { "id": "abc123", "label": "Home ML Laptop", "api_url": "http://192.168.x.x:3000", "api_key": "sk-...", "host_type": "openwebui" } ], "models": [ { "id": "def456", "type": "local_openai", "label": "Gemma Medium", "model_name": "agent-support-gemma-medium", "host_id": "abc123", "context_k": 50, "tags": ["chat", "fast"] } ], "roles": { "chat": { "primary": "def456", "backup_1": "claude_cli" } } } ``` ### host_type Controls which API path layout is used: | `host_type` | Chat endpoint | Models endpoint | Use for | |---|---|---|---| | `openwebui` (default) | `POST {url}/api/chat/completions` | `GET {url}/api/models` | Open WebUI, Ollama | | `openai` | `POST {url}/chat/completions` | `GET {url}/models` | OpenRouter, LiteLLM, Anthropic-compat | Set `api_url` to the base path ending just before `/chat/completions`: - OpenRouter: `https://openrouter.ai/api/v1` - LiteLLM proxy: `http://host:port` ### Built-in model IDs Always resolvable without a registry entry: | ID | Backend | |---|---| | `claude_cli` | Claude CLI subprocess | | `gemini_cli` | Gemini CLI subprocess | | `gemini_api` | Gemini API (SDK) — orchestrator only | --- ## Claude Backend (`_claude()`) Runs `claude --print --no-session-persistence --output-format text` as a subprocess. - System prompt passed via `--system-prompt` - Conversation history formatted as `` block - Token read live from `~/.claude/.credentials.json` on every call — never relies on the env var, which goes stale after `claude auth login` - Model override via `--model` flag when a specific `model_name` is configured in the registry Timeout: `TIMEOUT_CLAUDE=60` seconds (`.env`) --- ## Gemini CLI Backend (`_gemini()`) Runs `gemini --output-format text --extensions "" -p ` as a subprocess. - `--extensions ""` disables all MCP extensions — prevents child processes keeping pipes open - `start_new_session=True` puts the process in its own group for clean `os.killpg` on timeout - Output is cleaned to strip CLI noise lines (loading messages, retry notices, quota warnings) Timeout: `TIMEOUT_GEMINI=120` seconds (`.env`) --- ## Local Backend (`_local()`) HTTP POST to an OpenAI-compatible endpoint. Model config is resolved via the model registry. ```python # host_type "openwebui": POST {api_url}/api/chat/completions # host_type "openai": POST {api_url}/chat/completions ``` Timeout: `TIMEOUT_LOCAL=300` seconds (`.env`) — local models may need to load from disk. --- ## Distillation Memory distillation uses `role="distill"` for mid and long passes. Configure the distill model via the Model Registry → Role Assignments → Distill role. `.env` override: `ROLE_DISTILL=claude_cli` (default). Set to any built-in ID or leave blank to fall through to the hardcoded default (`claude_cli`). --- ## Code locations | File | Responsibility | |---|---| | `cortex/llm_client.py` | `complete()` — routing, dispatch, fallback | | `cortex/model_registry.py` | Per-user registry CRUD and resolution | | `cortex/routers/local_llm.py` | Settings UI routes + `/api/models/role` AJAX | | `cortex/routers/chat.py` | `_backend_label()`, `fallback_used` flag | | `cortex/config.py` | `ROLE_*` env defaults, `DEFINED_ROLES`, `PRIMARY_BACKEND` |