# Architecture: LLM Backends > How Cortex selects and talks to AI models. > Last updated: 2026-04-27 (V2 schema) --- ## Providers Cortex supports four model types, each dispatched differently: | Type | Auth | Use | |---|---|---| | `claude_cli` | OAuth token from `~/.claude/.credentials.json` | Chat, persona responses | | `gemini_cli` | Gemini CLI credentials | Chat fallback / explicit selection | | `gemini_api` | API key from registry account or `.env` | Orchestrator tool loop | | `local_openai` | API key per host in model registry | Open WebUI, Ollama, OpenRouter, LiteLLM, etc. | --- ## Backend Selection ### Default: Role-Based Routing (Auto) When no explicit backend is selected, Cortex routes to the model configured for the request's **role** in the user's model registry. Roles: `chat`, `orchestrator`, `distill`, `coder`, `research` (extensible via `DEFINED_ROLES` in `.env`). Resolution order for a role: 1. User registry: `roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4` 2. `.env` role default: `ROLE_CHAT=claude_cli`, `ROLE_DISTILL=claude_cli`, etc. 3. Hardcoded last-resort: `chat/distill/coder → claude_cli`, `orchestrator/research → gemini_api` ### Explicit Override The UI backend toggle cycles: **auto → claude → gemini → local → auto** - **auto** (default): role-based routing as above - **claude / gemini / local**: bypasses role routing; forces that backend type - The toggle will be redesigned in Phase 3 to cycle through chat role slots (Primary / Backup 1 / Backup 2) **Fallback chain** (automatic, only when no explicit registry entry exists): ``` claude → gemini gemini → claude local → claude ``` When a model is explicitly configured in the registry, errors surface immediately — no silent fallback. Each response shows a model tag (bottom-right of the message bubble) with the model label and host. --- ## Model Registry — V2 Schema Per-user configuration stored in `home/{user}/model_registry.json`. Managed at **Settings → Models** (`/settings/models`). Full provider UI coming in Phase 2. ```json { "version": 2, "providers": { "anthropic": { "credentials": [ {"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"} ] }, "google": { "accounts": [ {"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."} ] } }, "hosts": [ { "id": "abc123", "label": "Gaming Laptop", "api_url": "http://192.168.x.x:3000", "api_key": "", "host_type": "openwebui" } ], "models": [ { "id": "m1", "type": "claude_cli", "label": "Sonnet 4.6 (CLI)", "model_name": "claude-sonnet-4-6", "provider": "anthropic", "credential_id": "cli", "context_k": 200, "tags": ["chat", "persona"] }, { "id": "m2", "type": "gemini_api", "label": "Gemini 2.5 Flash (OSIT)", "model_name": "gemini-2.5-flash", "provider": "google", "account_id": "a1b2", "context_k": 1000, "tags": ["orchestrator", "research"] }, { "id": "m3", "type": "local_openai", "label": "Gemma 4 E4B", "model_name": "gemma4:e4b", "provider": "local", "host_id": "abc123", "context_k": 72, "tags": ["fast", "local"] } ], "roles": { "chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"}, "orchestrator": {"primary": "m2", "backup_1": "m3"}, "distill": {"primary": "m1"} } } ``` ### host_type (local hosts) | `host_type` | Chat endpoint | Models endpoint | Use for | |---|---|---|---| | `openwebui` (default) | `POST {url}/api/chat/completions` | `GET {url}/api/models` | Open WebUI, Ollama | | `openai` | `POST {url}/chat/completions` | `GET {url}/models` | OpenRouter, LiteLLM, Anthropic-compat | Set `api_url` to the base path before `/chat/completions`: - OpenRouter: `https://openrouter.ai/api/v1` ### Built-in model IDs Always resolvable without a user-created registry entry. Used as role defaults. | ID | Type | Notes | |---|---|---| | `claude_cli` | `claude_cli` | Model from `DEFAULT_MODEL` in `.env` | | `gemini_cli` | `gemini_cli` | Gemini CLI subprocess | | `gemini_api` | `gemini_api` | Model from `ORCHESTRATOR_MODEL` in `.env`; key from `GEMINI_API_KEY` | ### V1 → V2 migration Automatic on first load. Changes: - Adds `providers` section (Anthropic CLI credential + empty Google accounts) - Migrates `gemini_api_key` from `auth.json` → `providers.google.accounts[0]` - All existing hosts, models, and role assignments are preserved --- ## Claude Backend (`_claude()`) Runs `claude --print --no-session-persistence --output-format text` as a subprocess. - System prompt passed via `--system-prompt` - Conversation history formatted as `` block - Token read live from `~/.claude/.credentials.json` on every call — never uses the env var, which goes stale after `claude auth login` - Model override via `--model` flag when `model_name` is set in the registry entry Timeout: `TIMEOUT_CLAUDE=60` seconds (`.env`) --- ## Gemini CLI Backend (`_gemini()`) Runs `gemini --output-format text --extensions "" -p ` as a subprocess. - `--extensions ""` disables all MCP extensions — prevents child processes keeping pipes open - `start_new_session=True` puts the process in its own group for clean `os.killpg` on timeout - Output is cleaned to strip CLI noise (loading messages, retry notices, quota warnings) Timeout: `TIMEOUT_GEMINI=120` seconds (`.env`) --- ## Local Backend (`_local()`) HTTP POST to an OpenAI-compatible endpoint. Model config is resolved via the model registry. ```python # host_type "openwebui": POST {api_url}/api/chat/completions # host_type "openai": POST {api_url}/chat/completions ``` Timeout: `TIMEOUT_LOCAL=300` seconds (`.env`) — local models may need to load from disk. --- ## Gemini API (Orchestrator) Used by `orchestrator_engine.py` for the ReAct tool loop. Not used for general chat. API key resolution order: 1. `api_key` embedded in the resolved orchestrator model config (V2 registry with `account_id`) 2. `get_user_gemini_key(user)` — reads from `auth.json` (legacy, kept for compat) 3. `GEMINI_API_KEY` in `.env` (server default) --- ## Distillation Memory distillation uses `role="distill"`. Configure via Model Registry → Role Assignments. `.env` override: `ROLE_DISTILL=claude_cli` (default). --- ## Future: Phase 3 — Backend Toggle Redesign The `claude → gemini → local` toggle will be replaced with a slot toggle that cycles through the chat role's configured models (Primary → Backup 1 → Backup 2), showing the actual model label. See `DESIGN__Model_Registry_V2.md`. --- ## Code locations | File | Responsibility | |---|---| | `cortex/llm_client.py` | `complete()` — routing, dispatch, fallback | | `cortex/model_registry.py` | Per-user registry CRUD and resolution (V2) | | `cortex/routers/local_llm.py` | Settings UI routes + `/api/models/role` AJAX | | `cortex/routers/chat.py` | `_backend_label()`, `fallback_used` flag | | `cortex/routers/orchestrator.py` | Engine selection, Gemini API key resolution | | `cortex/config.py` | `ROLE_*` env defaults, `DEFINED_ROLES`, `PRIMARY_BACKEND` |