Adds a providers section to the per-user model registry for Anthropic and Google as first-class providers alongside local hosts. Google accounts (API keys) are now stored as a list so multiple Google accounts can coexist. Changes: - model_registry.py: V2 schema, auto migration V1→V2 (pulls gemini_api_key from auth.json into providers.google.accounts), _resolve_model() merges account API key for gemini_api type models - routers/orchestrator.py: uses model-resolved api_key when orchestrator role resolves to a gemini_api model with account_id - ANTHROPIC_CATALOG and GOOGLE_CATALOG constants for model picker (Phase 2) - New functions: get_google_api_key(), save/remove_google_account(), get_catalog() - Documentation: ARCH__BACKENDS.md updated to V2 schema, DESIGN doc added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.1 KiB
Architecture: LLM Backends
How Cortex selects and talks to AI models. Last updated: 2026-04-27 (V2 schema)
Providers
Cortex supports four model types, each dispatched differently:
| Type | Auth | Use |
|---|---|---|
claude_cli |
OAuth token from ~/.claude/.credentials.json |
Chat, persona responses |
gemini_cli |
Gemini CLI credentials | Chat fallback / explicit selection |
gemini_api |
API key from registry account or .env |
Orchestrator tool loop |
local_openai |
API key per host in model registry | Open WebUI, Ollama, OpenRouter, LiteLLM, etc. |
Backend Selection
Default: Role-Based Routing (Auto)
When no explicit backend is selected, Cortex routes to the model configured for the
request's role in the user's model registry. Roles: chat, orchestrator, distill,
coder, research (extensible via DEFINED_ROLES in .env).
Resolution order for a role:
- User registry:
roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4 .envrole default:ROLE_CHAT=claude_cli,ROLE_DISTILL=claude_cli, etc.- Hardcoded last-resort:
chat/distill/coder → claude_cli,orchestrator/research → gemini_api
Explicit Override
The UI backend toggle cycles: auto → claude → gemini → local → auto
- auto (default): role-based routing as above
- claude / gemini / local: bypasses role routing; forces that backend type
- The toggle will be redesigned in Phase 3 to cycle through chat role slots (Primary / Backup 1 / Backup 2)
Fallback chain (automatic, only when no explicit registry entry exists):
claude → gemini
gemini → claude
local → claude
When a model is explicitly configured in the registry, errors surface immediately — no silent fallback.
Each response shows a model tag (bottom-right of the message bubble) with the model label and host.
Model Registry — V2 Schema
Per-user configuration stored in home/{user}/model_registry.json.
Managed at Settings → Model Registry (/settings/local). Full provider UI coming in Phase 2.
{
"version": 2,
"providers": {
"anthropic": {
"credentials": [
{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}
]
},
"google": {
"accounts": [
{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}
]
}
},
"hosts": [
{
"id": "abc123",
"label": "Gaming Laptop",
"api_url": "http://192.168.x.x:3000",
"api_key": "",
"host_type": "openwebui"
}
],
"models": [
{
"id": "m1",
"type": "claude_cli",
"label": "Sonnet 4.6 (CLI)",
"model_name": "claude-sonnet-4-6",
"provider": "anthropic",
"credential_id": "cli",
"context_k": 200,
"tags": ["chat", "persona"]
},
{
"id": "m2",
"type": "gemini_api",
"label": "Gemini 2.5 Flash (OSIT)",
"model_name": "gemini-2.5-flash",
"provider": "google",
"account_id": "a1b2",
"context_k": 1000,
"tags": ["orchestrator", "research"]
},
{
"id": "m3",
"type": "local_openai",
"label": "Gemma 4 E4B",
"model_name": "gemma4:e4b",
"provider": "local",
"host_id": "abc123",
"context_k": 72,
"tags": ["fast", "local"]
}
],
"roles": {
"chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
"orchestrator": {"primary": "m2", "backup_1": "m3"},
"distill": {"primary": "m1"}
}
}
host_type (local hosts)
host_type |
Chat endpoint | Models endpoint | Use for |
|---|---|---|---|
openwebui (default) |
POST {url}/api/chat/completions |
GET {url}/api/models |
Open WebUI, Ollama |
openai |
POST {url}/chat/completions |
GET {url}/models |
OpenRouter, LiteLLM, Anthropic-compat |
Set api_url to the base path before /chat/completions:
- OpenRouter:
https://openrouter.ai/api/v1
Built-in model IDs
Always resolvable without a user-created registry entry. Used as role defaults.
| ID | Type | Notes |
|---|---|---|
claude_cli |
claude_cli |
Model from DEFAULT_MODEL in .env |
gemini_cli |
gemini_cli |
Gemini CLI subprocess |
gemini_api |
gemini_api |
Model from ORCHESTRATOR_MODEL in .env; key from GEMINI_API_KEY |
V1 → V2 migration
Automatic on first load. Changes:
- Adds
providerssection (Anthropic CLI credential + empty Google accounts) - Migrates
gemini_api_keyfromauth.json→providers.google.accounts[0] - All existing hosts, models, and role assignments are preserved
Claude Backend (_claude())
Runs claude --print --no-session-persistence --output-format text as a subprocess.
- System prompt passed via
--system-prompt - Conversation history formatted as
<conversation>block - Token read live from
~/.claude/.credentials.jsonon every call — never uses the env var, which goes stale afterclaude auth login - Model override via
--modelflag whenmodel_nameis set in the registry entry
Timeout: TIMEOUT_CLAUDE=60 seconds (.env)
Gemini CLI Backend (_gemini())
Runs gemini --output-format text --extensions "" -p <prompt> as a subprocess.
--extensions ""disables all MCP extensions — prevents child processes keeping pipes openstart_new_session=Trueputs the process in its own group for cleanos.killpgon timeout- Output is cleaned to strip CLI noise (loading messages, retry notices, quota warnings)
Timeout: TIMEOUT_GEMINI=120 seconds (.env)
Local Backend (_local())
HTTP POST to an OpenAI-compatible endpoint. Model config is resolved via the model registry.
# host_type "openwebui": POST {api_url}/api/chat/completions
# host_type "openai": POST {api_url}/chat/completions
Timeout: TIMEOUT_LOCAL=300 seconds (.env) — local models may need to load from disk.
Gemini API (Orchestrator)
Used by orchestrator_engine.py for the ReAct tool loop. Not used for general chat.
API key resolution order:
api_keyembedded in the resolved orchestrator model config (V2 registry withaccount_id)get_user_gemini_key(user)— reads fromauth.json(legacy, kept for compat)GEMINI_API_KEYin.env(server default)
Distillation
Memory distillation uses role="distill". Configure via Model Registry → Role Assignments.
.env override: ROLE_DISTILL=claude_cli (default).
Future: Phase 3 — Backend Toggle Redesign
The claude → gemini → local toggle will be replaced with a slot toggle that cycles
through the chat role's configured models (Primary → Backup 1 → Backup 2), showing
the actual model label. See DESIGN__Model_Registry_V2.md.
Code locations
| File | Responsibility |
|---|---|
cortex/llm_client.py |
complete() — routing, dispatch, fallback |
cortex/model_registry.py |
Per-user registry CRUD and resolution (V2) |
cortex/routers/local_llm.py |
Settings UI routes + /api/models/role AJAX |
cortex/routers/chat.py |
_backend_label(), fallback_used flag |
cortex/routers/orchestrator.py |
Engine selection, Gemini API key resolution |
cortex/config.py |
ROLE_* env defaults, DEFINED_ROLES, PRIMARY_BACKEND |