Tool audit log:
- Every orchestrator tool call logged to home/{user}/tool_audit/YYYY-MM-DD.jsonl
- Files panel sidebar: audit log group (collapsed), date-linked read-only table
- Admin endpoints: /api/audit/files, /api/audit/day, /api/audit/recent, /api/audit/stats
- Engine and model name recorded per entry
OpenAI orchestrator improvements:
- Context budget enforcement: 75% of model context_k (min 16k)
- Message compaction: truncates old tool results when approaching budget
- max_rounds respected per model config (intersected with server cap)
OpenRouter onboarding (setup.html, onboarding.py, app.js, settings.html):
- Step 3 of 3: /setup/model with curated model picker
- Chat banner for users on server-default model (informational, not alarmist)
- Settings quick-link card; /setup/model works standalone for existing users
Model registry + session store:
- set_role_config / get_role_config for per-role tool lists and system_append
- session_store: session rename, session name backfill endpoint
UI updates (app.js, index.html, style.css, local_llm.html):
- Role toggle in context panel
- Off-the-record mode
- Agent notes read-only viewer
- OPERATIONS.md loaded at T2+ in context
Documentation:
- HELP.md: full tool table, per-role tool sets, Agent Notes, usage tracking
- TOOLS.md: Agent Notes section, count corrected to 44
- ARCH__SYSTEM.md, ARCH__BACKENDS.md, MASTER.md updated to match reality
- CLAUDE.md: onboarding flow, documentation philosophy sections
- README.md: stack in practice, DeepSeek TUI mention, architecture diagram updated
- TODO__Agents.md: onboarding task completed with deviation notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.4 KiB
Architecture: LLM Backends
How Cortex selects and talks to AI models. Last updated: 2026-05-06
Providers
Cortex supports four model types, each dispatched differently:
| Type | Auth | Use |
|---|---|---|
claude_cli |
OAuth token from ~/.claude/.credentials.json |
Chat, persona responses |
gemini_cli |
Gemini CLI credentials | Chat fallback / explicit selection |
gemini_api |
API key from registry account or .env |
Orchestrator tool loop |
local_openai |
API key per host in model registry | Open WebUI, Ollama, OpenRouter, LiteLLM, etc. |
Backend Selection
Default: Role-Based Routing (Auto)
When no explicit backend is selected, Cortex routes to the model configured for the
request's role in the user's model registry. Roles: chat, orchestrator, distill,
coder, research (extensible via DEFINED_ROLES in .env).
Resolution order for a role:
- User registry:
roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4 .envrole default:ROLE_CHAT=claude_cli,ROLE_DISTILL=claude_cli, etc.- Hardcoded last-resort:
chat/distill/coder → claude_cli,orchestrator/research → gemini_api
Explicit Override
The Role toggle in the Context & Memory panel cycles through configured role slots for the chat role: Primary → Backup 1 → Backup 2 → auto.
- Each slot shows the configured model label
autouses the Primary without forcing a specific backend type- The ⚡ Tools toggle is independent — it routes to the
orchestratorrole regardless of the chat role selection
Fallback chain (automatic, only when no explicit registry entry exists):
claude → gemini
gemini → claude
local → claude
When a model is explicitly configured in the registry, errors surface immediately — no silent fallback.
Each response shows a model tag (bottom-right of the message bubble) with the model label and host.
Model Registry — V2 Schema
Per-user configuration stored in home/{user}/model_registry.json.
Managed at Settings → Models (/settings/models). Full provider UI coming in Phase 2.
{
"version": 2,
"providers": {
"anthropic": {
"credentials": [
{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}
]
},
"google": {
"accounts": [
{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}
]
}
},
"hosts": [
{
"id": "abc123",
"label": "Gaming Laptop",
"api_url": "http://192.168.x.x:3000",
"api_key": "",
"host_type": "openwebui"
}
],
"models": [
{
"id": "m1",
"type": "claude_cli",
"label": "Sonnet 4.6 (CLI)",
"model_name": "claude-sonnet-4-6",
"provider": "anthropic",
"credential_id": "cli",
"context_k": 200,
"tags": ["chat", "persona"]
},
{
"id": "m2",
"type": "gemini_api",
"label": "Gemini 2.5 Flash (OSIT)",
"model_name": "gemini-2.5-flash",
"provider": "google",
"account_id": "a1b2",
"context_k": 1000,
"tags": ["orchestrator", "research"]
},
{
"id": "m3",
"type": "local_openai",
"label": "Gemma 4 E4B",
"model_name": "gemma4:e4b",
"provider": "local",
"host_id": "abc123",
"context_k": 72,
"max_rounds": 5,
"tools": true,
"tags": ["fast", "local"]
}
],
"roles": {
"chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
"orchestrator": {"primary": "m2", "backup_1": "m3"},
"distill": {"primary": "m1"}
}
}
Optional model fields
| Field | Type | Default | Meaning |
|---|---|---|---|
context_k |
int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). |
max_rounds |
int | null | null | Per-model tool loop cap. null = use global orchestrator_max_rounds. Effective limit = min(per_model, global). |
tools |
bool | true | Whether this model supports tool calling. false = skip tool loop entirely; model gets a plain chat request. |
host_type (local hosts)
host_type |
Chat endpoint | Models endpoint | Use for |
|---|---|---|---|
openwebui (default) |
POST {url}/api/chat/completions |
GET {url}/api/models |
Open WebUI, Ollama |
openai |
POST {url}/chat/completions |
GET {url}/models |
OpenRouter, LiteLLM, Anthropic-compat |
Set api_url to the base path before /chat/completions:
- OpenRouter:
https://openrouter.ai/api/v1
Built-in model IDs
Always resolvable without a user-created registry entry. Used as role defaults.
| ID | Type | Notes |
|---|---|---|
claude_cli |
claude_cli |
Model from DEFAULT_MODEL in .env |
gemini_cli |
gemini_cli |
Gemini CLI subprocess |
gemini_api |
gemini_api |
Model from ORCHESTRATOR_MODEL in .env; key from GEMINI_API_KEY |
V1 → V2 migration
Automatic on first load. Changes:
- Adds
providerssection (Anthropic CLI credential + empty Google accounts) - Migrates
gemini_api_keyfromauth.json→providers.google.accounts[0] - All existing hosts, models, and role assignments are preserved
Claude Backend (_claude())
Runs claude --print --no-session-persistence --output-format text as a subprocess.
- System prompt passed via
--system-prompt - Conversation history formatted as
<conversation>block - Token read live from
~/.claude/.credentials.jsonon every call — never uses the env var, which goes stale afterclaude auth login - Model override via
--modelflag whenmodel_nameis set in the registry entry
Timeout: TIMEOUT_CLAUDE=60 seconds (.env)
Gemini CLI Backend (_gemini())
Runs gemini --output-format text --extensions "" -p <prompt> as a subprocess.
--extensions ""disables all MCP extensions — prevents child processes keeping pipes openstart_new_session=Trueputs the process in its own group for cleanos.killpgon timeout- Output is cleaned to strip CLI noise (loading messages, retry notices, quota warnings)
Timeout: TIMEOUT_GEMINI=120 seconds (.env)
Local Backend (_local())
HTTP POST to an OpenAI-compatible endpoint. Model config is resolved via the model registry.
# host_type "openwebui": POST {api_url}/api/chat/completions
# host_type "openai": POST {api_url}/chat/completions
Timeout: TIMEOUT_LOCAL=300 seconds (.env) — local models may need to load from disk.
Gemini API (Orchestrator)
Used by orchestrator_engine.py for the ReAct tool loop. Not used for general chat.
API key resolution order:
api_keyembedded in the resolved orchestrator model config (V2 registry withaccount_id)get_user_gemini_key(user)— reads fromauth.json(legacy, kept for compat)GEMINI_API_KEYin.env(server default)
Distillation
Memory distillation uses role="distill". Configure via Model Registry → Role Assignments.
.env override: ROLE_DISTILL=claude_cli (default).
Code locations
| File | Responsibility |
|---|---|
cortex/llm_client.py |
complete() — routing, dispatch, fallback |
cortex/model_registry.py |
Per-user registry CRUD and resolution (V2) |
cortex/routers/local_llm.py |
Settings UI routes + /api/models/role AJAX |
cortex/routers/chat.py |
_backend_label(), fallback_used flag |
cortex/routers/orchestrator.py |
Engine selection, Gemini API key resolution |
cortex/config.py |
ROLE_* env defaults, DEFINED_ROLES, PRIMARY_BACKEND |