Adds `anthropic_api` model type so users can authenticate with a direct
Anthropic API key instead of (or alongside) the CLI OAuth session.
- model_registry.py: `anthropic_api` type; `save/get/remove_anthropic_api_key()`
mirroring the Google account pattern; `save_cloud_model()` now picks type
based on credential type (cli → claude_cli, api_key → anthropic_api);
`_resolve_model()` merges api_key from the credential entry
- llm_client.py: `_anthropic_api()` backend (AsyncAnthropic SDK); dispatch
and fallback wiring; usage tracking
- routers/local_llm.py: Anthropic API key management routes
(POST /settings/local/anthropic-key, /anthropic-key/{id}/remove);
`anthropic_api` badge and edit-form credential selector
- static/local_llm.html: Anthropic Cloud Provider block now shows API key
management (add/remove); Add Model → Anthropic tab has credential selector
(CLI vs API key)
- requirements.txt: enable anthropic>=0.40.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Text files (.md, .py, .js, .json, etc.): read client-side and injected
into the message body as a fenced code block — works with all backends
with zero model capability requirements.
Images (PNG/JPG/WebP/GIF, max 5 MB): encoded as base64 data URL on the
client and sent as a separate attachment field. Backend formats them as
OpenAI multimodal content (text + image_url) for local_openai backends.
Claude CLI and Gemini CLI see the text message with a "📎 filename.png"
note; image data is never written to session history.
- index.html: 📎 button + hidden file input in mode-select row;
attachment-row preview area with thumbnail (images) or filename chip
- app.js: _resolveAttachment(), file reader, clearAttachment();
sendMessage/sendOrchestrate updated to allow no-text sends when a
file is pending; attachment spread into chat payload for images
- chat.py: Attachment model; attachment field on ChatRequest;
llm_attachment extracted in _stream_chat and passed to complete()
- llm_client.py: attachment param through complete()/_dispatch()/_local();
_local() builds multimodal content array for vision calls
- style.css: #attach-btn, #attachment-row, #attachment-preview, thumb
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- usage_tracker.py: daily token/call buckets per user (home/{user}/usage.json)
- Hook into local backend (OpenAI usage field) and Gemini API (usage_metadata)
- Claude/Gemini CLI backends produce no structured token data and are not tracked
- Fix CLAUDE.md stale tool count (27 → 39) and refresh tool list
- scripts/import_knowledge.py: walk markdown dirs, chunk by H2, call local LLM
for summaries, create AE journal entries with path-derived tags; resumable via
state file; --dry-run and --limit flags for safe testing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend toggle now cycles through chat role models by label instead of
cycling service type strings (auto/claude/gemini/local).
- model_registry: get_model_for_slot() — resolves a specific priority
slot without walking the fallback chain
- llm_client: complete() gains slot param; explicit slot selection
dispatches directly to that model with no silent fallback
- routers/chat.py: ChatRequest.slot; GET /backend returns chat_models
[{slot, label, type}] for the UI; _stream_chat uses resolved model
label for the response tag when a slot is pinned
- app.js: toggle loads chat_models from /backend, cycles by label,
sends slot in chat payload; legacy model field removed from payload
- app.js: fix Gap B — agent mode placeholder no longer says "Gemini
tool loop"; now says "orchestrator"
- DESIGN doc: updated to reflect phases 1+2 complete, catalog-as-code
decision, Gap A/B documented, Phase 3 implementation details
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds cloud provider management to /settings/models:
- Google Accounts section: add/remove Gemini API keys with labels
- Add Model form: provider tabs (Local / Google / Anthropic) with
catalog dropdowns that auto-fill label and context_k
- Provider badges on model rows (Anthropic / Google / Local)
- /settings/local now redirects to /settings/models (canonical URL)
- save_cloud_model() in model_registry for Anthropic/Google entries
- Distill role migration restored in _migrate_from_local_llm
- Test fixes: version assertions updated to V2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously any backend error would silently fall back to Claude.
Now if the user has a model configured via the model registry, errors
propagate to the UI so the actual problem is visible rather than hidden
behind a transparent backend switch.
Fallback still applies when using the default/auto backend with no
registry config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each assistant message in the session JSON now carries:
backend, backend_label, host (platform.node())
These fields are shown as model tags in the UI — on live responses and
when loading session history. Session log entries (sessions/YYYY-MM-DD.md)
include the backend label and host in the turn header.
The local (OpenAI-compat) backend strips non-standard fields before
sending messages to the API so extra fields don't leak upstream.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds host_type ("openwebui" | "openai") to the host schema so Cortex can
talk to both Open WebUI/Ollama and OpenRouter/standard-OpenAI endpoints.
Path differences per type:
openwebui (default): /api/chat/completions, /api/models
openai: /chat/completions, /models
model_registry.py:
- host_type added to host schema (default "openwebui", backward compat)
- save_host() accepts host_type parameter
- _resolve_model() passes host_type through with the merged host fields
llm_client._local():
- Reads host_type from resolved model_cfg
- Selects correct chat completions path accordingly
routers/local_llm.py:
- save_host route accepts host_type form field
- fetch-models uses /models for openai type, /api/models for openwebui
- Existing host rows show type selector pre-filled from stored value
local_llm.html:
- "Add host" form includes type selector
To use OpenRouter:
- Add host: URL = https://openrouter.ai/api/v1, Type = OpenAI-compatible
- API key from openrouter.ai (store in .env or model_registry.json only)
- Fetch models or add manually (e.g. anthropic/claude-sonnet-4-5-20251022)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces model_registry.py as the single source of truth for all LLM
backend configuration. Replaces scattered backend settings across user_settings,
config distill_backend_*, and the UI toggle.
model_registry.py:
- Per-user home/{user}/model_registry.json with version, hosts, models, roles
- Models have: type (local_openai|claude_cli|gemini_cli|gemini_api), label,
model_name, host_id, context_k (tokens), tags (capability labels)
- Roles map to priority chains: primary, backup_1..backup_4
- Built-in IDs (claude_cli, gemini_cli, gemini_api) always resolvable
- Auto-migrates existing local_llm.json on first access
- CRUD: save_host, remove_host, save_model, remove_model, set_role
- get_model_for_role(): registry → .env default → hardcoded fallback
config.py:
- role_chat/orchestrator/distill/coder/research .env defaults
- defined_roles: comma-separated standard role list (extensible)
- get_defined_roles() and get_role_default() helper methods
llm_client.complete():
- New role= parameter (default "chat") for registry-based routing
- model= still accepted for explicit UI toggle override
- _claude() and _local() accept model_cfg dict instead of raw string
- _local() uses pre-resolved config from registry
memory_distiller.py:
- distill_mid/long now use role="distill" (no more distill_backend_* .env vars needed)
cron_runner.py:
- brief jobs use role="chat"
routers/chat.py + auth.py:
- Use model_registry instead of user_settings for local model info
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix 'undefined' in auth banner: read access_token_hours_remaining (not hours_remaining)
- Fix false-positive warning on fresh tokens: when refresh token present, only warn
within 1 hour of expiry (not 24h) since the CLI should auto-rotate but sometimes misses
- Emit claude_auth_expired SSE event on 401 so UI shows inline red banner immediately
- app.js: handle claude_auth_expired SSE event with persistent top banner + dismiss button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All hardcoded "Inara"/"Scott" strings replaced with settings.agent_name
and settings.user_name, read from .env at startup:
- config.py: AGENT_NAME and USER_NAME settings (defaults: Inara / Scott)
- llm_client.py: conversation labels in prompt builder
- session_logger.py: **Name:** labels in session log markdown
- memory_distiller.py: distillation system prompts (mid + long)
- routers/nextcloud_talk.py: @mention prefix strip
- routers/google_chat.py: greeting message
Second instance scaffolding:
- holly/: identity directory with placeholder files (USER_NAME=Holly,
AGENT_NAME to be chosen by Holly)
- cortex/.env.holly: config for Holly's instance on port 8001
- cortex-holly.service: systemd unit for the second instance
No behavioural change to the Inara/Scott instance — defaults unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cortex: FastAPI backend serving Inara via Claude/Gemini CLI backends.
Includes SSE streaming chat, session persistence, Google Chat webhook
handler, and Docker support.
Inara: Identity files (persona, soul, protocols, memory, context tiers)
mounted read-only into the container at runtime.
Features in initial cut:
- /chat endpoint with SSE keepalive + LLM fallback
- Session store with rolling history window
- Markdown rendering, copy-to-clipboard, links open in new tab
- Stacked right-column input controls (height selector, enter toggle,
note mode with public/private) — semi-hidden until textarea grows
- /note endpoint for injecting public context into session history
- Docker Compose config (local dev runs natively; Docker for server)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>