Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)
Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)
Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results
Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override
Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object
UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local
Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.9 KiB
Architecture: System Overview
How the pieces fit together. Last updated: 2026-04-03
Architecture Diagram
┌─────────────────────────────────────────────────────────┐
│ INPUT CHANNELS │
│ │
│ Web UI ──────────────────────────────────────────┐ │
│ Nextcloud Talk ──── POST /webhook/nextcloud/{u} ─┤ │
│ Google Chat ─────── POST /channels/google-chat/{u}┤ │
│ Cron / Scheduler ─────────────────────────────────┤ │
│ Webhooks (future) ─────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ CORTEX DISPATCHER (FastAPI — cortex/) │
│ │
│ auth_middleware.py → validates JWT session cookie │
│ persona.py → resolves user + persona context │
│ context_loader.py → assembles system prompt (tier 1-4)│
│ │
│ POST /chat → direct LLM, streaming SSE │
│ POST /orchestrate → Gemini tool loop → Claude │
│ GET /orchestrate/{id} → poll job result │
└────────────┬───────────────────┬────────────────────────┘
↓ ↓
┌─────────────────┐ ┌──────────────────────────────────┐
│ LLM BACKENDS │ │ PERSONA DATA │
│ │ │ home/{user}/persona/{name}/ │
│ Claude CLI │ │ │
│ Gemini CLI │ │ IDENTITY.md SOUL.md │
│ Gemini API │ │ PROTOCOLS.md MEMORY_*.md │
│ Local (httpx) │ │ USER.md REMINDERS.md │
│ │ │ TASKS.json CRONS.json │
└─────────────────┘ │ sessions/ SCRATCH.md │
└──────────────────────────────────┘
Details: ARCH__BACKENDS.md | ARCH__PERSONA.md | ARCH__CHANNELS.md
Service Layout (cortex/)
| File | Purpose |
|---|---|
main.py |
App entry point, router registration |
config.py |
All settings (pydantic-settings, reads .env) |
persona.py |
User + persona path resolution, ContextVars |
context_loader.py |
Builds system prompt from persona files (tiers 1–4) |
llm_client.py |
All LLM backends — Claude, Gemini CLI, Local |
orchestrator_engine.py |
Gemini API ReAct tool loop → Claude handoff |
session_store.py |
In-memory + file session persistence |
session_logger.py |
Writes session turns to sessions/YYYY-MM-DD.md |
memory_distiller.py |
Short/mid/long distill jobs |
scheduler.py |
APScheduler — distill jobs + user crons |
cron_runner.py |
Cron job storage, schedule parsing, execution |
notification.py |
Outbound channel messages (distill alerts, cron proactive) |
auth_utils.py |
bcrypt passwords, JWT, invite tokens, channel config |
auth_middleware.py |
JWT cookie validation on all routes |
user_settings.py |
Per-user local LLM config (hosts, models, active model) |
event_bus.py |
Internal SSE pub/sub (NC Talk → browser mirror) |
email_utils.py |
SMTP invite emails |
persona_template.py |
Bootstrap a new persona directory from templates |
routers/ |
One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
tools/ |
Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
static/ |
Web UI — index.html, app.js, style.css, login.html, setup.html, HELP.md |
tests/ |
pytest suite (80 tests) |
Key Design Decisions
Two-brain pattern — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
Subprocess backends — Claude and Gemini run as CLI subprocesses (claude --print, gemini -p). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
Local backend via httpx — Open WebUI's OpenAI-compatible API (/api/chat/completions). No CLI wrapper. Per-user host + model config in local_llm.json.
ContextVars for async isolation — persona.py uses Python contextvars.ContextVar so concurrent requests each see their own user/persona without thread-local hacks.
Per-user filesystem layout — home/{user}/persona/{name}/ mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
No single point of coupling — tools live in cortex/tools/, separate from ae_* MCP tools. Channels live in cortex/routers/, each self-contained. Adding a channel or tool doesn't touch other subsystems.