Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)
Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)
Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results
Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override
Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object
UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local
Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
91 lines
5.9 KiB
Markdown
91 lines
5.9 KiB
Markdown
# Architecture: System Overview
|
||
|
||
> How the pieces fit together.
|
||
> Last updated: 2026-04-03
|
||
|
||
---
|
||
|
||
## Architecture Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ INPUT CHANNELS │
|
||
│ │
|
||
│ Web UI ──────────────────────────────────────────┐ │
|
||
│ Nextcloud Talk ──── POST /webhook/nextcloud/{u} ─┤ │
|
||
│ Google Chat ─────── POST /channels/google-chat/{u}┤ │
|
||
│ Cron / Scheduler ─────────────────────────────────┤ │
|
||
│ Webhooks (future) ─────────────────────────────────┘ │
|
||
└─────────────────────────────┬───────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ CORTEX DISPATCHER (FastAPI — cortex/) │
|
||
│ │
|
||
│ auth_middleware.py → validates JWT session cookie │
|
||
│ persona.py → resolves user + persona context │
|
||
│ context_loader.py → assembles system prompt (tier 1-4)│
|
||
│ │
|
||
│ POST /chat → direct LLM, streaming SSE │
|
||
│ POST /orchestrate → Gemini tool loop → Claude │
|
||
│ GET /orchestrate/{id} → poll job result │
|
||
└────────────┬───────────────────┬────────────────────────┘
|
||
↓ ↓
|
||
┌─────────────────┐ ┌──────────────────────────────────┐
|
||
│ LLM BACKENDS │ │ PERSONA DATA │
|
||
│ │ │ home/{user}/persona/{name}/ │
|
||
│ Claude CLI │ │ │
|
||
│ Gemini CLI │ │ IDENTITY.md SOUL.md │
|
||
│ Gemini API │ │ PROTOCOLS.md MEMORY_*.md │
|
||
│ Local (httpx) │ │ USER.md REMINDERS.md │
|
||
│ │ │ TASKS.json CRONS.json │
|
||
└─────────────────┘ │ sessions/ SCRATCH.md │
|
||
└──────────────────────────────────┘
|
||
```
|
||
|
||
Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__PERSONA.md) | [`ARCH__CHANNELS.md`](ARCH__CHANNELS.md)
|
||
|
||
---
|
||
|
||
## Service Layout (`cortex/`)
|
||
|
||
| File | Purpose |
|
||
|---|---|
|
||
| `main.py` | App entry point, router registration |
|
||
| `config.py` | All settings (pydantic-settings, reads `.env`) |
|
||
| `persona.py` | User + persona path resolution, ContextVars |
|
||
| `context_loader.py` | Builds system prompt from persona files (tiers 1–4) |
|
||
| `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
|
||
| `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
|
||
| `session_store.py` | In-memory + file session persistence |
|
||
| `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
|
||
| `memory_distiller.py` | Short/mid/long distill jobs |
|
||
| `scheduler.py` | APScheduler — distill jobs + user crons |
|
||
| `cron_runner.py` | Cron job storage, schedule parsing, execution |
|
||
| `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
|
||
| `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
|
||
| `auth_middleware.py` | JWT cookie validation on all routes |
|
||
| `user_settings.py` | Per-user local LLM config (hosts, models, active model) |
|
||
| `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
|
||
| `email_utils.py` | SMTP invite emails |
|
||
| `persona_template.py` | Bootstrap a new persona directory from templates |
|
||
| `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
|
||
| `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
|
||
| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` |
|
||
| `tests/` | pytest suite (80 tests) |
|
||
|
||
---
|
||
|
||
## Key Design Decisions
|
||
|
||
**Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
|
||
|
||
**Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
|
||
|
||
**Local backend via httpx** — Open WebUI's OpenAI-compatible API (`/api/chat/completions`). No CLI wrapper. Per-user host + model config in `local_llm.json`.
|
||
|
||
**ContextVars for async isolation** — `persona.py` uses Python `contextvars.ContextVar` so concurrent requests each see their own user/persona without thread-local hacks.
|
||
|
||
**Per-user filesystem layout** — `home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
|
||
|
||
**No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
|