tooling: install script, workspace file, and dev-restart helper

- install.py — idempotent setup script (venv, systemd service, linger, auth checks); supports --check for read-only status inspection - .stignore — exclude .venv and runtime dirs from Syncthing so each host maintains its own machine-local venv - Cortex_and_Inara.code-workspace — VS Code workspace (service, personas, docs folders; launch config for uvicorn --reload) - dev-restart.sh — SSH wrapper to restart Cortex on the gaming laptop and tail logs; supports restart / logs / status subcommands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: host_type field for OpenRouter / OpenAI-compatible API support
2026-04-08 19:11:27 -04:00 · 2026-04-06 21:11:22 -04:00 · 2026-04-05 22:25:09 -04:00 · 2026-04-05 22:10:40 -04:00 · 2026-04-05 21:48:00 -04:00 · 2026-04-05 21:31:32 -04:00
119 changed files with 7400 additions and 1369 deletions
--- a/.env.default
+++ b/.env.default
@@ -1,88 +0,0 @@
-# Cortex .env reference — copy to .env and fill in values
-# DO NOT commit .env — it contains secrets
-
-# ── Agent identity ───────────────────────────────────────────────────────────
-# Global display names used in distillation prompts and session logs.
-# Individual persona identities live in home/{username}/persona/{name}/IDENTITY.md
-AGENT_NAME=Inara
-USER_NAME=Scott
-
-# ── Home directory ────────────────────────────────────────────────────────────
-# Root for all user/persona data. Layout: home/{username}/persona/{name}/
-# Relative paths are resolved from the cortex/ directory.
-# Default: ../home  (i.e. Cortex_and_Inara_dev/home/)
-# HOME_DIR=../home
-
-# ── Session auth ─────────────────────────────────────────────────────────────
-# Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
-JWT_SECRET=change-me-in-dotenv
-JWT_EXPIRE_DAYS=30
-
-# ── SMTP (invite emails + future notifications) ───────────────────────────────
-SMTP_SERVER=linode.oneskyit.com
-SMTP_PORT=465
-SMTP_USERNAME=send_mail
-SMTP_PASSWORD=
-SMTP_FROM_EMAIL=noreply@oneskyit.com
-SMTP_FROM_NAME=Cortex
-# Base URL included in invite links
-CORTEX_BASE_URL=https://cortex.dgrzone.com
-
-# ── Server ──────────────────────────────────────────────────────────────────
-HOST=0.0.0.0
-PORT=8000
-
-# ── Google Chat bot ──────────────────────────────────────────────────────────
-# JWT audience for verifying inbound Workspace Add-on Chat webhook requests.
-# For Workspace Add-on Chat apps, the aud claim = the endpoint URL.
-# Leave blank to disable verification (dev/testing only).
-GOOGLE_CHAT_AUDIENCE=https://cortex.dgrzone.com/channels/google-chat
-
-# ── Nextcloud Talk bot ───────────────────────────────────────────────────────
-NEXTCLOUD_URL=https://cloud.dgrzone.com
-NEXTCLOUD_TALK_BOT_SECRET=
-
-# ── LLM backends ────────────────────────────────────────────────────────────
-# Primary backend: "claude" or "gemini" (other is always fallback)
-PRIMARY_BACKEND=claude
-
-# Timeouts in seconds
-TIMEOUT_CLAUDE=60
-TIMEOUT_GEMINI=120
-
-# ── Orchestrator (Gemini API — not Gemini CLI) ───────────────────────────────
-# Required for /orchestrate endpoint and tool use
-# Free tier key: https://aistudio.google.com/apikey
-GEMINI_API_KEY=
-
-# Model for the orchestration tool loop (not the user-facing response)
-ORCHESTRATOR_MODEL=gemini-2.5-flash
-
-# Safety cap on tool loop iterations
-ORCHESTRATOR_MAX_ROUNDS=10
-
-# ── DuckDuckGo search ────────────────────────────────────────────────────────
-# Leave blank for free unauthenticated tier
-# Set to your API key for higher rate limits (paid DuckDuckGo account)
-DDG_API_KEY=
-DDG_MAX_RESULTS=5
-
-# ── Aether Platform API ───────────────────────────────────────────────────────
-# Used by orchestrator tools: ae_journal_search, ae_journal_entry_create, ae_task_list
-# Same values as agents_sync/mcp/.env — copy from there
-AE_API_URL=https://dev-api.oneskyit.com
-AE_API_KEY=
-AE_ACCOUNT_ID=
-AE_API_TIMEOUT=15
-
-# ── Distillation schedule ────────────────────────────────────────────────────
-SCHEDULER_TIMEZONE=America/New_York
-AUTO_DISTILL=true
-AUTO_DISTILL_SHORT=true
-AUTO_DISTILL_MID=true
-AUTO_DISTILL_LONG=false   # manual review recommended before enabling
-
-# Memory tier token budgets (soft caps)
-MEMORY_BUDGET_SHORT=3000
-MEMORY_BUDGET_MID=2000
-MEMORY_BUDGET_LONG=2000
--- a/.gitignore
+++ b/.gitignore
@@ -9,11 +9,13 @@ __pycache__/
 # Session data (runtime state, not source)
 cortex/data/
 home/**/session_data/
+home/**/sessions/

 # User credentials and tokens — never commit
 home/**/auth.json
 home/**/invite.json
 home/**/profile.json
+home/**/channels.json

 # Syncthing Metadata
 .stfolder/
--- a/.stignore
+++ b/.stignore
@@ -0,0 +1,5 @@
+// Machine-local — never sync across hosts
+.venv/
+__pycache__/
+*.pyc
+cortex/data/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -82,6 +82,7 @@ Cortex_and_Inara_dev/

  docs/                  ← Integration reference docs
    NEXTCLOUD_TALK_BOT.md
+    OPEN_WEBUI_API.md    ← Open WebUI API: tool calling, RAG, model management

  documentation/         ← Architecture decisions and agent task list
    TODO__Agents.md      ← READ THIS FIRST — active task list
@@ -211,37 +212,21 @@ clearly asked for a directory to be unblocked.

 ---

-## Current State (2026-03-20)
+## Current State (2026-04-03)

 Cortex is running and stable. All three primary channels are live:

 | Channel | Status | Notes |
 |---|---|---|
-| Web UI | ✅ Live | `https://cortex.dgrzone.com` (basic auth) |
+| Web UI | ✅ Live | `https://cortex.dgrzone.com` |
 | Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply |
 | Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format |
+| Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config |

-### Active Tasks
+Active users: scott (inara, developer), holly (tina), brian (wintermute)

-See `documentation/TODO__Agents.md` for the full list. Current priorities:
-
- **[High]** Ollama backend — local LLM via `scott_gaming` over WireGuard
- **[Medium]** NC Talk — complete bot registration docs (`docs/NEXTCLOUD_TALK_BOT.md`)
- **[Medium]** Knowledge consolidation — markdown → AE Journals
-
-### Recently Completed
-
- ✅ Session auth — bcrypt passwords, JWT cookies, login/logout, `SessionAuthMiddleware` — 2026-03-20
- ✅ Persona onboarding — invite tokens, self-service password setup, persona creation form — 2026-03-20
- ✅ Multi-persona switcher — dropdown in UI header, `/api/personas` endpoint — 2026-03-20
- ✅ SMTP invite email — `noreply@oneskyit.com`, HTML + plain text, `manage_passwords.py invite` — 2026-03-20
- ✅ CSS routing fix — `/static/*` mount must precede wildcard `/{user}/{persona}` route — 2026-03-20
- ✅ Multi-user/multi-persona support (`home/{username}/persona/{name}/` two-level layout) — 2026-03-20
- ✅ Scratchpad, task management, and cron/scheduled job tools — 2026-03-20
- ✅ Test suite (80 tests) covering API, persona routing, tools, security — 2026-03-20
- ✅ Google Chat bot (Workspace Add-on, JWT auth, `hostAppDataAction` format) — 2026-03-20
- ✅ Orchestrator Agent mode UI + session persistence — 2026-03-18
- ✅ Memory distiller (APScheduler, short/mid/long) — 2026-03
+See `documentation/TODO__Agents.md` for the active task list.
+See `documentation/ROADMAP.md` for phases and what's next.

 ---

@@ -249,8 +234,14 @@ See `documentation/TODO__Agents.md` for the full list. Current priorities:

 | File | Purpose |
 |---|---|
+| `documentation/MASTER.md` | **Start here** — index, current state, all doc links |
 | `documentation/TODO__Agents.md` | Active task list — read before starting work |
-| `documentation/ARCH__Intelligence_Layer.md` | Full architecture design |
-| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+| `documentation/ROADMAP.md` | Phases — what's done, what's next |
+| `documentation/ARCH__SYSTEM.md` | System architecture and component map |
+| `documentation/ARCH__BACKENDS.md` | LLM backends, routing, per-user config |
+| `documentation/ARCH__PERSONA.md` | Persona system, context tiers, memory distillation |
+| `documentation/ARCH__CHANNELS.md` | Input channels — web, NC Talk, Google Chat, cron |
+| `documentation/ARCH__FUTURE.md` | Planned: local orchestrator, dev agents, knowledge layer |
+| `~/agents_sync/projects/CORTEX.md` | Project vision and philosophy |
 | `~/agents_sync/CLAUDE.md` | Fleet coordination rules |
 | `~/CLAUDE.md` | Machine identity (`scott_lpt`) |
--- a/Cortex_and_Inara.code-workspace
+++ b/Cortex_and_Inara.code-workspace
@@ -0,0 +1,75 @@
+{
+  "folders": [
+    {
+      "name": "cortex (service)",
+      "path": "cortex"
+    },
+    {
+      "name": "home (personas)",
+      "path": "home"
+    },
+    {
+      "name": "documentation",
+      "path": "documentation"
+    },
+    {
+      "name": "docs (integrations)",
+      "path": "docs"
+    },
+    {
+      "name": "project root",
+      "path": "."
+    }
+  ],
+  "settings": {
+    "files.exclude": {
+      "**/__pycache__": true,
+      "**/*.pyc": true,
+      "cortex/.venv": true,
+      "cortex/data": true
+    },
+    "search.exclude": {
+      "**/__pycache__": true,
+      "cortex/.venv": true,
+      "cortex/data": true,
+      "home/**/sessions": true,
+      "home/**/session_data": true
+    },
+    "[python]": {
+      "editor.formatOnSave": false
+    },
+    "editor.rulers": [100],
+    "files.associations": {
+      "*.env": "dotenv",
+      "*.env.default": "dotenv"
+    }
+  },
+  "extensions": {
+    "recommendations": [
+      "ms-python.python",
+      "ms-python.vscode-pylance",
+      "humao.rest-client",
+      "tamasfe.even-better-toml"
+    ]
+  },
+  "launch": {
+    "version": "0.2.0",
+    "configurations": [
+      {
+        "name": "Cortex (uvicorn dev)",
+        "type": "python",
+        "request": "launch",
+        "module": "uvicorn",
+        "args": [
+          "main:app",
+          "--host", "0.0.0.0",
+          "--port", "8000",
+          "--reload"
+        ],
+        "cwd": "${workspaceFolder:cortex (service)}",
+        "envFile": "${workspaceFolder:cortex (service)}/.env",
+        "justMyCode": false
+      }
+    ]
+  }
+}
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@

 > *"You can't stop the signal."*

-Cortex is a self-hosted multi-agent AI platform. It supports multiple users, each with their own named AI persona. Inara (Scott's persona) and Tina (Holly's persona) are the initial instances.
+Cortex is a self-hosted multi-agent AI platform. It supports multiple users, each with their own named AI persona.

 ---

@@ -16,9 +16,7 @@ Cortex is a self-hosted multi-agent AI platform. It supports multiple users, eac
 |---|---|
 | `cortex/` | FastAPI service — dispatcher, routing, LLM backends, session management |
 | `home/` | User and persona data (`home/{username}/persona/{name}/`) |
-| `home/scott/persona/inara/` | Inara identity, memory, and context files |
-| `home/holly/persona/tina/` | Tina identity, memory, and context files |
-| `docs/` | Integration reference docs (NC Talk bot, etc.) |
+| `docs/` | Integration reference docs (NC Talk bot, Google Chat bot) |
 | `documentation/` | Architecture decisions, project plans, agent task lists |

 ---
@@ -69,49 +67,55 @@ http://localhost:8000   (or cortex.dgrzone.com on WireGuard)
 The service starts automatically at boot via `loginctl enable-linger`.
 Service file: `~/.config/systemd/user/cortex.service`

-Config lives in `cortex/config.py` and a `.env` file at the project root (not tracked — see `.env.default`).
+Config lives in `cortex/config.py` and `cortex/.env` (not tracked — see `cortex/.env.example`).

 ---

 ## Key Documentation

+**Start here for a full picture:** [`documentation/MASTER.md`](documentation/MASTER.md)
+
 | File | Purpose |
 |---|---|
-| `documentation/TODO__Agents.md` | Active task list — read first |
-| `documentation/ARCH__Intelligence_Layer.md` | Intelligence layer architecture (orchestrator, dev agents, knowledge) |
-| `docs/NEXTCLOUD_TALK_BOT.md` | NC Talk bot setup |
-| `home/scott/persona/inara/IDENTITY.md` | Inara persona and identity |
-| `home/scott/persona/inara/HELP.md` | In-app help content (rendered in UI) |
-| `home/scott/persona/inara/PROTOCOLS.md` | Inara behavioral protocols |
-| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+| `documentation/MASTER.md` | Index — current state, all doc links, quick reference |
+| `documentation/ROADMAP.md` | Phases — what's done, what's next |
+| `documentation/TODO__Agents.md` | Active task list |
+| `documentation/ARCH__SYSTEM.md` | System architecture and component map |
+| `documentation/ARCH__BACKENDS.md` | LLM backends, routing, fallback |
+| `documentation/ARCH__PERSONA.md` | Persona system, context tiers, memory distillation |
+| `documentation/ARCH__CHANNELS.md` | Input channels — web, NC Talk, Google Chat, cron |
+| `documentation/ARCH__FUTURE.md` | Planned features — local orchestrator, dev agents, knowledge layer |
+| `docs/NEXTCLOUD_TALK_BOT.md` | NC Talk bot setup and troubleshooting |
+| `docs/GOOGLE_CHAT_BOT.md` | Google Chat Add-on setup |
+| `docs/OPEN_WEBUI_API.md` | Open WebUI/Ollama API reference |

 ---

 ## Architecture at a Glance

 ```
-[User / Cron / Webhook]
+[Web UI / NC Talk / Google Chat / Cron / Webhooks]
        ↓
  Cortex Dispatcher  (FastAPI, cortex/)
-    ├─ POST /chat            — direct to LLM (streaming SSE)
-    ├─ POST /orchestrate     — Gemini tool loop → Claude response
-    ├─ POST /webhook/nextcloud  — Nextcloud Talk bot
-    └─ POST /webhook/google     — Google Chat Add-on
+    ├─ POST /chat                            — direct to LLM (streaming SSE)
+    ├─ POST /orchestrate                     — Gemini tool loop → Claude response
+    ├─ POST /webhook/nextcloud/{username}    — Nextcloud Talk bot (per-user)
+    └─ POST /channels/google-chat/{username} — Google Chat Add-on (per-user)
        ↓
-  LLM Backend(s)
-  • Claude CLI   — primary reasoning, coding, long-context
-  • Gemini CLI   — secondary / cost routing
-  • Gemini API   — orchestrator tool loop (separate from Gemini CLI)
-  • Ollama       — offline/private (scott_gaming, future)
+  LLM Backends
+  • Claude CLI   — primary, all user-facing responses
+  • Gemini CLI   — fallback
+  • Gemini API   — orchestrator tool loop only (not general chat)
+  • Local        — Open WebUI/Ollama on scott_gaming (private/offline)
        ↓
  Persona context loaded from home/{user}/persona/{name}/
 ```

-See `documentation/ARCH__Intelligence_Layer.md` for the orchestrator/responder and dev-agent architecture.
+See `documentation/ARCH__SYSTEM.md` for the full architecture breakdown.

 ---

-## Inara / Tina
+## Personas

 Each persona has its own identity, memory, and session history.
 They are not tied to a specific LLM model — the name is fixed, the backend varies.
@@ -120,17 +124,23 @@ Context is loaded at request time from `home/{user}/persona/{name}/` via `cortex
 | User | Persona | Description |
 |---|---|---|
 | scott | inara | Scott's primary AI assistant |
+| scott | developer | Scott's dev-focused persona |
 | holly | tina | Holly's primary AI assistant |
+| brian | wintermute | Brian's primary AI assistant |

 ---

 ## Channels

-| Channel | Status | Notes |
+Webhook endpoints are per-user — each user configures their own secrets in `home/{username}/channels.json`.
+
+| Channel | Status | Endpoint |
 |---|---|---|
 | Web UI | Live | `https://cortex.dgrzone.com` — session auth (login form + JWT cookie) |
-| Nextcloud Talk | Live | HMAC-signed webhook, async reply |
-| Google Chat | Live | Workspace Add-on, JWT auth |
+| Nextcloud Talk | Live | `POST /webhook/nextcloud/{username}` — HMAC-signed, async reply |
+| Google Chat | Live | `POST /channels/google-chat/{username}` — Workspace Add-on, JWT auth |
+
+See `docs/NEXTCLOUD_TALK_BOT.md` and `docs/GOOGLE_CHAT_BOT.md` for setup instructions.

 ---

@@ -142,7 +152,10 @@ cd cortex
 # Create a user directory and send an invite email
 .venv/bin/python manage_passwords.py invite <username> <email>

-# List users with password and email status
+# Register a Google account for sign-in (run after user completes onboarding)
+.venv/bin/python manage_passwords.py google-add <username> <email>
+
+# List users with password, Google, and email status
 .venv/bin/python manage_passwords.py list

 # Set/check a password directly
@@ -152,6 +165,8 @@ cd cortex

 New users receive a link to `/setup/{token}` where they set their own password and create their first persona. Invite tokens expire in 72 hours and are one-time-use.

+To enable a channel for a user, create `home/{username}/channels.json` — see the relevant doc in `docs/`.
+
 ---

 ## Testing
--- a/cortex-holly.service
+++ b/cortex-holly.service
@@ -1,15 +0,0 @@
-[Unit]
-Description=Cortex / Holly LLM Gateway
-After=network.target
-
-[Service]
-Type=simple
-User=scott
-WorkingDirectory=/home/scott/agents_sync/projects/Cortex_and_Inara_dev/cortex
-EnvironmentFile=/home/scott/agents_sync/projects/Cortex_and_Inara_dev/cortex/.env.holly
-ExecStart=/home/scott/agents_sync/projects/Cortex_and_Inara_dev/cortex/.venv/bin/uvicorn main:app --host 0.0.0.0 --port 8001
-Restart=on-failure
-RestartSec=5
-
-[Install]
-WantedBy=multi-user.target
--- a/cortex/.env.example
+++ b/cortex/.env.example
@@ -1,33 +1,106 @@
-# Auth is handled by the claude CLI (claude setup-token) — no API key needed here.
-# ANTHROPIC_API_KEY=only_needed_if_switching_to_sdk
+# Cortex .env reference — copy to .env and fill in values
+# DO NOT commit .env — it contains secrets

-# Path to the inara/ identity directory — relative to cortex/ or absolute
-INARA_DIR=../inara
+# ── Agent identity ───────────────────────────────────────────────────────────
+# Global display names used in distillation prompts and session logs.
+# Individual persona identities live in home/{username}/persona/{name}/IDENTITY.md
+AGENT_NAME=Inara
+USER_NAME=Scott

-# Path for persistent JSON session files
-SESSIONS_DIR=./data/sessions
+# ── Home directory ────────────────────────────────────────────────────────────
+# Root for all user/persona data. Layout: home/{username}/persona/{name}/
+# Relative paths are resolved from the cortex/ directory.
+# Default: ../home  (i.e. Cortex_and_Inara_dev/home/)
+# HOME_DIR=../home

-# LLM defaults
-DEFAULT_MODEL=claude-sonnet-4-6
-DEFAULT_TIER=2
+# ── Google OAuth — "Sign in with Google" ────────────────────────────────────
+# Create credentials at console.cloud.google.com → APIs & Services → Credentials
+# Application type: Web Application
+# Authorised redirect URI: https://cortex.dgrzone.com/auth/google/callback
+# Pre-register users: cd cortex && .venv/bin/python manage_passwords.py google-add <user> <email>
+# Per-user Gemini key: add "gemini_api_key": "AIza..." to home/{username}/auth.json
+GOOGLE_CLIENT_ID=
+GOOGLE_CLIENT_SECRET=

-# Session rolling window — number of messages to keep (user + assistant pairs)
-# 40 = 20 turns
-MAX_HISTORY_MESSAGES=40
+# ── Session auth ─────────────────────────────────────────────────────────────
+# Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
+JWT_SECRET=change-me-in-dotenv
+JWT_EXPIRE_DAYS=30

-# Per-backend timeouts (seconds)
-# Gemini is generous — it frequently takes 30-60s under load
-# Local models may need time to load into VRAM before first response
+# ── SMTP (invite emails + future notifications) ───────────────────────────────
+SMTP_SERVER=linode.oneskyit.com
+SMTP_PORT=465
+SMTP_USERNAME=send_mail
+SMTP_PASSWORD=
+SMTP_FROM_EMAIL=noreply@oneskyit.com
+SMTP_FROM_NAME=Cortex
+# Base URL included in invite links
+CORTEX_BASE_URL=https://cortex.dgrzone.com
+
+# ── Server ──────────────────────────────────────────────────────────────────
+HOST=0.0.0.0
+PORT=8000
+
+# ── Google Chat bot ──────────────────────────────────────────────────────────
+# JWT audience for verifying inbound Workspace Add-on Chat webhook requests.
+# For Workspace Add-on Chat apps, the aud claim = the endpoint URL.
+# Leave blank to disable verification (dev/testing only).
+GOOGLE_CHAT_AUDIENCE=https://cortex.dgrzone.com/channels/google-chat
+
+# ── Nextcloud Talk bot ───────────────────────────────────────────────────────
+NEXTCLOUD_URL=https://cloud.dgrzone.com
+NEXTCLOUD_TALK_BOT_SECRET=
+
+# ── LLM backends ────────────────────────────────────────────────────────────
+# Primary backend: "claude", "gemini", or "local" (switchable at runtime via UI)
+PRIMARY_BACKEND=claude
+
+# Timeouts in seconds
 TIMEOUT_CLAUDE=60
 TIMEOUT_GEMINI=120
-TIMEOUT_LOCAL=300
+TIMEOUT_LOCAL=300   # local models may need time to load

-# Google Chat — must respond within 30s or Chat shows an error to the user
-GOOGLE_CHAT_TIMEOUT=25
-# Backend pinned for Google Chat (claude recommended — more reliable within 25s)
-GOOGLE_CHAT_BACKEND=claude
-# TODO: add GOOGLE_CHAT_TOKEN for request verification once endpoint is public
+# ── Local model (Open WebUI / Ollama — OpenAI-compatible API) ────────────────
+# Leave LOCAL_API_URL blank to disable. When set, "local" appears as a backend option.
+# API key: Open WebUI → Settings → Account → API Keys
+# Model: workspace alias or full Ollama model name
+LOCAL_API_URL=http://192.168.32.19:3000
+LOCAL_API_KEY=
+LOCAL_MODEL=test-agent-simple

-# Server
-PORT=8000
-HOST=0.0.0.0
+# ── Orchestrator (Gemini API — not Gemini CLI) ───────────────────────────────
+# Required for /orchestrate endpoint and tool use
+# Free tier key: https://aistudio.google.com/apikey
+GEMINI_API_KEY=
+
+# Model for the orchestration tool loop (not the user-facing response)
+ORCHESTRATOR_MODEL=gemini-2.5-flash
+
+# Safety cap on tool loop iterations
+ORCHESTRATOR_MAX_ROUNDS=10
+
+# ── DuckDuckGo search ────────────────────────────────────────────────────────
+# Leave blank for free unauthenticated tier
+# Set to your API key for higher rate limits (paid DuckDuckGo account)
+DDG_API_KEY=
+DDG_MAX_RESULTS=5
+
+# ── Aether Platform API ───────────────────────────────────────────────────────
+# Used by orchestrator tools: ae_journal_search, ae_journal_entry_create, ae_task_list
+# Same values as agents_sync/mcp/.env — copy from there
+AE_API_URL=https://dev-api.oneskyit.com
+AE_API_KEY=
+AE_ACCOUNT_ID=
+AE_API_TIMEOUT=15
+
+# ── Distillation schedule ────────────────────────────────────────────────────
+SCHEDULER_TIMEZONE=America/New_York
+AUTO_DISTILL=true
+AUTO_DISTILL_SHORT=true
+AUTO_DISTILL_MID=true
+AUTO_DISTILL_LONG=false   # manual review recommended before enabling
+
+# Memory tier token budgets (soft caps)
+MEMORY_BUDGET_SHORT=3000
+MEMORY_BUDGET_MID=2000
+MEMORY_BUDGET_LONG=2000
--- a/cortex/auth_middleware.py
+++ b/cortex/auth_middleware.py
@@ -19,8 +19,8 @@ from auth_utils import COOKIE_NAME, decode_token
 # Paths that don't require a session cookie
 _PUBLIC = {"/login", "/logout", "/health"}

-# Path prefixes that are always public (setup flow + webhooks)
-_PUBLIC_PREFIXES = ("/setup/", "/channels/", "/webhook/")
+# Path prefixes that are always public (setup flow + webhooks + Google OAuth)
+_PUBLIC_PREFIXES = ("/setup/", "/channels/", "/webhook/", "/auth/google")


 class SessionAuthMiddleware(BaseHTTPMiddleware):
--- a/cortex/auth_utils.py
+++ b/cortex/auth_utils.py
@@ -29,33 +29,92 @@ ALGORITHM = "HS256"


 # ---------------------------------------------------------------------------
-# Password helpers
+# auth.json helpers — read/write without clobbering unrelated fields
 # ---------------------------------------------------------------------------

 def _auth_path(username: str) -> Path:
    return settings.home_root() / username / "auth.json"


+def _read_auth(username: str) -> dict:
+    path = _auth_path(username)
+    if not path.exists():
+        return {}
+    try:
+        return json.loads(path.read_text())
+    except Exception:
+        return {}
+
+
+def _write_auth(username: str, data: dict) -> None:
+    path = _auth_path(username)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(json.dumps(data, indent=2) + "\n")
+
+
+# ---------------------------------------------------------------------------
+# Password helpers
+# ---------------------------------------------------------------------------
+
 def set_password(username: str, password: str) -> None:
-    """Hash and store a password for a user. Creates auth.json if needed."""
-    hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
-    _auth_path(username).write_text(json.dumps({"password_hash": hashed}) + "\n")
+    """Hash and store a password. Preserves any existing fields in auth.json."""
+    data = _read_auth(username)
+    data["password_hash"] = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
+    _write_auth(username, data)
    logger.info("password set for user: %s", username)


 def check_credentials(username: str, password: str) -> bool:
    """Return True if username+password are valid, False otherwise."""
-    path = _auth_path(username)
-    if not path.exists():
-        return False
    try:
-        data = json.loads(path.read_text())
-        stored = data.get("password_hash", "").encode()
+        stored = _read_auth(username).get("password_hash", "").encode()
+        if not stored:
+            return False
        return bcrypt.checkpw(password.encode(), stored)
    except Exception:
        return False


+# ---------------------------------------------------------------------------
+# Google OAuth helpers
+# ---------------------------------------------------------------------------
+
+def find_user_by_google(sub: str, email: str) -> str | None:
+    """
+    Scan all users for one whose auth.json matches the given Google sub or email.
+    Sub match takes priority (stable); email match is a fallback for first sign-in.
+    Returns the username, or None if no match.
+    """
+    root = settings.home_root()
+    if not root.exists():
+        return None
+    for user_dir in sorted(root.iterdir()):
+        if not user_dir.is_dir():
+            continue
+        data = _read_auth(user_dir.name)
+        if not data:
+            continue
+        if sub and data.get("google_sub") == sub:
+            return user_dir.name
+        if email and data.get("google_email", "").lower() == email.lower():
+            return user_dir.name
+    return None
+
+
+def link_google(username: str, sub: str, email: str) -> None:
+    """Store / update Google sub and email in a user's auth.json."""
+    data = _read_auth(username)
+    data["google_sub"]   = sub
+    data["google_email"] = email
+    _write_auth(username, data)
+    logger.info("Google account linked for user: %s (%s)", username, email)
+
+
+def get_user_gemini_key(username: str) -> str | None:
+    """Return the user's personal Gemini API key, or None to use the server key."""
+    return _read_auth(username).get("gemini_api_key") or None
+
+
 # ---------------------------------------------------------------------------
 # JWT helpers
 # ---------------------------------------------------------------------------
@@ -136,3 +195,22 @@ def consume_invite(username: str) -> None:
            path.write_text(json.dumps(data) + "\n")
        except Exception:
            pass
+
+
+# ---------------------------------------------------------------------------
+# Per-user channel config
+# ---------------------------------------------------------------------------
+
+def _channels_path(username: str) -> Path:
+    return settings.home_root() / username / "channels.json"
+
+
+def get_user_channels(username: str) -> dict:
+    """Return the parsed channels.json for a user, or {} if not found."""
+    path = _channels_path(username)
+    if not path.exists():
+        return {}
+    try:
+        return json.loads(path.read_text())
+    except Exception:
+        return {}
--- a/cortex/config.py
+++ b/cortex/config.py
@@ -5,6 +5,12 @@ from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
    anthropic_api_key: str | None = None  # not used — claude CLI handles auth

+    # Google OAuth — "Sign in with Google" for all users
+    # Create credentials at console.cloud.google.com → APIs & Services → Credentials
+    # Add https://<your-domain>/auth/google/callback as an authorised redirect URI
+    google_client_id: str | None = None
+    google_client_secret: str | None = None
+
    # Orchestrator (Gemini API — separate from Gemini CLI)
    # Get a key at: https://aistudio.google.com/apikey (free tier is sufficient)
    gemini_api_key: str | None = None
@@ -34,26 +40,17 @@ class Settings(BaseSettings):
    max_history_messages: int = 40  # rolling window — 20 turns (user + assistant)
    primary_backend: str = "claude"  # "claude" or "gemini" — other is always fallback

+    # Local model backend — OpenAI-compatible API (Open WebUI / Ollama)
+    # Set LOCAL_API_URL in .env to enable; leave blank to disable
+    local_api_url: str = ""            # e.g. http://192.168.32.19:3000
+    local_api_key: str = ""            # sk-... from Open WebUI → Settings → Account → API Keys
+    local_model: str = ""              # workspace or model name, e.g. test-agent-simple
+
    # Per-backend timeouts in seconds
    timeout_claude: int = 60
    timeout_gemini: int = 120   # frequently slow under load
    timeout_local: int = 300    # local models may need to load first

-    # Google Chat
-    # JWT audience (aud) claim to verify on inbound webhook requests.
-    # Google Chat sets aud = the Google Cloud project number (e.g. "741112865538").
-    # Set to "" to disable verification (dev/testing only).
-    google_chat_audience: str = ""
-    # Google Chat must receive a response within 30s or shows an error to the user
-    google_chat_timeout: int = 25
-    # Backend forced for Google Chat — Claude is more reliable within the 25s deadline
-    google_chat_backend: str = "claude"
-
-    # Nextcloud Talk bot
-    nextcloud_url: str = "https://cloud.dgrzone.com"
-    nextcloud_talk_bot_secret: str = ""   # set in .env
-    nextcloud_talk_timeout: int = 55
-
    # Auto-distillation schedule — override in .env
    # AUTO_DISTILL=false disables entirely
    scheduler_timezone: str = "America/New_York"  # IANA tz — override in .env if needed
@@ -62,6 +59,26 @@ class Settings(BaseSettings):
    auto_distill_mid: bool = True     # weekly Sunday at 03:30 — LLM summarizes short → mid
    auto_distill_long: bool = False   # monthly 1st at 04:00 — off by default (manual review recommended)

+    # Which backend to use for distillation LLM calls.
+    # "" = use primary_backend (default); "local" = use local model (saves API credits).
+    # "long" stays on default (claude/gemini) for best quality.
+    distill_backend_mid: str = ""
+    distill_backend_long: str = ""
+
+    # Model registry: default backend type per role when user registry has no entry.
+    # Values: "claude_cli" | "gemini_cli" | "gemini_api" (builtin IDs)
+    # Override in .env: ROLE_CHAT=claude_cli  ROLE_DISTILL=gemini_api  etc.
+    role_chat: str = "claude_cli"
+    role_orchestrator: str = "gemini_api"
+    role_distill: str = "claude_cli"
+    role_coder: str = "claude_cli"
+    role_research: str = "gemini_api"
+
+    # Comma-separated list of standard roles shown in the model settings UI.
+    # Add custom roles here to extend the UI without code changes.
+    # Example: DEFINED_ROLES=chat,orchestrator,distill,coder,research,medical
+    defined_roles: str = "chat,orchestrator,distill,coder,research"
+
    # Memory tier token budgets — soft caps used during distillation
    # Override in .env: MEMORY_BUDGET_LONG=4000 etc.
    memory_budget_long: int = 2000
@@ -87,6 +104,14 @@ class Settings(BaseSettings):

    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8", extra="ignore")

+    def get_defined_roles(self) -> list[str]:
+        """Return the ordered list of standard roles from the defined_roles setting."""
+        return [r.strip() for r in self.defined_roles.split(",") if r.strip()]
+
+    def get_role_default(self, role: str) -> str:
+        """Return the .env default backend type for a role (e.g. 'claude_cli')."""
+        return getattr(self, f"role_{role.replace('-', '_')}", "claude_cli")
+
    def home_root(self) -> Path:
        """Resolve home_dir relative to this file's location if not absolute."""
        if self.home_dir.is_absolute():
--- a/cortex/cron_runner.py
+++ b/cortex/cron_runner.py
@@ -10,16 +10,20 @@ Job schema:
    "id":         "c_abc123",
    "label":      "Human-readable name",
    "schedule":   "daily:09:00",   # see parse_schedule() for all formats
-    "type":       "remind" | "note",
-    "payload":    "Text to write when the job fires",
+    "type":       "remind" | "note" | "message" | "brief",
+    "payload":    "Text or prompt when the job fires",
+    "channel":    null | "nextcloud" | "google_chat",  # for message/brief types
    "enabled":    true,
    "created_at": "ISO 8601",
    "last_run":   null | "ISO 8601"
  }

 Job types:
-  remind  → appends to inara/REMINDERS.md  (auto-loaded into Inara's context)
-  note    → appends to inara/SCRATCH.md    (read on demand via scratch_read)
+  remind   → appends to REMINDERS.md  (auto-loaded into context at tier 2+)
+  note     → appends to SCRATCH.md    (read on demand via scratch_read)
+  message  → sends payload as-is to NC Talk notification_room
+  brief    → runs LLM with payload as the prompt, sends response to NC Talk
+             (good for morning briefings, summaries, proactive check-ins)
 """

 import logging
@@ -150,6 +154,40 @@ async def run_job(job: dict) -> None:
        p.write_text(existing.rstrip() + "\n" + section)
        logger.info("cron [note] fired: %s", label)

+    elif job_type == "message":
+        # Send payload text directly to the user's notification channel
+        from notification import notify
+        username = job.get("user") or "scott"
+        channel  = job.get("channel") or None
+        await notify(username, payload, channel=channel)
+        logger.info("cron [message] sent: %s", label)
+
+    elif job_type == "brief":
+        # Run LLM with payload as the prompt, send response to notification channel.
+        # Great for morning briefings, reminders, proactive check-ins.
+        from context_loader import load_context
+        from llm_client import complete
+        from notification import notify
+        from persona import set_context
+        from config import settings as _s
+
+        username   = job.get("user") or _s.user_name.lower()
+        persona_nm = job.get("persona") or _s.agent_name.lower()
+        channel    = job.get("channel") or None
+        set_context(username, persona_nm)
+
+        system_prompt = load_context(2)  # tier 2: identity + memory + user profile
+        try:
+            response_text, backend = await complete(
+                system_prompt=system_prompt,
+                messages=[{"role": "user", "content": payload}],
+                role="chat",
+            )
+            await notify(username, response_text, channel=channel)
+            logger.info("cron [brief] sent via %s: %s", backend, label)
+        except Exception as e:
+            logger.error("cron [brief] LLM error for %s: %s", label, e)
+
    else:
        logger.warning("cron: unknown type %r (job %s)", job_type, job.get("id"))
        return
--- a/cortex/llm_client.py
+++ b/cortex/llm_client.py
@@ -31,22 +31,59 @@ async def cleanup() -> None:
    _active_pgroups.clear()


+# Map from registry model type → dispatch function key
+_TYPE_TO_BACKEND = {
+    "claude_cli":   "claude",
+    "gemini_cli":   "gemini",
+    "gemini_api":   "gemini",   # gemini_api falls back to CLI in this context
+    "local_openai": "local",
+}
+
+# Explicit UI toggle values (kept for backward compat)
+_EXPLICIT_BACKENDS = ("claude", "gemini", "local")
+_FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"}
+
+
 async def complete(
    system_prompt: str,
    messages: list[dict],
    model: str | None = None,
+    role: str = "chat",
    max_tokens: int = 2048,
 ) -> tuple[str, str]:
-    """Returns (response_text, actual_backend_used)."""
-    if model in ("claude", "gemini"):
+    """
+    Returns (response_text, actual_backend_used).
+
+    model: explicit backend override ("claude" | "gemini" | "local") from UI toggle.
+           None = resolve via model registry for the given role.
+    role:  registry role used when model is None (default: "chat").
+    """
+    import model_registry as _reg
+    from persona import _user
+
+    username = _user.get()
+    resolved_cfg: dict | None = None
+
+    if model in _EXPLICIT_BACKENDS:
+        # User explicitly selected a backend in the UI
+        if model == "local":
+            resolved_cfg = _reg.get_best_local_model(username, role)
+            if not resolved_cfg:
+                raise RuntimeError("No local model configured — add one at /settings/models")
        primary = model
    else:
-        primary = settings.primary_backend
+        # Role-based routing via model registry
+        resolved = _reg.get_model_for_role(username, role)
+        if resolved:
+            resolved_cfg = resolved
+            primary = _TYPE_TO_BACKEND.get(resolved["type"], "claude")
+        else:
+            primary = settings.primary_backend

-    fallback = "gemini" if primary == "claude" else "claude"
+    fallback = _FALLBACK.get(primary, "claude")

    try:
-        response = await _dispatch(primary, system_prompt, messages, model)
+        response = await _dispatch(primary, system_prompt, messages, resolved_cfg)
        return response, primary
    except Exception as e:
        err_str = str(e)
@@ -61,11 +98,13 @@ async def _dispatch(
    backend: str,
    system_prompt: str,
    messages: list[dict],
-    model: str | None,
+    model_cfg: dict | None,
 ) -> str:
    if backend == "gemini":
        return await _gemini(system_prompt, messages)
-    return await _claude(system_prompt, messages, model)
+    if backend == "local":
+        return await _local(system_prompt, messages, model_cfg)
+    return await _claude(system_prompt, messages, model_cfg)


 def _fresh_claude_token() -> str | None:
@@ -85,14 +124,16 @@ def _fresh_claude_token() -> str | None:
        return None


-async def _claude(system_prompt: str, messages: list[dict], model: str | None) -> str:
+async def _claude(system_prompt: str, messages: list[dict], model_cfg: dict | None) -> str:
+    model_name = (model_cfg or {}).get("model_name") if model_cfg else None
    cmd = [
        "claude", "--print",
        "--no-session-persistence",
        "--output-format", "text",
    ]
-    if model and model not in ("claude", "gemini"):
-        cmd.extend(["--model", model])
+    # Only pass --model if it's a real model name (not a backend type string)
+    if model_name and model_name not in ("claude", "gemini", "local", ""):
+        cmd.extend(["--model", model_name])
    if system_prompt:
        cmd.extend(["--system-prompt", system_prompt])
    cmd.append(_build_conversation(messages))
@@ -108,6 +149,60 @@ async def _claude(system_prompt: str, messages: list[dict], model: str | None) -
    return await _run(cmd, timeout=settings.timeout_claude, env=env)


+async def _local(system_prompt: str, messages: list[dict], model_cfg: dict | None = None) -> str:
+    """OpenAI-compatible backend — Open WebUI / Ollama.
+
+    model_cfg is pre-resolved by complete() via model_registry.
+    Falls back to registry lookup if not provided.
+    """
+    import httpx
+
+    cfg = model_cfg
+    if not cfg:
+        # Fallback: resolve directly from registry
+        import model_registry as _reg
+        from persona import _user
+        cfg = _reg.get_best_local_model(_user.get())
+    if not cfg:
+        raise RuntimeError("No local model configured — add one at /settings/models")
+
+    api_url = cfg["api_url"]
+    api_key = cfg["api_key"]
+    model   = cfg["model_name"]
+
+    if not api_url:
+        raise RuntimeError("local_api_url not configured — set LOCAL_API_URL in .env or add a host at /settings/local")
+    if not model:
+        raise RuntimeError("local_model not configured — add a model at /settings/local")
+
+    host_type = cfg.get("host_type", "openwebui")
+    # "openwebui" uses Open WebUI/Ollama path layout; "openai" uses standard OpenAI layout
+    chat_path = "/chat/completions" if host_type == "openai" else "/api/chat/completions"
+    logger.info("local backend (%s): %s @ %s", host_type, model, api_url)
+
+    msgs: list[dict] = []
+    if system_prompt:
+        msgs.append({"role": "system", "content": system_prompt})
+    msgs.extend(messages)
+
+    url = api_url.rstrip("/") + chat_path
+    headers: dict[str, str] = {}
+    if api_key:
+        headers["Authorization"] = f"Bearer {api_key}"
+
+    payload = {"model": model, "messages": msgs}
+
+    async with httpx.AsyncClient(timeout=settings.timeout_local) as client:
+        resp = await client.post(url, json=payload, headers=headers)
+        resp.raise_for_status()
+        data = resp.json()
+
+    text = data["choices"][0]["message"]["content"]
+    if not text or not text.strip():
+        raise RuntimeError("Local model returned an empty response")
+    return text.strip()
+
+
 async def _gemini(system_prompt: str, messages: list[dict]) -> str:
    # Gemini CLI spawns MCP child processes that keep stdout pipes open after responding.
    # start_new_session=True puts the whole tree in its own process group so
--- a/cortex/main.py
+++ b/cortex/main.py
@@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag
 from config import settings
 from auth_middleware import SessionAuthMiddleware
 from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator
-from routers import ui, onboarding, settings, help
+from routers import ui, onboarding, settings, help, auth_google, local_llm


@asynccontextmanager
@@ -39,11 +39,15 @@ app.include_router(orchestrator.router)
 # ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc.
 app.mount("/static", StaticFiles(directory="static"), name="static")

+# Google OAuth — must be before ui.router (wildcard /{user}/{persona} would swallow it)
+app.include_router(auth_google.router)
+
 # Onboarding (invite tokens + persona creation — before ui.router)
 app.include_router(onboarding.router)

 # Account settings
 app.include_router(settings.router)
+app.include_router(local_llm.router)

 # Help page
 app.include_router(help.router)
--- a/cortex/manage_passwords.py
+++ b/cortex/manage_passwords.py
@@ -6,9 +6,10 @@ Usage:
  python manage_passwords.py set <username>                 # prompt for password
  python manage_passwords.py set <username> <pass>          # set directly (avoid in shell history)
  python manage_passwords.py check <username>               # test a password interactively
-  python manage_passwords.py list                           # show users, passwords, and emails
+  python manage_passwords.py list                           # show users, auth methods, and emails
  python manage_passwords.py invite <username> [email]      # generate + optionally email invite link
  python manage_passwords.py email <username> <email>       # store/update an email address
+  python manage_passwords.py google-add <username> <email>  # register a user for Google sign-in
 """

 import json
@@ -18,7 +19,7 @@ import getpass
 # Add cortex/ to path so we can import config and auth_utils
 sys.path.insert(0, str(__import__('pathlib').Path(__file__).parent))

-from auth_utils import set_password, check_credentials, _auth_path, create_invite
+from auth_utils import set_password, check_credentials, _auth_path, create_invite, link_google, _read_auth
 from persona import list_users
 from config import settings

@@ -96,10 +97,14 @@ def cmd_list(_args):
    if not users:
        print("  No users found in home/")
        return
+    print(f"  {'USER':<18} {'PW':<6} {'GOOGLE':<8} {'EMAIL'}")
+    print(f"  {'-'*18} {'-'*6} {'-'*8} {'-'*30}")
    for user in users:
-        has_pw    = "✓ pw" if _auth_path(user).exists() else "✗ pw"
-        email     = get_email(user) or "—"
-        print(f"  {user:<20} {has_pw}   {email}")
+        auth   = _read_auth(user)
+        has_pw = "✓" if auth.get("password_hash") else "—"
+        google = auth.get("google_email") or "—"
+        email  = get_email(user) or "—"
+        print(f"  {user:<18} {has_pw:<6} {google:<36} {email}")


 def cmd_email(args):
@@ -149,6 +154,22 @@ def cmd_invite(args):
        print("Tip: python manage_passwords.py invite <username> <email>  to email it next time.\n")


+def cmd_google_add(args):
+    if len(args) < 2:
+        print("Usage: manage_passwords.py google-add <username> <google_email>")
+        sys.exit(1)
+    username, email = args[0], args[1].lower().strip()
+
+    # Ensure the user directory exists
+    (settings.home_root() / username).mkdir(parents=True, exist_ok=True)
+
+    # Store in auth.json (google_sub filled in on first sign-in) + profile.json (for invites)
+    link_google(username, sub="", email=email)
+    set_email(username, email)
+    print(f"Google sign-in registered for {username!r}: {email}")
+    print(f"They can now sign in at {settings.cortex_base_url}/login using that Google account.")
+
+
 if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(__doc__)
@@ -167,6 +188,8 @@ if __name__ == "__main__":
        cmd_email(rest)
    elif command == "invite":
        cmd_invite(rest)
+    elif command == "google-add":
+        cmd_google_add(rest)
    else:
        print(f"Unknown command: {command}")
        print(__doc__)
--- a/cortex/memory_distiller.py
+++ b/cortex/memory_distiller.py
@@ -77,10 +77,16 @@ def distill_short(username: str | None = None, persona: str | None = None) -> di
 async def distill_mid(username: str | None = None, persona: str | None = None) -> dict:
    """
    Ask the LLM to summarize MEMORY_SHORT.md → MEMORY_MID.md.
+    Uses DISTILL_BACKEND_MID if set (e.g. "local"), otherwise primary_backend.
    """
    from llm_client import complete
+    from persona import set_context

-    inara_dir = _persona_path(username, persona)
+    u = username or settings.user_name.lower()
+    p = persona or settings.agent_name.lower()
+    set_context(u, p)
+
+    inara_dir = _persona_path(u, p)
    short_content = _read(inara_dir / "MEMORY_SHORT.md")

    if not short_content.strip() or "Not yet populated" in short_content:
@@ -100,6 +106,7 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": short_content}],
+        role="distill",
    )

    now = datetime.now().strftime("%Y-%m-%d %H:%M")
@@ -112,6 +119,7 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    logger.info("distill_mid: wrote %d chars via %s", len(header) + len(response_text), backend)

    return {
+        "username": u,
        "backend": backend,
        "chars_written": len(header) + len(response_text),
        "budget_tokens": budget_tokens,
@@ -121,10 +129,16 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
 async def distill_long(username: str | None = None, persona: str | None = None) -> dict:
    """
    Ask the LLM to integrate MEMORY_MID.md into MEMORY_LONG.md.
+    Uses DISTILL_BACKEND_LONG if set, otherwise primary_backend.
    """
    from llm_client import complete
+    from persona import set_context

-    inara_dir = _persona_path(username, persona)
+    u = username or settings.user_name.lower()
+    p = persona or settings.agent_name.lower()
+    set_context(u, p)
+
+    inara_dir = _persona_path(u, p)
    long_content = _read(inara_dir / "MEMORY_LONG.md")
    mid_content = _read(inara_dir / "MEMORY_MID.md")

@@ -149,6 +163,7 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": user_content}],
+        role="distill",
    )

    # Ensure the file has the right header if the LLM dropped it
@@ -165,6 +180,7 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    logger.info("distill_long: wrote %d chars via %s", len(response_text), backend)

    return {
+        "username": u,
        "backend": backend,
        "chars_written": len(response_text),
        "budget_tokens": budget_tokens,
--- a/cortex/model_registry.py
+++ b/cortex/model_registry.py
@@ -0,0 +1,460 @@
+"""
+Per-user unified model registry.
+
+Stored in: home/{user}/model_registry.json
+
+Schema:
+  {
+    "version": 1,
+    "hosts": [{"id", "label", "api_url", "api_key",
+               "host_type": "openwebui" | "openai"}, ...],
+    #
+    # host_type controls the API path layout:
+    #   "openwebui"  (default) — Open WebUI / Ollama:
+    #                   chat:   POST {url}/api/chat/completions
+    #                   models: GET  {url}/api/models
+    #   "openai"     — OpenRouter, LiteLLM, Anthropic-compatible, etc.:
+    #                   chat:   POST {url}/chat/completions
+    #                   models: GET  {url}/models
+    #   Set api_url to the base path that ends just before /chat/completions,
+    #   e.g. https://openrouter.ai/api/v1  for OpenRouter.
+    "models": [
+      {
+        "id":         str,             # unique within this registry
+        "type":       str,             # "local_openai" | "claude_cli" | "gemini_cli" | "gemini_api"
+        "label":      str,             # human-readable display name
+        "model_name": str,             # model identifier sent to the API
+        "host_id":    str | null,      # only for local_openai — references hosts[].id
+        "context_k":  int,             # context window in thousands of tokens (informational)
+        "tags":       [str],           # user-defined capability tags
+      },
+    ],
+    "roles": {
+      "<role>": {
+        "primary":  "<model_id>" | null,
+        "backup_1": "<model_id>" | null,
+        "backup_2": "<model_id>" | null,
+        "backup_3": "<model_id>" | null,
+        "backup_4": "<model_id>" | null,
+      },
+    },
+  }
+
+Built-in model IDs (always resolvable, no registry entry required):
+  "claude_cli"  — Claude CLI subprocess (~/.claude/.credentials.json)
+  "gemini_cli"  — Gemini CLI subprocess
+  "gemini_api"  — Gemini API (google-genai SDK; used by orchestrator engine, not llm_client)
+
+Standard roles are defined by settings.defined_roles (default: chat,orchestrator,distill,coder,research).
+Additional custom roles can be added freely to roles{}.
+
+Resolution for get_model_for_role(username, role):
+  1. User registry: roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4
+  2. .env default: ROLE_<ROLE>=<builtin_id>  (e.g. ROLE_CHAT=claude_cli)
+  3. Hardcoded last-resort defaults per role
+"""
+
+import json
+import logging
+import secrets
+from pathlib import Path
+
+from config import settings
+
+logger = logging.getLogger(__name__)
+
+# ── Built-in model definitions ────────────────────────────────────────────────
+# These IDs are always resolvable without a registry entry.
+
+def _builtins() -> dict[str, dict]:
+    """Return built-in model definitions (lazy so settings are resolved at call time)."""
+    return {
+        "claude_cli": {
+            "id":         "claude_cli",
+            "type":       "claude_cli",
+            "label":      f"Claude (CLI) — {settings.default_model}",
+            "model_name": settings.default_model,
+            "context_k":  200,
+            "tags":       ["chat", "persona", "creative"],
+        },
+        "gemini_cli": {
+            "id":         "gemini_cli",
+            "type":       "gemini_cli",
+            "label":      "Gemini (CLI)",
+            "model_name": "",
+            "context_k":  1000,
+            "tags":       ["chat", "research", "long_context"],
+        },
+        "gemini_api": {
+            "id":         "gemini_api",
+            "type":       "gemini_api",
+            "label":      f"Gemini API — {settings.orchestrator_model}",
+            "model_name": settings.orchestrator_model,
+            "context_k":  1000,
+            "tags":       ["orchestrator", "research", "long_context", "tools"],
+        },
+    }
+
+
+# Hardcoded last-resort defaults per role (used only if .env is also unset)
+_ROLE_LAST_RESORT: dict[str, str] = {
+    "chat":         "claude_cli",
+    "orchestrator": "gemini_api",
+    "distill":      "claude_cli",
+    "coder":        "claude_cli",
+    "research":     "gemini_api",
+}
+
+PRIORITY_KEYS = ["primary", "backup_1", "backup_2", "backup_3", "backup_4"]
+
+
+# ── Storage ───────────────────────────────────────────────────────────────────
+
+def _registry_path(username: str) -> Path:
+    return settings.home_root() / username / "model_registry.json"
+
+
+def _local_llm_path(username: str) -> Path:
+    return settings.home_root() / username / "local_llm.json"
+
+
+def _empty() -> dict:
+    return {"version": 1, "hosts": [], "models": [], "roles": {}}
+
+
+def _load(username: str) -> dict:
+    path = _registry_path(username)
+    if path.exists():
+        try:
+            data = json.loads(path.read_text())
+            if isinstance(data, dict) and "version" in data:
+                return data
+        except (json.JSONDecodeError, OSError):
+            logger.warning("model_registry.json for %s is unreadable — starting fresh", username)
+        return _empty()
+
+    # No registry yet — try migrating from local_llm.json
+    legacy = _local_llm_path(username)
+    if legacy.exists():
+        data = _migrate_from_local_llm(username, legacy)
+        _save(username, data)
+        logger.info("Migrated local_llm.json → model_registry.json for %s", username)
+        return data
+
+    return _empty()
+
+
+def _save(username: str, data: dict) -> None:
+    _registry_path(username).write_text(json.dumps(data, indent=2))
+
+
+# ── Migration ─────────────────────────────────────────────────────────────────
+
+def _migrate_from_local_llm(username: str, path: Path) -> dict:
+    """Convert local_llm.json (hosts/models/active_model_id) → model_registry format."""
+    try:
+        old = json.loads(path.read_text())
+    except Exception:
+        return _empty()
+
+    data = _empty()
+
+    # Handle v0 flat format
+    if "hosts" not in old:
+        api_url    = old.get("api_url") or settings.local_api_url
+        api_key    = old.get("api_key") or settings.local_api_key
+        model_name = old.get("model")   or settings.local_model
+        if not api_url:
+            return data
+        host_id = secrets.token_hex(4)
+        old = {
+            "hosts": [{"id": host_id, "label": "Local Model Server", "api_url": api_url, "api_key": api_key}],
+            "models": [{"id": secrets.token_hex(4), "host_id": host_id, "label": model_name, "model_name": model_name}] if model_name else [],
+            "active_model_id": None,
+        }
+        if old["models"]:
+            old["active_model_id"] = old["models"][0]["id"]
+
+    data["hosts"] = old.get("hosts", [])
+
+    for m in old.get("models", []):
+        data["models"].append({
+            "id":         m["id"],
+            "type":       "local_openai",
+            "label":      m.get("label") or m.get("model_name", ""),
+            "model_name": m.get("model_name", ""),
+            "host_id":    m.get("host_id"),
+            "context_k":  0,
+            "tags":       [],
+        })
+
+    # Build initial role assignments
+    active_id = old.get("active_model_id")
+    distill_type = settings.distill_backend_mid or None
+
+    roles: dict[str, dict] = {}
+    if active_id and any(m["id"] == active_id for m in data["models"]):
+        roles["chat"] = {"primary": active_id}
+
+    if distill_type == "local" and active_id:
+        roles["distill"] = {"primary": active_id}
+
+    data["roles"] = roles
+    return data
+
+
+# ── Model resolution ──────────────────────────────────────────────────────────
+
+def _resolve_model(registry: dict, model_id: str) -> dict | None:
+    """Resolve a model_id to its full config dict, or None if not found."""
+    builtins = _builtins()
+
+    # Built-in IDs take priority over user-defined entries with the same ID
+    if model_id in builtins:
+        return dict(builtins[model_id])
+
+    model = next((m for m in registry.get("models", []) if m["id"] == model_id), None)
+    if not model:
+        return None
+
+    if model.get("type") == "local_openai":
+        host_id = model.get("host_id")
+        host = next((h for h in registry.get("hosts", []) if h["id"] == host_id), None)
+        if not host:
+            logger.warning("model %s references missing host_id %s", model_id, host_id)
+            return None
+        return {
+            **model,
+            "api_url":   host.get("api_url", ""),
+            "api_key":   host.get("api_key", ""),
+            "host_type": host.get("host_type", "openwebui"),
+        }
+
+    return dict(model)
+
+
+def get_model_for_role(username: str, role: str) -> dict | None:
+    """
+    Return the resolved model config for the given role.
+
+    Resolution order:
+      1. User registry: roles[role].primary → backup_1 → ... → backup_4
+      2. .env: ROLE_<ROLE> = builtin model ID
+      3. Hardcoded last-resort default per role
+      4. claude_cli (absolute fallback)
+    """
+    registry = _load(username)
+    role_cfg = registry.get("roles", {}).get(role, {})
+
+    for key in PRIORITY_KEYS:
+        model_id = role_cfg.get(key)
+        if not model_id:
+            continue
+        resolved = _resolve_model(registry, model_id)
+        if resolved:
+            return resolved
+        logger.debug("role %s.%s = %s but model not found", role, key, model_id)
+
+    # .env default
+    env_type = settings.get_role_default(role)
+    builtins = _builtins()
+    if env_type and env_type in builtins:
+        return dict(builtins[env_type])
+
+    # Hardcoded last resort
+    fallback_id = _ROLE_LAST_RESORT.get(role, "claude_cli")
+    return dict(builtins.get(fallback_id, builtins["claude_cli"]))
+
+
+def get_best_local_model(username: str, role: str = "chat") -> dict | None:
+    """
+    Return the best available local_openai model for the given role.
+    Used when the user explicitly selects "local" backend in the UI.
+    Tries the role's priority chain first, then any configured local model.
+    """
+    registry = _load(username)
+    role_cfg = registry.get("roles", {}).get(role, {})
+
+    for key in PRIORITY_KEYS:
+        model_id = role_cfg.get(key)
+        if not model_id:
+            continue
+        resolved = _resolve_model(registry, model_id)
+        if resolved and resolved.get("type") == "local_openai":
+            return resolved
+
+    # Fall back to first configured local model
+    for model in registry.get("models", []):
+        if model.get("type") == "local_openai":
+            resolved = _resolve_model(registry, model["id"])
+            if resolved:
+                return resolved
+
+    return None
+
+
+# ── Read API (for UI and callers) ─────────────────────────────────────────────
+
+def get_registry(username: str) -> dict:
+    """Return the full registry (with built-in models injected for display)."""
+    return _load(username)
+
+
+def get_all_models(username: str) -> list[dict]:
+    """Return all user-defined models (resolved — hosts merged in)."""
+    registry = _load(username)
+    out = []
+    for m in registry.get("models", []):
+        resolved = _resolve_model(registry, m["id"])
+        if resolved:
+            out.append(resolved)
+    return out
+
+
+def get_defined_roles(username: str) -> dict[str, dict]:
+    """Return the roles section of the registry, filling gaps with empty dicts."""
+    registry = _load(username)
+    roles = registry.get("roles", {})
+    result = {}
+    for role in settings.get_defined_roles():
+        result[role] = roles.get(role, {})
+    return result
+
+
+# ── Write API (CRUD) ──────────────────────────────────────────────────────────
+
+def save_host(username: str, host_id: str | None,
+              label: str, api_url: str, api_key: str,
+              host_type: str = "openwebui") -> str:
+    """Create or update a host. Returns the host ID.
+
+    host_type: "openwebui" (default) or "openai" (OpenRouter, LiteLLM, etc.)
+    """
+    data = _load(username)
+    host_type = host_type if host_type in ("openwebui", "openai") else "openwebui"
+
+    if host_id:
+        for h in data["hosts"]:
+            if h["id"] == host_id:
+                h["label"]     = label.strip()
+                h["api_url"]   = api_url.strip()
+                h["host_type"] = host_type
+                if api_key.strip():
+                    h["api_key"] = api_key.strip()
+                _save(username, data)
+                return host_id
+        host_id = None  # not found — create new
+
+    host_id = secrets.token_hex(4)
+    data["hosts"].append({
+        "id":        host_id,
+        "label":     label.strip(),
+        "api_url":   api_url.strip(),
+        "api_key":   api_key.strip(),
+        "host_type": host_type,
+    })
+    _save(username, data)
+    return host_id
+
+
+def remove_host(username: str, host_id: str) -> bool:
+    """Remove a host and all models that reference it. Returns True if found."""
+    data = _load(username)
+    before = len(data["hosts"])
+    data["hosts"] = [h for h in data["hosts"] if h["id"] != host_id]
+    data["models"] = [m for m in data["models"] if m.get("host_id") != host_id]
+    # Clear any role assignments that pointed to removed models
+    removed_ids = {m["id"] for m in data["models"] if m.get("host_id") == host_id}
+    for role_cfg in data.get("roles", {}).values():
+        for key in PRIORITY_KEYS:
+            if role_cfg.get(key) in removed_ids:
+                role_cfg[key] = None
+    _save(username, data)
+    return len(data["hosts"]) < before
+
+
+def save_model(username: str, model_id: str | None, host_id: str,
+               label: str, model_name: str, context_k: int = 0,
+               tags: list[str] | None = None) -> str:
+    """Create or update a model entry. Returns the model ID."""
+    data = _load(username)
+    tags = tags or []
+
+    if model_id:
+        for m in data["models"]:
+            if m["id"] == model_id:
+                m["host_id"]    = host_id
+                m["label"]      = label.strip() or model_name.strip()
+                m["model_name"] = model_name.strip()
+                m["context_k"]  = context_k
+                m["tags"]       = tags
+                _save(username, data)
+                return model_id
+        model_id = None
+
+    model_id = secrets.token_hex(4)
+    data["models"].append({
+        "id":         model_id,
+        "type":       "local_openai",
+        "label":      label.strip() or model_name.strip(),
+        "model_name": model_name.strip(),
+        "host_id":    host_id,
+        "context_k":  context_k,
+        "tags":       tags,
+    })
+    _save(username, data)
+    return model_id
+
+
+def remove_model(username: str, model_id: str) -> bool:
+    """Remove a model and clear any role assignments pointing to it."""
+    data = _load(username)
+    before = len(data["models"])
+    data["models"] = [m for m in data["models"] if m["id"] != model_id]
+
+    for role_cfg in data.get("roles", {}).values():
+        for key in PRIORITY_KEYS:
+            if role_cfg.get(key) == model_id:
+                role_cfg[key] = None
+
+    _save(username, data)
+    return len(data["models"]) < before
+
+
+def set_role(username: str, role: str, priority: str, model_id: str | None) -> bool:
+    """
+    Assign a model to a role priority slot.
+
+    priority must be one of: primary, backup_1, backup_2, backup_3, backup_4
+    model_id None clears the slot.
+    model_id "claude_cli" / "gemini_cli" / "gemini_api" are valid built-in IDs.
+    Returns False if model_id is set but not found.
+    """
+    if priority not in PRIORITY_KEYS:
+        return False
+
+    data = _load(username)
+
+    if model_id and model_id not in _builtins():
+        if not any(m["id"] == model_id for m in data["models"]):
+            return False
+
+    roles = data.setdefault("roles", {})
+    if role not in roles:
+        roles[role] = {}
+    roles[role][priority] = model_id or None
+
+    _save(username, data)
+    return True
+
+
+def fetch_models_from_host(api_url: str, api_key: str) -> list[str]:
+    """Synchronously fetch the model list from an OpenAI-compatible host."""
+    import httpx
+    url = api_url.rstrip("/") + "/api/models"
+    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
+    resp = httpx.get(url, headers=headers, timeout=10)
+    resp.raise_for_status()
+    data = resp.json()
+    models = data.get("data", [])
+    return sorted(m.get("id", m.get("name", "")) for m in models if m.get("id") or m.get("name"))
--- a/cortex/notification.py
+++ b/cortex/notification.py
@@ -0,0 +1,106 @@
+"""
+Outbound notification helpers — send messages to user channels proactively.
+
+Channel config lives in home/{user}/channels.json.
+Each channel that supports proactive notifications needs a notification_channel
+set to its key name (e.g. "nextcloud", "google_chat") in the user's channels.json:
+  {
+    "notification_channel": "nextcloud",
+    "nextcloud": {
+      "url": "https://cloud.example.com",
+      "bot_secret": "...",
+      "notification_room": "<room-token>",
+      ...
+    }
+  }
+
+If notification_channel is absent, defaults to "nextcloud" if configured.
+If notification_room (for NCT) is absent, notifications are silently skipped.
+"""
+import hashlib
+import hmac
+import json
+import logging
+import secrets
+
+import httpx
+
+logger = logging.getLogger(__name__)
+
+
+async def _send_nct_message(url: str, secret: str, room: str, message: str) -> None:
+    """Post a message to a Nextcloud Talk room as the bot."""
+    endpoint = f"{url}/ocs/v2.php/apps/spreed/api/v1/bot/{room}/message"
+    random_str = secrets.token_hex(32)
+    sig = hmac.new(
+        secret.encode(),
+        (random_str + message).encode("utf-8"),
+        hashlib.sha256,
+    ).hexdigest()
+    body = json.dumps({"message": message}, ensure_ascii=False).encode("utf-8")
+
+    try:
+        async with httpx.AsyncClient() as client:
+            resp = await client.post(
+                endpoint,
+                content=body,
+                headers={
+                    "Content-Type": "application/json",
+                    "OCS-APIRequest": "true",
+                    "X-Nextcloud-Talk-Bot-Random": random_str,
+                    "X-Nextcloud-Talk-Bot-Signature": sig,
+                },
+                timeout=15,
+            )
+        if resp.status_code not in (200, 201):
+            logger.warning("notify NCT %s → HTTP %d: %s", room, resp.status_code, resp.text[:200])
+        else:
+            logger.info("notify NCT → %s (%d chars)", room, len(message))
+    except Exception as e:
+        logger.error("notify NCT error: %s", e)
+
+
+async def _notify_nct(nct: dict, message: str, username: str) -> None:
+    room   = nct.get("notification_room", "").strip()
+    url    = nct.get("url", "").rstrip("/")
+    secret = nct.get("bot_secret", "")
+    if not room:
+        logger.debug("notify: NCT notification_room not set for %s — skipping", username)
+        return
+    if not url or not secret:
+        logger.warning("notify: NCT config incomplete for %s (missing url or secret)", username)
+        return
+    await _send_nct_message(url, secret, room, message)
+
+
+async def notify(username: str, message: str, channel: str | None = None) -> None:
+    """Send a notification to the user's preferred outbound channel.
+
+    Channel resolution order:
+      1. `channel` parameter if provided
+      2. `notification_channel` key in channels.json
+      3. "nextcloud" if configured
+      4. Silent no-op
+
+    To configure: set `notification_channel` in home/{user}/channels.json.
+    For NCT: also set `notification_room` in the nextcloud section.
+    """
+    from auth_utils import get_user_channels
+    channels = get_user_channels(username)
+
+    target = channel or channels.get("notification_channel", "").strip()
+    if not target:
+        # Auto-detect: use nextcloud if configured
+        if "nextcloud" in channels:
+            target = "nextcloud"
+        else:
+            return
+
+    if target == "nextcloud":
+        nct = channels.get("nextcloud")
+        if not nct:
+            logger.debug("notify: nextcloud not configured for %s", username)
+            return
+        await _notify_nct(nct, message, username)
+    else:
+        logger.debug("notify: channel %r not yet supported for outbound (user %s)", target, username)
--- a/cortex/orchestrator_engine.py
+++ b/cortex/orchestrator_engine.py
@@ -56,6 +56,7 @@ async def run(
    system_prompt: str = "",
    session_messages: list[dict] | None = None,
    respond_with_claude: bool = True,
+    gemini_api_key: str | None = None,
 ) -> OrchestratorResult:
    """
    Run the full orchestration loop for a task.
@@ -66,17 +67,19 @@ async def run(
        session_messages:   Prior conversation history for session continuity
        respond_with_claude: If False, return Gemini's summary as the response (useful for
                             background/cron tasks where a polished reply isn't needed)
+        gemini_api_key:     Per-user Gemini API key (falls back to GEMINI_API_KEY in .env)

    Returns:
        OrchestratorResult with response, tool call log, backend used, and Gemini summary
    """
-    if not settings.gemini_api_key:
+    api_key = gemini_api_key or settings.gemini_api_key
+    if not api_key:
        raise RuntimeError(
-            "GEMINI_API_KEY not set — orchestrator requires Gemini API. "
-            "Get a free key at https://aistudio.google.com/apikey and add it to .env"
+            "No Gemini API key available — set GEMINI_API_KEY in .env or add a personal key "
+            "via: manage_passwords.py gemini-key <username> <key>"
        )

-    client = genai.Client(api_key=settings.gemini_api_key)
+    client = genai.Client(api_key=api_key)

    # Seed Gemini with the task — include recent session context if available
    task_with_context = _build_task_prompt(task, session_messages)
--- a/cortex/persona_template.py
+++ b/cortex/persona_template.py
@@ -135,6 +135,27 @@ def _protocols(display_name: str) -> str:

 ---

+## Tools & Modes
+
+Cortex has two chat modes. Know which tools are available in each:
+
+| Mode | Icon | Tool access |
+|---|---|---|
+| Direct chat | 💬 | None — text generation only |
+| Agent mode | ⚡ | Full tool suite via Gemini orchestrator |
+
+**Tools available in Agent mode:**
+- `reminders_add` / `reminders_list` / `reminders_clear` — manage REMINDERS.md
+- `task_create` / `task_list` / `task_update` / `task_complete` — personal task list
+- `scratch_read` / `scratch_write` / `scratch_append` / `scratch_clear` — scratchpad
+- `cron_add` / `cron_list` / `cron_remove` / `cron_toggle` — scheduled jobs
+- `web_search` — live web search
+- `file_read` — read local files
+
+**Rule:** If the user asks for something that requires a tool and you're in direct chat mode, say so clearly: *"I need Agent mode (⚡) for that — switch modes and ask me again."* Do not attempt workarounds or pretend the action was taken.
+
+---
+
 ## Memory

 - Long-term memory lives in MEMORY_LONG.md (auto-distilled monthly).
--- a/cortex/requirements.txt
+++ b/cortex/requirements.txt
@@ -16,5 +16,8 @@ bcrypt>=4.0.0
 PyJWT>=2.8.0
 python-multipart>=0.0.9   # required by FastAPI for Form() data

+# Async HTTP client — used for local OpenAI-compatible backend (Open WebUI / Ollama)
+httpx>=0.27.0
+
 # anthropic SDK not needed — using claude CLI subprocess for auth
 # anthropic>=0.40.0
--- a/cortex/routers/auth.py
+++ b/cortex/routers/auth.py
@@ -13,6 +13,7 @@ import logging
 from datetime import datetime, timezone
 from pathlib import Path
 from fastapi import APIRouter
+from config import settings

 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/auth")
@@ -71,9 +72,39 @@ def _gemini_status() -> dict:
        return {"ok": False, "error": str(e), "warning": True, "authenticated": False}


+async def _local_status(username: str = "scott") -> dict:
+    """Check reachability of the user's configured local model host."""
+    import model_registry
+    cfg = model_registry.get_best_local_model(username)
+    if not cfg:
+        return {"configured": False}
+    api_url = cfg.get("api_url", "")
+    if not api_url:
+        return {"configured": False}
+    try:
+        import httpx
+        url = api_url.rstrip("/") + "/api/models"
+        headers = {}
+        api_key = cfg.get("api_key", "")
+        if api_key:
+            headers["Authorization"] = f"Bearer {api_key}"
+        async with httpx.AsyncClient(timeout=5) as client:
+            resp = await client.get(url, headers=headers)
+        reachable = resp.status_code < 400
+        return {
+            "configured": True,
+            "reachable": reachable,
+            "model": cfg.get("model_name", ""),
+            "label": cfg.get("label", ""),
+        }
+    except Exception as e:
+        return {"configured": True, "reachable": False, "error": str(e), "model": cfg.get("model_name", "")}
+
+
@router.get("/status")
 async def auth_status() -> dict:
    return {
        "claude": _claude_status(),
        "gemini": _gemini_status(),
+        "local": await _local_status(),
    }
--- a/cortex/routers/auth_google.py
+++ b/cortex/routers/auth_google.py
@@ -0,0 +1,205 @@
+"""
+Google OAuth 2.0 sign-in.
+
+Flow:
+  1. GET /auth/google          → redirect to Google's consent page
+  2. GET /auth/google/callback → exchange code, look up user, set JWT cookie
+
+Users must be pre-registered by Scott before they can sign in:
+  cd cortex && .venv/bin/python manage_passwords.py google-add <username> <email>
+
+Routes are public (added to _PUBLIC_PREFIXES in auth_middleware.py).
+"""
+
+import json
+import logging
+import secrets
+import urllib.parse
+import urllib.request
+
+from fastapi import APIRouter, Request
+from fastapi.responses import HTMLResponse, RedirectResponse, Response
+
+from auth_utils import COOKIE_NAME, create_token, find_user_by_google, link_google
+from config import settings
+from persona import list_user_personas
+
+logger = logging.getLogger(__name__)
+router = APIRouter()
+
+_GOOGLE_AUTH_URL  = "https://accounts.google.com/o/oauth2/v2/auth"
+_GOOGLE_TOKEN_URL = "https://oauth2.googleapis.com/token"
+_GOOGLE_USERINFO  = "https://openidconnect.googleapis.com/v1/userinfo"
+_STATE_COOKIE     = "oauth_state"
+_STATE_MAX_AGE    = 600  # 10 minutes — plenty of time to complete the flow
+
+
+@router.get("/auth/google", include_in_schema=False)
+async def google_login():
+    if not settings.google_client_id:
+        return HTMLResponse("Google sign-in is not configured on this server.", status_code=503)
+
+    state = secrets.token_urlsafe(16)
+    params = urllib.parse.urlencode({
+        "client_id":     settings.google_client_id,
+        "redirect_uri":  f"{settings.cortex_base_url}/auth/google/callback",
+        "response_type": "code",
+        "scope":         "openid email profile",
+        "state":         state,
+        "access_type":   "online",
+        "prompt":        "select_account",
+    })
+
+    resp = RedirectResponse(f"{_GOOGLE_AUTH_URL}?{params}", status_code=302)
+    resp.set_cookie(_STATE_COOKIE, state, max_age=_STATE_MAX_AGE, httponly=True, samesite="lax")
+    return resp
+
+
+@router.get("/auth/google/callback", include_in_schema=False)
+async def google_callback(
+    request: Request,
+    code: str = "",
+    state: str = "",
+    error: str = "",
+):
+    if error:
+        return _error_page(f"Google sign-in was cancelled or denied: {error}")
+
+    if not code:
+        return _error_page("No authorisation code returned by Google.")
+
+    # CSRF check — state must match what we stored in the cookie
+    stored_state = request.cookies.get(_STATE_COOKIE)
+    if not stored_state or stored_state != state:
+        return _error_page("State mismatch — please try signing in again.")
+
+    # Exchange authorisation code for tokens
+    try:
+        token_data = _exchange_code(code)
+    except Exception as e:
+        logger.error("Google token exchange failed: %s", e)
+        return _error_page("Could not complete sign-in with Google. Please try again.")
+
+    access_token = token_data.get("access_token")
+    if not access_token:
+        return _error_page("No access token returned by Google.")
+
+    # Fetch the user's profile
+    try:
+        userinfo = _get_userinfo(access_token)
+    except Exception as e:
+        logger.error("Google userinfo fetch failed: %s", e)
+        return _error_page("Could not retrieve your Google profile. Please try again.")
+
+    google_sub   = userinfo.get("sub", "")
+    google_email = userinfo.get("email", "")
+
+    if not google_sub or not google_email:
+        return _error_page("Your Google account didn't return a usable email address.")
+
+    # Match to a Cortex user
+    username = find_user_by_google(google_sub, google_email)
+    if not username:
+        logger.warning("Google sign-in rejected: no account for %s (%s)", google_sub, google_email)
+        return _error_page(
+            f"Your Google account (<strong>{google_email}</strong>) isn't registered with Cortex.<br><br>"
+            "Contact Scott to get access."
+        )
+
+    # Persist the stable sub so future lookups use it (not just email)
+    link_google(username, google_sub, google_email)
+
+    personas = list_user_personas(username)
+    if not personas:
+        return _error_page("No personas are configured for your account yet. Contact Scott.")
+
+    logger.info("Google sign-in: %s (%s)", username, google_email)
+    resp = RedirectResponse(f"/{username}/{personas[0]}", status_code=302)
+    _set_session_cookie(resp, username)
+    resp.delete_cookie(_STATE_COOKIE)
+    return resp
+
+
+# ---------------------------------------------------------------------------
+# Private helpers
+# ---------------------------------------------------------------------------
+
+def _exchange_code(code: str) -> dict:
+    body = urllib.parse.urlencode({
+        "code":          code,
+        "client_id":     settings.google_client_id,
+        "client_secret": settings.google_client_secret,
+        "redirect_uri":  f"{settings.cortex_base_url}/auth/google/callback",
+        "grant_type":    "authorization_code",
+    }).encode()
+    req = urllib.request.Request(
+        _GOOGLE_TOKEN_URL,
+        data=body,
+        headers={"Content-Type": "application/x-www-form-urlencoded"},
+        method="POST",
+    )
+    with urllib.request.urlopen(req, timeout=10) as resp:
+        return json.loads(resp.read())
+
+
+def _get_userinfo(access_token: str) -> dict:
+    req = urllib.request.Request(
+        _GOOGLE_USERINFO,
+        headers={"Authorization": f"Bearer {access_token}"},
+    )
+    with urllib.request.urlopen(req, timeout=10) as resp:
+        return json.loads(resp.read())
+
+
+def _set_session_cookie(response: Response, username: str) -> None:
+    token = create_token(username)
+    response.set_cookie(
+        COOKIE_NAME,
+        token,
+        max_age=settings.jwt_expire_days * 86400,
+        httponly=True,
+        samesite="lax",
+        secure=False,  # set True if terminating TLS at the app layer (not behind a proxy)
+    )
+
+
+def _error_page(message: str) -> HTMLResponse:
+    html = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Cortex — Sign In Failed</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@100..900&display=swap" rel="stylesheet">
+  <style>
+    *, *::before, *::after {{ box-sizing: border-box; margin: 0; padding: 0; }}
+    body {{
+      min-height: 100vh; display: flex; align-items: center; justify-content: center;
+      background: #0f1117; font-family: 'Inter', system-ui; font-weight: 450;
+      -webkit-font-smoothing: antialiased; color: #e2e8f0;
+    }}
+    .card {{
+      background: #1a1d27; border: 1px solid #2d3148; border-radius: 12px;
+      padding: 2.5rem 2rem; width: 100%; max-width: 420px; text-align: center;
+    }}
+    h1 {{ font-size: 1.25rem; font-weight: 700; color: #f87171; margin-bottom: 1rem; }}
+    p {{ font-size: 0.9rem; color: #94a3b8; margin-bottom: 1.75rem; line-height: 1.65; }}
+    a {{
+      display: inline-block; padding: 0.6rem 1.5rem;
+      background: #7c3aed; border-radius: 6px; color: #fff;
+      text-decoration: none; font-size: 0.9rem; font-weight: 600;
+      transition: background 0.15s;
+    }}
+    a:hover {{ background: #6d28d9; }}
+  </style>
+</head>
+<body>
+  <div class="card">
+    <h1>Sign In Failed</h1>
+    <p>{message}</p>
+    <a href="/login">← Back to Sign In</a>
+  </div>
+</body>
+</html>"""
+    return HTMLResponse(html, status_code=403)
--- a/cortex/routers/chat.py
+++ b/cortex/routers/chat.py
@@ -1,6 +1,7 @@
 import asyncio
 import json
-from fastapi import APIRouter, HTTPException, Query
+import jwt
+from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import StreamingResponse
 from pydantic import BaseModel
 from context_loader import load_context
@@ -9,12 +10,28 @@ from session_logger import log_turn
 from session_store import load as load_session, save as save_session, list_all, generate_session_id, delete as delete_session, rename as rename_session
 from config import settings
 from persona import set_context, validate as validate_persona
+from auth_utils import COOKIE_NAME, decode_token
+import model_registry
 import event_bus


 router = APIRouter()


+def _backend_label(backend: str, username: str) -> str:
+    """Human-readable label for the model that handled a request."""
+    if backend == "claude":
+        return "Claude"
+    if backend == "gemini":
+        return "Gemini"
+    if backend == "local":
+        cfg = model_registry.get_best_local_model(username)
+        if cfg:
+            return cfg.get("label") or cfg.get("model_name") or "Local"
+        return "Local"
+    return backend.title()
+
+
 class ChatRequest(BaseModel):
    message: str
    session_id: str | None = None
@@ -29,7 +46,7 @@ class ChatRequest(BaseModel):


 class BackendRequest(BaseModel):
-    primary: str  # "claude" or "gemini"
+    primary: str  # "claude", "gemini", or "local"


 class NoteRequest(BaseModel):
@@ -102,6 +119,7 @@ async def _stream_chat(req: ChatRequest):
                "response": response_text,
                "session_id": session_id,
                "backend": actual_backend,
+                "backend_label": _backend_label(actual_backend, user),
                "fallback_used": actual_backend != requested,
            }
            yield f"data: {json.dumps(payload)}\n\n"
@@ -130,19 +148,45 @@ async def chat(req: ChatRequest) -> StreamingResponse:
    )


+_BACKEND_CYCLE = ("claude", "gemini", "local")
+_BACKEND_FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"}
+
+
+def _local_model_info(request: Request) -> dict | None:
+    """Return the best local model {label, model_name} for the session user, or None."""
+    try:
+        token    = request.cookies.get(COOKIE_NAME)
+        username = decode_token(token) if token else None
+        if not username:
+            return None
+        cfg = model_registry.get_best_local_model(username, "chat")
+        if cfg:
+            return {"label": cfg.get("label", ""), "model_name": cfg.get("model_name", "")}
+    except (jwt.InvalidTokenError, Exception):
+        pass
+    return None
+
+
@router.get("/backend")
-async def get_backend() -> dict:
-    other = "gemini" if settings.primary_backend == "claude" else "claude"
-    return {"primary": settings.primary_backend, "fallback": other}
+async def get_backend(request: Request) -> dict:
+    p = settings.primary_backend
+    return {
+        "primary":      p,
+        "fallback":     _BACKEND_FALLBACK.get(p, "claude"),
+        "local_model":  _local_model_info(request),
+    }


@router.post("/backend")
-async def set_backend(req: BackendRequest) -> dict:
-    if req.primary not in ("claude", "gemini"):
-        raise HTTPException(status_code=400, detail="primary must be 'claude' or 'gemini'")
+async def set_backend(req: BackendRequest, request: Request) -> dict:
+    if req.primary not in _BACKEND_CYCLE:
+        raise HTTPException(status_code=400, detail="primary must be 'claude', 'gemini', or 'local'")
    settings.primary_backend = req.primary
-    other = "gemini" if req.primary == "claude" else "claude"
-    return {"primary": settings.primary_backend, "fallback": other}
+    return {
+        "primary":     req.primary,
+        "fallback":    _BACKEND_FALLBACK[req.primary],
+        "local_model": _local_model_info(request),
+    }


 def _set_ctx(user: str, persona: str) -> None:
--- a/cortex/routers/files.py
+++ b/cortex/routers/files.py
@@ -1,7 +1,8 @@
 """
-Read/write the Inara identity markdown files.
+Read/write Inara identity markdown files, and search past session logs.
 Only whitelisted filenames are accessible — no path traversal possible.
 """
+import re
 from fastapi import APIRouter, HTTPException, Query
 from pydantic import BaseModel
 from persona import persona_path, set_context, validate as validate_persona
@@ -47,10 +48,12 @@ async def list_files(
    files = []
    for name in sorted(ALLOWED):
        p = persona_dir / name
+        st = p.stat() if p.exists() else None
        files.append({
            "name": name,
            "exists": p.exists(),
-            "size": p.stat().st_size if p.exists() else 0,
+            "size": st.st_size if st else 0,
+            "modified": st.st_mtime if st else None,
        })
    return {"files": files}

@@ -83,3 +86,59 @@ async def save_file(
    p = _path(filename)
    p.write_text(req.content)
    return {"ok": True, "name": filename, "size": len(req.content)}
+
+
+# ── Session search ────────────────────────────────────────────────────────────
+
+_CONTEXT_CHARS = 120  # chars of context to include around each match
+
+
+@router.get("/sessions/search")
+async def search_sessions(
+    q: str = Query(..., min_length=2),
+    user: str = Query("scott"),
+    persona: str = Query("inara"),
+    limit: int = Query(20, ge=1, le=100),
+) -> dict:
+    """Full-text search across past session logs.
+
+    Returns up to `limit` matches, newest sessions first.
+    Each match includes a short excerpt (120 chars before/after) for context.
+    """
+    _resolve(user, persona)
+    sessions_dir = persona_path() / "sessions"
+    if not sessions_dir.exists():
+        return {"query": q, "matches": [], "total_files_searched": 0}
+
+    pattern = re.compile(re.escape(q), re.IGNORECASE)
+    session_files = sorted(sessions_dir.glob("*.md"), reverse=True)  # newest first
+
+    matches = []
+    for sf in session_files:
+        if len(matches) >= limit:
+            break
+        try:
+            text = sf.read_text()
+        except OSError:
+            continue
+        for m in pattern.finditer(text):
+            if len(matches) >= limit:
+                break
+            start = max(0, m.start() - _CONTEXT_CHARS)
+            end   = min(len(text), m.end() + _CONTEXT_CHARS)
+            excerpt = text[start:end].strip()
+            # Prefix with ellipsis if we truncated the left side
+            if start > 0:
+                excerpt = "…" + excerpt
+            if end < len(text):
+                excerpt = excerpt + "…"
+            matches.append({
+                "date":    sf.stem,          # YYYY-MM-DD
+                "excerpt": excerpt,
+            })
+
+    return {
+        "query":               q,
+        "matches":             matches,
+        "total_files_searched": len(session_files),
+    }
--- a/cortex/routers/google_chat.py
+++ b/cortex/routers/google_chat.py
@@ -3,14 +3,16 @@ import logging
 from fastapi import APIRouter, HTTPException, Request, Response
 from google.auth.transport import requests as google_requests
 from google.oauth2 import id_token
+from auth_utils import get_user_channels
 from context_loader import load_context
 from llm_client import complete
+from persona import set_context
 from session_logger import log_turn
 from session_store import load as load_session, save as save_session
 from config import settings

 logger = logging.getLogger(__name__)
-router = APIRouter(prefix="/channels/google-chat")
+router = APIRouter()

 # Workspace Add-on Chat apps: JWT is issued by accounts.google.com.
 # (Legacy standalone Chat bots used chat@system.gserviceaccount.com — different format.)
@@ -35,7 +37,7 @@ def _msg(text: str) -> dict:
    }


-def _verify_system_id_token(token: str) -> None:
+def _verify_system_id_token(token: str, audience: str) -> None:
    """Verify the systemIdToken from authorizationEventObject.

    For Workspace Add-on Chat apps Google sends the token in the request body
@@ -44,13 +46,13 @@ def _verify_system_id_token(token: str) -> None:

    Claims verified:
      iss  = "https://accounts.google.com"
-      aud  = settings.google_chat_audience (the endpoint URL)
+      aud  = the per-user audience from channels.json (the endpoint URL)
    """
    try:
        claims = id_token.verify_oauth2_token(
            token,
            google_requests.Request(),
-            audience=settings.google_chat_audience,
+            audience=audience,
        )
    except Exception as exc:
        logger.warning("Google Chat JWT verification failed: %s", exc)
@@ -60,17 +62,30 @@ def _verify_system_id_token(token: str) -> None:
        raise HTTPException(status_code=401, detail="Wrong issuer")


-@router.post("")
-async def receive(request: Request):
+@router.post("/channels/google-chat/{username}")
+async def receive(username: str, request: Request):
+    channels = get_user_channels(username)
+    cfg = channels.get("google_chat")
+    if not cfg:
+        logger.warning("Google Chat: no channel config for user %r", username)
+        raise HTTPException(status_code=404, detail="Channel not configured for this user")
+
+    persona_name = cfg.get("persona", "inara")
+    audience     = cfg.get("audience", "")
+    backend      = cfg.get("backend", settings.primary_backend)
+    timeout      = cfg.get("timeout", 25)
+
+    set_context(username, persona_name)
+
    body = await request.json()

    # Verify the systemIdToken embedded in the request body
-    if settings.google_chat_audience:
+    if audience:
        token = body.get("authorizationEventObject", {}).get("systemIdToken", "")
        if not token:
-            logger.warning("Google Chat: missing systemIdToken")
+            logger.warning("Google Chat: missing systemIdToken for %s", username)
            raise HTTPException(status_code=401, detail="Missing token")
-        _verify_system_id_token(token)
+        _verify_system_id_token(token, audience)

    chat = body.get("chat", {})

@@ -79,8 +94,8 @@ async def receive(request: Request):
    if "addedToSpacePayload" in chat:
        space_type = chat["addedToSpacePayload"].get("space", {}).get("type", "")
        if space_type == "DM":
-            return _msg(f"✨ Hello! I'm {settings.agent_name}. What can I help you with?")
-        return _msg(f"✨ Hello! I'm {settings.agent_name}. Send me a message and I'll do my best to help.")
+            return _msg(f"✨ Hello! I'm {persona_name.capitalize()}. What can I help you with?")
+        return _msg(f"✨ Hello! I'm {persona_name.capitalize()}. Send me a message and I'll do my best to help.")

    if "removedFromSpacePayload" in chat:
        return Response(status_code=200)
@@ -89,10 +104,10 @@ async def receive(request: Request):
        logger.info("Google Chat: unhandled event keys: %s", list(chat.keys()))
        return Response(status_code=200)

-    payload       = chat["messagePayload"]
-    message       = payload.get("message", {})
-    space         = payload.get("space", {})
-    user          = chat.get("user", {})
+    payload        = chat["messagePayload"]
+    message        = payload.get("message", {})
+    space          = payload.get("space", {})
+    user           = chat.get("user", {})

    # argumentText strips @BotName mentions in Spaces; fall back to full text in DMs
    user_text      = (message.get("argumentText") or message.get("text", "")).strip()
@@ -107,7 +122,7 @@ async def receive(request: Request):
        logger.warning("Google Chat: empty user_text, ignoring")
        return Response(status_code=200)

-    session_id    = "gc_" + space_name.replace("/", "_")
+    session_id    = f"gc_{username}_{space_name.replace('/', '_')}"
    system_prompt = load_context(settings.default_tier)
    history       = load_session(session_id)
    history.append({"role": "user", "content": user_text})
@@ -117,9 +132,9 @@ async def receive(request: Request):
            complete(
                system_prompt=system_prompt,
                messages=history,
-                model=settings.google_chat_backend,
+                model=backend,
            ),
-            timeout=settings.google_chat_timeout,
+            timeout=timeout,
        )
    except asyncio.TimeoutError:
        logger.warning("Google Chat request timed out for session %s", session_id)
--- a/cortex/routers/help.py
+++ b/cortex/routers/help.py
@@ -32,13 +32,17 @@ def _get_session_user(request: Request) -> str | None:


@router.get("/help", include_in_schema=False)
-async def help_page(request: Request):
+async def help_page(request: Request, persona: str = ""):
    username = _get_session_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)

    personas = list_user_personas(username)
-    back_persona = personas[0] if personas else ""
+    # Use persona from query param if valid, else fall back to first
+    if persona and persona in personas:
+        back_persona = persona
+    else:
+        back_persona = personas[0] if personas else ""
    back_href = f"/{username}/{back_persona}" if back_persona else "/"

    html = (_STATIC / "help.html").read_text()
--- a/cortex/routers/local_llm.py
+++ b/cortex/routers/local_llm.py
@@ -0,0 +1,341 @@
+"""
+Model Registry settings — hosts, models, and role assignments.
+
+Routes:
+  GET  /settings/local                        → settings page
+  POST /settings/local/host                   → save/create a host
+  POST /settings/local/host/{id}/remove       → remove a host (and its models)
+  POST /settings/local/models/add             → add a model entry
+  POST /settings/local/models/{id}/remove     → remove a model
+  POST /api/models/role                       → AJAX: set a role assignment
+  GET  /api/local-llm/fetch-models            → proxy to host /api/models (JSON)
+"""
+import logging
+from pathlib import Path
+
+import httpx
+import jwt
+from fastapi import APIRouter, Form, Request
+from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
+
+from auth_utils import COOKIE_NAME, decode_token
+from config import settings as app_settings
+import model_registry as reg
+
+logger = logging.getLogger(__name__)
+router = APIRouter()
+
+_STATIC = Path(__file__).parent.parent / "static"
+
+
+# ── Auth helper ───────────────────────────────────────────────────────────────
+
+def _get_user(request: Request) -> str | None:
+    token = request.cookies.get(COOKIE_NAME)
+    if not token:
+        return None
+    try:
+        return decode_token(token)
+    except jwt.InvalidTokenError:
+        return None
+
+
+# ── Page renderer ─────────────────────────────────────────────────────────────
+
+def _render(username: str, success: str = "", error: str = "") -> str:
+    registry = reg.get_registry(username)
+    hosts    = registry.get("hosts", [])
+    models   = registry.get("models", [])
+    roles    = registry.get("roles", {})
+    builtins = reg._builtins()
+
+    host_by_id = {h["id"]: h for h in hosts}
+
+    # ── Host rows ─────────────────────────────────────────────────────────────
+    host_rows = ""
+    for h in hosts:
+        key_hint  = f"…{h['api_key'][-4:]}" if h.get("api_key") else "not set"
+        ht        = h.get("host_type", "openwebui")
+        ow_sel    = ' selected' if ht == "openwebui" else ''
+        ai_sel    = ' selected' if ht == "openai"    else ''
+        host_rows += f'''
+        <div class="host-row">
+          <form method="POST" action="/settings/local/host" class="host-form">
+            <input type="hidden" name="host_id" value="{h["id"]}">
+            <div class="field-row">
+              <div class="field">
+                <label>Label</label>
+                <input type="text" name="label" value="{h.get("label","")}"
+                       placeholder="Home ML Laptop" autocomplete="off" data-form-type="other">
+              </div>
+              <div class="field" style="flex:2">
+                <label>API URL</label>
+                <input type="text" name="api_url" value="{h.get("api_url","")}"
+                       placeholder="http://192.168.x.x:3000"
+                       autocomplete="off" spellcheck="false" data-form-type="other">
+              </div>
+            </div>
+            <div class="field-row">
+              <div class="field">
+                <label>API Key</label>
+                <input type="password" name="api_key" placeholder="Leave blank to keep existing"
+                       autocomplete="new-password" data-1p-ignore data-lpignore="true" data-form-type="other">
+                <p class="key-status">Current: {key_hint}</p>
+              </div>
+              <div class="field" style="flex:0 0 auto">
+                <label>Type</label>
+                <select name="host_type">
+                  <option value="openwebui"{ow_sel}>Open WebUI / Ollama</option>
+                  <option value="openai"{ai_sel}>OpenAI-compatible (OpenRouter, etc.)</option>
+                </select>
+              </div>
+            </div>
+            <div class="btn-row">
+              <button type="submit" class="btn btn-secondary btn-sm">Save host</button>
+              <button type="button" class="btn btn-secondary btn-sm fetch-btn"
+                      data-host-id="{h["id"]}">Fetch models</button>
+              <span class="fetch-status" id="fetch-{h["id"]}"></span>
+            </div>
+          </form>
+          <form method="POST" action="/settings/local/host/{h["id"]}/remove"
+                onsubmit="return confirm('Remove host and all its models?')" style="margin-top:0.5rem">
+            <button type="submit" class="btn-link danger">Remove host</button>
+          </form>
+        </div>'''
+
+    if not host_rows:
+        host_rows = '<p class="empty-note">No hosts configured yet. Add one below.</p>'
+
+    # ── Host options for add-model form ───────────────────────────────────────
+    host_options = "".join(
+        f'<option value="{h["id"]}">{h.get("label") or h["api_url"]}</option>'
+        for h in hosts
+    )
+    add_model_hidden = "" if hosts else ' style="display:none"'
+
+    # ── Model rows ────────────────────────────────────────────────────────────
+    model_rows = ""
+    for m in models:
+        resolved = reg._resolve_model(registry, m["id"])
+        if not resolved:
+            continue
+        host_name = ""
+        if m.get("type") == "local_openai" and m.get("host_id"):
+            h = host_by_id.get(m["host_id"], {})
+            host_name = h.get("label") or h.get("api_url", "")
+
+        ctx_badge = f'<span class="ctx-badge">{m.get("context_k",0)}k ctx</span>' if m.get("context_k") else ""
+        tags_html = " ".join(
+            f'<span class="tag">{t}</span>' for t in (m.get("tags") or [])
+        )
+        host_html = f'<span class="model-host">{host_name}</span>' if host_name else ""
+
+        model_rows += f'''
+        <div class="model-row" id="model-{m["id"]}">
+          <div class="model-info">
+            <span class="model-label">{m.get("label") or m.get("model_name","")}</span>
+            <span class="model-name">{m.get("model_name","")}</span>
+            {host_html}{ctx_badge}
+            <div class="tag-row">{tags_html}</div>
+          </div>
+          <div class="model-actions">
+            <form method="POST" action="/settings/local/models/{m["id"]}/remove"
+                  onsubmit="return confirm('Remove this model?')" style="display:inline">
+              <button type="submit" class="row-btn danger">Remove</button>
+            </form>
+          </div>
+        </div>'''
+
+    if not model_rows:
+        model_rows = '<p class="empty-note">No models added yet.</p>'
+
+    # ── Role assignment rows ──────────────────────────────────────────────────
+    # Build option list: (none) + built-ins + user models
+    model_opts = '<option value="">— .env default —</option>\n'
+    model_opts += '<optgroup label="Built-in">\n'
+    for bid, bm in builtins.items():
+        model_opts += f'  <option value="{bid}">{bm["label"]}</option>\n'
+    model_opts += '</optgroup>\n'
+    if models:
+        model_opts += '<optgroup label="Local models">\n'
+        for m in models:
+            lbl = m.get("label") or m.get("model_name", m["id"])
+            model_opts += f'  <option value="{m["id"]}">{lbl}</option>\n'
+        model_opts += '</optgroup>\n'
+
+    role_rows = ""
+    for role in app_settings.get_defined_roles():
+        role_cfg = roles.get(role, {})
+        role_rows += f'<div class="role-row" data-role="{role}"><span class="role-name">{role.title()}</span><div class="role-slots">'
+        for slot in reg.PRIORITY_KEYS[:3]:  # primary + backup_1 + backup_2
+            current = role_cfg.get(slot) or ""
+            slot_label = slot.replace("_", " ").title()
+            sel_html = f'<select class="role-select" data-role="{role}" data-slot="{slot}" title="{slot_label}">\n{model_opts}\n</select>'
+            # Pre-select current value via JS (simpler than string-building selected attrs)
+            role_rows += f'<div class="role-slot"><span class="slot-label">{slot_label}</span>{sel_html}</div>'
+        role_rows += '</div></div>'
+
+    # JS data for pre-selecting current role values
+    import json as _json
+    role_data_js = _json.dumps({
+        role: {slot: (roles.get(role, {}).get(slot) or "") for slot in reg.PRIORITY_KEYS[:3]}
+        for role in app_settings.get_defined_roles()
+    })
+
+    html = (_STATIC / "local_llm.html").read_text()
+    html = html.replace("{{ username }}",         username)
+    html = html.replace("{{ host_rows }}",         host_rows)
+    html = html.replace("{{ model_rows }}",        model_rows)
+    html = html.replace("{{ host_options }}",      host_options)
+    html = html.replace("{{ add_model_hidden }}",  add_model_hidden)
+    html = html.replace("{{ role_rows }}",         role_rows)
+    html = html.replace("{{ role_data_js }}",      role_data_js)
+    if success:
+        html = html.replace("<!-- SUCCESS -->", f'<p class="msg success">{success}</p>')
+    if error:
+        html = html.replace("<!-- ERROR -->",   f'<p class="msg error">{error}</p>')
+    return html
+
+
+# ── Routes ────────────────────────────────────────────────────────────────────
+
+@router.get("/settings/local", include_in_schema=False)
+async def models_page(request: Request):
+    username = _get_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+    return HTMLResponse(_render(username))
+
+
+@router.post("/settings/local/host", include_in_schema=False)
+async def save_host(
+    request:   Request,
+    host_id:   str = Form(""),
+    label:     str = Form(""),
+    api_url:   str = Form(""),
+    api_key:   str = Form(""),
+    host_type: str = Form("openwebui"),
+):
+    username = _get_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+    if not api_url.strip():
+        return HTMLResponse(_render(username, error="API URL is required."))
+    reg.save_host(username, host_id or None, label, api_url, api_key, host_type)
+    logger.info("model registry host saved: %s (%s)", username, host_type)
+    return HTMLResponse(_render(username, success="Host saved."))
+
+
+@router.post("/settings/local/host/{host_id}/remove", include_in_schema=False)
+async def remove_host(request: Request, host_id: str):
+    username = _get_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+    reg.remove_host(username, host_id)
+    return HTMLResponse(_render(username, success="Host removed."))
+
+
+@router.post("/settings/local/models/add", include_in_schema=False)
+async def add_model(
+    request:    Request,
+    host_id:    str = Form(...),
+    label:      str = Form(""),
+    model_name: str = Form(...),
+    context_k:  int = Form(0),
+    tags:       str = Form(""),
+):
+    username = _get_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+    if not model_name.strip():
+        return HTMLResponse(_render(username, error="Model name is required."))
+    tag_list = [t.strip() for t in tags.split(",") if t.strip()]
+    reg.save_model(username, None, host_id, label, model_name, context_k, tag_list)
+    logger.info("model added to registry: %s / %s", username, model_name)
+    return HTMLResponse(_render(username, success=f'Model "{label or model_name}" added.'))
+
+
+@router.post("/settings/local/models/{model_id}/remove", include_in_schema=False)
+async def remove_model(request: Request, model_id: str):
+    username = _get_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+    reg.remove_model(username, model_id)
+    return HTMLResponse(_render(username, success="Model removed."))
+
+
+@router.post("/api/models/role")
+async def set_role(request: Request) -> JSONResponse:
+    """AJAX: assign a model to a role priority slot.
+
+    Body: {"role": "chat", "slot": "primary", "model_id": "abc123" | ""}
+    """
+    username = _get_user(request)
+    if not username:
+        return JSONResponse({"error": "Not authenticated"}, status_code=401)
+    try:
+        body = await request.json()
+    except Exception:
+        return JSONResponse({"error": "Invalid JSON"}, status_code=400)
+
+    role     = body.get("role", "").strip()
+    slot     = body.get("slot", "").strip()
+    model_id = body.get("model_id", "").strip() or None
+
+    if not role or not slot:
+        return JSONResponse({"error": "role and slot are required"}, status_code=400)
+
+    ok = reg.set_role(username, role, slot, model_id)
+    if not ok:
+        return JSONResponse({"error": f"Invalid slot or model_id not found"}, status_code=400)
+
+    logger.info("role set: %s %s.%s = %s", username, role, slot, model_id)
+    return JSONResponse({"ok": True})
+
+
+@router.get("/api/local-llm/fetch-models")
+async def fetch_models(request: Request, host_id: str = "") -> JSONResponse:
+    """Proxy to the host's /api/models endpoint. host_id selects which host."""
+    username = _get_user(request)
+    if not username:
+        return JSONResponse({"error": "Not authenticated"}, status_code=401)
+
+    registry = reg.get_registry(username)
+    hosts = registry.get("hosts", [])
+
+    if host_id:
+        host = next((h for h in hosts if h["id"] == host_id), None)
+    else:
+        host = hosts[0] if hosts else None
+
+    # Fall back to .env
+    if host:
+        api_url = host.get("api_url", "")
+        api_key = host.get("api_key", "")
+    else:
+        api_url = app_settings.local_api_url
+        api_key = app_settings.local_api_key
+
+    if not api_url:
+        return JSONResponse({"error": "No host configured."}, status_code=400)
+
+    host_type   = host.get("host_type", "openwebui") if host else "openwebui"
+    models_path = "/models" if host_type == "openai" else "/api/models"
+    url         = api_url.rstrip("/") + models_path
+    headers     = {"Authorization": f"Bearer {api_key}"} if api_key else {}
+
+    try:
+        async with httpx.AsyncClient(timeout=8) as client:
+            resp = await client.get(url, headers=headers)
+        resp.raise_for_status()
+        data   = resp.json()
+        models = [
+            {"id": m["id"], "name": m.get("name") or m["id"]}
+            for m in data.get("data", [])
+        ]
+        models.sort(key=lambda m: m["name"].lower())
+        return JSONResponse({"models": models})
+    except httpx.HTTPStatusError as e:
+        return JSONResponse({"error": f"Host returned {e.response.status_code}"}, status_code=502)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=502)
--- a/cortex/routers/nextcloud_talk.py
+++ b/cortex/routers/nextcloud_talk.py
@@ -1,18 +1,17 @@
 import asyncio
-import hashlib
-import hmac
 import json
 import logging
-import secrets

-import httpx
 from fastapi import APIRouter, BackgroundTasks, HTTPException, Request, Response

-from config import settings
+from auth_utils import get_user_channels
 from context_loader import load_context
 from llm_client import complete
+from notification import _send_nct_message
+from persona import set_context
 from session_logger import log_turn
 from session_store import load as load_session, save as save_session
+from config import settings
 import event_bus

 logger = logging.getLogger(__name__)
@@ -26,55 +25,37 @@ if not logger.handlers:
 router = APIRouter()


-def _verify_signature(body: bytes, random_header: str, sig_header: str) -> bool:
+def _verify_signature(body: bytes, random_header: str, sig_header: str, secret: str) -> bool:
    """Nextcloud signs requests with HMAC-SHA256(key=secret, msg=random+body)."""
    expected = hmac.new(
-        settings.nextcloud_talk_bot_secret.encode(),
+        secret.encode(),
        (random_header + body.decode("utf-8", errors="replace")).encode(),
        hashlib.sha256,
    ).hexdigest()
    return hmac.compare_digest(expected, sig_header.lower())


-async def _send_reply(conversation_token: str, message: str) -> None:
+async def _send_reply(conversation_token: str, message: str, nextcloud_url: str, secret: str) -> None:
    """Post a message to Nextcloud Talk as the bot."""
-    url = (
-        f"{settings.nextcloud_url}/ocs/v2.php/apps/spreed/api/v1"
-        f"/bot/{conversation_token}/message"
-    )
-    # NC Talk verifies HMAC over (random + message_text), NOT the raw body.
-    # See BotController::getBotFromHeaders → checksumVerificationService::validateRequest($random, $sig, $secret, $message)
-    body_dict = {"message": message}
-    body_bytes = json.dumps(body_dict, ensure_ascii=False).encode("utf-8")
-    random_str = secrets.token_hex(32)
-    sig = hmac.new(
-        settings.nextcloud_talk_bot_secret.encode(),
-        (random_str + message).encode("utf-8"),
-        hashlib.sha256,
-    ).hexdigest()
-
-    logger.info("NCT _send_reply → %s (body: %s)", url, body_bytes.decode())
-    try:
-        async with httpx.AsyncClient() as client:
-            resp = await client.post(
-                url,
-                content=body_bytes,
-                headers={
-                    "Content-Type": "application/json",
-                    "OCS-APIRequest": "true",
-                    "X-Nextcloud-Talk-Bot-Random": random_str,
-                    "X-Nextcloud-Talk-Bot-Signature": sig,
-                },
-                timeout=15,
-            )
-        logger.info("NCT reply: %s — %s", resp.status_code, resp.text[:400])
-    except Exception as e:
-        logger.error("NCT reply error: %s", e)
+    logger.info("NCT _send_reply → room %s (%d chars)", conversation_token, len(message))
+    await _send_nct_message(nextcloud_url, secret, conversation_token, message)


-async def _process_message(conversation_token: str, user_text: str, actor_name: str) -> None:
+async def _process_message(
+    conversation_token: str,
+    user_text: str,
+    actor_name: str,
+    username: str,
+    persona_name: str,
+    nextcloud_url: str,
+    secret: str,
+    timeout: int,
+) -> None:
    logger.info("NCT process: token=%s user=%s text=%r", conversation_token, actor_name, user_text)
-    session_id    = f"nct_{conversation_token}"
+
+    set_context(username, persona_name)
+
+    session_id    = f"nct_{username}_{conversation_token}"
    system_prompt = load_context(settings.default_tier)
    history       = load_session(session_id)
    history.append({"role": "user", "content": user_text})
@@ -90,15 +71,15 @@ async def _process_message(conversation_token: str, user_text: str, actor_name:
    try:
        response_text, backend = await asyncio.wait_for(
            complete(system_prompt=system_prompt, messages=history),
-            timeout=settings.nextcloud_talk_timeout,
+            timeout=timeout,
        )
    except asyncio.TimeoutError:
        logger.warning("NCT timeout for %s", conversation_token)
-        await _send_reply(conversation_token, "⏳ Still thinking — this is taking longer than usual.")
+        await _send_reply(conversation_token, "⏳ Still thinking — this is taking longer than usual.", nextcloud_url, secret)
        return
    except Exception as e:
        logger.error("NCT LLM error for %s: %s", conversation_token, e)
-        await _send_reply(conversation_token, "⚠️ Something went wrong on my end.")
+        await _send_reply(conversation_token, "⚠️ Something went wrong on my end.", nextcloud_url, secret)
        return

    logger.info("NCT LLM responded via %s (%d chars)", backend, len(response_text))
@@ -114,22 +95,33 @@ async def _process_message(conversation_token: str, user_text: str, actor_name:
        "backend": backend,
    })

-    await _send_reply(conversation_token, response_text)
+    await _send_reply(conversation_token, response_text, nextcloud_url, secret)


-@router.post("/inara-nextcloud-talk-webhook")
-async def nextcloud_talk_webhook(request: Request, background_tasks: BackgroundTasks):
-    body = await request.body()
+@router.post("/webhook/nextcloud/{username}")
+async def nextcloud_talk_webhook(username: str, request: Request, background_tasks: BackgroundTasks):
+    channels = get_user_channels(username)
+    cfg = channels.get("nextcloud")
+    if not cfg:
+        logger.warning("NCT webhook: no channel config for user %r", username)
+        raise HTTPException(status_code=404, detail="Channel not configured for this user")

-    if not settings.nextcloud_talk_bot_secret:
-        logger.error("nextcloud_talk_bot_secret not configured")
+    persona_name  = cfg.get("persona", "inara")
+    nextcloud_url = cfg.get("url", "")
+    secret        = cfg.get("bot_secret", "")
+    timeout       = cfg.get("timeout", 55)
+
+    if not secret:
+        logger.error("NCT webhook: bot_secret missing for user %r", username)
        return Response(status_code=500)

+    body = await request.body()
+
    random_header = request.headers.get("X-Nextcloud-Talk-Random", "")
    sig_header    = request.headers.get("X-Nextcloud-Talk-Signature", "")

-    if not _verify_signature(body, random_header, sig_header):
-        logger.warning("NCT webhook: signature mismatch")
+    if not _verify_signature(body, random_header, sig_header, secret):
+        logger.warning("NCT webhook: signature mismatch for %s", username)
        raise HTTPException(status_code=401, detail="Invalid signature")

    try:
@@ -153,12 +145,12 @@ async def nextcloud_talk_webhook(request: Request, background_tasks: BackgroundT
    conversation_token = target.get("id", "")

    try:
-        content = json.loads(obj.get("content", "{}"))
+        content   = json.loads(obj.get("content", "{}"))
        user_text = content.get("message", "").strip()
    except (json.JSONDecodeError, AttributeError):
        user_text = (obj.get("name") or obj.get("content", "")).strip()

-    mention_prefix = f"@{settings.agent_name.lower()}"
+    mention_prefix = f"@{persona_name.lower()}"
    if user_text.lower().startswith(mention_prefix):
        user_text = user_text[len(mention_prefix):].strip()

@@ -168,5 +160,9 @@ async def nextcloud_talk_webhook(request: Request, background_tasks: BackgroundT
    actor_name = actor.get("name", "User")
    logger.info("NCT message from %s in %s: %r", actor_name, conversation_token, user_text[:60])

-    background_tasks.add_task(_process_message, conversation_token, user_text, actor_name)
+    background_tasks.add_task(
+        _process_message,
+        conversation_token, user_text, actor_name,
+        username, persona_name, nextcloud_url, secret, timeout,
+    )
    return Response(status_code=200)
--- a/cortex/routers/orchestrator.py
+++ b/cortex/routers/orchestrator.py
@@ -18,6 +18,7 @@ from datetime import datetime, timezone
 from fastapi import APIRouter
 from pydantic import BaseModel

+from auth_utils import get_user_gemini_key
 from config import settings
 from context_loader import load_context
 from persona import set_context, validate as validate_persona
@@ -104,7 +105,7 @@ async def orchestrate(req: OrchestrateRequest) -> OrchestrateResponse:
        _jobs[job_id] = job

    # Run in background — caller polls GET /orchestrate/{job_id}
-    asyncio.create_task(_run_job(job_id, req))
+    asyncio.create_task(_run_job(job_id, req, user))
    logger.info("Orchestrator job queued: %s — %.80s", job_id, req.task)
    return OrchestrateResponse(job_id=job_id, status="queued")

@@ -134,7 +135,7 @@ async def list_jobs() -> list[JobStatusResponse]:
 # Background runner
 # ---------------------------------------------------------------------------

-async def _run_job(job_id: str, req: OrchestrateRequest) -> None:
+async def _run_job(job_id: str, req: OrchestrateRequest, user: str) -> None:
    """Execute the orchestration job and update the job store."""
    async with _jobs_lock:
        _jobs[job_id]["status"] = "running"
@@ -161,6 +162,7 @@ async def _run_job(job_id: str, req: OrchestrateRequest) -> None:
            system_prompt=system_prompt,
            session_messages=session_messages,
            respond_with_claude=req.respond_with_claude,
+            gemini_api_key=get_user_gemini_key(user),
        )

        # Save the turn to the session store so it survives a page refresh
--- a/cortex/routers/settings.py
+++ b/cortex/routers/settings.py
@@ -16,7 +16,7 @@ import jwt
 from fastapi import APIRouter, Form, Request
 from fastapi.responses import HTMLResponse, RedirectResponse

-from auth_utils import COOKIE_NAME, decode_token, check_credentials, set_password
+from auth_utils import COOKIE_NAME, decode_token, check_credentials, set_password, _read_auth, _write_auth
 from persona import list_user_personas
 from config import settings as app_settings

@@ -41,6 +41,21 @@ def _get_session_user(request: Request) -> str | None:
 def _settings_page(username: str, personas: list[str], success: str = "", error: str = "") -> str:
    html = (_STATIC / "settings.html").read_text()
    html = html.replace("{{ username }}", username)
+
+    # Connected Google account
+    auth_data    = _read_auth(username)
+    google_email = auth_data.get("google_email") or ""
+    html = html.replace("{{ google_email }}", google_email)
+
+    # Gemini API key — show masked hint only, never the full key
+    gemini_key = auth_data.get("gemini_api_key") or ""
+    if gemini_key:
+        hint = f"Saved (…{gemini_key[-4:]})"
+    else:
+        hint = "Using server key"
+    html = html.replace("{{ gemini_key_hint }}", hint)
+    html = html.replace("{{ gemini_key_set }}", "true" if gemini_key else "false")
+
    persona_items = "\n".join(
        f'''<li>
          <a href="/{username}/{p}" class="persona-link">{p}</a>
@@ -58,6 +73,7 @@ def _settings_page(username: str, personas: list[str], success: str = "", error:
    html = html.replace("{{ persona_items }}", persona_items or "<li><em>No personas yet.</em></li>")
    back_persona = personas[0] if personas else ""
    html = html.replace("{{ back_href }}", f"/{username}/{back_persona}" if back_persona else "/")
+    html = html.replace("{{ help_href }}", f"/help?persona={back_persona}" if back_persona else "/help")
    if success:
        html = html.replace("<!-- SUCCESS -->", f'<p class="success">{success}</p>')
    if error:
@@ -139,6 +155,30 @@ async def rename_username(
    return resp


+@router.post("/settings/gemini-key", include_in_schema=False)
+async def save_gemini_key(
+    request: Request,
+    gemini_api_key: str = Form(...),
+):
+    username = _get_session_user(request)
+    if not username:
+        return RedirectResponse("/login", status_code=302)
+
+    personas = list_user_personas(username)
+    gemini_api_key = gemini_api_key.strip()
+
+    data = _read_auth(username)
+    if gemini_api_key:
+        data["gemini_api_key"] = gemini_api_key
+        msg = "Gemini API key saved."
+    else:
+        data.pop("gemini_api_key", None)
+        msg = "Gemini API key removed — using server key."
+    _write_auth(username, data)
+    logger.info("gemini key updated: %s", username)
+    return HTMLResponse(_settings_page(username, personas, success=msg))
+
+
@router.post("/settings/persona/rename", include_in_schema=False)
 async def rename_persona(
    request: Request,
--- a/cortex/routers/ui.py
+++ b/cortex/routers/ui.py
@@ -62,6 +62,20 @@ def _first_persona(username: str) -> str | None:
    return names[0] if names else None


+# ---------------------------------------------------------------------------
+# Favicon — default sparkle; persona pages override via JS
+# ---------------------------------------------------------------------------
+
+_FAVICON_SVG = (
+    "<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'>"
+    "<text y='.9em' font-size='90'>✨</text></svg>"
+)
+
+@router.get("/favicon.ico", include_in_schema=False)
+async def favicon():
+    return Response(content=_FAVICON_SVG, media_type="image/svg+xml")
+
+
 # ---------------------------------------------------------------------------
 # Root redirect
 # ---------------------------------------------------------------------------
@@ -123,6 +137,112 @@ async def logout():
    return resp


+# ---------------------------------------------------------------------------
+# User landing — /{username}  → persona picker
+# ---------------------------------------------------------------------------
+
+@router.get("/{username}", include_in_schema=False)
+async def user_landing(username: str, request: Request):
+    session_user = _get_session_user(request)
+    if not session_user:
+        return RedirectResponse("/login", status_code=302)
+    if session_user != username:
+        return RedirectResponse(f"/{session_user}", status_code=302)
+
+    personas = list_user_personas(username)
+    if not personas:
+        return HTMLResponse("<h1>No personas configured.</h1>", status_code=404)
+
+    cards_html = ""
+    for p in personas:
+        emoji = "✨"
+        identity_path = persona_path(username, p) / "IDENTITY.md"
+        if identity_path.exists():
+            m = re.search(r"\|\s*Emoji\s*\|\s*(.+?)\s*\|", identity_path.read_text())
+            if m:
+                emoji = m.group(1).strip()
+        cards_html += (
+            f'<a href="/{username}/{p}" class="persona-card">'
+            f'<span class="p-emoji">{emoji}</span>'
+            f'<span class="p-name">{p.capitalize()}</span>'
+            f'</a>\n'
+        )
+
+    html = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Cortex — {username}</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@100..900&display=swap" rel="stylesheet">
+  <style>
+    *, *::before, *::after {{ box-sizing: border-box; margin: 0; padding: 0; }}
+    body {{
+      min-height: 100vh;
+      display: flex;
+      align-items: center;
+      justify-content: center;
+      background: #1a1228;
+      font-family: 'Inter', system-ui, -apple-system, sans-serif;
+      font-weight: 450;
+      -webkit-font-smoothing: antialiased;
+      color: #e8e0f0;
+      padding: 2rem 1.5rem;
+    }}
+    .card {{
+      background: #221840;
+      border: 1px solid #3a2852;
+      border-radius: 14px;
+      padding: 2.5rem 2rem;
+      width: 100%;
+      max-width: 400px;
+      text-align: center;
+    }}
+    h1 {{ font-size: 1.3rem; font-weight: 700; color: #c4935a; margin-bottom: 0.4rem; }}
+    .sub {{ font-size: 0.82rem; color: #b0a2c8; margin-bottom: 2rem; }}
+    .personas {{ display: flex; flex-direction: column; gap: 0.75rem; }}
+    .persona-card {{
+      display: flex;
+      align-items: center;
+      gap: 1rem;
+      padding: 0.85rem 1.2rem;
+      background: #1a1228;
+      border: 1px solid #3a2852;
+      border-radius: 10px;
+      color: #e8e0f0;
+      text-decoration: none;
+      font-size: 1rem;
+      font-weight: 500;
+      transition: border-color 0.15s, background 0.15s;
+    }}
+    .persona-card:hover {{ border-color: #c4935a; background: #261d42; }}
+    .p-emoji {{ font-size: 1.6rem; line-height: 1; }}
+    .p-name  {{ color: #c4935a; font-weight: 600; }}
+    .settings-link {{
+      display: inline-block;
+      margin-top: 1.5rem;
+      font-size: 0.78rem;
+      color: #b0a2c8;
+      text-decoration: none;
+    }}
+    .settings-link:hover {{ color: #e8e0f0; }}
+  </style>
+</head>
+<body>
+  <div class="card">
+    <h1>Cortex</h1>
+    <p class="sub">Signed in as <strong>{username}</strong> — choose a persona</p>
+    <div class="personas">
+{cards_html}    </div>
+    <a href="/settings" class="settings-link">Account settings</a>
+  </div>
+</body>
+</html>"""
+    return HTMLResponse(html)
+
+
 # ---------------------------------------------------------------------------
 # Main UI — /{username}/{persona}
 # ---------------------------------------------------------------------------
--- a/cortex/scheduler.py
+++ b/cortex/scheduler.py
@@ -30,24 +30,28 @@ async def _run_short() -> None:

 async def _run_mid() -> None:
    from memory_distiller import distill_mid
+    from notification import notify
    try:
        result = await distill_mid()
        if "error" in result:
            logger.warning("auto distill mid skipped: %s", result["error"])
        else:
            logger.info("auto distill mid: %d chars via %s", result["chars_written"], result["backend"])
+            await notify(result["username"], f"📝 Weekly memory digest complete ({result['chars_written']} chars via {result['backend']}).")
    except Exception as e:
        logger.error("auto distill mid failed: %s", e)


 async def _run_long() -> None:
    from memory_distiller import distill_long
+    from notification import notify
    try:
        result = await distill_long()
        if "error" in result:
            logger.warning("auto distill long skipped: %s", result["error"])
        else:
            logger.info("auto distill long: %d chars via %s", result["chars_written"], result["backend"])
+            await notify(result["username"], f"🧠 Monthly long-term memory integration complete ({result['chars_written']} chars via {result['backend']}). Worth a quick review.")
    except Exception as e:
        logger.error("auto distill long failed: %s", e)

--- a/cortex/static/HELP.md
+++ b/cortex/static/HELP.md
@@ -0,0 +1,262 @@
+# Cortex UI — Help & Reference
+
+<!-- SHARED BASE: cortex/static/HELP.md
+     This file is served to all users regardless of persona.
+     Persona-specific additions live in home/{username}/persona/{name}/HELP.md
+     and are appended automatically by help.html when present.
+-->
+
+*Last updated: 2026-03-27*
+
+---
+
+## Header Controls
+
+| Button | What it does |
+|---|---|
+| **Sessions** | Open the sessions panel — list, resume, or start sessions |
+| **Files** | Open the identity file editor (SOUL, MEMORY, etc.) |
+| **⚙ N** | Open the Settings panel (N = current context tier) |
+| **?** | Open this help panel |
+
+The **⚙ Settings** panel contains all configuration options:
+
+| Section | Controls |
+|---|---|
+| **Context Tier** | T1 – T4 context depth |
+| **Memory Layers** | Toggle Long / Mid / Short memory on/off |
+| **Distill Memory** | Manually trigger short / mid / long / all distillation |
+| **Backend** | Active LLM backend — click to toggle claude ↔ gemini |
+| **Display** | Aa/A+/A− font size cycle · ☾/☀ theme toggle |
+
+All header settings (theme, font size, tier, memory layers) persist in `localStorage` across page refreshes.
+
+---
+
+## Chat
+
+- **Send:** `Ctrl+Enter` by default. Click `⌃↵` in the input controls to toggle to plain `Enter` mode.
+- **Stop:** Click **Stop** to cancel an in-progress response at any time.
+- **Edit a message:** Hover over any message → click **edit**. `Ctrl+Enter` saves, `Esc` cancels.
+- **Delete a message:** Hover over any message → click **del**. Removes from session history.
+- **Copy a response:** Hover over any assistant message → click **copy**.
+- **New line while typing:** `Shift+Enter` (in `Ctrl+Enter` mode) or `Shift+Enter` / Enter (in Enter mode).
+
+---
+
+## Agent Mode
+
+Click the **Agent** button in the input row to enable Agent mode. The button highlights and Send changes to **Run**.
+
+In Agent mode, messages are routed through the **orchestrator** instead of directly to Claude:
+
+1. **Gemini** runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
+2. **Claude** receives the enriched context and writes the final response
+3. A `⚡ N tool calls: …` note appears below the response listing what was used
+
+Agent mode is best for tasks that require research, multi-step reasoning, or tool use (e.g. "search for X", "add a task", "what's on my list?"). Regular chat is faster for conversational turns.
+
+Agent mode sessions persist to history exactly like regular chat — they survive page refreshes and appear in the Sessions panel.
+
+---
+
+## Sessions
+
+Sessions are named conversation threads that persist across page refreshes.
+
+- Click **Sessions** → **+ New** to start a fresh session.
+- Click any listed session to resume it — full history loads instantly.
+- Sessions from Nextcloud Talk appear as `nct_*` prefixed IDs.
+- A blue **●** badge appears on the Sessions button when Talk activity arrives in a session you're not currently viewing.
+
+---
+
+## Notes
+
+Notes are injected into a session without triggering an LLM response.
+
+- Click **Note** to toggle note mode. The input border changes colour.
+- **Private note** (amber border) — visible only in the UI, never sent to the LLM.
+- **Context note** (teal border) — persisted to session history so the LLM sees it on the next turn. Useful for nudging context without a full message.
+- Click the `private / public` label to switch between note types.
+
+---
+
+## Backends
+
+- **Claude CLI** and **Gemini CLI** are both available. One is primary, the other is fallback.
+- Click **⚙** → **Backend** to toggle between `claude` and `gemini` as the primary.
+- If the primary fails or times out, the fallback is used automatically. A **⚡** notice appears in the chat when this happens.
+- Timeouts: Claude 60s, Gemini 120s.
+
+---
+
+## Nextcloud Talk Bot
+
+Inara is registered as a bot in Nextcloud Talk.
+
+- Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to by Inara.
+- The webhook returns `200 OK` immediately; the LLM call and reply happen asynchronously.
+- Real-time updates stream to the web UI via SSE — you see Talk messages and responses appear live.
+- To enable the bot in a conversation: open Talk conversation settings → Bots → enable Inara.
+
+---
+
+## Google Chat Bot
+
+Inara is available as a bot in Google Chat (One Sky IT Workspace).
+
+- Send Inara a direct message in Google Chat to start a conversation.
+- Each DM thread is its own session (`gc_spaces/*` prefix) — history persists across messages.
+- Responses are synchronous — Google Chat displays Inara's reply directly in the thread.
+- To add Inara to a space: open the space, add a person/app, search for **Inara**.
+- Sessions from Google Chat appear as `gc_*` prefixed IDs in the Sessions panel.
+
+**Technical note:** Cortex uses Google's Workspace Add-on format (`hostAppDataAction`) — the modern API required for all Google Chat apps as of 2025.
+
+---
+
+## Files (Identity Editor)
+
+The **Files** button opens an editor for Inara's identity and memory files:
+
+| File | Purpose |
+|---|---|
+| `SOUL.md` | Core personality, values, and voice |
+| `IDENTITY.md` | Role, capabilities, and context |
+| `USER.md` | Scott's profile, preferences, and history |
+| `PROTOCOLS.md` | Behavioural rules and communication protocols |
+| `CONTEXT_TIERS.md` | Defines what gets loaded at each context tier |
+| `MEMORY_LONG.md` | Permanent curated long-term memory |
+| `MEMORY_MID.md` | Rolling mid-term digest (LLM-distilled) |
+| `MEMORY_SHORT.md` | Recent session rollup (auto-aggregated) |
+| `TASKS.json` | Inara's personal task list (managed via Agent mode) |
+| `HELP.md` | This file |
+
+Toggle **preview** / **edit** to switch between rendered markdown and raw text. **Ctrl+S** saves, **Esc** closes.
+
+---
+
+## Context & Memory ( ⚙ panel )
+
+### Context Tiers
+
+Controls how much context is prepended to each LLM call:
+
+| Tier | Loads | ~Tokens |
+|---|---|---|
+| **T1** | SOUL + IDENTITY + USER summary | ~1,500 |
+| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
+| **T3** | + last 2 raw session logs | ~15,000 |
+| **T4** | + last 7 raw session logs | ~50,000 |
+
+Default is T2. Use T1 for small/local models. Use T3–T4 for complex multi-session tasks.
+
+### Memory Layers
+
+Three independently toggleable memory files, loaded **Long → Mid → Short** (short sits closest to the conversation turn for better LLM recall):
+
+| Layer | File | Contents |
+|---|---|---|
+| **Long** | `MEMORY_LONG.md` | Permanent facts — origin, key decisions, Scott's profile highlights |
+| **Mid** | `MEMORY_MID.md` | Rolling digest of recent weeks — LLM-distilled from Short |
+| **Short** | `MEMORY_SHORT.md` | Recent session rollup — auto-aggregated from session log files |
+
+Toggle any layer off to save tokens for a focused conversation where history isn't needed.
+
+### Memory Distillation (manual)
+
+Distillation builds up the memory layers from raw session logs. Currently **manual** — trigger via the ⚙ panel:
+
+| Button | What it does |
+|---|---|
+| **short** | Rolls recent session log files → `MEMORY_SHORT.md` (fast, no LLM) |
+| **mid** | LLM summarizes `MEMORY_SHORT.md` → `MEMORY_MID.md` |
+| **long** | LLM integrates `MEMORY_MID.md` → `MEMORY_LONG.md` |
+| **all** | Runs short → mid → long in sequence |
+
+**Recommended workflow:**
+- Run **short** after any productive session to capture it.
+- Run **mid** weekly to distil short → mid.
+- Run **long** monthly to absorb mid into permanent memory.
+
+Token budgets for each layer are set in `.env` (`MEMORY_BUDGET_LONG`, `MEMORY_BUDGET_MID`, `MEMORY_BUDGET_SHORT`).
+
+---
+
+## Keyboard Shortcuts
+
+| Keys | Action |
+|---|---|
+| `Ctrl+Enter` | Send message (default mode) |
+| `Enter` | Send (when in Enter mode) |
+| `Shift+Enter` | New line in message input |
+| `Ctrl+Enter` | Save inline message edit |
+| `Esc` | Cancel inline edit |
+| `Ctrl+S` | Save file (Files modal) |
+| `Esc` | Close any open modal |
+
+---
+
+## API Reference
+
+For direct access or scripting:
+
+| Method | Endpoint | Description |
+|---|---|---|
+| `POST` | `/chat` | Send a message — returns SSE stream |
+| `GET` | `/backend` | Get current primary/fallback backends |
+| `POST` | `/backend` | Set primary backend (`{"primary": "claude"}`) |
+| `GET` | `/sessions` | List all sessions |
+| `GET` | `/history/{id}` | Get session message history |
+| `PUT` | `/history/{id}` | Replace full session history |
+| `GET` | `/events` | SSE stream for real-time Talk activity |
+| `POST` | `/note` | Inject a context note into a session |
+| `GET` | `/files` | List identity files |
+| `GET` | `/files/{name}` | Read a file |
+| `PUT` | `/files/{name}` | Write a file |
+| `POST` | `/distill/short` | Aggregate session logs → MEMORY_SHORT |
+| `POST` | `/distill/mid` | Summarize short → MEMORY_MID (LLM) |
+| `POST` | `/distill/long` | Integrate mid → MEMORY_LONG (LLM) |
+| `POST` | `/distill/all` | Run all three distillation steps |
+| `GET` | `/distill/status` | Show scheduler status and next run times |
+| `POST` | `/orchestrate` | Submit an agent task — returns `{"job_id": "..."}` |
+| `GET` | `/orchestrate/{job_id}` | Poll job status and result |
+| `GET` | `/orchestrate` | List all jobs from current session (in-memory) |
+| `GET` | `/health` | Health check — returns `{"status": "ok"}` |
+
+Chat request body (`POST /chat`):
+```json
+{
+  "message": "string",
+  "session_id": "string | null",
+  "tier": 1,
+  "model": "claude | gemini | null",
+  "include_long": true,
+  "include_mid": true,
+  "include_short": true
+}
+```
+
+---
+
+## In Progress / Planned
+
+- **Ollama local model backend** — direct Ollama API support (no CLI wrapper); target host: scott_gaming via WireGuard
+- **Nextcloud Talk stabilization** — test end-to-end after restarts; complete bot registration docs
+- **Multi-user support** — per-user identity/memory files; currently single-user (Scott); Holly instance planned
+
+### Recently Completed
+
+- ✓ **Google Chat bot** — Workspace Add-on integration; DM and spaces; JWT verification; session persistence
+- ✓ **Agent mode** — Gemini tool loop + Claude responder, accessible via UI toggle
+- ✓ **Personal task management** — `task_list`, `task_create`, `task_update`, `task_complete` tools backed by `TASKS.json`
+- ✓ **Web search fixed** — DDG package updated (`ddgs`); `WebSearch`/`WebFetch` allowed for Claude CLI fallback
+- ✓ **Session persistence for orchestrator** — agent mode turns now survive page refresh
+- ✓ **Systemd user service** — Cortex runs as a user service; no sudo required (`systemctl --user restart cortex`)
+- ✓ **OAuth token warning banner** — amber banner when Claude CLI token is within 24h of expiry
+
+---
+
+*Cortex is Scott's personal AI orchestration system. Inara is its primary resident agent.*
+*Built on FastAPI + Claude CLI + Gemini CLI. Named after Firefly.*
--- a/cortex/static/app.js
+++ b/cortex/static/app.js
@@ -16,6 +16,50 @@
        const note_vis_btn_el    = document.getElementById('note-vis-btn');
        const settings_btn_el    = document.getElementById('settings-btn');
        const settings_dd_el     = document.getElementById('settings-dropdown');
+        const sessionsBackdrop   = document.getElementById('sessions-backdrop');
+
+        // ── Close all panels/dropdowns (mutual exclusion) ─────────────
+        function closeAllPanels() {
+            if (mode_dropdown_el)  mode_dropdown_el.classList.remove('open');
+            if (settings_dd_el)    settings_dd_el.classList.remove('open');
+            if (sessionsPanel)     { sessionsPanel.classList.remove('open'); sessionsBackdrop.classList.remove('open'); }
+            const pd = document.getElementById('persona-dropdown');
+            if (pd) pd.classList.remove('open');
+        }
+
+        // ── Toasts ────────────────────────────────────────────────────
+        const toastContainer = document.getElementById('toast-container');
+
+        function showToast(message, type = 'info', duration = 2500) {
+            const el = document.createElement('div');
+            el.className = 'toast' + (type !== 'info' ? ' ' + type : '');
+            el.textContent = message;
+            toastContainer.appendChild(el);
+            requestAnimationFrame(() => {
+                requestAnimationFrame(() => el.classList.add('show'));
+            });
+            setTimeout(() => {
+                el.classList.remove('show');
+                el.addEventListener('transitionend', () => el.remove(), { once: true });
+            }, duration);
+        }
+
+        // ── Syntax highlighting ───────────────────────────────────────
+        function highlight_code(container) {
+            if (typeof hljs === 'undefined') return;
+            container.querySelectorAll('pre code').forEach(el => hljs.highlightElement(el));
+        }
+
+        // ── Utility helpers ───────────────────────────────────────────
+        function _esc(s) {
+            return String(s).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
+        }
+
+        // ── Lucide icon helpers ───────────────────────────────────────
+        function icon_html(name, size = 16) {
+            return `<svg data-lucide="${name}" width="${size}" height="${size}" class="btn-icon"></svg>`;
+        }
+        function render_icons() { if (window.lucide) lucide.createIcons(); }

        // User/persona injected by the server at /{user}/{persona}
        const CORTEX_USER    = (window.CORTEX_CONFIG || {}).user    || 'scott';
@@ -26,12 +70,50 @@

        if (headerEmoji) headerEmoji.textContent = CORTEX_EMOJI;

+        // Set favicon to persona emoji
+        {
+            const favicon = document.querySelector("link[rel='icon']");
+            if (favicon && CORTEX_EMOJI) {
+                const svg = `<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>${CORTEX_EMOJI}</text></svg>`;
+                favicon.href = `data:image/svg+xml,${encodeURIComponent(svg)}`;
+            }
+        }
+
+        // Wire help link to preserve current persona on return
+        const helpLink = document.getElementById('help-link');
+        if (helpLink) helpLink.href = `/help?persona=${encodeURIComponent(CORTEX_PERSONA)}`;
+
        let sessionId        = null;
        let primaryBackend   = 'claude';
        let activeController = null;
        let currentHistory   = [];  // mirrors backend session [{role, content}, ...]
        let talkThinkingDiv  = null; // pending "thinking…" bubble for live Talk updates

+        // ── Session persistence ───────────────────────────────────────
+        // Survives page navigation (help, settings, etc.) within the same browser.
+        // Expires after SESSION_TTL_MS of inactivity.
+        const SESSION_TTL_MS  = 30 * 60 * 1000;  // 30 minutes
+        const _sid_key        = `cx_sid_${CORTEX_USER}_${CORTEX_PERSONA}`;
+        const _sid_ts_key     = `cx_sid_ts_${CORTEX_USER}_${CORTEX_PERSONA}`;
+
+        function persist_session() {
+            if (!sessionId) return;
+            localStorage.setItem(_sid_key, sessionId);
+            localStorage.setItem(_sid_ts_key, String(Date.now()));
+        }
+
+        function clear_stored_session() {
+            localStorage.removeItem(_sid_key);
+            localStorage.removeItem(_sid_ts_key);
+        }
+
+        function get_stored_session() {
+            const id = localStorage.getItem(_sid_key);
+            const ts = parseInt(localStorage.getItem(_sid_ts_key) || '0', 10);
+            if (!id || Date.now() - ts > SESSION_TTL_MS) return null;
+            return id;
+        }
+
        // ── Enter toggle ─────────────────────────────────────────────
        // Default: Ctrl+Enter sends. Stored in localStorage.
        let ctrlEnterMode = localStorage.getItem('ctrlEnterSend') !== 'false';
@@ -69,12 +151,17 @@

        // ── Input mode — dropdown select with MRU ordering ──────────
        const MODES = {
-            chat:  { icon: '💬', label: 'Chat' },
-            note:  { icon: '📝', label: 'Note' },
-            otr:   { icon: '🔒', label: 'OTR' },
-            agent: { icon: '🥸', label: 'Agent' },
+            chat:  { icon: 'message-circle', label: 'Chat' },
+            note:  { icon: 'pencil',         label: 'Note' },
+            otr:   { icon: 'lock',           label: 'OTR'  },
+            agent: { icon: 'bot',            label: 'Agent' },
+        };
+        const send_defs = {
+            chat:  { icon: 'arrow-up', label: 'Send' },
+            note:  { icon: 'pencil',   label: 'Note' },
+            otr:   { icon: 'arrow-up', label: 'Send' },
+            agent: { icon: 'zap',      label: 'Run'  },
        };
-        const send_labels = { chat: '↑ Send', note: '📝 Note', otr: '↑ Send', agent: '⚡ Run' };

        let current_mode = localStorage.getItem('current_mode') || 'chat';
        let note_public  = false;
@@ -96,6 +183,7 @@
        }

        function open_mode_dropdown() {
+            closeAllPanels();
            // Build options in MRU order (least recent at top, most recent at bottom)
            // — bottom is visually closest to the button since dropdown opens upward
            const ordered = [...mode_mru].reverse();
@@ -105,12 +193,13 @@
                const btn = document.createElement('button');
                btn.className = 'mode-option' + (mode === current_mode ? ' current' : '');
                btn.innerHTML =
-                    `<span class="opt-icon">${m.icon}</span>${m.label}`
+                    `<span class="opt-icon">${icon_html(m.icon, 15)}</span>${m.label}`
                    + (mode === current_mode ? '<span class="opt-check">✓</span>' : '');
                btn.addEventListener('click', () => set_mode(mode));
                mode_dropdown_el.appendChild(btn);
            });
            mode_dropdown_el.classList.add('open');
+            render_icons();
        }

        function close_mode_dropdown() {
@@ -130,10 +219,11 @@
        });

        function update_mode_ui() {
-            const m = MODES[current_mode];
+            const m  = MODES[current_mode];
+            const sd = send_defs[current_mode] || send_defs.chat;

            // Update trigger button
-            mode_icon_el.textContent  = m.icon;
+            mode_icon_el.innerHTML    = icon_html(m.icon, 15);
            mode_label_el.textContent = m.label;
            mode_select_btn_el.className = current_mode === 'chat'
                ? '' : `mode-${current_mode}`;
@@ -150,9 +240,10 @@
            inputEl.classList.toggle('mode-otr',   current_mode === 'otr');
            inputEl.classList.toggle('mode-agent', current_mode === 'agent');

-            // Send button label
-            sendBtn.textContent = send_labels[current_mode] || 'Send';
+            // Send button label + icon
+            sendBtn.innerHTML = icon_html(sd.icon) + ' ' + sd.label;

+            render_icons();
            updateInputPlaceholder();
        }

@@ -184,7 +275,9 @@
        // ── Settings dropdown ─────────────────────────────────────────
        settings_btn_el.addEventListener('click', (e) => {
            e.stopPropagation();
-            settings_dd_el.classList.toggle('open');
+            const isOpen = settings_dd_el.classList.contains('open');
+            closeAllPanels();
+            if (!isOpen) settings_dd_el.classList.add('open');
        });
        document.addEventListener('click', (e) => {
            if (!settings_dd_el.contains(e.target) && e.target !== settings_btn_el) {
@@ -238,7 +331,9 @@
        if (personaSwitcher) {
            personaSwitcher.addEventListener('click', (e) => {
                if (personaDropEl.children.length === 0) return;
-                personaDropEl.classList.toggle('open');
+                const isOpen = personaDropEl.classList.contains('open');
+                closeAllPanels();
+                if (!isOpen) personaDropEl.classList.add('open');
                e.stopPropagation();
            });
            document.addEventListener('click', () => personaDropEl.classList.remove('open'));
@@ -246,23 +341,40 @@

        // ── Backend toggle ───────────────────────────────────────────

-        fetch('/backend').then(r => r.json()).then(d => setBackendUI(d.primary));
+        fetch('/backend').then(r => r.json()).then(d => setBackendUI(d));

-        function setBackendUI(backend) {
+        const BACKEND_CYCLE = ['claude', 'gemini', 'local'];
+        const BACKEND_CLASS = { claude: '', gemini: 'mem-on', local: 'local-on' };
+        const backendModelHint = document.getElementById('backend-model-hint');
+
+        function setBackendUI(d) {
+            const backend = d.primary || d;  // accept full response obj or bare string
            primaryBackend = backend;
            backendToggle.textContent = backend;
-            backendToggle.className = 'ctx-btn' + (backend === 'gemini' ? ' mem-on' : '');
+            const extra = BACKEND_CLASS[backend] || '';
+            backendToggle.className = 'ctx-btn' + (extra ? ' ' + extra : '');
+
+            if (backendModelHint) {
+                if (backend === 'local' && d.local_model) {
+                    backendModelHint.textContent = d.local_model.label || d.local_model.model_name;
+                    backendModelHint.style.display = '';
+                } else {
+                    backendModelHint.textContent = '';
+                    backendModelHint.style.display = 'none';
+                }
+            }
        }

        backendToggle.addEventListener('click', async () => {
-            const next = primaryBackend === 'claude' ? 'gemini' : 'claude';
+            const idx = BACKEND_CYCLE.indexOf(primaryBackend);
+            const next = BACKEND_CYCLE[(idx + 1) % BACKEND_CYCLE.length];
            const res = await fetch('/backend', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ primary: next }),
            });
            const d = await res.json();
-            setBackendUI(d.primary);
+            setBackendUI(d);
            addMessage('system', `Backend: ${d.primary} (fallback: ${d.fallback})`);
        });

@@ -272,17 +384,26 @@
            e.stopPropagation();
            if (sessionsPanel.classList.contains('open')) {
                sessionsPanel.classList.remove('open');
+                sessionsBackdrop.classList.remove('open');
                return;
            }
+            closeAllPanels();
            const res = await fetch(`/sessions?${_fileParams}`);
            const data = await res.json();
            renderPanel(data.sessions);
            sessionsPanel.classList.add('open');
+            sessionsBackdrop.classList.add('open');
+        });
+
+        sessionsBackdrop.addEventListener('click', () => {
+            sessionsPanel.classList.remove('open');
+            sessionsBackdrop.classList.remove('open');
        });

        document.addEventListener('click', (e) => {
            if (!sessionsPanel.contains(e.target) && e.target !== sessionsBtn) {
                sessionsPanel.classList.remove('open');
+                sessionsBackdrop.classList.remove('open');
            }
        });

@@ -296,11 +417,13 @@
            const newItem = makeItem('new', '+ New session', '');
            newItem.addEventListener('click', () => {
                sessionId = null;
+                clear_stored_session();
                currentHistory = [];
                messagesEl.innerHTML = '';
                sessionEl.textContent = '';
                addMessage('system', 'New session');
                sessionsPanel.classList.remove('open');
+                sessionsBackdrop.classList.remove('open');
                inputEl.focus();
            });
            sessionsPanel.appendChild(newItem);
@@ -355,6 +478,7 @@
                        if (sessionId === s.session_id) {
                            sessionEl.textContent = `session: ${newName || s.session_id}`;
                        }
+                        if (newName) showToast('Session renamed', 'success');
                    }

                    input.addEventListener('keydown', (e) => {
@@ -374,10 +498,11 @@
                    await fetch(`/sessions/${s.session_id}?${_fileParams}`, { method: 'DELETE' });
                    if (sessionId === s.session_id) {
                        sessionId = null;
+                        clear_stored_session();
                        currentHistory = [];
                        messagesEl.innerHTML = '';
                        sessionEl.textContent = '';
-                        addMessage('system', 'Session deleted');
+                        showToast('Session deleted');
                    }
                    const res = await fetch(`/sessions?${_fileParams}`);
                    const data = await res.json();
@@ -407,10 +532,11 @@
            return item;
        }

-        async function resumeSession(id) {
+        async function resumeSession(id, silent = false) {
            talkThinkingDiv = null;
            if (id && id.startsWith('nct_')) sessionsBtn.classList.remove('talk-badge');
            const res = await fetch(`/history/${id}?${_fileParams}`);
+            if (!res.ok) throw new Error(`HTTP ${res.status}`);
            const data = await res.json();

            messagesEl.innerHTML = '';
@@ -426,10 +552,12 @@
                attachHistoryControls(msgDiv, i);
            }

-            addMessage('system', `Resumed session ${id}`);
+            if (!silent) addMessage('system', `Resumed session ${id}`);
            scrollToBottom();
            sessionsPanel.classList.remove('open');
+            sessionsBackdrop.classList.remove('open');
            inputEl.focus();
+            persist_session();
        }

        function timeAgo(iso) {
@@ -473,6 +601,7 @@
            if (role === 'assistant' && typeof marked !== 'undefined') {
                div.dataset.raw = text;
                div.innerHTML = marked.parse(text);
+                highlight_code(div);
                div.querySelectorAll('a').forEach(a => {
                    a.target = '_blank';
                    a.rel = 'noopener noreferrer';
@@ -488,7 +617,9 @@
                div.appendChild(label);
                div.appendChild(content);
            } else {
+                div.dataset.raw = text;
                div.textContent = text;
+                div.appendChild(makeCopyBtn(div));
            }

            // Wrap user/assistant messages so action buttons can be attached
@@ -523,20 +654,21 @@

            const editBtn = document.createElement('button');
            editBtn.className = 'msg-act-btn';
-            editBtn.textContent = 'edit';
+            editBtn.innerHTML = icon_html('pencil', 12) + ' edit';
            editBtn.addEventListener('click', () => {
                startEdit(msgDiv);
            });

            const delBtn = document.createElement('button');
            delBtn.className = 'msg-act-btn del';
-            delBtn.textContent = 'del';
+            delBtn.innerHTML = icon_html('trash-2', 12) + ' del';
            delBtn.addEventListener('click', () => {
                deleteMsg(wrapper);
            });

            actionsDiv.appendChild(editBtn);
            actionsDiv.appendChild(delBtn);
+            render_icons();
        }

        // After any currentHistory splice, renumber all wrapper data-hist-idx attributes.
@@ -569,17 +701,18 @@
            ta.rows = Math.min(originalText.split('\n').length + 1, 12);

            const saveBtn   = document.createElement('button');
-            saveBtn.textContent = 'Save';
-            saveBtn.className   = 'edit-save-btn';
+            saveBtn.innerHTML = icon_html('check', 13) + ' Save';
+            saveBtn.className = 'edit-save-btn';

            const cancelBtn = document.createElement('button');
-            cancelBtn.textContent = 'Cancel';
-            cancelBtn.className   = 'edit-cancel-btn';
+            cancelBtn.innerHTML = icon_html('x', 13) + ' Cancel';
+            cancelBtn.className = 'edit-cancel-btn';

            const btnRow = document.createElement('div');
            btnRow.className = 'edit-btns';
            btnRow.appendChild(saveBtn);
            btnRow.appendChild(cancelBtn);
+            render_icons();

            msgDiv.innerHTML = '';
            msgDiv.appendChild(ta);
@@ -641,6 +774,7 @@
            if (role === 'assistant' && typeof marked !== 'undefined') {
                div.dataset.raw = text;
                div.innerHTML = marked.parse(text);
+                highlight_code(div);
                div.querySelectorAll('a').forEach(a => {
                    a.target = '_blank';
                    a.rel = 'noopener noreferrer';
@@ -651,10 +785,81 @@
            }
        }

+        // ── Agent tool-call step cards ────────────────────────────────
+        function renderToolCalls(toolCalls, beforeEl) {
+            if (!toolCalls || toolCalls.length === 0) return;
+
+            const container = document.createElement('div');
+            container.className = 'tool-calls-container';
+
+            for (const tc of toolCalls) {
+                const details = document.createElement('details');
+                details.className = 'tool-call';
+
+                // Summary: name + first arg value snippet
+                const args    = tc.args || {};
+                const argKeys = Object.keys(args);
+                let argSnippet = '';
+                if (argKeys.length > 0) {
+                    const firstVal = String(args[argKeys[0]]);
+                    argSnippet = firstVal.length > 60 ? firstVal.slice(0, 60) + '…' : firstVal;
+                }
+
+                const summary = document.createElement('summary');
+                const nameSpan = document.createElement('span');
+                nameSpan.className = 'tc-name';
+                nameSpan.textContent = tc.tool;
+                summary.appendChild(nameSpan);
+                if (argSnippet) {
+                    const snippetSpan = document.createElement('span');
+                    snippetSpan.className = 'tc-snippet';
+                    snippetSpan.textContent = argSnippet;
+                    summary.appendChild(snippetSpan);
+                }
+                details.appendChild(summary);
+
+                // Expanded body
+                const body = document.createElement('div');
+                body.className = 'tc-body';
+
+                if (argKeys.length > 0) {
+                    const sec = document.createElement('div');
+                    sec.className = 'tc-section';
+                    const lbl = document.createElement('span');
+                    lbl.className = 'tc-label';
+                    lbl.textContent = 'args';
+                    const pre = document.createElement('pre');
+                    pre.textContent = JSON.stringify(args, null, 2);
+                    sec.appendChild(lbl);
+                    sec.appendChild(pre);
+                    body.appendChild(sec);
+                }
+
+                const resultStr  = tc.result || '';
+                const truncated  = resultStr.length > 400;
+                const sec2 = document.createElement('div');
+                sec2.className = 'tc-section';
+                const lbl2 = document.createElement('span');
+                lbl2.className = 'tc-label';
+                lbl2.textContent = 'result';
+                const pre2 = document.createElement('pre');
+                pre2.textContent = truncated ? resultStr.slice(0, 400) + '\n…[truncated]' : resultStr;
+                sec2.appendChild(lbl2);
+                sec2.appendChild(pre2);
+                body.appendChild(sec2);
+
+                details.appendChild(body);
+                container.appendChild(details);
+            }
+
+            beforeEl.parentElement.insertBefore(container, beforeEl);
+        }
+
        function makeCopyBtn(div) {
            const btn = document.createElement('button');
            btn.className = 'copy-btn';
-            btn.textContent = 'copy';
+            btn.innerHTML = icon_html('copy', 12) + ' copy';
+            render_icons();
            btn.addEventListener('click', (e) => {
                e.stopPropagation();
                const text = div.dataset.raw || '';
@@ -663,11 +868,14 @@
                } else {
                    fallbackCopy(text);
                }
-                btn.textContent = '✓';
+                showToast('Copied to clipboard', 'success', 1800);
+                btn.innerHTML = icon_html('check', 12) + ' copied';
+                render_icons();
                btn.classList.add('copied');
                setTimeout(() => {
-                    btn.textContent = 'copy';
+                    btn.innerHTML = icon_html('copy', 12) + ' copy';
                    btn.classList.remove('copied');
+                    render_icons();
                }, 1500);
            });
            return btn;
@@ -701,7 +909,7 @@
                });
                if (!res.ok) throw new Error(`HTTP ${res.status}`);
            } catch (err) {
-                addMessage('system', `Note save failed: ${err.message}`);
+                showToast(`Note save failed: ${err.message}`, 'error');
            }
        }

@@ -716,7 +924,7 @@
            inputEl.value = '';
            syncHeight();
            sendBtn.style.display = 'none';
-            stopBtn.style.display = 'block';
+            stopBtn.style.display = 'flex';
            headerEmoji.classList.add('processing');

            activeController = new AbortController();
@@ -741,6 +949,7 @@
                        include_mid: memMid,
                        include_short: memShort,
                        off_record: current_mode === 'otr',
+                        model: primaryBackend,
                        user: CORTEX_USER,
                        persona: CORTEX_PERSONA,
                    }),
@@ -770,15 +979,21 @@
                        if (data.type === 'response') {
                            sessionId = data.session_id;
                            sessionEl.textContent = `session: ${sessionId}`;
+                            persist_session();
                            thinkingDiv.className = 'message assistant';
                            setMessageText(thinkingDiv, 'assistant', data.response);
                            const assistHistIdx = currentHistory.length;
                            currentHistory.push({ role: 'assistant', content: data.response });
                            attachHistoryControls(thinkingDiv, assistHistIdx);
-                            if (data.fallback_used) {
-                                addMessage('system',
-                                    `⚡ ${primaryBackend} unavailable — answered by ${data.backend}`);
-                            }
+
+                            // Model tag — always shown, amber if fallback was used
+                            const modelTag = document.createElement('div');
+                            modelTag.className = 'model-tag' + (data.fallback_used ? ' fallback' : '');
+                            const label = data.backend_label || data.backend || '';
+                            modelTag.textContent = data.fallback_used
+                                ? `⚡ fallback → ${label}`
+                                : label;
+                            thinkingDiv.appendChild(modelTag);
                        } else if (data.type === 'error') {
                            throw new Error(data.message);
                        }
@@ -808,7 +1023,7 @@
            inputEl.value = '';
            syncHeight();
            sendBtn.style.display = 'none';
-            stopBtn.style.display = 'block';
+            stopBtn.style.display = 'flex';
            headerEmoji.classList.add('processing');

            activeController = new AbortController();
@@ -870,6 +1085,7 @@
                if (job.session_id) {
                    sessionId = job.session_id;
                    sessionEl.textContent = `session: ${sessionId}`;
+                    persist_session();
                }

                const userHistIdx = currentHistory.length - 1; // pushed before fetch
@@ -881,11 +1097,7 @@
                currentHistory.push({ role: 'assistant', content: job.response || '' });
                attachHistoryControls(thinkingDiv, assistHistIdx);

-                const n = job.tool_calls?.length || 0;
-                if (n) {
-                    const names = job.tool_calls.map(t => t.name).join(', ');
-                    addMessage('system', `⚡ ${n} tool call${n !== 1 ? 's' : ''}: ${names}`);
-                }
+                renderToolCalls(job.tool_calls, thinkingDiv.parentElement);

            } catch (err) {
                if (err.name === 'AbortError') {
@@ -926,17 +1138,94 @@

        // ── File editor ──────────────────────────────────────────────
        const fileModal      = document.getElementById('file-modal');
-        const fileSelect     = document.getElementById('file-select');
+        const fileSidebar    = document.getElementById('file-sidebar');
        const fileEditor     = document.getElementById('file-editor');
        const filePreview    = document.getElementById('file-preview');
        const fileRawBtn     = document.getElementById('file-raw-btn');
        const filePreviewBtn = document.getElementById('file-preview-btn');
        const fileSaveBtn    = document.getElementById('file-save-btn');
-        const fileSavedMsg   = document.getElementById('file-saved-msg');
        const fileCloseBtn   = document.getElementById('file-close-btn');
        const filesBtn       = document.getElementById('files-btn');

-        let fileMode = 'preview'; // 'edit' or 'preview'
+        let fileMode        = 'preview'; // 'edit' or 'preview'
+        let activeFileName  = null;
+
+        // File groups — controls sidebar order and section labels
+        const FILE_GROUPS = [
+            { label: 'Identity', files: ['IDENTITY.md', 'SOUL.md', 'PROTOCOLS.md', 'CONTEXT_TIERS.md'] },
+            { label: 'Memory',   files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] },
+            { label: 'Profile',  files: ['USER.md', 'HELP.md'] },
+        ];
+
+        function fmtSize(bytes) {
+            if (!bytes) return 'empty';
+            if (bytes < 1024) return bytes + ' B';
+            return (bytes / 1024).toFixed(1) + ' KB';
+        }
+
+        function fmtModified(ts) {
+            if (!ts) return '';
+            const d   = new Date(ts * 1000);
+            const now = new Date();
+            if (d.toDateString() === now.toDateString()) return 'today';
+            const diff = (now - d) / 86400000;
+            if (diff < 2) return 'yesterday';
+            return d.toLocaleDateString(undefined, { month: 'short', day: 'numeric' });
+        }
+
+        function renderFileSidebar(files) {
+            const byName = Object.fromEntries(files.map(f => [f.name, f]));
+            fileSidebar.innerHTML = '';
+
+            for (const group of FILE_GROUPS) {
+                const groupEl = document.createElement('div');
+                groupEl.className = 'file-group';
+
+                const header = document.createElement('div');
+                header.className = 'fg-header';
+                header.textContent = group.label;
+                header.addEventListener('click', () => header.classList.toggle('collapsed'));
+                groupEl.appendChild(header);
+
+                const items = document.createElement('div');
+                items.className = 'fg-items';
+
+                for (const fname of group.files) {
+                    const f = byName[fname];
+                    if (!f) continue;
+
+                    const item = document.createElement('div');
+                    item.className = 'file-item' + (f.exists ? '' : ' missing');
+                    item.dataset.name = fname;
+                    if (fname === activeFileName) item.classList.add('active');
+
+                    const nameEl = document.createElement('div');
+                    nameEl.className = 'fi-name';
+                    nameEl.textContent = fname;
+                    item.appendChild(nameEl);
+
+                    const metaEl = document.createElement('div');
+                    metaEl.className = 'fi-meta';
+                    metaEl.innerHTML = `<span>${fmtSize(f.size)}</span>`
+                        + (f.modified ? `<span>${fmtModified(f.modified)}</span>` : '');
+                    item.appendChild(metaEl);
+
+                    item.addEventListener('click', () => loadFile(fname));
+                    items.appendChild(item);
+                }
+
+                groupEl.appendChild(items);
+                fileSidebar.appendChild(groupEl);
+            }
+        }
+
+        function setActiveFile(name) {
+            activeFileName = name;
+            fileSidebar.querySelectorAll('.file-item').forEach(el => {
+                el.classList.toggle('active', el.dataset.name === name);
+            });
+            document.getElementById('file-modal-title').textContent = name;
+        }

        function setFileMode(mode) {
            fileMode = mode;
@@ -960,27 +1249,22 @@
        }

        async function loadFile(name) {
+            setActiveFile(name);
            const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`);
            if (!res.ok) { fileEditor.value = `Error loading ${name}`; return; }
            const data = await res.json();
            fileEditor.value = data.content;
-            document.getElementById('file-modal-title').textContent = name;
            setFileMode(fileMode);
        }

        async function openFileModal() {
-            // Populate the file list
-            const res = await fetch(`/files?${_fileParams}`);
+            const res  = await fetch(`/files?${_fileParams}`);
            const data = await res.json();
-            fileSelect.innerHTML = '';
-            for (const f of data.files) {
-                const opt = document.createElement('option');
-                opt.value = f.name;
-                opt.textContent = f.name + (f.exists ? '' : ' (missing)');
-                fileSelect.appendChild(opt);
-            }
+            renderFileSidebar(data.files);
            fileModal.classList.add('open');
-            await loadFile(fileSelect.value);
+            // Load first existing file
+            const first = data.files.find(f => f.exists) || data.files[0];
+            if (first) await loadFile(first.name);
        }

        filesBtn.addEventListener('click', () => {
@@ -988,21 +1272,24 @@
            openFileModal();
        });

-        fileSelect.addEventListener('change', () => loadFile(fileSelect.value));
-
        fileRawBtn.addEventListener('click', () => setFileMode('edit'));
        filePreviewBtn.addEventListener('click', () => setFileMode('preview'));

        fileSaveBtn.addEventListener('click', async () => {
-            const name = fileSelect.value;
-            const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`, {
+            if (!activeFileName) return;
+            const res = await fetch(`/files/${encodeURIComponent(activeFileName)}?${_fileParams}`, {
                method: 'PUT',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ content: fileEditor.value }),
            });
            if (res.ok) {
-                fileSavedMsg.classList.add('show');
-                setTimeout(() => fileSavedMsg.classList.remove('show'), 2000);
+                showToast('File saved', 'success');
+                // Refresh sidebar to update size/modified
+                const listRes = await fetch(`/files?${_fileParams}`);
+                const listData = await listRes.json();
+                renderFileSidebar(listData.files);
+            } else {
+                showToast('Save failed', 'error');
            }
        });

@@ -1012,6 +1299,66 @@
            if (e.target === fileModal) fileModal.classList.remove('open');
        });

+        // ── Session search ────────────────────────────────────────────
+        const sessionSearchInput   = document.getElementById('session-search-input');
+        const sessionSearchBtn     = document.getElementById('session-search-btn');
+        const sessionSearchResults = document.getElementById('session-search-results');
+
+        function _showFileView() {
+            fileEditor.style.display = '';
+            filePreview.style.display = '';
+            sessionSearchResults.style.display = 'none';
+        }
+
+        function _showSearchResults(html) {
+            fileEditor.style.display = 'none';
+            filePreview.style.display = 'none';
+            sessionSearchResults.style.display = '';
+            sessionSearchResults.innerHTML = html;
+        }
+
+        async function runSessionSearch() {
+            const q = sessionSearchInput.value.trim();
+            if (q.length < 2) return;
+            sessionSearchBtn.disabled = true;
+            sessionSearchBtn.textContent = '…';
+            try {
+                const res  = await fetch(`/sessions/search?q=${encodeURIComponent(q)}&${_fileParams}&limit=30`);
+                const data = await res.json();
+                if (!res.ok) { _showSearchResults(`<p class="sr-error">Error: ${data.detail || res.status}</p>`); return; }
+                if (!data.matches.length) {
+                    _showSearchResults(`<p class="sr-empty">No results for "<strong>${_esc(q)}</strong>" in ${data.total_files_searched} session file(s).</p>`);
+                    return;
+                }
+                let html = `<div class="sr-header">${data.matches.length} result(s) for "<strong>${_esc(q)}</strong>" across ${data.total_files_searched} session(s)</div>`;
+                let lastDate = null;
+                for (const m of data.matches) {
+                    if (m.date !== lastDate) {
+                        html += `<div class="sr-date">${m.date}</div>`;
+                        lastDate = m.date;
+                    }
+                    const hi = m.excerpt.replace(new RegExp(_esc(q), 'gi'), s => `<mark>${_esc(s)}</mark>`);
+                    html += `<div class="sr-excerpt">${hi}</div>`;
+                }
+                _showSearchResults(html);
+            } catch (e) {
+                _showSearchResults(`<p class="sr-error">Search failed: ${e.message}</p>`);
+            } finally {
+                sessionSearchBtn.disabled = false;
+                sessionSearchBtn.textContent = 'Go';
+            }
+        }
+
+        sessionSearchBtn.addEventListener('click', runSessionSearch);
+        sessionSearchInput.addEventListener('keydown', (e) => {
+            if (e.key === 'Enter') runSessionSearch();
+        });
+
+        // When a file is clicked, switch back from search results to editor
+        fileSidebar.addEventListener('click', () => {
+            if (sessionSearchResults.style.display !== 'none') _showFileView();
+        });
+
        document.addEventListener('keydown', (e) => {
            if (e.key === 'Escape') {
                if (fileModal.classList.contains('open')) fileModal.classList.remove('open');
@@ -1026,6 +1373,13 @@
        // ── Real-time Talk updates (SSE) ─────────────────────────────
        const evtSource = new EventSource('/events');

+        // Close cleanly on navigation so the browser doesn't log "connection interrupted"
+        window.addEventListener('beforeunload', () => evtSource.close());
+
+        evtSource.onerror = () => {
+            // EventSource auto-reconnects — nothing to do; suppress console noise
+        };
+
        evtSource.onmessage = (e) => {
            let data;
            try { data = JSON.parse(e.data); } catch { return; }
@@ -1286,3 +1640,16 @@
        checkAuthStatus();
        // Re-check every 30 minutes
        setInterval(checkAuthStatus, 30 * 60 * 1000);
+
+        // ── Initial render ────────────────────────────────────────────
+        // Process all static Lucide SVGs in the header + stop button,
+        // and seed the mode UI (which also calls render_icons internally).
+        update_mode_ui();
+        render_icons();
+
+        // ── Auto-restore last session ─────────────────────────────────
+        // Silently resume if within the inactivity TTL; clears stored ID on error.
+        {
+            const stored = get_stored_session();
+            if (stored) resumeSession(stored, true).catch(clear_stored_session);
+        }
--- a/cortex/static/help.html
+++ b/cortex/static/help.html
@@ -27,14 +27,30 @@
      margin: 0 auto;
    }

-    .back-link {
-      display: inline-block;
-      font-size: 0.8rem;
-      color: #94a3b8;
-      text-decoration: none;
-      margin-bottom: 1.5rem;
+    .page-nav {
+      display: flex;
+      align-items: center;
+      gap: 0.25rem;
+      margin-bottom: 1.75rem;
+      flex-wrap: wrap;
    }
-    .back-link:hover { color: #a78bfa; }
+    .nav-link {
+      display: inline-flex;
+      align-items: center;
+      padding: 0.3rem 0.6rem;
+      border-radius: 6px;
+      font-size: 0.8rem;
+      font-weight: 500;
+      color: #64748b;
+      text-decoration: none;
+      transition: color 0.15s, background 0.15s;
+      white-space: nowrap;
+    }
+    .nav-link:hover { color: #cbd5e1; background: rgba(255,255,255,0.05); }
+    .nav-link.active { color: #a78bfa; }
+    .nav-spacer { flex: 1; min-width: 0.5rem; }
+    .nav-link.nav-logout { color: #475569; }
+    .nav-link.nav-logout:hover { color: #94a3b8; background: none; }

    header {
      margin-bottom: 2rem;
@@ -106,7 +122,13 @@
 </head>
 <body>
  <div class="page">
-    <a id="back-link" href="/" class="back-link">← Back to Cortex</a>
+    <nav class="page-nav" id="page-nav">
+      <a id="nav-chat" href="/" class="nav-link">← Chat</a>
+      <a href="/help" class="nav-link active">Help</a>
+      <a href="/settings" class="nav-link" id="nav-settings">Settings</a>
+      <span class="nav-spacer"></span>
+      <a href="/logout" class="nav-link nav-logout">Sign out</a>
+    </nav>

    <header>
      <h1>Help &amp; Reference</h1>
@@ -122,8 +144,8 @@
    const persona = cfg.persona || 'inara';
    const params  = `user=${encodeURIComponent(user)}&persona=${encodeURIComponent(persona)}`;

-    // Wire up back link and persona label
-    document.getElementById('back-link').href = cfg.backHref || '/';
+    // Wire up nav links and persona label
+    document.getElementById('nav-chat').href = cfg.backHref || '/';
    if (persona) {
      document.getElementById('persona-label').textContent =
        `${persona.charAt(0).toUpperCase() + persona.slice(1)} · ${user}`;
@@ -155,11 +177,25 @@

    async function loadHelp() {
      try {
-        const res = await fetch(`/files/HELP.md?${params}`);
-        if (!res.ok) throw new Error(`HTTP ${res.status}`);
-        const data = await res.json();
+        // Always load the shared base from static
+        const baseRes = await fetch('/static/HELP.md');
+        if (!baseRes.ok) throw new Error(`HTTP ${baseRes.status}`);
+        let markdown = await baseRes.text();
+
+        // Try to load persona-specific additions and append them
+        try {
+          const personaRes = await fetch(`/files/HELP.md?${params}`);
+          if (personaRes.ok) {
+            const personaData = await personaRes.json();
+            const extra = (personaData.content || '').trim();
+            if (extra) {
+              markdown += '\n\n---\n\n## ' + persona.charAt(0).toUpperCase() + persona.slice(1) + ' Notes\n\n' + extra;
+            }
+          }
+        } catch (_) { /* persona-specific file is optional */ }
+
        const body = document.getElementById('help-body');
-        body.innerHTML = marked.parse(data.content);
+        body.innerHTML = marked.parse(markdown);
        body.querySelectorAll('a').forEach(a => {
          a.target = '_blank'; a.rel = 'noopener noreferrer';
        });
--- a/cortex/static/index.html
+++ b/cortex/static/index.html
@@ -21,6 +21,9 @@
    </script>
    <link rel="stylesheet" href="/static/style.css">
    <script src="/static/marked.min.js"></script>
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.11.1/styles/atom-one-dark.min.css">
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.11.1/highlight.min.js"></script>
+    <script src="https://unpkg.com/lucide@latest/dist/umd/lucide.min.js"></script>
 </head>
 <body>
    <header>
@@ -32,20 +35,35 @@
        </div>

        <nav id="hdr-nav">
-            <button id="sessions-btn" class="hdr-btn" title="Sessions">💬 <span class="btn-label">Sessions</span></button>
-            <button id="ctx-open-btn" class="hdr-btn" title="Context &amp; memory">⚙<span class="tier-badge">2</span></button>
+            <button id="sessions-btn" class="hdr-btn" title="Sessions">
+                <svg data-lucide="history" class="btn-icon"></svg>
+                <span class="btn-label">Sessions</span>
+            </button>
+            <button id="ctx-open-btn" class="hdr-btn" title="Context &amp; memory">
+                <svg data-lucide="sliders-horizontal" class="btn-icon"></svg><span class="tier-badge">2</span>
+            </button>
            <div class="hdr-dropdown-wrap" id="settings-wrap">
-                <button class="hdr-btn" id="settings-btn" title="Settings">≡</button>
+                <button class="hdr-btn" id="settings-btn" title="Settings">
+                    <svg data-lucide="menu" class="btn-icon"></svg>
+                </button>
                <div class="hdr-dropdown" id="settings-dropdown">
-                    <button id="files-btn" class="hdr-dd-item">📁 Files</button>
-                    <a href="/settings" class="hdr-dd-item">👤 Account</a>
+                    <button id="files-btn" class="hdr-dd-item">
+                        <svg data-lucide="folder-open" class="btn-icon"></svg> Files
+                    </button>
+                    <a href="/settings" class="hdr-dd-item">
+                        <svg data-lucide="user" class="btn-icon"></svg> Account
+                    </a>
                    <div class="hdr-dd-divider"></div>
                    <form method="POST" action="/logout" style="margin:0">
-                        <button type="submit" class="hdr-dd-item">⏏ Sign Out</button>
+                        <button type="submit" class="hdr-dd-item">
+                            <svg data-lucide="log-out" class="btn-icon"></svg> Sign Out
+                        </button>
                    </form>
                </div>
            </div>
-            <a href="/help" class="hdr-btn" title="Help &amp; reference" style="text-decoration:none">❓</a>
+            <a id="help-link" href="/help" class="hdr-btn" title="Help &amp; reference" style="text-decoration:none">
+                <svg data-lucide="circle-help" class="btn-icon"></svg>
+            </a>
        </nav>

        <div id="sessions-panel"></div>
@@ -85,6 +103,7 @@
                <div class="ctx-row">
                    <button id="backend-toggle" class="ctx-btn" title="Click to switch primary backend">claude</button>
                </div>
+                <div id="backend-model-hint"></div>
            </div>
            <div class="ctx-section">
                <div class="ctx-section-title">Display</div>
@@ -107,16 +126,28 @@
        <div id="file-modal-inner">
            <div id="file-modal-header">
                <span id="file-modal-title">Context Files</span>
-                <select id="file-select"></select>
+                <span class="fm-spacer"></span>
                <button class="fm-btn" id="file-raw-btn">edit</button>
                <button class="fm-btn active" id="file-preview-btn">preview</button>
                <button class="fm-btn save" id="file-save-btn">Save</button>
-                <span id="file-saved-msg">saved ✓</span>
                <button class="fm-btn" id="file-close-btn">✕</button>
            </div>
-            <div id="file-modal-body">
-                <textarea id="file-editor" spellcheck="false"></textarea>
-                <div id="file-preview"></div>
+            <div id="file-modal-content">
+                <div id="file-sidebar-wrap">
+                    <div id="file-sidebar"></div>
+                    <div id="session-search-wrap">
+                        <div id="session-search-label">Session Search</div>
+                        <div id="session-search-row">
+                            <input id="session-search-input" type="search" placeholder="Search sessions…" autocomplete="off">
+                            <button id="session-search-btn">Go</button>
+                        </div>
+                    </div>
+                </div>
+                <div id="file-modal-body">
+                    <textarea id="file-editor" spellcheck="false"></textarea>
+                    <div id="file-preview"></div>
+                    <div id="session-search-results" style="display:none"></div>
+                </div>
            </div>
        </div>
    </div>
@@ -149,10 +180,12 @@
        <textarea id="input" rows="1" placeholder="Message…" autofocus></textarea>
        <div id="send-col">
            <button id="send">Send</button>
-            <button id="stop">Stop</button>
+            <button id="stop"><svg data-lucide="square" width="14" height="14" class="btn-icon"></svg> Stop</button>
        </div>
    </div>

+    <div id="sessions-backdrop"></div>
+    <div id="toast-container"></div>
    <script src="/static/app.js"></script>
 </body>
 </html>
--- a/cortex/static/local_llm.html
+++ b/cortex/static/local_llm.html
@@ -0,0 +1,483 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Cortex — Model Registry</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@100..900&display=swap" rel="stylesheet">
+  <style>
+    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+
+    body {
+      min-height: 100vh;
+      background: #0f1117;
+      font-family: 'Inter', system-ui, -apple-system, sans-serif;
+      font-weight: 450;
+      -webkit-font-smoothing: antialiased;
+      color: #e2e8f0;
+      padding: 2rem 1.5rem 4rem;
+    }
+
+    .page { max-width: 700px; margin: 0 auto; }
+
+    /* ── Nav ── */
+    .page-nav {
+      display: flex; align-items: center; gap: 0.25rem;
+      margin-bottom: 1.75rem; flex-wrap: wrap;
+    }
+    .nav-link {
+      display: inline-flex; align-items: center;
+      padding: 0.3rem 0.6rem; border-radius: 6px;
+      font-size: 0.8rem; font-weight: 500; color: #64748b;
+      text-decoration: none; transition: color 0.15s, background 0.15s;
+      white-space: nowrap;
+    }
+    .nav-link:hover { color: #cbd5e1; background: rgba(255,255,255,0.05); }
+    .nav-link.active { color: #a78bfa; }
+    .nav-spacer { flex: 1; min-width: 0.5rem; }
+    .nav-link.nav-logout { color: #475569; }
+    .nav-link.nav-logout:hover { color: #94a3b8; background: none; }
+
+    /* ── Page header ── */
+    .page-header { margin-bottom: 2rem; padding-bottom: 1rem; border-bottom: 1px solid #2d3148; }
+    .page-header h1 { font-size: 1.4rem; font-weight: 700; color: #a78bfa; }
+    .page-header p  { font-size: 0.82rem; color: #94a3b8; margin-top: 0.25rem; }
+
+    /* ── Section cards ── */
+    .section {
+      background: #1a1d27; border: 1px solid #2d3148;
+      border-radius: 10px; padding: 1.5rem; margin-bottom: 1.25rem;
+    }
+    .section h2 {
+      font-size: 0.85rem; font-weight: 600; color: #94a3b8;
+      text-transform: uppercase; letter-spacing: 0.05em;
+      margin-bottom: 1.1rem; padding-bottom: 0.5rem;
+      border-bottom: 1px solid #2d3148;
+    }
+    .section-note {
+      font-size: 0.8rem; color: #64748b; margin-bottom: 1rem; line-height: 1.5;
+    }
+
+    /* ── Form elements ── */
+    .field { margin-bottom: 0.9rem; }
+    label {
+      display: block; font-size: 0.78rem; font-weight: 500;
+      color: #94a3b8; margin-bottom: 0.35rem;
+    }
+    input[type="text"], input[type="password"], input[type="url"],
+    input[type="number"], select {
+      width: 100%; padding: 0.6rem 0.8rem;
+      background: #0f1117; border: 1px solid #2d3148; border-radius: 6px;
+      color: #e2e8f0; font-size: 0.9rem; font-family: inherit;
+      outline: none; transition: border-color 0.15s;
+    }
+    input:focus, select:focus { border-color: #7c3aed; }
+    select { cursor: pointer; }
+    input[type="number"] { width: 6rem; }
+
+    .field-row { display: flex; gap: 0.75rem; }
+    .field-row .field { flex: 1; margin-bottom: 0; }
+
+    .key-status { font-size: 0.75rem; color: #94a3b8; margin-top: 0.35rem; }
+
+    /* ── Buttons ── */
+    .btn {
+      padding: 0.6rem 1.1rem; border: none; border-radius: 6px;
+      font-size: 0.88rem; font-weight: 600; cursor: pointer;
+      transition: background 0.15s, opacity 0.15s; font-family: inherit;
+    }
+    .btn-primary { background: #7c3aed; color: #fff; }
+    .btn-primary:hover { background: #6d28d9; }
+    .btn-secondary {
+      background: #1a1d27; color: #94a3b8;
+      border: 1px solid #2d3148;
+    }
+    .btn-secondary:hover { border-color: #94a3b8; color: #e2e8f0; }
+    .btn-sm { padding: 0.35rem 0.7rem; font-size: 0.8rem; font-weight: 500; }
+    .btn-row { display: flex; gap: 0.6rem; align-items: center; margin-top: 0.75rem; flex-wrap: wrap; }
+    .btn-link {
+      background: none; border: none; cursor: pointer; font-family: inherit;
+      font-size: 0.78rem; color: #64748b; padding: 0; text-decoration: underline;
+      text-underline-offset: 2px;
+    }
+    .btn-link:hover { color: #94a3b8; }
+    .btn-link.danger { color: #7f1d1d; }
+    .btn-link.danger:hover { color: #f87171; }
+
+    /* ── Host rows ── */
+    .host-row {
+      background: #0f1117; border: 1px solid #2d3148; border-radius: 8px;
+      padding: 1rem; margin-bottom: 0.75rem;
+    }
+    .host-form .field-row { margin-bottom: 0.6rem; }
+    .fetch-status { font-size: 0.78rem; color: #94a3b8; }
+    .fetch-status.ok  { color: #4ade80; }
+    .fetch-status.err { color: #f87171; }
+
+    /* ── Model rows ── */
+    .model-row {
+      display: flex; align-items: flex-start; justify-content: space-between;
+      gap: 0.75rem; padding: 0.75rem 0.9rem;
+      background: #0f1117; border: 1px solid #2d3148; border-radius: 8px;
+      margin-bottom: 0.5rem;
+    }
+    .model-info { display: flex; flex-direction: column; gap: 0.2rem; min-width: 0; }
+    .model-label { font-size: 0.9rem; font-weight: 600; color: #e2e8f0; }
+    .model-name  { font-size: 0.75rem; color: #64748b; font-family: monospace; word-break: break-all; }
+    .model-host  { font-size: 0.72rem; color: #475569; }
+    .ctx-badge {
+      display: inline-block; margin-left: 0.4rem;
+      padding: 0.1rem 0.35rem; border-radius: 3px;
+      background: #1e293b; color: #64748b;
+      font-size: 0.67rem; font-weight: 600;
+    }
+    .tag-row { display: flex; flex-wrap: wrap; gap: 0.3rem; margin-top: 0.2rem; }
+    .tag {
+      padding: 0.1rem 0.4rem; border-radius: 3px;
+      background: #1e1b4b; color: #818cf8;
+      font-size: 0.68rem; font-weight: 500;
+    }
+    .model-actions { display: flex; gap: 0.4rem; flex-shrink: 0; }
+    .row-btn {
+      padding: 0.3rem 0.65rem; border-radius: 5px; font-size: 0.78rem;
+      font-weight: 500; cursor: pointer; font-family: inherit;
+      border: 1px solid #2d3148; background: #1a1d27; color: #94a3b8;
+      transition: border-color 0.15s, color 0.15s;
+    }
+    .row-btn.danger { color: #f87171; }
+    .row-btn.danger:hover { border-color: #f87171; }
+
+    /* ── Role assignment rows ── */
+    .role-row {
+      display: flex; align-items: flex-start; gap: 1rem;
+      padding: 0.6rem 0; border-bottom: 1px solid #1e2030;
+    }
+    .role-row:last-child { border-bottom: none; }
+    .role-name {
+      font-size: 0.82rem; font-weight: 600; color: #a78bfa;
+      min-width: 6rem; padding-top: 0.45rem;
+    }
+    .role-slots { display: flex; flex-wrap: wrap; gap: 0.5rem; flex: 1; }
+    .role-slot { display: flex; flex-direction: column; gap: 0.2rem; flex: 1; min-width: 8rem; }
+    .slot-label { font-size: 0.68rem; color: #475569; font-weight: 500; text-transform: uppercase; letter-spacing: 0.04em; }
+    .role-select {
+      padding: 0.4rem 0.6rem; font-size: 0.8rem;
+      background: #0f1117; border: 1px solid #2d3148; border-radius: 6px;
+      color: #e2e8f0; font-family: inherit; cursor: pointer; outline: none;
+      transition: border-color 0.15s;
+    }
+    .role-select:focus { border-color: #7c3aed; }
+    .role-select.saved  { border-color: #166534; }
+    .role-select.saving { border-color: #92400e; }
+    .role-select.err    { border-color: #7f1d1d; }
+
+    /* ── Add model section ── */
+    #add-section .field-row { margin-bottom: 0.5rem; }
+    #model-select-wrap { display: none; margin-bottom: 0.75rem; }
+    .tags-hint { font-size: 0.72rem; color: #475569; margin-top: 0.3rem; }
+
+    /* ── Messages ── */
+    .msg {
+      font-size: 0.85rem; text-align: center;
+      padding: 0.6rem 1rem; border-radius: 6px; margin-bottom: 1rem;
+    }
+    .msg.success { color: #4ade80; background: #052e16; border: 1px solid #166534; }
+    .msg.error   { color: #f87171; background: #2d0a0a; border: 1px solid #7f1d1d; }
+
+    /* ── Toast ── */
+    #toast {
+      position: fixed; bottom: 1.5rem; right: 1.5rem;
+      background: #1a1d27; border: 1px solid #166534; color: #4ade80;
+      padding: 0.5rem 1rem; border-radius: 6px; font-size: 0.82rem;
+      opacity: 0; transition: opacity 0.2s; pointer-events: none;
+      z-index: 100;
+    }
+    #toast.show { opacity: 1; }
+    #toast.err { border-color: #7f1d1d; color: #f87171; }
+
+    .empty-note { font-size: 0.85rem; color: #475569; padding: 0.3rem 0; }
+  </style>
+</head>
+<body>
+  <div class="page">
+    <nav class="page-nav">
+      <a href="/" class="nav-link">← Chat</a>
+      <a href="/help" class="nav-link">Help</a>
+      <a href="/settings" class="nav-link">Settings</a>
+      <a href="/settings/local" class="nav-link active">Models</a>
+      <span class="nav-spacer"></span>
+      <a href="/logout" class="nav-link nav-logout">Sign out</a>
+    </nav>
+
+    <div class="page-header">
+      <h1>Model Registry</h1>
+      <p>Configure hosts, models, and which model handles each task type.</p>
+    </div>
+
+    <!-- SUCCESS -->
+    <!-- ERROR -->
+
+    <!-- ── Hosts ── -->
+    <div class="section">
+      <h2>Hosts</h2>
+      <p class="section-note">OpenAI-compatible API servers (Open WebUI, Ollama, LM Studio, etc.)</p>
+      {{ host_rows }}
+      <details style="margin-top:0.75rem">
+        <summary style="font-size:0.82rem; color:#64748b; cursor:pointer; user-select:none">+ Add host</summary>
+        <div style="margin-top:0.75rem">
+          <form method="POST" action="/settings/local/host">
+            <input type="hidden" name="host_id" value="">
+            <div class="field-row">
+              <div class="field">
+                <label for="new-host-label">Label</label>
+                <input type="text" id="new-host-label" name="label"
+                       placeholder="e.g. Gaming Laptop"
+                       autocomplete="off" data-form-type="other">
+              </div>
+              <div class="field" style="flex:2">
+                <label for="new-host-url">API URL</label>
+                <input type="text" id="new-host-url" name="api_url"
+                       placeholder="http://192.168.x.x:3000"
+                       autocomplete="off" spellcheck="false" data-form-type="other">
+              </div>
+            </div>
+            <div class="field-row">
+              <div class="field">
+                <label for="new-host-key">API Key</label>
+                <input type="password" id="new-host-key" name="api_key"
+                       placeholder="sk-… (leave blank if not required)"
+                       autocomplete="new-password" data-1p-ignore data-lpignore="true" data-form-type="other">
+              </div>
+              <div class="field" style="flex:0 0 auto">
+                <label for="new-host-type">Type</label>
+                <select id="new-host-type" name="host_type">
+                  <option value="openwebui">Open WebUI / Ollama</option>
+                  <option value="openai">OpenAI-compatible (OpenRouter, etc.)</option>
+                </select>
+              </div>
+            </div>
+            <div class="btn-row">
+              <button type="submit" class="btn btn-primary btn-sm">Add Host</button>
+            </div>
+          </form>
+        </div>
+      </details>
+    </div>
+
+    <!-- ── Models ── -->
+    <div class="section">
+      <h2>Models</h2>
+      {{ model_rows }}
+    </div>
+
+    <!-- ── Add Model ── -->
+    <div class="section" id="add-section"{{ add_model_hidden }}>
+      <h2>Add Model</h2>
+      <div id="model-select-wrap">
+        <div class="field">
+          <label for="model-picker">Available on host</label>
+          <select id="model-picker">
+            <option value="">— select to auto-fill —</option>
+          </select>
+        </div>
+      </div>
+      <form method="POST" action="/settings/local/models/add" id="add-form">
+        <input type="hidden" name="host_id" id="add-host-id" value="">
+        <div class="field">
+          <label for="add-host-select">Host</label>
+          <select id="add-host-select" onchange="document.getElementById('add-host-id').value=this.value">
+            {{ host_options }}
+          </select>
+        </div>
+        <div class="field-row">
+          <div class="field">
+            <label for="add-label">Label</label>
+            <input type="text" id="add-label" name="label"
+                   placeholder="e.g. Gemma 4 E4B"
+                   autocomplete="off" data-form-type="other">
+          </div>
+          <div class="field" style="flex:2">
+            <label for="add-model-name">Model name</label>
+            <input type="text" id="add-model-name" name="model_name"
+                   placeholder="e.g. gemma4:e4b"
+                   autocomplete="off" spellcheck="false" data-form-type="other">
+          </div>
+        </div>
+        <div class="field-row">
+          <div class="field" style="flex:0 0 auto">
+            <label for="add-context-k">Context (k tokens)</label>
+            <input type="number" id="add-context-k" name="context_k"
+                   value="0" min="0" max="10000">
+          </div>
+          <div class="field">
+            <label for="add-tags">Tags <span style="color:#475569; font-weight:400">(comma-separated)</span></label>
+            <input type="text" id="add-tags" name="tags"
+                   placeholder="fast, distill, coding"
+                   autocomplete="off" data-form-type="other">
+            <p class="tags-hint">Informational labels — used for display and future filtering.</p>
+          </div>
+        </div>
+        <div class="btn-row">
+          <button type="submit" class="btn btn-primary btn-sm">Add Model</button>
+          <button type="button" id="fetch-btn" class="btn btn-secondary btn-sm">
+            Fetch models from host
+          </button>
+          <span id="fetch-status" class="fetch-status"></span>
+        </div>
+      </form>
+    </div>
+
+    <!-- ── Role Assignments ── -->
+    <div class="section">
+      <h2>Role Assignments</h2>
+      <p class="section-note">
+        Choose which model handles each task type.
+        Backups are tried in order if the primary fails or is unavailable.
+        Leave a slot empty to use the server default (.env).
+      </p>
+      {{ role_rows }}
+    </div>
+  </div>
+
+  <div id="toast"></div>
+
+  <script>
+    // ── Pre-fill role selects ─────────────────────────────────────────────────
+    const ROLE_DATA = {{ role_data_js }};
+
+    document.querySelectorAll('.role-select').forEach(sel => {
+      const role = sel.dataset.role;
+      const slot = sel.dataset.slot;
+      const val  = (ROLE_DATA[role] || {})[slot] || '';
+      for (const opt of sel.options) {
+        if (opt.value === val) { opt.selected = true; break; }
+      }
+    });
+
+    // ── Role select change → AJAX save ───────────────────────────────────────
+    const toast = document.getElementById('toast');
+    let toastTimer = null;
+
+    function showToast(msg, err = false) {
+      toast.textContent = msg;
+      toast.className   = 'show' + (err ? ' err' : '');
+      clearTimeout(toastTimer);
+      toastTimer = setTimeout(() => { toast.className = ''; }, 2000);
+    }
+
+    document.querySelectorAll('.role-select').forEach(sel => {
+      sel.addEventListener('change', async () => {
+        const role     = sel.dataset.role;
+        const slot     = sel.dataset.slot;
+        const model_id = sel.value || null;
+
+        sel.classList.add('saving');
+        try {
+          const res = await fetch('/api/models/role', {
+            method:  'POST',
+            headers: {'Content-Type': 'application/json'},
+            body:    JSON.stringify({role, slot, model_id}),
+          });
+          const data = await res.json();
+          if (data.ok) {
+            sel.classList.replace('saving', 'saved');
+            showToast(`${role} → ${slot} saved`);
+            setTimeout(() => sel.classList.remove('saved'), 1200);
+          } else {
+            sel.classList.replace('saving', 'err');
+            showToast(data.error || 'Save failed', true);
+            setTimeout(() => sel.classList.remove('err'), 2000);
+          }
+        } catch (e) {
+          sel.classList.replace('saving', 'err');
+          showToast(e.message, true);
+        }
+      });
+    });
+
+    // ── Fetch models from host ────────────────────────────────────────────────
+    // Per-host "Fetch models" buttons in the host rows
+    document.querySelectorAll('.fetch-btn').forEach(btn => {
+      btn.addEventListener('click', () => fetchModels(btn.dataset.hostId, btn));
+    });
+
+    // "Fetch models from host" in Add Model section (uses selected host)
+    const globalFetchBtn = document.getElementById('fetch-btn');
+    if (globalFetchBtn) {
+      globalFetchBtn.addEventListener('click', () => {
+        const hostSel = document.getElementById('add-host-select');
+        const hostId  = hostSel ? hostSel.value : '';
+        fetchModels(hostId, globalFetchBtn, true);
+      });
+    }
+
+    async function fetchModels(hostId, btn, fillAddForm = false) {
+      const statusEl = fillAddForm
+        ? document.getElementById('fetch-status')
+        : document.getElementById('fetch-' + hostId);
+
+      btn.disabled = true;
+      if (statusEl) { statusEl.textContent = 'Fetching…'; statusEl.className = 'fetch-status'; }
+
+      const url = '/api/local-llm/fetch-models' + (hostId ? '?host_id=' + encodeURIComponent(hostId) : '');
+      try {
+        const res  = await fetch(url);
+        const data = await res.json();
+
+        if (data.error) {
+          if (statusEl) { statusEl.textContent = '✗ ' + data.error; statusEl.className = 'fetch-status err'; }
+          return;
+        }
+
+        if (fillAddForm) {
+          const picker = document.getElementById('model-picker');
+          const wrap   = document.getElementById('model-select-wrap');
+          picker.innerHTML = '<option value="">— select to auto-fill —</option>';
+          for (const m of data.models) {
+            const opt = document.createElement('option');
+            opt.value       = m.id;
+            opt.textContent = m.name !== m.id ? `${m.name}  (${m.id})` : m.id;
+            opt.dataset.id  = m.id;
+            opt.dataset.name = m.name;
+            picker.appendChild(opt);
+          }
+          wrap.style.display = 'block';
+        }
+
+        if (statusEl) {
+          statusEl.textContent = `✓ ${data.models.length} model${data.models.length !== 1 ? 's' : ''}`;
+          statusEl.className   = 'fetch-status ok';
+        }
+      } catch (e) {
+        if (statusEl) { statusEl.textContent = '✗ ' + e.message; statusEl.className = 'fetch-status err'; }
+      } finally {
+        btn.disabled = false;
+      }
+    }
+
+    // Auto-fill label + model name when a model is selected from the picker
+    const picker = document.getElementById('model-picker');
+    if (picker) {
+      picker.addEventListener('change', () => {
+        const opt = picker.options[picker.selectedIndex];
+        if (!opt.value) return;
+        const nameInput  = document.getElementById('add-model-name');
+        const labelInput = document.getElementById('add-label');
+        nameInput.value = opt.dataset.id || opt.value;
+        labelInput.value = (opt.dataset.name && opt.dataset.name !== opt.dataset.id)
+          ? opt.dataset.name : '';
+        nameInput.focus();
+      });
+    }
+
+    // Sync hidden host_id input from the visible select
+    const addHostSel = document.getElementById('add-host-select');
+    const addHostId  = document.getElementById('add-host-id');
+    if (addHostSel && addHostId) {
+      addHostId.value = addHostSel.value;
+    }
+  </script>
+</body>
+</html>
--- a/cortex/static/login.html
+++ b/cortex/static/login.html
@@ -90,6 +90,40 @@

    button[type="submit"]:hover { background: #6d28d9; }

+    .divider {
+      display: flex;
+      align-items: center;
+      gap: 0.75rem;
+      margin: 1.25rem 0;
+      color: #475569;
+      font-size: 0.78rem;
+    }
+    .divider::before, .divider::after {
+      content: '';
+      flex: 1;
+      border-top: 1px solid #2d3148;
+    }
+
+    .google-btn {
+      display: flex;
+      align-items: center;
+      justify-content: center;
+      gap: 0.6rem;
+      width: 100%;
+      padding: 0.65rem;
+      background: #fff;
+      border: 1px solid #dadce0;
+      border-radius: 6px;
+      color: #3c4043;
+      font-size: 0.95rem;
+      font-weight: 500;
+      font-family: inherit;
+      cursor: pointer;
+      text-decoration: none;
+      transition: background 0.15s, box-shadow 0.15s;
+    }
+    .google-btn:hover { background: #f8f9fa; box-shadow: 0 1px 4px rgba(0,0,0,0.2); }
+
    .error {
      color: #f87171;
      font-size: 0.85rem;
@@ -107,6 +141,18 @@

    <!-- ERROR -->

+    <a href="/auth/google" class="google-btn">
+      <svg width="18" height="18" viewBox="0 0 18 18" xmlns="http://www.w3.org/2000/svg">
+        <path d="M17.64 9.2c0-.637-.057-1.251-.164-1.84H9v3.481h4.844c-.209 1.125-.843 2.078-1.796 2.717v2.258h2.908c1.702-1.567 2.684-3.875 2.684-6.615z" fill="#4285F4"/>
+        <path d="M9 18c2.43 0 4.467-.806 5.956-2.18l-2.908-2.259c-.806.54-1.837.86-3.048.86-2.344 0-4.328-1.584-5.036-3.711H.957v2.332A8.997 8.997 0 0 0 9 18z" fill="#34A853"/>
+        <path d="M3.964 10.71A5.41 5.41 0 0 1 3.682 9c0-.593.102-1.17.282-1.71V4.958H.957A8.996 8.996 0 0 0 0 9c0 1.452.348 2.827.957 4.042l3.007-2.332z" fill="#FBBC05"/>
+        <path d="M9 3.58c1.321 0 2.508.454 3.44 1.345l2.582-2.58C13.463.891 11.426 0 9 0A8.997 8.997 0 0 0 .957 4.958L3.964 7.29C4.672 5.163 6.656 3.58 9 3.58z" fill="#EA4335"/>
+      </svg>
+      Sign in with Google
+    </a>
+
+    <div class="divider">or</div>
+
    <form method="POST" action="/login">
      <div class="field">
        <label for="username">Username</label>
--- a/cortex/static/settings.html
+++ b/cortex/static/settings.html
@@ -33,14 +33,30 @@
      max-width: 480px;
    }

-    .back-link {
-      display: inline-block;
-      font-size: 0.8rem;
-      color: #94a3b8;
-      text-decoration: none;
-      margin-bottom: 1.5rem;
+    .page-nav {
+      display: flex;
+      align-items: center;
+      gap: 0.25rem;
+      margin-bottom: 1.75rem;
+      flex-wrap: wrap;
    }
-    .back-link:hover { color: #a78bfa; }
+    .nav-link {
+      display: inline-flex;
+      align-items: center;
+      padding: 0.3rem 0.6rem;
+      border-radius: 6px;
+      font-size: 0.8rem;
+      font-weight: 500;
+      color: #64748b;
+      text-decoration: none;
+      transition: color 0.15s, background 0.15s;
+      white-space: nowrap;
+    }
+    .nav-link:hover { color: #cbd5e1; background: rgba(255,255,255,0.05); }
+    .nav-link.active { color: #a78bfa; }
+    .nav-spacer { flex: 1; min-width: 0.5rem; }
+    .nav-link.nav-logout { color: #475569; }
+    .nav-link.nav-logout:hover { color: #94a3b8; background: none; }

    .logo {
      margin-bottom: 1.75rem;
@@ -192,7 +208,13 @@
 </head>
 <body>
  <div class="card">
-    <a href="{{ back_href }}" class="back-link">← Back to Cortex</a>
+    <nav class="page-nav">
+      <a href="{{ back_href }}" class="nav-link">← Chat</a>
+      <a href="{{ help_href }}" class="nav-link">Help</a>
+      <a href="/settings" class="nav-link active">Settings</a>
+      <span class="nav-spacer"></span>
+      <a href="/logout" class="nav-link nav-logout">Sign out</a>
+    </nav>

    <div class="logo">
      <h1>Account Settings</h1>
@@ -219,7 +241,8 @@
          <label for="new_username">New username</label>
          <input type="text" id="new_username" name="new_username"
                 value="{{ username }}"
-                 pattern="[a-z_][a-z0-9_\-]{0,31}" required autofocus>
+                 pattern="[a-z_][a-z0-9_\-]{0,31}" required autofocus
+                 autocomplete="off" data-form-type="other">
          <p style="font-size:0.75rem; color:#94a3b8; margin-top:0.3rem;">
            Lowercase letters, digits, _ or - only. You will be logged out after renaming.
          </p>
@@ -232,6 +255,61 @@
      </form>
    </div>

+    <!-- Connected accounts -->
+    <div class="section">
+      <h2>Connected Accounts</h2>
+      <div class="field">
+        <label>Google Account</label>
+        <input type="text" value="{{ google_email }}" readonly
+               placeholder="No Google account linked"
+               style="{{ google_email == '' and 'color:#475569' or '' }}">
+      </div>
+      <p style="font-size:0.75rem; color:#94a3b8; margin-top:-0.5rem;">
+        To link or change your Google account, contact Scott.
+      </p>
+    </div>
+
+    <!-- Gemini API key -->
+    <div class="section">
+      <h2>Gemini API Key</h2>
+      <p style="font-size:0.8rem; color:#94a3b8; margin-bottom:0.85rem; line-height:1.55;">
+        Paste your personal key from
+        <a href="https://aistudio.google.com/apikey" target="_blank" rel="noopener"
+           style="color:#a78bfa;">aistudio.google.com/apikey</a>
+        to use your own Gemini quota. Leave blank to use the shared server key.
+      </p>
+      <form method="POST" action="/settings/gemini-key">
+        <div class="field">
+          <label for="gemini_api_key">API Key</label>
+          <input type="text" id="gemini_api_key" name="gemini_api_key"
+                 placeholder="{{ gemini_key_hint }}"
+                 autocomplete="new-password" spellcheck="false"
+                 data-1p-ignore data-lpignore="true" data-form-type="other">
+        </div>
+        <button type="submit">Save Key</button>
+      </form>
+      <p id="gemini-key-status" style="font-size:0.75rem; color:#94a3b8; margin-top:0.5rem;">
+        Current: {{ gemini_key_hint }}
+        <span id="gemini-remove-wrap" style="{{ gemini_key_set == 'false' and 'display:none' or '' }}">
+          — <a href="#" id="gemini-remove-link" style="color:#f87171;">remove</a>
+        </span>
+      </p>
+    </div>
+
+    <!-- Local models link -->
+    <div class="section">
+      <h2>Local Models</h2>
+      <p style="font-size:0.8rem; color:#94a3b8; margin-bottom:0.85rem; line-height:1.55;">
+        Configure OpenAI-compatible hosts and models (Open WebUI, Ollama, LM Studio, etc.).
+      </p>
+      <a href="/settings/local"
+         style="display:inline-block; padding:0.55rem 1rem; background:#7c3aed; border-radius:6px;
+                color:#fff; font-size:0.88rem; font-weight:600; text-decoration:none;
+                transition:background 0.15s;">
+        Manage local models →
+      </a>
+    </div>
+
    <!-- Change password -->
    <div class="section">
      <h2>Change Password</h2>
@@ -287,6 +365,16 @@
      document.getElementById('show-rename-user').style.display = '';
    });

+    // Gemini key — "remove" link clears the input and submits the form
+    const geminiRemove = document.getElementById('gemini-remove-link');
+    if (geminiRemove) {
+      geminiRemove.addEventListener('click', e => {
+        e.preventDefault();
+        document.getElementById('gemini_api_key').value = '';
+        document.querySelector('form[action="/settings/gemini-key"]').submit();
+      });
+    }
+
    // Persona rename toggle
    document.querySelectorAll('.persona-rename-toggle').forEach(btn => {
      btn.addEventListener('click', () => {
--- a/cortex/static/style.css
+++ b/cortex/static/style.css
@@ -183,7 +183,13 @@

        .persona-dropdown .pd-add:hover { color: var(--text); }

+        /* Lucide SVG icon alignment */
+        .btn-icon { display: inline-block; vertical-align: middle; flex-shrink: 0; pointer-events: none; }
+
        .hdr-btn {
+            display: inline-flex;
+            align-items: center;
+            gap: 5px;
            background: var(--bg);
            border: 1px solid var(--border);
            border-radius: 6px;
@@ -224,7 +230,9 @@
        .hdr-dropdown.open { display: block; }

        .hdr-dd-item {
-            display: block;
+            display: flex;
+            align-items: center;
+            gap: 8px;
            width: 100%;
            text-align: left;
            padding: 0.55rem 0.85rem;
@@ -423,6 +431,8 @@
            padding: 0;
            font-size: 0.85em;
        }
+        /* Syntax highlighting — app theme controls the pre background; hljs adds token colors */
+        .message.assistant pre code.hljs { background: transparent; padding: 0; }

        .message.system {
            align-self: center;
@@ -432,6 +442,80 @@
            padding: 2px 0;
        }

+        /* ── Tool call step cards (agent mode) ── */
+        .tool-calls-container {
+            display: flex;
+            flex-direction: column;
+            gap: 3px;
+            margin: 4px 0 6px;
+            align-self: stretch;
+        }
+        .tool-call {
+            background: var(--surface);
+            border: 1px solid var(--border);
+            border-radius: 6px;
+            overflow: hidden;
+            font-size: 0.78rem;
+        }
+        .tool-call summary {
+            display: flex;
+            align-items: baseline;
+            gap: 0.5rem;
+            padding: 0.35rem 0.65rem;
+            cursor: pointer;
+            list-style: none;
+            user-select: none;
+            color: var(--muted);
+        }
+        .tool-call summary::-webkit-details-marker { display: none; }
+        .tool-call summary::before {
+            content: '▶';
+            font-size: 0.55rem;
+            color: var(--muted);
+            transition: transform 0.12s;
+            flex-shrink: 0;
+        }
+        .tool-call[open] summary::before { transform: rotate(90deg); }
+        .tool-call summary:hover { color: var(--text); background: rgba(255,255,255,0.03); }
+        .tc-name {
+            font-weight: 600;
+            color: var(--accent);
+            font-family: 'Courier New', monospace;
+        }
+        .tc-snippet {
+            color: var(--muted);
+            overflow: hidden;
+            text-overflow: ellipsis;
+            white-space: nowrap;
+            max-width: 36ch;
+        }
+        .tc-body {
+            padding: 0 0.65rem 0.5rem;
+            display: flex;
+            flex-direction: column;
+            gap: 0.4rem;
+        }
+        .tc-section { display: flex; flex-direction: column; gap: 2px; }
+        .tc-label {
+            font-size: 0.68rem;
+            font-weight: 600;
+            text-transform: uppercase;
+            letter-spacing: 0.05em;
+            color: var(--muted);
+        }
+        .tc-body pre {
+            margin: 0;
+            background: var(--pre-bg);
+            border: 1px solid var(--border);
+            border-radius: 4px;
+            padding: 6px 8px;
+            font-size: 0.78rem;
+            white-space: pre-wrap;
+            word-break: break-word;
+            color: var(--text);
+            overflow-x: auto;
+        }
+
        .message.error {
            align-self: flex-start;
            background: var(--error-bg);
@@ -443,9 +527,12 @@
        .message.thinking { color: var(--muted); font-style: italic; }

        /* Copy button */
-        .message.assistant { position: relative; }
+        .message.assistant, .message.user { position: relative; }

        .copy-btn {
+            display: inline-flex;
+            align-items: center;
+            gap: 4px;
            position: absolute;
            top: 7px;
            right: 8px;
@@ -460,10 +547,24 @@
            transition: opacity 0.15s, color 0.15s, border-color 0.15s;
        }

-        .message.assistant:hover .copy-btn { opacity: 1; }
+        .message.assistant:hover .copy-btn,
+        .message.user:hover .copy-btn { opacity: 1; }
        .copy-btn:hover  { color: var(--text); border-color: var(--muted); }
        .copy-btn.copied { color: var(--success); border-color: var(--success-dim); }

+        /* Model tag — shown at the bottom of every assistant message */
+        .model-tag {
+            display: block;
+            font-size: 0.67rem;
+            color: #475569;
+            margin-top: 0.55rem;
+            padding-top: 0.4rem;
+            border-top: 1px solid #2d3148;
+            text-align: right;
+            letter-spacing: 0.02em;
+        }
+        .model-tag.fallback { color: #f59e0b; }
+
        /* Note messages */
        .message.note-private {
            align-self: flex-end;
@@ -538,7 +639,7 @@
        #mode-select-btn.mode-otr               { border-color: rgba(120,80,160,0.6); color: #a87fd4; }
        #mode-select-btn.mode-agent             { border-color: rgba(80,140,200,0.6); color: #7cb9e8; }

-        #mode-icon  { font-size: 1rem; line-height: 1; }
+        #mode-icon  { display: flex; align-items: center; }
        .mode-arrow { font-size: 0.55rem; color: var(--muted); margin-left: 2px; opacity: 0.5; }

        /* Dropdown — opens upward; MRU at bottom = closest to button */
@@ -573,7 +674,7 @@
        }
        .mode-option:hover           { background: var(--border); color: var(--text); }
        .mode-option.current         { color: var(--text); font-weight: 500; }
-        .mode-option .opt-icon       { font-size: 1rem; line-height: 1; }
+        .mode-option .opt-icon       { display: flex; align-items: center; }
        .mode-option .opt-check      { margin-left: auto; font-size: 0.7rem; opacity: 0.7; }

        /* Note visibility sub-button — shown below mode-select when note is active */
@@ -630,6 +731,10 @@

        /* Send button */
        #send {
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            gap: 6px;
            background: var(--user-bg);
            border: 1px solid var(--user-border);
            color: var(--text);
@@ -649,11 +754,14 @@
        /* Stop button */
        #stop {
            display: none;
+            align-items: center;
+            justify-content: center;
+            gap: 6px;
            background: var(--error-bg);
            border: 1px solid var(--error-border);
            color: var(--error-text);
            border-radius: 8px;
-            padding: 10px 0;
+            padding: 10px 14px;
            cursor: pointer;
            font-size: 0.9rem;
            text-align: center;
@@ -699,6 +807,9 @@
        .msg-wrapper:hover .msg-actions     { opacity: 1; }

        .msg-act-btn {
+            display: inline-flex;
+            align-items: center;
+            gap: 4px;
            background: none;
            border: 1px solid var(--border);
            border-radius: 4px;
@@ -736,6 +847,9 @@
        }

        .edit-save-btn, .edit-cancel-btn {
+            display: inline-flex;
+            align-items: center;
+            gap: 4px;
            background: none;
            border: 1px solid var(--border);
            border-radius: 4px;
@@ -783,22 +897,12 @@
            flex-shrink: 0;
        }

-        #file-modal-header select {
-            background: var(--surface);
-            border: 1px solid var(--border);
-            border-radius: 5px;
-            color: var(--text);
-            font-size: 0.85rem;
-            padding: 4px 8px;
-            cursor: pointer;
-        }
-
        #file-modal-title {
            font-size: 0.9rem;
            font-weight: 600;
            color: var(--accent);
-            flex: 1;
        }
+        .fm-spacer { flex: 1; }

        .fm-btn {
            background: var(--bg);
@@ -814,13 +918,153 @@
        .fm-btn.active { color: var(--accent); border-color: var(--accent); }
        .fm-btn.save   { color: var(--accent); border-color: var(--inara-border); }
        .fm-btn.save:hover { background: var(--inara-bg); }
-        #file-saved-msg {
-            font-size: 0.75rem;
-            color: #6abf6a;
-            opacity: 0;
-            transition: opacity 0.3s;
+        #file-modal-content {
+            flex: 1;
+            display: flex;
+            overflow: hidden;
+        }
+
+        /* ── File sidebar ── */
+        #file-sidebar-wrap {
+            width: 190px;
+            flex-shrink: 0;
+            border-right: 1px solid var(--border);
+            display: flex;
+            flex-direction: column;
+            background: var(--bg);
+        }
+        #file-sidebar {
+            flex: 1;
+            overflow-y: auto;
+        }
+
+        /* ── Session search (within sidebar) ── */
+        #session-search-wrap {
+            border-top: 1px solid var(--border);
+            padding: 8px 8px 10px;
+        }
+        #session-search-label {
+            font-size: 0.65rem;
+            font-weight: 700;
+            text-transform: uppercase;
+            letter-spacing: 0.06em;
+            color: var(--muted);
+            margin-bottom: 5px;
+        }
+        #session-search-row {
+            display: flex;
+            gap: 4px;
+        }
+        #session-search-input {
+            flex: 1;
+            min-width: 0;
+            background: var(--surface);
+            border: 1px solid var(--border);
+            border-radius: 4px;
+            color: var(--text);
+            font-size: 0.78rem;
+            padding: 3px 6px;
+        }
+        #session-search-btn {
+            background: var(--surface);
+            border: 1px solid var(--border);
+            border-radius: 4px;
+            color: var(--muted);
+            font-size: 0.78rem;
+            padding: 3px 8px;
+            cursor: pointer;
+        }
+        #session-search-btn:hover { color: var(--accent); border-color: var(--accent); }
+
+        /* ── Session search results panel ── */
+        #session-search-results {
+            flex: 1;
+            overflow-y: auto;
+            padding: 12px 14px;
+            font-size: 0.82rem;
+        }
+        .sr-header { color: var(--muted); font-size: 0.72rem; margin-bottom: 10px; }
+        .sr-date {
+            font-size: 0.7rem;
+            font-weight: 700;
+            text-transform: uppercase;
+            letter-spacing: 0.05em;
+            color: var(--accent);
+            margin: 14px 0 4px;
+        }
+        .sr-date:first-of-type { margin-top: 0; }
+        .sr-excerpt {
+            background: var(--surface);
+            border-left: 2px solid var(--border);
+            border-radius: 0 4px 4px 0;
+            padding: 6px 10px;
+            margin-bottom: 6px;
+            line-height: 1.5;
+            white-space: pre-wrap;
+            word-break: break-word;
+            color: var(--text);
+        }
+        .sr-excerpt mark {
+            background: rgba(139,92,246,0.25);
+            color: var(--accent);
+            border-radius: 2px;
+            padding: 0 1px;
+        }
+        .sr-empty, .sr-error { color: var(--muted); padding: 8px 0; }
+
+        .fg-header {
+            display: flex;
+            align-items: center;
+            gap: 0.3rem;
+            padding: 7px 10px 5px;
+            font-size: 0.68rem;
+            font-weight: 700;
+            text-transform: uppercase;
+            letter-spacing: 0.06em;
+            color: var(--muted);
+            cursor: pointer;
+            user-select: none;
+        }
+        .fg-header::before {
+            content: '▾';
+            font-size: 0.7rem;
+            transition: transform 0.15s;
+        }
+        .fg-header.collapsed::before { transform: rotate(-90deg); }
+        .fg-header.collapsed + .fg-items { display: none; }
+
+        .fg-items { display: flex; flex-direction: column; }
+
+        .file-item {
+            padding: 6px 10px 6px 16px;
+            cursor: pointer;
+            border-left: 2px solid transparent;
+            transition: background 0.1s, border-color 0.1s;
+        }
+        .file-item:hover { background: var(--surface); }
+        .file-item.active {
+            background: var(--inara-bg);
+            border-left-color: var(--accent);
+        }
+        .file-item.missing { opacity: 0.45; }
+
+        .fi-name {
+            font-size: 0.8rem;
+            color: var(--text);
+            font-weight: 500;
+            white-space: nowrap;
+            overflow: hidden;
+            text-overflow: ellipsis;
+        }
+        .file-item.active .fi-name { color: var(--accent); }
+
+        .fi-meta {
+            display: flex;
+            gap: 0.5rem;
+            margin-top: 2px;
+            font-size: 0.68rem;
+            color: var(--muted);
        }
-        #file-saved-msg.show { opacity: 1; }

        #file-modal-body {
            flex: 1;
@@ -911,9 +1155,14 @@
            cursor: pointer;
            transition: color 0.15s, border-color 0.15s, background 0.15s;
        }
-        .ctx-btn:hover  { color: var(--text); border-color: var(--muted); }
-        .ctx-btn.active { color: var(--accent); border-color: var(--accent); }
-        .ctx-btn.mem-on { color: var(--success); border-color: var(--success-dim); }
+        .ctx-btn:hover    { color: var(--text); border-color: var(--muted); }
+        .ctx-btn.active   { color: var(--accent); border-color: var(--accent); }
+        .ctx-btn.mem-on   { color: var(--success); border-color: var(--success-dim); }
+        .ctx-btn.local-on { color: #f59e0b; border-color: #92400e; }
+        #backend-model-hint {
+            font-size: 0.68rem; color: #f59e0b; opacity: 0.8;
+            margin-top: 4px; word-break: break-all; line-height: 1.3;
+        }

        #ctx-distill-status {
            margin-top: 6px;
@@ -1149,6 +1398,48 @@

        #auth-banner-close:hover { opacity: 1; }

+        /* ── Toasts ──────────────────────────────────────────────── */
+        #toast-container {
+            position: fixed;
+            bottom: 1.25rem;
+            right: 1.25rem;
+            display: flex;
+            flex-direction: column;
+            align-items: flex-end;
+            gap: 0.4rem;
+            z-index: 9999;
+            pointer-events: none;
+        }
+        .toast {
+            padding: 0.45rem 0.85rem;
+            border-radius: 6px;
+            font-size: 0.8rem;
+            font-weight: 500;
+            color: #fff;
+            background: #334155;
+            border: 1px solid #475569;
+            box-shadow: 0 4px 12px rgba(0,0,0,0.35);
+            opacity: 0;
+            transform: translateY(6px);
+            transition: opacity 0.18s ease, transform 0.18s ease;
+            pointer-events: none;
+            white-space: nowrap;
+        }
+        .toast.show { opacity: 1; transform: translateY(0); }
+        .toast.success { background: #14532d; border-color: #16a34a; }
+        .toast.error   { background: #7f1d1d; border-color: #dc2626; }
+
+        /* Sessions backdrop — hidden by default, visible only as mobile drawer overlay */
+        #sessions-backdrop {
+            display: none;
+            position: fixed;
+            inset: 0;
+            background: rgba(0, 0, 0, 0.5);
+            z-index: 98;
+            animation: backdrop-in 0.2s ease;
+        }
+        @keyframes backdrop-in { from { opacity: 0; } to { opacity: 1; } }
+
        /* ── Mobile responsive ───────────────────────────────────── */
        @media (max-width: 520px) {
            header { padding: 8px 12px; gap: 8px; }
@@ -1209,6 +1500,36 @@

            /* Larger touch targets */
            #send, #stop { padding: 12px 14px; font-size: 1rem; }
+
+            /* File modal: sidebar collapses to a narrow strip */
+            #file-modal-inner { width: 100vw; height: 100dvh; border-radius: 0; }
+            #file-sidebar-wrap { width: 130px; }
+            .fi-meta { display: none; }
+
+            /* Sessions backdrop active on mobile */
+            #sessions-backdrop.open { display: block; }
+
+            /* Sessions panel → full-height drawer sliding in from the right */
+            #sessions-panel {
+                display: block !important; /* keep rendered so transition works */
+                position: fixed;
+                top: 0;
+                right: 0;
+                bottom: 0;
+                width: min(300px, 85vw);
+                max-height: none;
+                height: 100%;
+                border-radius: 0;
+                border-top: none;
+                border-right: none;
+                border-bottom: none;
+                border-left: 1px solid var(--border);
+                transform: translateX(110%);
+                transition: transform 0.25s ease;
+                z-index: 99;
+                overflow-y: auto;
+            }
+            #sessions-panel.open { transform: translateX(0); }
        }

        /* ── Touch devices — no hover capability ─────────────────── */
--- a/cortex/tests/test_model_registry.py
+++ b/cortex/tests/test_model_registry.py
@@ -0,0 +1,805 @@
+"""
+Unit tests for model_registry.py — no HTTP, no LLM calls, no running service.
+
+All file I/O is redirected to tmp_path via patch.object(config.settings, "home_dir", ...).
+
+Coverage:
+  - Empty registry (no files)
+  - Save/load round-trip
+  - Migration from local_llm.json (v0 flat and v1 hosts/models)
+  - Host CRUD
+  - Model CRUD (including role reference cleanup on remove)
+  - Role assignment (set_role, validation)
+  - Model resolution (_resolve_model: built-ins, local_openai, missing host/model)
+  - get_model_for_role: registry chain → .env fallback → hardcoded fallback
+  - get_best_local_model: role chain, first-local fallback, no-local case
+  - Backup chain: skips missing models, returns next valid
+"""
+
+import json
+import pytest
+from pathlib import Path
+from unittest.mock import patch
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _home(tmp_path: Path) -> Path:
+    """Create a minimal home directory and return the root."""
+    root = tmp_path / "home"
+    root.mkdir()
+    return root
+
+
+def _user_dir(home: Path, username: str = "scott") -> Path:
+    d = home / username
+    d.mkdir(exist_ok=True)
+    return d
+
+
+def _write_registry(home: Path, data: dict, username: str = "scott") -> Path:
+    _user_dir(home, username)
+    path = home / username / "model_registry.json"
+    path.write_text(json.dumps(data))
+    return path
+
+
+def _write_local_llm(home: Path, data: dict, username: str = "scott") -> Path:
+    _user_dir(home, username)
+    path = home / username / "local_llm.json"
+    path.write_text(json.dumps(data))
+    return path
+
+
+def _read_registry(home: Path, username: str = "scott") -> dict:
+    path = home / username / "model_registry.json"
+    return json.loads(path.read_text())
+
+
+# ---------------------------------------------------------------------------
+# Empty / fresh state
+# ---------------------------------------------------------------------------
+
+def test_empty_registry_no_files(tmp_path):
+    """With no files, _load returns an empty structure."""
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("scott")
+    assert data["version"] == 1
+    assert data["hosts"] == []
+    assert data["models"] == []
+    assert data["roles"] == {}
+
+
+def test_empty_registry_missing_user_dir(tmp_path):
+    """Even with no user dir, _load returns an empty structure gracefully."""
+    home = _home(tmp_path)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("nobody")
+    assert data["hosts"] == []
+
+
+# ---------------------------------------------------------------------------
+# Save / load round-trip
+# ---------------------------------------------------------------------------
+
+def test_save_and_load(tmp_path):
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+
+    registry = {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "ML Box", "api_url": "http://10.0.0.1:3000", "api_key": "sk-test"}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "Gemma Small",
+                    "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": ["fast"]}],
+        "roles": {"chat": {"primary": "m1"}},
+    }
+    with patch.object(config.settings, "home_dir", home):
+        reg._save("scott", registry)
+        loaded = reg._load("scott")
+
+    assert loaded["hosts"][0]["label"] == "ML Box"
+    assert loaded["models"][0]["model_name"] == "gemma4:e4b"
+    assert loaded["roles"]["chat"]["primary"] == "m1"
+
+
+def test_corrupt_registry_falls_back_to_empty(tmp_path):
+    home = _home(tmp_path)
+    path = _user_dir(home) / "model_registry.json"
+    path.write_text("{bad json{{")
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("scott")
+    assert data["hosts"] == []
+
+
+# ---------------------------------------------------------------------------
+# Migration from local_llm.json
+# ---------------------------------------------------------------------------
+
+def test_migrate_v1_hosts_models(tmp_path):
+    """v1 local_llm.json (hosts/models/active_model_id) migrates correctly."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {
+        "hosts": [{"id": "h1", "label": "Home", "api_url": "http://10.0.0.1:3000", "api_key": "sk-1"}],
+        "models": [
+            {"id": "m1", "host_id": "h1", "label": "Gemma Small", "model_name": "gemma4:e4b"},
+            {"id": "m2", "host_id": "h1", "label": "Gemma Med",   "model_name": "gemma4:26b"},
+        ],
+        "active_model_id": "m1",
+    })
+
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("scott")
+
+    assert len(data["hosts"]) == 1
+    assert data["hosts"][0]["api_url"] == "http://10.0.0.1:3000"
+    assert len(data["models"]) == 2
+    assert all(m["type"] == "local_openai" for m in data["models"])
+    # active_model_id → roles.chat.primary
+    assert data["roles"].get("chat", {}).get("primary") == "m1"
+
+
+def test_migrate_v1_no_active_model(tmp_path):
+    """Migration with active_model_id=null: chat role stays unset."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "host_id": "h1", "label": "Model", "model_name": "llama3"}],
+        "active_model_id": None,
+    })
+
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("scott")
+
+    assert "chat" not in data["roles"] or data["roles"]["chat"].get("primary") is None
+
+
+def test_migrate_v0_flat_format(tmp_path):
+    """v0 flat local_llm.json is wrapped into hosts/models structure."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {
+        "api_url": "http://10.0.0.2:3000",
+        "api_key": "sk-flat",
+        "model": "qwen3:8b",
+    })
+
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        data = reg._load("scott")
+
+    assert len(data["hosts"]) == 1
+    assert data["hosts"][0]["api_url"] == "http://10.0.0.2:3000"
+    assert len(data["models"]) == 1
+    assert data["models"][0]["model_name"] == "qwen3:8b"
+
+
+def test_migrate_v0_empty_url_returns_empty(tmp_path):
+    """v0 with no api_url and no .env fallback → nothing to migrate, empty registry."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {"api_url": "", "api_key": "", "model": ""})
+
+    import config
+    import model_registry as reg
+    with (
+        patch.object(config.settings, "home_dir", home),
+        patch.object(config.settings, "local_api_url", ""),   # ensure no .env fallback
+        patch.object(config.settings, "local_model", ""),
+    ):
+        data = reg._load("scott")
+
+    assert data["hosts"] == []
+    assert data["models"] == []
+
+
+def test_migrate_v1_distill_local_sets_role(tmp_path):
+    """When DISTILL_BACKEND_MID=local and active model exists, distill role is set."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "host_id": "h1", "label": "G", "model_name": "gemma4:e4b"}],
+        "active_model_id": "m1",
+    })
+
+    import config
+    import model_registry as reg
+    with (
+        patch.object(config.settings, "home_dir", home),
+        patch.object(config.settings, "distill_backend_mid", "local"),
+    ):
+        data = reg._load("scott")
+
+    assert data["roles"].get("distill", {}).get("primary") == "m1"
+
+
+def test_migration_saves_registry_file(tmp_path):
+    """After migration, model_registry.json is written so next load skips migration."""
+    home = _home(tmp_path)
+    _write_local_llm(home, {
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [],
+        "active_model_id": None,
+    })
+
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg._load("scott")  # triggers migration + save
+        # Second load should read model_registry.json, not re-run migration
+        data2 = reg._load("scott")
+
+    assert (home / "scott" / "model_registry.json").exists()
+    assert data2["version"] == 1
+
+
+# ---------------------------------------------------------------------------
+# Built-in model resolution
+# ---------------------------------------------------------------------------
+
+def test_builtin_claude_cli(tmp_path):
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(reg._empty(), "claude_cli")
+    assert result is not None
+    assert result["type"] == "claude_cli"
+    assert result["id"] == "claude_cli"
+
+
+def test_builtin_gemini_api(tmp_path):
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(reg._empty(), "gemini_api")
+    assert result["type"] == "gemini_api"
+
+
+def test_builtin_gemini_cli(tmp_path):
+    home = _home(tmp_path)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(reg._empty(), "gemini_cli")
+    assert result["type"] == "gemini_cli"
+
+
+def test_builtin_unknown_returns_none(tmp_path):
+    home = _home(tmp_path)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(reg._empty(), "does_not_exist")
+    assert result is None
+
+
+# ---------------------------------------------------------------------------
+# User model resolution
+# ---------------------------------------------------------------------------
+
+def test_resolve_local_openai_merges_host(tmp_path):
+    """local_openai model gets api_url and api_key merged from its host."""
+    home = _home(tmp_path)
+    registry = {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": "sk-test"}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "G", "model_name": "gemma4:e4b",
+                    "host_id": "h1", "context_k": 72, "tags": []}],
+        "roles": {},
+    }
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(registry, "m1")
+    assert result["api_url"] == "http://10.0.0.1:3000"
+    assert result["api_key"] == "sk-test"
+    assert result["model_name"] == "gemma4:e4b"
+
+
+def test_resolve_local_openai_missing_host_returns_none(tmp_path):
+    """A model pointing to a non-existent host_id returns None."""
+    home = _home(tmp_path)
+    registry = {
+        "version": 1, "hosts": [], "roles": {},
+        "models": [{"id": "m1", "type": "local_openai", "host_id": "missing",
+                    "label": "X", "model_name": "x", "context_k": 0, "tags": []}],
+    }
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(registry, "m1")
+    assert result is None
+
+
+def test_resolve_unknown_model_id_returns_none(tmp_path):
+    home = _home(tmp_path)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg._resolve_model(reg._empty(), "no_such_model")
+    assert result is None
+
+
+# ---------------------------------------------------------------------------
+# get_model_for_role
+# ---------------------------------------------------------------------------
+
+def test_get_model_for_role_uses_registry(tmp_path):
+    """Registry primary assignment is returned first."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "G",
+                    "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": []}],
+        "roles": {"chat": {"primary": "m1"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_model_for_role("scott", "chat")
+    assert result["model_name"] == "gemma4:e4b"
+    assert result["api_url"] == "http://10.0.0.1:3000"
+
+
+def test_get_model_for_role_uses_builtin_from_registry(tmp_path):
+    """Registry can assign built-in IDs (claude_cli, gemini_api, etc.)."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1, "hosts": [], "models": [],
+        "roles": {"chat": {"primary": "claude_cli"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_model_for_role("scott", "chat")
+    assert result["type"] == "claude_cli"
+
+
+def test_get_model_for_role_skips_missing_primary(tmp_path):
+    """If primary model_id is not found, falls through to backup_1."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m2", "type": "local_openai", "label": "Backup",
+                    "model_name": "llama3:8b", "host_id": "h1", "context_k": 8, "tags": []}],
+        "roles": {"chat": {"primary": "gone", "backup_1": "m2"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_model_for_role("scott", "chat")
+    assert result["model_name"] == "llama3:8b"
+
+
+def test_get_model_for_role_env_fallback(tmp_path):
+    """No registry entry for role → falls back to .env setting."""
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    with (
+        patch.object(config.settings, "home_dir", home),
+        patch.object(config.settings, "role_chat", "gemini_cli"),
+    ):
+        result = reg.get_model_for_role("scott", "chat")
+    assert result["type"] == "gemini_cli"
+
+
+def test_get_model_for_role_hardcoded_fallback(tmp_path):
+    """No registry + no .env for role → hardcoded last resort."""
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    # Clear the .env default for 'chat' to simulate unset
+    with (
+        patch.object(config.settings, "home_dir", home),
+        patch.object(config.settings, "role_chat", ""),
+    ):
+        result = reg.get_model_for_role("scott", "chat")
+    # claude_cli is the hardcoded last resort for 'chat'
+    assert result["type"] == "claude_cli"
+
+
+def test_get_model_for_role_custom_role(tmp_path):
+    """Custom roles not in DEFINED_ROLES can still be assigned and resolved."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1, "hosts": [], "models": [],
+        "roles": {"therapy": {"primary": "gemini_api"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_model_for_role("scott", "therapy")
+    assert result["type"] == "gemini_api"
+
+
+def test_get_model_for_role_full_backup_chain(tmp_path):
+    """Walks the entire priority chain before falling back."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m4", "type": "local_openai", "label": "Last",
+                    "model_name": "tiny:1b", "host_id": "h1", "context_k": 4, "tags": []}],
+        "roles": {"chat": {
+            "primary":  "gone1",
+            "backup_1": "gone2",
+            "backup_2": "gone3",
+            "backup_3": "gone4",
+            "backup_4": "m4",
+        }},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_model_for_role("scott", "chat")
+    assert result["model_name"] == "tiny:1b"
+
+
+# ---------------------------------------------------------------------------
+# get_best_local_model
+# ---------------------------------------------------------------------------
+
+def test_get_best_local_prefers_role_chain(tmp_path):
+    """Returns the first local_openai model in the chat role chain."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [
+            {"id": "m1", "type": "local_openai", "label": "Preferred",
+             "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": []},
+        ],
+        "roles": {"chat": {"primary": "claude_cli", "backup_1": "m1"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        # primary is claude_cli (not local), backup_1 is m1 (local)
+        result = reg.get_best_local_model("scott", "chat")
+    assert result["model_name"] == "gemma4:e4b"
+
+
+def test_get_best_local_falls_back_to_first_model(tmp_path):
+    """No local model in role chain → returns first configured local model."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [
+            {"id": "m1", "type": "local_openai", "label": "G",
+             "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": []},
+        ],
+        "roles": {},  # no chat role assigned
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_best_local_model("scott", "chat")
+    assert result["model_name"] == "gemma4:e4b"
+
+
+def test_get_best_local_returns_none_when_no_local_models(tmp_path):
+    """No local_openai models configured → returns None."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1, "hosts": [], "models": [],
+        "roles": {"chat": {"primary": "claude_cli"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        result = reg.get_best_local_model("scott", "chat")
+    assert result is None
+
+
+# ---------------------------------------------------------------------------
+# Host CRUD
+# ---------------------------------------------------------------------------
+
+def test_save_host_creates_new(tmp_path):
+    home = _home(tmp_path)
+    _user_dir(home)
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        host_id = reg.save_host("scott", None, "ML Box", "http://10.0.0.1:3000", "sk-abc")
+        data = reg._load("scott")
+    assert len(data["hosts"]) == 1
+    assert data["hosts"][0]["id"] == host_id
+    assert data["hosts"][0]["label"] == "ML Box"
+    assert data["hosts"][0]["api_key"] == "sk-abc"
+
+
+def test_save_host_updates_existing(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Old Label", "api_url": "http://10.0.0.1:3000", "api_key": "sk-old"}],
+        "models": [], "roles": {},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg.save_host("scott", "h1", "New Label", "http://10.0.0.2:3000", "")
+        data = reg._load("scott")
+    assert len(data["hosts"]) == 1
+    assert data["hosts"][0]["label"] == "New Label"
+    assert data["hosts"][0]["api_url"] == "http://10.0.0.2:3000"
+    # Empty api_key → existing key preserved
+    assert data["hosts"][0]["api_key"] == "sk-old"
+
+
+def test_save_host_unknown_id_creates_new(tmp_path):
+    """Passing a host_id that doesn't exist creates a new host instead of crashing."""
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg.save_host("scott", "ghost-id", "New", "http://10.0.0.3:3000", "")
+        data = reg._load("scott")
+    assert len(data["hosts"]) == 1
+
+
+def test_remove_host_also_removes_models(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "type": "local_openai", "host_id": "h1",
+                    "label": "G", "model_name": "gemma4:e4b", "context_k": 72, "tags": []}],
+        "roles": {"chat": {"primary": "m1"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        found = reg.remove_host("scott", "h1")
+        data = reg._load("scott")
+    assert found is True
+    assert data["hosts"] == []
+    assert data["models"] == []
+
+
+def test_remove_host_not_found_returns_false(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        found = reg.remove_host("scott", "nope")
+    assert found is False
+
+
+# ---------------------------------------------------------------------------
+# Model CRUD
+# ---------------------------------------------------------------------------
+
+def test_save_model_creates(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [], "roles": {},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        model_id = reg.save_model("scott", None, "h1", "Gemma Small", "gemma4:e4b", 72, ["fast", "distill"])
+        data = reg._load("scott")
+    assert len(data["models"]) == 1
+    assert data["models"][0]["id"] == model_id
+    assert data["models"][0]["context_k"] == 72
+    assert data["models"][0]["tags"] == ["fast", "distill"]
+    assert data["models"][0]["type"] == "local_openai"
+
+
+def test_save_model_updates_existing(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "Old",
+                    "model_name": "llama3", "host_id": "h1", "context_k": 8, "tags": []}],
+        "roles": {},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg.save_model("scott", "m1", "h1", "New Label", "llama3:latest", 128, ["updated"])
+        data = reg._load("scott")
+    assert len(data["models"]) == 1
+    assert data["models"][0]["label"] == "New Label"
+    assert data["models"][0]["context_k"] == 128
+
+
+def test_remove_model_clears_role_refs(tmp_path):
+    """Removing a model clears it from any role assignments."""
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "G",
+                    "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": []}],
+        "roles": {
+            "chat":   {"primary": "m1", "backup_1": "m1"},
+            "distill": {"primary": "claude_cli", "backup_1": "m1"},
+        },
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        found = reg.remove_model("scott", "m1")
+        data = reg._load("scott")
+    assert found is True
+    assert data["models"] == []
+    assert data["roles"]["chat"].get("primary") is None
+    assert data["roles"]["chat"].get("backup_1") is None
+    assert data["roles"]["distill"].get("backup_1") is None
+    # claude_cli assignment preserved
+    assert data["roles"]["distill"]["primary"] == "claude_cli"
+
+
+def test_remove_model_not_found_returns_false(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        found = reg.remove_model("scott", "ghost")
+    assert found is False
+
+
+# ---------------------------------------------------------------------------
+# set_role
+# ---------------------------------------------------------------------------
+
+def test_set_role_assigns_model(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1,
+        "hosts": [{"id": "h1", "label": "Box", "api_url": "http://10.0.0.1:3000", "api_key": ""}],
+        "models": [{"id": "m1", "type": "local_openai", "label": "G",
+                    "model_name": "gemma4:e4b", "host_id": "h1", "context_k": 72, "tags": []}],
+        "roles": {},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        ok = reg.set_role("scott", "chat", "primary", "m1")
+        data = reg._load("scott")
+    assert ok is True
+    assert data["roles"]["chat"]["primary"] == "m1"
+
+
+def test_set_role_assigns_builtin(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        ok = reg.set_role("scott", "orchestrator", "primary", "gemini_api")
+        data = reg._load("scott")
+    assert ok is True
+    assert data["roles"]["orchestrator"]["primary"] == "gemini_api"
+
+
+def test_set_role_clears_with_none(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1, "hosts": [], "models": [],
+        "roles": {"chat": {"primary": "claude_cli"}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        ok = reg.set_role("scott", "chat", "primary", None)
+        data = reg._load("scott")
+    assert ok is True
+    assert data["roles"]["chat"]["primary"] is None
+
+
+def test_set_role_invalid_slot_returns_false(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        ok = reg.set_role("scott", "chat", "backup_99", "claude_cli")
+    assert ok is False
+
+
+def test_set_role_unknown_model_id_returns_false(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        ok = reg.set_role("scott", "chat", "primary", "nonexistent_model")
+    assert ok is False
+
+
+def test_set_role_creates_role_key_if_missing(tmp_path):
+    """set_role on a role that isn't in roles{} yet creates it."""
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg.set_role("scott", "medical", "primary", "claude_cli")
+        data = reg._load("scott")
+    assert data["roles"]["medical"]["primary"] == "claude_cli"
+
+
+# ---------------------------------------------------------------------------
+# get_defined_roles
+# ---------------------------------------------------------------------------
+
+def test_get_defined_roles_returns_registry_roles(tmp_path):
+    home = _home(tmp_path)
+    _write_registry(home, {
+        "version": 1, "hosts": [], "models": [],
+        "roles": {"chat": {"primary": "claude_cli"}, "distill": {}},
+    })
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        roles = reg.get_defined_roles("scott")
+    # Should include all settings.defined_roles, filling gaps with {}
+    for role in config.settings.get_defined_roles():
+        assert role in roles
+
+
+def test_get_defined_roles_fills_gaps(tmp_path):
+    """Roles in settings.defined_roles that aren't in registry get empty dicts."""
+    home = _home(tmp_path)
+    _write_registry(home, {"version": 1, "hosts": [], "models": [], "roles": {}})
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        roles = reg.get_defined_roles("scott")
+    assert "chat" in roles
+    assert roles["chat"] == {}
+
+
+# ---------------------------------------------------------------------------
+# Multi-user isolation
+# ---------------------------------------------------------------------------
+
+def test_registries_are_isolated_per_user(tmp_path):
+    """Each user has their own registry file — changes don't bleed across users."""
+    home = _home(tmp_path)
+    (home / "scott").mkdir()
+    (home / "holly").mkdir()
+
+    import config
+    import model_registry as reg
+    with patch.object(config.settings, "home_dir", home):
+        reg.save_host("scott", None, "Scott Host", "http://10.0.0.1:3000", "")
+        scott_data = reg._load("scott")
+        holly_data = reg._load("holly")
+
+    assert len(scott_data["hosts"]) == 1
+    assert holly_data["hosts"] == []
--- a/cortex/tools/init.py
+++ b/cortex/tools/init.py
@@ -28,6 +28,10 @@ from tools.cron import (
    cron_add as _cron_add,
    cron_remove as _cron_remove,
    cron_toggle as _cron_toggle,
+)
+from tools.reminders import (
+    reminders_add as _reminders_add,
+    reminders_list as _reminders_list,
    reminders_clear as _reminders_clear,
 )
 from tools.scratch import (
@@ -196,6 +200,8 @@ _CALLABLES: dict[str, callable] = {
    "cron_add": _cron_add,
    "cron_remove": _cron_remove,
    "cron_toggle": _cron_toggle,
+    "reminders_add": _reminders_add,
+    "reminders_list": _reminders_list,
    "reminders_clear": _reminders_clear,
    "scratch_read": _scratch_read,
    "scratch_write": _scratch_write,
@@ -409,6 +415,40 @@ _cron_toggle_declaration = types.FunctionDeclaration(
    ),
 )

+_reminders_add_declaration = types.FunctionDeclaration(
+    name="reminders_add",
+    description=(
+        "Add a new reminder to REMINDERS.md. Reminders are automatically surfaced "
+        "in your context at the start of each session (Tier 2+). "
+        "Use this when the user asks you to remember something, follow up on something, "
+        "or surface a note at the next session."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "text": types.Schema(
+                type=types.Type.STRING,
+                description="The reminder text to add",
+            ),
+            "label": types.Schema(
+                type=types.Type.STRING,
+                description="Optional heading for this reminder (e.g. 'Follow up on NC Talk'). Defaults to current timestamp.",
+            ),
+        },
+        required=["text"],
+    ),
+)
+
+_reminders_list_declaration = types.FunctionDeclaration(
+    name="reminders_list",
+    description=(
+        "Read all current pending reminders from REMINDERS.md. "
+        "Use this to check what reminders are queued before adding duplicates, "
+        "or to show the user what's pending."
+    ),
+    parameters=types.Schema(type=types.Type.OBJECT, properties={}),
+)
+
 _reminders_clear_declaration = types.FunctionDeclaration(
    name="reminders_clear",
    description=(
@@ -494,6 +534,8 @@ TOOL_DECLARATIONS = [
        _cron_add_declaration,
        _cron_remove_declaration,
        _cron_toggle_declaration,
+        _reminders_add_declaration,
+        _reminders_list_declaration,
        _reminders_clear_declaration,
        _scratch_read_declaration,
        _scratch_write_declaration,
--- a/cortex/tools/reminders.py
+++ b/cortex/tools/reminders.py
@@ -0,0 +1,69 @@
+"""
+Reminders tools.
+
+Reminders are stored in persona/REMINDERS.md and automatically surfaced
+in the system prompt at Tier 2+. Use these tools to add, list, and clear
+pending reminders.
+
+Operations:
+  reminders_add   — append a new reminder entry
+  reminders_list  — return all current reminders (or a message if empty)
+  reminders_clear — erase all reminders (moved here from cron.py for consistency;
+                    cron.py still calls the same underlying file)
+"""
+
+import asyncio
+from datetime import datetime, timezone
+from pathlib import Path
+
+from persona import persona_path
+
+
+def _reminders_path() -> Path:
+    return persona_path() / "REMINDERS.md"
+
+
+def _now_label() -> str:
+    return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
+
+
+# ---------------------------------------------------------------------------
+# Sync implementations
+# ---------------------------------------------------------------------------
+
+def _reminders_list() -> str:
+    p = _reminders_path()
+    if not p.exists() or not p.read_text().strip():
+        return "No pending reminders."
+    return p.read_text()
+
+
+def _reminders_add(text: str, label: str | None = None) -> str:
+    p = _reminders_path()
+    existing = p.read_text() if p.exists() else ""
+    heading = label or _now_label()
+    section = f"\n## {heading}\n\n{text.strip()}\n"
+    p.write_text(existing.rstrip() + "\n" + section)
+    return f"Reminder added: {heading}"
+
+
+def _reminders_clear() -> str:
+    p = _reminders_path()
+    p.write_text("")
+    return "All reminders cleared."
+
+
+# ---------------------------------------------------------------------------
+# Async wrappers
+# ---------------------------------------------------------------------------
+
+async def reminders_list() -> str:
+    return await asyncio.to_thread(_reminders_list)
+
+
+async def reminders_add(text: str, label: str | None = None) -> str:
+    return await asyncio.to_thread(_reminders_add, text, label)
+
+
+async def reminders_clear() -> str:
+    return await asyncio.to_thread(_reminders_clear)
--- a/cortex/user_settings.py
+++ b/cortex/user_settings.py
@@ -0,0 +1,194 @@
+"""
+Per-user settings stored in home/{user}/local_llm.json.
+
+Structure:
+  {
+    "hosts": [{"id", "label", "api_url", "api_key"}, ...],
+    "models": [{"id", "host_id", "label", "model_name"}, ...],
+    "active_model_id": "<model id>" | null
+  }
+
+Values not configured here fall back to .env server defaults.
+"""
+import json
+import logging
+import secrets
+from pathlib import Path
+
+from config import settings as app_settings
+
+logger = logging.getLogger(__name__)
+
+
+def _llm_path(username: str) -> Path:
+    return app_settings.home_root() / username / "local_llm.json"
+
+
+def _empty() -> dict:
+    return {"hosts": [], "models": [], "active_model_id": None}
+
+
+def _load(username: str) -> dict:
+    path = _llm_path(username)
+    if not path.exists():
+        return _empty()
+    try:
+        data = json.loads(path.read_text())
+    except (json.JSONDecodeError, OSError):
+        logger.warning("local_llm.json for %s is unreadable — starting fresh", username)
+        return _empty()
+
+    # Migrate old single-model format {api_url, api_key, model} → new format
+    if "hosts" not in data:
+        return _migrate_v0(data)
+
+    return data
+
+
+def _migrate_v0(old: dict) -> dict:
+    """Migrate flat {api_url, api_key, model} → hosts/models structure."""
+    data = _empty()
+    api_url    = old.get("api_url")    or app_settings.local_api_url
+    api_key    = old.get("api_key")    or app_settings.local_api_key
+    model_name = old.get("model")      or app_settings.local_model
+
+    if not api_url:
+        return data
+
+    host_id = secrets.token_hex(4)
+    data["hosts"].append({
+        "id":      host_id,
+        "label":   "Local Model Server",
+        "api_url": api_url,
+        "api_key": api_key,
+    })
+
+    if model_name:
+        model_id = secrets.token_hex(4)
+        data["models"].append({
+            "id":         model_id,
+            "host_id":    host_id,
+            "label":      model_name,
+            "model_name": model_name,
+        })
+        data["active_model_id"] = model_id
+
+    logger.info("migrated local_llm.json v0 → v1 for user (host=%s)", host_id)
+    return data
+
+
+def _save(username: str, data: dict) -> None:
+    _llm_path(username).write_text(json.dumps(data, indent=2))
+
+
+# ── Public read API ───────────────────────────────────────────────────────────
+
+def get_config(username: str) -> dict:
+    """Return the full local LLM config for the user."""
+    return _load(username)
+
+
+def get_active_local_model(username: str) -> dict | None:
+    """Return effective {api_url, api_key, model_name, label} for the active model.
+
+    Resolution order:
+      1. User's active model + its host config
+      2. .env server defaults (LOCAL_API_URL / LOCAL_API_KEY / LOCAL_MODEL)
+      3. None — caller should raise a helpful error
+    """
+    data = _load(username)
+
+    active_id = data.get("active_model_id")
+    model = next((m for m in data["models"] if m["id"] == active_id), None)
+
+    if model:
+        host = next((h for h in data["hosts"] if h["id"] == model["host_id"]), None)
+        if host:
+            return {
+                "api_url":    host.get("api_url", ""),
+                "api_key":    host.get("api_key", ""),
+                "model_name": model["model_name"],
+                "label":      model.get("label") or model["model_name"],
+            }
+
+    # Fall back to .env defaults
+    if app_settings.local_api_url and app_settings.local_model:
+        return {
+            "api_url":    app_settings.local_api_url,
+            "api_key":    app_settings.local_api_key,
+            "model_name": app_settings.local_model,
+            "label":      app_settings.local_model,
+        }
+
+    return None
+
+
+# ── Host management ───────────────────────────────────────────────────────────
+
+def save_host(username: str, host_id: str | None,
+              label: str, api_url: str, api_key: str) -> str:
+    """Create or update a host. Returns the host ID.
+
+    api_key is only written when non-empty, so submitting a masked placeholder
+    with a blank key field leaves the stored key unchanged.
+    """
+    data = _load(username)
+
+    if host_id:
+        for h in data["hosts"]:
+            if h["id"] == host_id:
+                h["label"]   = label.strip()
+                h["api_url"] = api_url.strip()
+                if api_key.strip():
+                    h["api_key"] = api_key.strip()
+                break
+        else:
+            host_id = None  # ID not found — fall through to create
+
+    if not host_id:
+        host_id = secrets.token_hex(4)
+        data["hosts"].append({
+            "id":      host_id,
+            "label":   label.strip(),
+            "api_url": api_url.strip(),
+            "api_key": api_key.strip(),
+        })
+
+    _save(username, data)
+    return host_id
+
+
+# ── Model management ──────────────────────────────────────────────────────────
+
+def add_model(username: str, host_id: str, label: str, model_name: str) -> str:
+    """Add a model entry. Auto-activates if it is the first model. Returns the model ID."""
+    data = _load(username)
+    model_id = secrets.token_hex(4)
+    data["models"].append({
+        "id":         model_id,
+        "host_id":    host_id,
+        "label":      label.strip() or model_name.strip(),
+        "model_name": model_name.strip(),
+    })
+    if not data.get("active_model_id"):
+        data["active_model_id"] = model_id
+    _save(username, data)
+    return model_id
+
+
+def remove_model(username: str, model_id: str) -> None:
+    data = _load(username)
+    data["models"] = [m for m in data["models"] if m["id"] != model_id]
+    if data.get("active_model_id") == model_id:
+        data["active_model_id"] = data["models"][0]["id"] if data["models"] else None
+    _save(username, data)
+
+
+def set_active_model(username: str, model_id: str) -> bool:
+    """Set the active model. Returns False if the model ID is not found."""
+    data = _load(username)
+    if not any(m["id"] == model_id for m in data["models"]):
+        return False
+    data["active_model_id"] = model_id
+    _save(username, data)
+    return True
--- a/dev-restart.sh
+++ b/dev-restart.sh
@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+# dev-restart.sh — restart Cortex on the gaming laptop and tail logs
+# Usage:
+#   ./dev-restart.sh          restart and show last 30 log lines
+#   ./dev-restart.sh logs     tail live logs (ctrl-c to stop)
+#   ./dev-restart.sh status   show service status only
+
+# "scott-lt-i7-rtx" or "192.168.32.19"
+CORTEX_HOST="scott-lt-i7-rtx"   # hostname or IP of the machine running Cortex
+SERVICE="cortex"
+
+case "${1:-restart}" in
+  logs)
+    echo "→ Tailing $SERVICE logs on $CORTEX_HOST (ctrl-c to stop)"
+    ssh "$CORTEX_HOST" "journalctl --user -u $SERVICE -f --no-pager"
+    ;;
+  status)
+    ssh "$CORTEX_HOST" "systemctl --user status $SERVICE --no-pager -l"
+    ;;
+  restart|*)
+    echo "→ Restarting $SERVICE on $CORTEX_HOST …"
+    ssh "$CORTEX_HOST" "systemctl --user restart $SERVICE"
+    echo "→ Last 30 log lines:"
+    ssh "$CORTEX_HOST" "journalctl --user -u $SERVICE --no-pager -n 30"
+    ;;
+esac
--- a/docs/GOOGLE_CHAT_BOT.md
+++ b/docs/GOOGLE_CHAT_BOT.md
@@ -0,0 +1,100 @@
+# Google Chat Bot Integration
+
+Cortex connects to Google Chat as a **Workspace Add-on** — each Cortex user gets their own webhook endpoint routed to their chosen persona.
+
+**Status:** Live and confirmed working (2026-03-27)
+
+---
+
+## Prerequisites
+
+- A Google Cloud project with **Google Chat API** enabled
+- The Cortex server reachable at a public HTTPS URL
+- The user pre-registered in Cortex (`manage_passwords.py invite` or `google-add`)
+
+---
+
+## Per-User Setup
+
+### 1. Create the user's `channels.json`
+
+Create `home/{username}/channels.json` on the Cortex server:
+
+```json
+{
+  "google_chat": {
+    "persona": "inara",
+    "audience": "https://cortex.dgrzone.com/channels/google-chat/{username}",
+    "backend": "claude",
+    "timeout": 25
+  }
+}
+```
+
+- **`persona`** — which persona responds (must exist under `home/{username}/persona/`)
+- **`audience`** — must exactly match the HTTP endpoint URL you set in Google Cloud Console (Google uses this as the JWT `aud` claim)
+- **`backend`** — `"claude"` recommended; Google Chat requires a response within 30s
+- **`timeout`** — keep at 25 (Google's hard limit is 30s; this leaves a 5s buffer)
+
+### 2. Configure Google Chat API in Google Cloud Console
+
+1. Go to [console.cloud.google.com](https://console.cloud.google.com) and select the project
+2. **APIs & Services → Enabled APIs & services → Google Chat API**
+3. Click the **Configuration** tab
+4. Fill in **Application info:**
+   - App name: `Cortex` (or your persona name)
+   - Avatar URL: optional
+   - Description: optional
+5. Under **Interactive features:**
+   - Enable **"Join spaces and group conversations"** if you want the bot in group chats, or leave it off for DM-only
+6. Under **Connection settings:**
+   - Select **HTTP endpoint URL**
+   - Enter: `https://cortex.dgrzone.com/channels/google-chat/{username}`
+7. Under **Visibility:**
+   - Add the specific Google accounts that should be able to use this bot
+   - For One Sky IT Workspace users: add individuals or the whole domain
+8. Click **Save**
+
+> **Important:** The URL in step 6 must exactly match the `audience` value in `channels.json`. Google includes this URL as the JWT `aud` claim on every request, and Cortex rejects any request where they don't match.
+
+---
+
+## How It Works
+
+1. User sends a message in Google Chat → Google POSTs a signed JSON payload to `/channels/google-chat/{username}`
+2. Cortex reads the user's `channels.json`, verifies the JWT `systemIdToken` from `authorizationEventObject`
+3. Sets the persona context, builds the system prompt, calls the LLM
+4. Returns the response wrapped in `hostAppDataAction → chatDataAction → createMessageAction`
+
+The response must be returned synchronously (Google Chat does not support async/background replies like NC Talk does). The 25s timeout is a hard constraint.
+
+---
+
+## JWT Verification
+
+Google Chat Workspace Add-ons send a `systemIdToken` in the request body at:
+`body["authorizationEventObject"]["systemIdToken"]`
+
+Claims verified by Cortex:
+- `iss` = `https://accounts.google.com`
+- `aud` = the value of `audience` in `channels.json`
+
+If `audience` is empty, verification is skipped (useful for local testing, never in production).
+
+---
+
+## Nginx
+
+The `/channels/` prefix is already public in `auth_middleware.py` — no Nginx changes needed if you're already proxying all traffic to Cortex. Verify the path isn't blocked by basic auth or IP restrictions.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| 404 on the webhook | `channels.json` missing or no `google_chat` key | Create/check `home/{username}/channels.json` |
+| 401 Invalid token | `audience` in `channels.json` doesn't match the endpoint URL | Make them identical — copy the URL exactly |
+| 401 Missing token | No `systemIdToken` in request | Bot may not be a Workspace Add-on; check connection settings type |
+| Timeout / no response | LLM too slow | `backend: "claude"` recommended; reduce context tier if needed |
+| Bot not receiving messages | Visibility not configured | Add the user's Google account under Visibility in Cloud Console |
--- a/docs/NEXTCLOUD_TALK_BOT.md
+++ b/docs/NEXTCLOUD_TALK_BOT.md
@@ -1,69 +1,78 @@
 # Nextcloud Talk Bot Integration

-Inara is registered as a bot in Nextcloud Talk, receiving messages via webhook and replying through the bot API.
+Cortex connects to Nextcloud Talk as a bot — each Cortex user gets their own webhook endpoint routed to their chosen persona.

-**Status:** Live and confirmed working (2026-03-20)
+**Status:** Live and confirmed working (2026-03-20); per-user routing added 2026-03-27

 ---

-## Installation
+## Prerequisites

-Run on the Nextcloud server (inside the Docker container):
-
-```bash
-docker exec -it --user www-data <nc-app-container> php /var/www/html/occ talk:bot:install \
-  "Inara" \
-  "<secret from cortex .env NEXTCLOUD_TALK_BOT_SECRET>" \
-  "https://cortex.dgrzone.com/inara-nextcloud-talk-webhook" \
-  --feature webhook --feature response --feature reaction
-```
-
-After installing, enable the bot in each Talk conversation via the conversation settings UI (three-dot menu → Bots).
-
-To list installed bots and verify registration:
-
-```bash
-docker exec -it --user www-data <nc-app-container> php /var/www/html/occ talk:bot:list
-```
-
-To uninstall (if re-registering with a new secret):
-
-```bash
-docker exec -it --user www-data <nc-app-container> php /var/www/html/occ talk:bot:remove <bot-id>
-```
+- Access to the Nextcloud server (Docker exec or SSH)
+- The Cortex server reachable at a public HTTPS URL
+- The user pre-registered in Cortex (`manage_passwords.py invite`)

 ---

-## Configuration
+## Per-User Setup

-**`cortex/.env`:**
-```
-NEXTCLOUD_URL=https://cloud.dgrzone.com
-NEXTCLOUD_TALK_BOT_SECRET=<shared secret — must match occ install command>
-```
+### 1. Create the user's `channels.json`

-`NEXTCLOUD_URL` defaults to `https://cloud.dgrzone.com` in `config.py`.
+Create `home/{username}/channels.json` on the Cortex server:

-**Nginx:** The `/inara-nextcloud-talk-webhook` endpoint must be reachable by Nextcloud without basic auth. Add a location block before the default `auth_basic` block:
-
-```nginx
-location = /inara-nextcloud-talk-webhook {
-    proxy_pass http://127.0.0.1:8000;
-    proxy_set_header Host $host;
-    proxy_set_header X-Real-IP $remote_addr;
+```json
+{
+  "nextcloud": {
+    "persona": "inara",
+    "url": "https://cloud.dgrzone.com",
+    "bot_secret": "<a secret you choose — must match the occ install command>",
+    "timeout": 55
+  }
 }
 ```

-(The `/channels/` prefix is already bypassed for Google Chat — consider moving the webhook path to `/channels/nextcloud` in a future cleanup to unify the nginx config.)
+- **`persona`** — which persona responds (must exist under `home/{username}/persona/`)
+- **`url`** — base URL of the Nextcloud instance
+- **`bot_secret`** — a shared HMAC secret; you choose this value and use it in both `channels.json` and the `occ` install command
+- **`timeout`** — seconds to wait for the LLM before sending a timeout message (NC Talk is async, so 55s is safe)
+
+### 2. Register the bot in Nextcloud
+
+The Nextcloud container for DgrZone is `dgr_zone_nextcloud-app-1`. Substitute your own container name if different.
+
+First, list existing bots to check if one is already registered (note the bot ID):
+
+```bash
+docker exec -it --user www-data dgr_zone_nextcloud-app-1 php /var/www/html/occ talk:bot:list
+```
+
+If re-registering (new URL or new secret), uninstall the old bot first:
+
+```bash
+docker exec -it --user www-data dgr_zone_nextcloud-app-1 php /var/www/html/occ talk:bot:uninstall <bot-id>
+```
+
+Install the bot:
+
+```bash
+docker exec -it --user www-data dgr_zone_nextcloud-app-1 php /var/www/html/occ talk:bot:install \
+  "Inara" \
+  "<bot_secret from channels.json>" \
+  "https://cortex.dgrzone.com/webhook/nextcloud/{username}" \
+  --feature webhook --feature response --feature reaction
+```
+
+After installing, enable the bot in each Talk conversation: open the conversation → three-dot menu → **Bots** → enable the bot by name.

 ---

 ## How It Works

-1. User sends a message in Talk → Nextcloud POSTs a signed webhook to `/inara-nextcloud-talk-webhook`
-2. Cortex verifies the incoming HMAC signature, extracts the message text, runs it through the LLM
-3. Cortex POSTs the reply to `/ocs/v2.php/apps/spreed/api/v1/bot/{token}/message` with its own HMAC signature
-4. The webhook handler returns HTTP 200 immediately; the LLM call happens in a `BackgroundTask` (prevents Nextcloud from disabling the bot due to slow response)
+1. User sends a message in Talk → Nextcloud POSTs a signed webhook to `/webhook/nextcloud/{username}`
+2. Cortex reads the user's `channels.json`, verifies the incoming HMAC signature
+3. Sets the persona context, builds the system prompt, runs the LLM in a `BackgroundTask`
+4. Returns HTTP 200 immediately (prevents Nextcloud from disabling the bot due to slow response)
+5. Cortex POSTs the reply to `/ocs/v2.php/apps/spreed/api/v1/bot/{token}/message` with its own HMAC signature

 ---

@@ -76,7 +85,6 @@ location = /inara-nextcloud-talk-webhook {
 Nextcloud signs its outgoing webhook with `HMAC-SHA256(secret, random + raw_body)`:

 ```python
-# _verify_signature in nextcloud_talk.py
 expected = hmac.new(
    secret.encode(),
    (random_header + body.decode("utf-8")).encode(),
@@ -89,7 +97,6 @@ expected = hmac.new(
 When Cortex posts a reply, Nextcloud verifies the signature against the *parsed message string*, not the raw body. This is because `BotController::sendMessage` passes the parsed `$message` parameter to `checksumVerificationService::validateRequest`, not `$request->getContent()`.

 ```python
-# _send_reply in nextcloud_talk.py
 sig = hmac.new(
    secret.encode(),
    (random_str + message).encode("utf-8"),  # message text only, NOT json.dumps({"message": ...})
@@ -105,35 +112,50 @@ sig = hmac.new(secret.encode(), (random_str + '{"message": "..."}').encode(), ha

 ---

-## Multi-User Note
+## Nginx

-NC Talk currently uses the **default user and persona** (`settings.default_tier`, `load_context()`). All Talk conversations go to Inara regardless of who is messaging. Per-conversation persona routing (e.g., Holly gets Tina) is a future enhancement — would require mapping Nextcloud user IDs or conversation tokens to Cortex users.
+The `/webhook/` prefix is already public in `auth_middleware.py`. If Nginx applies basic auth or IP restrictions, add a `location` block before the default auth block:

---
-
-## Claude CLI Auth in systemd
-
-The `CLAUDE_CODE_OAUTH_TOKEN` in `.env` goes stale after each `claude auth login` (tokens rotate). Cortex reads the token live from `~/.claude/.credentials.json` on every Claude call (`llm_client._fresh_claude_token()`), so no manual `.env` update is needed after re-authentication.
-
-Also: never set `ANTHROPIC_API_KEY` to an OAuth token value (`sk-ant-oat01-...`) — the Claude CLI treats it as a direct API key and fails. Only real API keys (`sk-ant-api03-...`) belong in `ANTHROPIC_API_KEY`.
+```nginx
+location ^~ /webhook/ {
+    proxy_pass http://127.0.0.1:8000;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+}
+```

 ---

 ## Triggering the Bot

- **@mention** — prefix the message with `@inara` (or whatever `AGENT_NAME` is set to); the prefix is stripped before sending to the LLM
+- **@mention** — prefix the message with `@{persona_name}`; the prefix is stripped before sending to the LLM
 - **Any message** in a conversation where the bot is enabled — all messages are forwarded, not just @mentions

 ---

+## Logs
+
+Two log streams are useful when debugging:
+
+```bash
+# Nextcloud server logs (bot registration errors, webhook rejections)
+docker exec -it --user www-data dgr_zone_nextcloud-app-1 php /var/www/html/occ log:tail
+
+# Cortex service logs (LLM errors, signature failures, timeouts)
+journalctl --user -u cortex -f
+```
+
+---
+
 ## Troubleshooting

 | Symptom | Cause | Fix |
 |---|---|---|
+| 404 on the webhook | `channels.json` missing or no `nextcloud` key | Create/check `home/{username}/channels.json` |
 | Webhook not received | Bot not enabled for conversation | Enable in Talk conversation settings (Bots) |
-| Incoming 401 | Wrong secret in `.env` | Match secret to `occ talk:bot:install` value |
+| Incoming 401 | `bot_secret` in `channels.json` doesn't match `occ install` secret | Re-register with matching secret |
 | Reply POST returns 401 (first try) | HMAC computed over wrong data | Sign `random + message_text` only (not raw JSON body) |
-| Reply POST returns 401 (persistent) | Brute force protection triggered | `occ security:bruteforce:reset <cortex-IP>` |
-| Bot auto-disabled by Nextcloud | Webhook held open too long | Verify `BackgroundTasks` is used — return 200 immediately |
-| Claude falls back to Gemini | Stale/wrong auth token | Token is auto-refreshed from `~/.claude/.credentials.json`; run `claude auth login` if expired |
-| No response at all | Nginx blocking the path with basic auth | Add a `location =` block before the auth block (see Nginx section above) |
+| Reply POST returns 401 (persistent) | Brute force protection triggered | `docker exec -it --user www-data dgr_zone_nextcloud-app-1 php /var/www/html/occ security:bruteforce:reset <cortex-IP>` |
+| Bot auto-disabled by Nextcloud | Webhook held open too long | Verify `BackgroundTasks` is used — Cortex returns 200 immediately |
+| Claude falls back to Gemini | Stale/expired auth token | Run `claude auth login`; token is auto-refreshed from `~/.claude/.credentials.json` |
+| No response at all | Nginx blocking the path | Add a `location ^~ /webhook/` block before any auth block |
--- a/docs/OPEN_WEBUI_API.md
+++ b/docs/OPEN_WEBUI_API.md
@@ -0,0 +1,276 @@
+# Open WebUI API Reference for Cortex
+
+> Last updated: 2026-04-03
+> Source: https://docs.openwebui.com/reference/api-endpoints/
+> Host in use: `http://192.168.32.19:3000` (scott_gaming — 8 GB VRAM)
+
+## Local Model Performance (scott_gaming, 8 GB VRAM)
+
+| Model | Alias | Speed | Practical Context | Spec Context |
+|---|---|---|---|---|
+| Gemma 4 E4B | `agent-support-gemma-small` | ~25 t/s | **72k tokens** | 128k |
+| Gemma 4 26B A4B (MoE) | `agent-support-gemma-medium` | ~9 t/s | **50k tokens** | 256k |
+
+Context is VRAM-constrained — spec limits are higher but KV cache fills available VRAM first.
+Techniques to improve: lower KV cache quantization, flash attention, context length tuning in Ollama.
+
+**Practical implications for the local orchestrator:**
+- System prompt + memory (T2) + tool results + history: budget ~40-50k for small, ~35-40k for medium
+- Medium at 9 t/s is fine for background/async tasks; small at 25 t/s is responsive enough for interactive use
+- Both are well above what's needed for most tool loop iterations (~2-5k tokens per round)
+
+---
+
+## Authentication
+
+All API calls use a bearer token:
+
+```
+Authorization: Bearer sk-<api-key>
+```
+
+API keys are managed in Open WebUI → Settings → Account → API Keys.
+Cortex stores these per-user in `home/{username}/local_llm.json` → `hosts[].api_key`.
+
+---
+
+## Core Endpoints Used by Cortex
+
+### List Available Models
+
+```
+GET /api/models
+Authorization: Bearer sk-...
+```
+
+Returns all models (Ollama, OpenAI-proxied, custom functions).
+Used by `/api/local-llm/fetch-models` in `routers/local_llm.py`.
+
+Response shape:
+```json
+{
+  "data": [
+    { "id": "gemma4-e4b", "name": "Gemma 4 E4B" },
+    ...
+  ]
+}
+```
+
+### Chat Completions (OpenAI-compatible)
+
+```
+POST /api/chat/completions
+Authorization: Bearer sk-...
+Content-Type: application/json
+```
+
+Standard OpenAI chat format. Supports:
+- `messages` — standard role/content array
+- `model` — model ID or workspace alias
+- `tools` + `tool_choice` — function calling (see Tool Loop below)
+- `stream: true/false`
+
+This is the endpoint used by `_local()` in `llm_client.py`.
+
+### Anthropic Messages API Compatibility
+
+```
+POST /api/v1/messages
+Authorization: Bearer sk-...
+```
+
+Open WebUI also accepts Anthropic-format requests and auto-converts them.
+Could be used to route Claude SDK calls through Open WebUI.
+Base URL for this mode: `http://192.168.32.19:3000/api`
+
+### Direct Ollama Proxy
+
+```
+GET  /ollama/api/tags        — list models
+POST /ollama/api/generate    — streaming completions
+POST /ollama/api/embed       — generate embeddings
+```
+
+Use these if you need to bypass Open WebUI's filter layer and hit Ollama directly.
+Ollama is also accessible directly at `http://192.168.32.19:11434`.
+
+---
+
+## Tool / Function Calling
+
+Both Gemma 4 models (E4B and 26B A4B) support function calling via the standard
+OpenAI `tools` parameter. Open WebUI passes this through to the underlying model.
+
+### Request Format
+
+```json
+POST /api/chat/completions
+{
+  "model": "gemma4-26b-a4b",
+  "messages": [
+    { "role": "system", "content": "..." },
+    { "role": "user",   "content": "What's the weather?" }
+  ],
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "web_search",
+        "description": "Search the web for current information",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "query": { "type": "string", "description": "Search query" }
+          },
+          "required": ["query"]
+        }
+      }
+    }
+  ],
+  "tool_choice": "auto"
+}
+```
+
+### Tool Call Response
+
+When the model wants to call a tool, it returns `finish_reason: "tool_calls"`:
+
+```json
+{
+  "choices": [{
+    "finish_reason": "tool_calls",
+    "message": {
+      "role": "assistant",
+      "content": null,
+      "tool_calls": [{
+        "id": "call_abc123",
+        "type": "function",
+        "function": {
+          "name": "web_search",
+          "arguments": "{\"query\": \"current weather NYC\"}"
+        }
+      }]
+    }
+  }]
+}
+```
+
+### Sending Tool Results Back
+
+Append the assistant's tool_call message and a tool result message, then re-submit:
+
+```json
+{
+  "messages": [
+    { "role": "user",      "content": "What's the weather?" },
+    { "role": "assistant", "content": null,
+      "tool_calls": [{ "id": "call_abc123", "function": { "name": "web_search", "arguments": "..." } }] },
+    { "role": "tool",      "tool_call_id": "call_abc123",
+      "content": "Current weather in NYC: 62°F, partly cloudy." }
+  ],
+  "tools": [...],
+  "tool_choice": "auto"
+}
+```
+
+Repeat until `finish_reason: "stop"`.
+
+---
+
+## RAG (Retrieval Augmented Generation)
+
+### Upload a File
+
+```
+POST /api/v1/files/
+Authorization: Bearer sk-...
+Content-Type: multipart/form-data
+
+file=@/path/to/document.pdf
+```
+
+Returns a file ID. Poll `/api/v1/files/{id}/process/status` until `completed`.
+
+### Knowledge Collections
+
+```
+POST /api/v1/knowledge/{collection_id}/file/add
+{ "file_id": "..." }
+```
+
+### Use in Chat
+
+Reference files or knowledge collections in any chat request:
+
+```json
+{
+  "model": "gemma4-26b-a4b",
+  "messages": [...],
+  "files": [
+    { "type": "file",       "id": "file-id" },
+    { "type": "collection", "id": "collection-id" }
+  ]
+}
+```
+
+### Process a Web URL into a Collection
+
+```
+POST /api/v1/retrieval/process/web
+{ "url": "https://example.com/article", "collection_id": "..." }
+```
+
+---
+
+## Filter Behavior with Direct API Calls
+
+Open WebUI supports inlet/outlet filter pipelines. With direct API access:
+
+| Filter    | Runs automatically? |
+|-----------|---------------------|
+| `inlet()` | ✅ Yes              |
+| `stream()`| ✅ Yes              |
+| `outlet()`| ❌ Manual only — call `POST /api/chat/completed` after receiving response |
+
+For Cortex's use case (tool loop orchestration), this is not a concern — we're
+driving the loop ourselves and don't rely on Open WebUI's filter pipeline.
+
+---
+
+## Relevant Cortex Files
+
+| File | Purpose |
+|---|---|
+| `cortex/llm_client.py` — `_local()` | Current local backend (direct chat only) |
+| `cortex/routers/local_llm.py` | Local model settings page + fetch-models endpoint |
+| `cortex/user_settings.py` | Per-user host + model config (`local_llm.json`) |
+| `cortex/orchestrator_engine.py` | Gemini API tool loop — reference for local version |
+| `home/{user}/local_llm.json` | Stored host/model config |
+
+---
+
+## Planned: Local Orchestrator (`local_orchestrator_engine.py`)
+
+A local equivalent of `orchestrator_engine.py` that:
+1. Takes the same tool definitions already registered in `cortex/tools/`
+2. Converts them to OpenAI `tools` format (already close — minor schema diff from Gemini)
+3. Runs a ReAct loop against the local model via `/api/chat/completions`
+4. Falls back gracefully if the model doesn't return a valid tool call
+
+See `documentation/TODO__Agents.md` — `[Local] Tool-capable local orchestrator`.
+
+Model recommendation:
+- **Gemma 4 26B A4B** (256k ctx, MoE — fast for its size) for complex tool tasks
+- **Gemma 4 E4B** (128k ctx) for lightweight/fast tasks
+
+---
+
+## Notes
+
+- Open WebUI workspace aliases (e.g. `agent-support-gemma-small`) resolve to the
+  underlying Ollama model — use aliases in Cortex for human-friendly model names.
+- `tool_choice: "auto"` lets the model decide; `"none"` forces plain text response;
+  `{"type": "function", "function": {"name": "..."}}` forces a specific tool.
+- Gemma 4 models support combined tool use + reasoning (thinking tokens) — useful
+  for complex multi-step tasks.
+- For embeddings (future RAG work), use `/ollama/api/embed` directly.
--- a/documentation/ARCH__BACKENDS.md
+++ b/documentation/ARCH__BACKENDS.md
@@ -0,0 +1,106 @@
+# Architecture: LLM Backends
+
+> How Cortex talks to AI models.
+> Last updated: 2026-04-03
+
+---
+
+## Three Backends
+
+| Backend | Used for | Auth | Config |
+|---|---|---|---|
+| **Claude CLI** | Primary chat, all user-facing responses | OAuth token from `~/.claude/.credentials.json` | `DEFAULT_MODEL` in `.env` |
+| **Gemini CLI** | Fallback when Claude unavailable | Gemini CLI credentials | Auto-fallback |
+| **Local (Open WebUI)** | Private/offline tasks, cost-free use | API key per user in `local_llm.json` | `/settings/local` UI |
+
+The **Gemini API** (google-genai SDK) is also used — but only by the orchestrator tool loop, not as a general chat backend. See [`ARCH__FUTURE.md`](ARCH__FUTURE.md) for the orchestrator pattern.
+
+---
+
+## Backend Selection
+
+User toggles backend in the UI: `claude → gemini → local` (cycles). The active backend is stored server-side; the UI reflects it with color coding (default / green / amber).
+
+When local is active, the active model name appears below the toggle button.
+
+**Fallback chain** (automatic, on error):
+```
+claude  → gemini
+gemini  → claude
+local   → claude
+```
+
+Auth expiry on Claude triggers a UI banner + `claude_auth_expired` SSE event.
+
+---
+
+## Claude Backend (`_claude()`)
+
+Runs `claude --print --no-session-persistence --output-format text` as a subprocess.
+
+- System prompt passed via `--system-prompt`
+- Conversation history formatted as `<conversation>` block
+- Token read live from `~/.claude/.credentials.json` on every call — never relies on the env var, which goes stale after `claude auth login`
+- Model override via `--model` flag (e.g. `claude-opus-4-6`)
+
+Timeout: `TIMEOUT_CLAUDE=60` seconds (`.env`)
+
+---
+
+## Gemini CLI Backend (`_gemini()`)
+
+Runs `gemini --output-format text --extensions "" -p <prompt>` as a subprocess.
+
+- `--extensions ""` disables all MCP extensions — prevents child processes from keeping pipes open after responding
+- `start_new_session=True` puts the process in its own group for clean `os.killpg` on timeout
+- Output is cleaned to strip CLI noise lines (loading messages, retry notices, quota warnings)
+
+Timeout: `TIMEOUT_GEMINI=120` seconds (`.env`)
+
+---
+
+## Local Backend (`_local()`)
+
+HTTP POST to Open WebUI's OpenAI-compatible endpoint: `{api_url}/api/chat/completions`.
+
+Per-user config in `home/{user}/local_llm.json`:
+```json
+{
+  "hosts": [{"id": "...", "label": "scott_gaming", "api_url": "http://192.168.32.19:3000", "api_key": "sk-..."}],
+  "models": [{"id": "...", "host_id": "...", "label": "Gemma 4 Small", "model_name": "agent-support-gemma-small"}],
+  "active_model_id": "..."
+}
+```
+
+Resolution order for active model:
+1. User's `active_model_id` in `local_llm.json`
+2. `.env` server defaults (`LOCAL_API_URL` / `LOCAL_MODEL`)
+3. Error — user is prompted to configure at `/settings/local`
+
+Timeout: `TIMEOUT_LOCAL=300` seconds (`.env`) — local models may need to load from disk.
+
+**Manage at:** `/settings/local` — supports multiple hosts and models per user, "Fetch from host" button to populate model list from the server.
+
+---
+
+## Distillation Backends
+
+Memory distillation runs on a schedule and uses the LLM for mid and long distill passes. By default uses the primary backend (`claude`). Override in `.env`:
+
+```
+DISTILL_BACKEND_MID=local   # saves API credits — Gemma handles summarization well
+DISTILL_BACKEND_LONG=       # empty = use primary (claude recommended for quality)
+```
+
+---
+
+## Current Local Models (scott_gaming, 8 GB VRAM)
+
+| Model | Alias | Speed | Practical Context |
+|---|---|---|---|
+| Gemma 4 E4B | `agent-support-gemma-small` | ~25 t/s | **72k tokens** |
+| Gemma 4 26B A4B (MoE) | `agent-support-gemma-medium` | ~9 t/s | **50k tokens** |
+
+Both support OpenAI `tools` / `tool_choice` function calling — required for the local orchestrator.
+
+Full Open WebUI API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
--- a/documentation/ARCH__CHANNELS.md
+++ b/documentation/ARCH__CHANNELS.md
@@ -0,0 +1,149 @@
+# Architecture: Input Channels
+
+> How messages reach Cortex and how Cortex reaches back.
+> Last updated: 2026-04-03
+
+---
+
+## Channel Summary
+
+| Channel | Direction | Auth | Endpoint |
+|---|---|---|---|
+| Web UI | In + Out | JWT session cookie | `/{user}/{persona}` |
+| Nextcloud Talk | In + Out | HMAC-SHA256 | `POST /webhook/nextcloud/{username}` |
+| Google Chat | In + Out | JWT (Google system token) | `POST /channels/google-chat/{username}` |
+| Cron | Out (proactive) | Internal | APScheduler |
+| Webhooks | In (future) | TBD | `POST /webhook/{source}` |
+
+**Per-user config:** Each channel that needs secrets (NC Talk bot key, Google Chat audience) stores them in `home/{username}/channels.json`. No channel access by default — each user sets up their own.
+
+---
+
+## Web UI
+
+Single-page app served from `cortex/static/`. All chat happens via `POST /chat` (streaming SSE for real-time response) or `POST /orchestrate` (async job, polled).
+
+**Session auth:** Login form (`/login`) → bcrypt password check → JWT cookie (30-day expiry). Google OAuth also available (`/auth/google`). All non-public routes require a valid cookie.
+
+**Modes:**
+- **Direct** — message goes straight to LLM via `/chat`
+- **Agent** — message goes to orchestrator (`/orchestrate`), tool loop runs, result polled and streamed into UI
+
+**Context + Memory panel:** Shows current backend (claude/gemini/local), memory tier, active local model. Toggle backend cycles claude → gemini → local.
+
+**Files panel:** Browse and edit persona markdown files in-browser. Session search at the bottom.
+
+**Settings:** `/settings` — Gemini API key, Google account, connected status. `/settings/local` — local model hosts and models.
+
+---
+
+## Nextcloud Talk
+
+Bot integration. The bot is registered in a Talk room; it receives messages, generates a response, and sends it back via the NC Talk bot API.
+
+**Incoming:** `POST /webhook/nextcloud/{username}`
+- Signature verified: `HMAC-SHA256(secret, random + raw_body)`
+- Ignores non-Create events and non-Note types
+- Strips `@{persona}` mention prefix from message text
+- Processes in background task (immediate 200 response to NC Talk)
+
+**Outgoing:** Bot API `POST /ocs/v2.php/apps/spreed/api/v1/bot/{room}/message`
+- Signature: `HMAC-SHA256(secret, random + message_text)` — note: message text, not body
+- Logic lives in `notification.py` (`_send_nct_message`) — shared with proactive notifications
+
+**Proactive notifications:** Set `notification_room` in `channels.json` → `nextcloud`. Used by distill completion alerts and `message`/`brief` cron jobs.
+
+**Per-user config (`channels.json`):**
+```json
+{
+  "nextcloud": {
+    "persona": "inara",
+    "url": "https://cloud.dgrzone.com",
+    "bot_secret": "...",
+    "notification_room": "<room-token>",
+    "timeout": 55
+  }
+}
+```
+
+Full setup guide: [`docs/NEXTCLOUD_TALK_BOT.md`](../docs/NEXTCLOUD_TALK_BOT.md)
+
+---
+
+## Google Chat
+
+Workspace Add-on. Messages arrive as HTTP POST from Google's infrastructure; the handler returns a JSON response synchronously (no background task — Google expects an immediate reply).
+
+**Incoming:** `POST /channels/google-chat/{username}`
+- Auth: JWT in `authorizationEventObject.systemIdToken`, verified against Google's JWKS
+- Response format: `hostAppDataAction.chatDataAction.createMessageAction`
+
+**Per-user config (`channels.json`):**
+```json
+{
+  "google_chat": {
+    "persona": "inara",
+    "audience": "https://cortex.dgrzone.com/channels/google-chat/scott",
+    "backend": "claude",
+    "timeout": 25
+  }
+}
+```
+
+Full setup guide: [`docs/GOOGLE_CHAT_BOT.md`](../docs/GOOGLE_CHAT_BOT.md)
+
+---
+
+## Cron / Proactive Messages
+
+User-defined scheduled jobs stored in `home/{user}/persona/{name}/CRONS.json`. Registered at startup by `scheduler.py`; manageable via the `cron_*` orchestrator tools.
+
+**Job types:**
+
+| Type | What happens |
+|---|---|
+| `remind` | Appends to `REMINDERS.md` — surfaced in context at tier 2+ |
+| `note` | Appends to `SCRATCH.md` — read on demand |
+| `message` | Sends payload text to user's notification channel |
+| `brief` | Runs LLM with payload as prompt, sends response to notification channel |
+
+**`brief` example — morning briefing:**
+```json
+{
+  "label": "Morning briefing",
+  "schedule": "daily:08:00",
+  "type": "brief",
+  "payload": "Give Scott a brief good morning. Note any pending reminders or tasks due today.",
+  "enabled": true
+}
+```
+
+**Channel selection for `message`/`brief`:**
+1. `channel` field on the job (if set)
+2. `notification_channel` key in `channels.json`
+3. Auto-detect: uses `nextcloud` if configured
+
+**Schedule formats:** `hourly` | `daily` | `daily:HH:MM` | `weekly:DOW` | `weekly:DOW:HH:MM`
+
+---
+
+## Notification Channel Config
+
+`notification_channel` in `channels.json` sets the default outbound channel for all proactive messages (distill alerts, cron message/brief jobs):
+
+```json
+{
+  "notification_channel": "nextcloud",
+  ...
+}
+```
+
+If absent, defaults to `nextcloud` if configured. Currently only NC Talk is supported for outbound; Google Chat outbound is a future item.
+
+---
+
+## Future Channels
+
+- **WhatsApp** — Business API or bridge (not started; needs account)
+- **Gitea webhooks** — push/PR/issue events → orchestrator (router pattern exists; add `gitea.py`)
+- **Aether platform events** — trigger agent actions from business data changes
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -0,0 +1,192 @@
+# Architecture: Planned Features
+
+> What's next and how it's designed to work.
+> Last updated: 2026-04-04
+
+For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
+
+---
+
+## 1. Local Orchestrator
+
+**Status:** High priority — design complete, not yet built.
+
+Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
+
+**Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
+
+**Design:**
+```
+POST /orchestrate  (mode: "local")
+    ↓
+local_orchestrator_engine.py
+    • converts tools/ to OpenAI tools format
+    • POST /api/chat/completions with tools array
+    • parse tool_calls response
+    • execute tool, append result
+    • loop until finish_reason: "stop"
+    ↓
+response returned (local model generates final answer)
+```
+
+Model selection:
+- **Gemma 4 E4B** (25 t/s, 72k ctx) — interactive/fast tasks
+- **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
+
+Context budget per iteration (system prompt + memory + tool results + history):
+- Small model: budget ~40-50k tokens per round
+- Medium model: budget ~35-40k tokens per round
+
+Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
+
+---
+
+## 2. Dev Agent Pipeline
+
+**Status:** Design complete, not yet built.
+
+Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
+
+```
+Task (chat / Gitea issue / Kanban)
+    ↓
+Orchestrator — reads relevant files, routes to specialist
+    ↓
+Specialist Agent (Claude CLI in project directory)
+    • implements the change
+    • runs self-check: py_compile / svelte-check
+    ↓
+Supervisor Agent
+    • reviews the diff
+    • runs test suite
+    • returns: PASS / NEEDS_REVIEW / FAIL + reason
+    ↓
+Human approval gate
+    • summary in Cortex UI or NC Talk
+    • approve → commit (+ optional push)
+    • reject <20><> feedback back to specialist
+```
+
+**Specialists** (both Claude CLI):
+- **Frontend** — working dir: `~/OSIT_dev/aether_app_sveltekit/` — runs `svelte-check` after every change
+- **Backend** — working dir: `~/OSIT_dev/aether_api_fastapi/` — runs `py_compile` + unit tests
+
+**Supervisor** returns structured JSON:
+```json
+{
+  "verdict": "PASS | NEEDS_REVIEW | FAIL",
+  "checks_passed": ["py_compile"],
+  "checks_failed": [],
+  "review_notes": "...",
+  "commit_message": "..."
+}
+```
+
+---
+
+## 3. Gitea Integration
+
+**Status:** Not started. pfSense port forward for SSH already confirmed working.
+
+- **Webhooks → Cortex:** push/PR/issue events → `POST /webhook/gitea` → orchestrator
+  - Router pattern already established; add `cortex/routers/gitea.py`
+- **Gitea Actions CI:** `.gitea/workflows/check.yml` — run `py_compile`/`svelte-check` on push
+- **Cortex → Gitea:** after human approval, call Gitea API to create PR or push branch
+
+SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
+
+---
+
+## 4. Knowledge Layer (AE Journals)
+
+**Status:** Tools exist, import script not yet built.
+
+AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".
+
+**Existing tools:** `ae_journal_search`, `ae_journal_entry_create` — already in orchestrator tool suite.
+
+**Import script (to build):**
+- Walk a markdown directory (Nextcloud, agents_sync docs)
+- Chunk by H2 section
+- Search before creating (deduplication)
+- Tag from frontmatter, filename, directory path
+- Target sources: `~/DgrZone_Nextcloud/`, `~/OSIT_Nextcloud/`
+
+**Agent workflow:**
+```
+"Summarize my notes on WireGuard setup"
+    → orchestrator calls ae_journal_search("wireguard")
+    → returns matching entries
+    → Claude synthesizes response
+```
+
+---
+
+## 5. Intelligent Model Routing
+
+**Status:** Deferred. Currently user-toggled.
+
+Route automatically based on task characteristics rather than requiring manual backend selection:
+
+| Task type | Backend | Reason |
+|---|---|---|
+| User-facing conversation | Claude | Quality prose, persona fidelity |
+| Tool use / orchestration | Gemini API | Native function calling, free tier |
+| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
+| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
+| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
+
+Routing logic would live in `llm_client.py` or a new `router.py` — map task metadata to backend choice.
+
+---
+
+## 6. RAG via Open WebUI
+
+**Status:** Future — Open WebUI already supports it.
+
+Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via `"files": [{"type": "collection", "id": "..."}]`.
+
+Would complement AE Journals for local-only contexts where data shouldn't leave the network.
+
+API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG section.
+
+---
+
+## 8. Agent Architecture Ideas (from Claude Code leak)
+
+**Status:** Research — review before building dev agent pipeline and orchestrator.
+
+The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
+
+- https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
+- https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself
+
+**Ideas worth incorporating:**
+
+**Tiered permission architecture** — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a `brief` cron job accidentally in write mode.
+
+**Agent lineage tracking** — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.
+
+**Cost/budget enforcement** — hard token and cost budgets per operation, multiple budget types. `ORCHESTRATOR_MAX_ROUNDS=10` is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).
+
+**Context compaction/snipping** — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.
+
+**Nested agent delegation with dependency-aware batching** — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).
+
+**File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
+
+**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
+
+---
+
+## 7. Permanent Fleet Hosting
+
+**Status:** Deferred.
+
+Currently running on `scott_lpt` (main laptop). Long-term target: home server (always-on, Docker).
+
+`docker-compose.yml` already exists in the project root. Deployment path:
+1. Copy to home server
+2. Configure reverse proxy (Nginx, already Docker-hosted)
+3. Set subdomain `cortex.dgrzone.com` → home server internal IP
+4. WireGuard required for all access — not internet-exposed
--- a/documentation/ARCH__Intelligence_Layer.md
+++ b/documentation/ARCH__Intelligence_Layer.md
@@ -1,306 +1,14 @@
-# Architecture: Intelligence Layer
+# ARCH__Intelligence_Layer.md — Archived

-**Status:** Design phase — not yet implemented
-**Last updated:** 2026-03-18
+This document has been split into focused per-topic docs.

-This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.
+| What you're looking for | New location |
+|---|---|
+| Overall architecture, design decisions | [`ARCH__SYSTEM.md`](ARCH__SYSTEM.md) |
+| Orchestrator/Responder pattern, tool loop | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 1 |
+| Dev agent pipeline, supervisor agent | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 2 |
+| Knowledge layer, AE Journals import | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 4 |
+| LLM backends and routing | [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) |
+| Model routing (future) | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 5 |

---
-
-## Overview
-
-Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:
-
-1. **Orchestrator/Responder** — Gemini handles tool use and planning; Claude handles the user-facing response
-2. **Dev Agent Pipeline** — Specialist agents implement code changes; a supervisor checks the work
-3. **Knowledge Layer** — AE Journals becomes the primary knowledge base; agents can read and write it
-
-These are independent tracks that share the same trigger layer and can be built incrementally.
-
---
-
-## 1. Orchestrator / Responder Pattern
-
-### The Problem
-
-Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.
-
-### The Pattern
-
-```
-User message
-    ↓
-Orchestrator (Gemini API)
-    • interprets intent
-    • decides which tools to call
-    • executes tool loop (ReAct: reason → act → observe → repeat)
-    • assembles enriched context + tool results
-    ↓
-Responder (Claude CLI)
-    • receives enriched context
-    • writes the user-facing response
-    ↓
-User
-```
-
-For **direct chat** (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).
-
-### Why Gemini API (not CLI)?
-
- Gemini CLI is a subprocess; function calling via subprocess is fragile
- Gemini API (`google-generativeai` SDK) has native structured tool-calling
- Free tier (Gemini 2.0 Flash) handles orchestration load without cost
- Access token is short-lived but auto-refreshed by the SDK (no expiry problem)
-
-### Tool Strategy
-
-Tools for the orchestrator are **separate** from the existing `ae_*` MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.
-
-New orchestrator tools are Python functions wrapped in Gemini function declarations:
-
-| Tool | What it does | Implementation |
-|---|---|---|
-| `web_search` | DuckDuckGo search | `duckduckgo-search` library |
-| `ae_journal_search` | Search AE Journals via V3 API | HTTP to AE API |
-| `ae_journal_entry_create` | Write a new journal entry | HTTP to AE API |
-| `ae_task_list` | Read Kanban tasks | HTTP to AE API or agents_sync file |
-| `file_read` | Read a file from known safe paths | Python `pathlib` |
-| `gitea_api` | Query Gitea repos, issues, PRs | Gitea REST API |
-
-Tools are registered in `cortex/tools/` (one file per domain group).
-
-### Implementation Path
-
-```
-cortex/
-  tools/
-    __init__.py          — tool registry
-    web.py               — web_search
-    ae_knowledge.py      — ae_journal_* tools
-    ae_tasks.py          — task tools
-    gitea.py             — Gitea API tools
-  routers/
-    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
-  orchestrator_engine.py — Gemini tool loop + Claude handoff
-```
-
-Endpoint contract:
-
-```
-POST /orchestrate
-{
-  "task": "What tasks are due this week and summarize my notes on X topic",
-  "session_id": "optional — if part of an ongoing conversation",
-  "respond_with_claude": true   // false = return Gemini's assembled context only
-}
-
-→ { "job_id": "uuid", "status": "queued" }
-
-GET /orchestrate/{job_id}
-→ { "status": "complete", "result": "...", "tool_calls": [...] }
-```
-
---
-
-## 2. Trigger Layer
-
-All three capabilities (chat, orchestration, dev agents) share the same trigger layer:
-
-```
-┌────────────────────────────────────────────────┐
-│  TRIGGERS                                      │
-│                                                │
-│  Chat UI  →  POST /chat  (existing)            │
-│  Cron     →  POST /orchestrate  (new)          │
-│  Gitea    →  POST /webhook/gitea  (new)        │
-│  NC Talk  →  POST /webhook/nextcloud  (exists) │
-│  Manual   →  CLI / curl for debugging          │
-└────────────────────────────────────────────────┘
-```
-
-Cron trigger example (from existing cron infrastructure):
-
-```bash
-curl -X POST http://localhost:8000/orchestrate \
-  -H "Content-Type: application/json" \
-  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'
-```
-
-This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.
-
---
-
-## 3. Dev Agent Pipeline
-
-### The Goal
-
-Accept a plain-English task like *"Fix the bug where X, add a test for it"* and produce:
- A working code change
- Passing syntax/type checks
- A summary of what changed and what still needs human review
- A commit ready to push (pending approval)
-
-### Architecture
-
-```
-Task request (chat / Gitea issue / Kanban)
-    ↓
-Orchestrator
-    • reads relevant files (context gathering)
-    • routes to correct specialist
-    ↓
-Specialist Agent (Claude CLI in project directory)
-    • implements the change
-    • runs self-check: py_compile / svelte-check
-    ↓
-Supervisor Agent
-    • reviews the diff
-    • runs test suite
-    • returns: PASS / NEEDS_REVIEW / FAIL + reason
-    ↓
-Human approval gate
-    • summary shown in Cortex UI or NC Talk
-    • user approves → commit + optional push
-    • user rejects → feedback goes back to specialist
-```
-
-### Specialist Agents
-
-Two initial specialists, both using Claude CLI:
-
-**Frontend specialist** (working dir: `~/OSIT_dev/aether_app_sveltekit/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `npx svelte-check` after every change — no exceptions
- Atomic commits (one component or fix per commit)
-
-**Backend specialist** (working dir: `~/OSIT_dev/aether_api_fastapi/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `python3 -m py_compile` after every file edit
- Runs unit tests before declaring done
- Flags E2E tests that need human review
-
-### Supervisor Agent
-
-The supervisor is a separate Claude invocation that receives:
- The diff of all changed files
- Stdout/stderr from all checks that were run
- The original task description
-
-It returns a structured assessment:
-
-```json
-{
-  "verdict": "PASS | NEEDS_REVIEW | FAIL",
-  "checks_passed": ["py_compile", "unit_tests"],
-  "checks_failed": [],
-  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
-  "commit_message": "fix: correct session token validation in auth middleware"
-}
-```
-
-### Gitea Integration
-
- **Gitea webhooks → Cortex:** Push/PR events trigger supervisor review automatically
- **Gitea Actions:** Run `py_compile`/`svelte-check` on every push (simple CI, no custom runner)
- **Cortex → Gitea:** After human approval, supervisor calls Gitea API to create PR or push
-
-Gitea Actions are simpler than they sound — a `.gitea/workflows/check.yml` is just a YAML file that runs shell commands on push. No external CI infrastructure needed.
-
---
-
-## 4. Knowledge Layer
-
-### The Goal
-
-AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.
-
-### Import Strategy
-
-1. **Don't bulk-import blindly.** The orchestrator searches AE Journals before creating anything (deduplication).
-2. **Chunk by section.** A large markdown file becomes multiple journal entries — one per H2 section.
-3. **Preserve provenance.** Each imported entry includes source path, import date, and original file date in its `data_json` or notes.
-4. **Tag intelligently.** Tags come from: frontmatter, filename keywords, directory path, and content analysis.
-
-### Source Priority
-
-| Source | Priority | Notes |
-|---|---|---|
-| `~/DgrZone_Nextcloud/` | High | Personal notes, projects |
-| `~/OSIT_Nextcloud/` | High | Business docs |
-| `~/agents_sync/aether/docs/` | Medium | Platform specs (already structured) |
-| OpenClaw session logs | Low | Historical, lots of noise |
-
-### Agent Workflow
-
-```
-"Summarize my notes on WireGuard setup"
-    ↓
-Orchestrator calls ae_journal_search("wireguard")
-    ↓
-Returns matching entries
-    ↓
-Claude synthesizes a response
-```
-
-```
-"Save this as a note in my DgrZone journal"
-    ↓
-Orchestrator calls ae_journal_entry_create(
-    journal="DgrZone General",
-    title="...",
-    content="...",
-    tags=["note", "wireguard"]
-)
-```
-
-### Context Tiers (Inara Memory)
-
-The existing distill system (`MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`) handles working memory. The Knowledge Layer is complementary — it's the **searchable long-term archive**, not the rolling context window. Agents should:
-
- Use memory files for "what have we been working on lately"
- Use AE Journals search for "what do I know about topic X"
-
---
-
-## 5. Model Routing (Future)
-
-Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:
-
-| Task type | Model | Reason |
-|---|---|---|
-| User-facing conversation | Claude | Quality prose, reasoning |
-| Tool use / orchestration | Gemini API | Native function calling, free |
-| Private / sensitive | Ollama (local) | No data leaves the network |
-| Long context (>100k tokens) | Gemini 2.0 | 1M token context window |
-| Code generation | Claude | Strong code quality |
-
-Routing logic lives in `cortex/orchestrator_engine.py` — a simple function that maps task metadata to a backend choice.
-
---
-
-## Implementation Order (Recommended)
-
-1. **Orchestrator Phase 1** — Gemini API integration, basic tool loop, `/orchestrate` endpoint
-   - Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
-2. **Knowledge import** — markdown → AE Journal Entries tool + import script
-   - Unlocks: searchable knowledge base for all agents
-3. **Dev agent pipeline** — Frontend + Backend specialist agents
-   - Unlocks: AI-assisted development with supervisor review
-4. **Gitea integration** — webhook receiver + Actions CI
-   - Unlocks: event-driven automation, PR workflow
-5. **Intelligent routing** — model selection by task type
-   - Polish: cost and quality optimization
-
---
-
-## Key Design Decisions
-
-| Decision | Choice | Rationale |
-|---|---|---|
-| Orchestrator model | Gemini API (not CLI) | Native tool calling; free tier |
-| Responder model | Claude CLI (Pro sub) | Quality output; no API cost |
-| Direct chat bypass | Yes | Don't add latency when tools aren't needed |
-| Tool set | Separate from ae_* MCPs | ae_* tools are stable; don't risk breaking active agents |
-| Dev agents | Claude CLI in project dir | CLAUDE.md + project context already in place |
-| Human approval gate | Required before commit | Agents can propose; humans decide |
-| Knowledge primary source | AE Journals | Already exists, structured, searchable |
+*Original content written 2026-03-18. Superseded 2026-04-03.*
--- a/documentation/ARCH__PERSONA.md
+++ b/documentation/ARCH__PERSONA.md
@@ -0,0 +1,121 @@
+# Architecture: Persona System & Memory
+
+> How Inara (and other personas) know who they are and what they remember.
+> Last updated: 2026-04-03
+
+---
+
+## Filesystem Layout
+
+Each persona lives in `home/{username}/persona/{name}/`:
+
+```
+home/scott/persona/inara/
+  IDENTITY.md       Who Inara is — role, name, origin
+  SOUL.md           Values, personality, voice, what she cares about
+  PROTOCOLS.md      Behavioral rules — how she responds, what she avoids
+  CONTEXT_TIERS.md  Documents which files load at each tier
+  USER.md           Scott's profile — loaded into context so she knows who she's talking to
+  HELP.md           Persona-specific help content (appended to shared HELP.md in UI)
+  MEMORY_SHORT.md   Recent session digest (auto-distilled daily)
+  MEMORY_MID.md     Mid-term summary (auto-distilled weekly)
+  MEMORY_LONG.md    Long-term memory (auto-distilled monthly)
+  REMINDERS.md      Pending reminders (auto-surfaced at tier 2+)
+  SCRATCH.md        Ephemeral scratchpad (read/write via tools)
+  TASKS.json        Personal task list (managed via tools)
+  CRONS.json        Scheduled jobs (managed via tools)
+  sessions/         Session turn logs — YYYY-MM-DD.md, one file per day
+```
+
+**ContextVars:** `persona.py` sets `_user` and `_persona` ContextVars per request. Everything downstream calls `persona_path()` to resolve the right directory — no globals, no thread-local state.
+
+---
+
+## Context Tiers
+
+Each chat request specifies a tier (default: 2). Higher tiers load more context — slower but richer.
+
+| Tier | Loaded Files | Use case |
+|---|---|---|
+| 1 | IDENTITY.md | Minimal — lightweight tasks |
+| 2 | + SOUL.md, PROTOCOLS.md, USER.md, MEMORY_SHORT.md, MEMORY_MID.md, REMINDERS.md | Standard chat |
+| 3 | + MEMORY_LONG.md, CONTEXT_TIERS.md | Deep sessions, long tasks |
+| 4 | + SCRATCH.md, TASKS.json | Full state — agent mode |
+
+`context_loader.py` assembles the system prompt from these files in order. The resulting prompt is passed to whichever LLM backend handles the request.
+
+---
+
+## Memory Distillation
+
+Three-tier rolling memory system, run by APScheduler:
+
+```
+sessions/YYYY-MM-DD.md  ← raw session logs (written by session_logger.py)
+        ↓ daily 03:00
+MEMORY_SHORT.md         ← recent session digest (no LLM — pure aggregation)
+        ↓ weekly Sun 03:30
+MEMORY_MID.md           ← concise summary (LLM)
+        ↓ monthly 1st 04:00
+MEMORY_LONG.md          ← integrated long-term memory (LLM)
+```
+
+**Short distill** — reads the most recent session files that fit within the token budget, writes them in chronological order. No LLM involved — fast and cheap.
+
+**Mid distill** — LLM summarizes MEMORY_SHORT into a concise digest. Prompt asks for recurring themes, decisions, ongoing projects, Scott's current state and priorities. Written in first person as Inara.
+
+**Long distill** — LLM integrates MEMORY_MID into MEMORY_LONG. Rules: preserve historical facts, update stale info, absorb new themes, remove irrelevant entries.
+
+**Distill notifications** — after mid and long runs, `notification.py` sends a message to the user's configured NC Talk notification room (if `notification_room` is set in `channels.json`).
+
+**Controls** in `.env`:
+```
+AUTO_DISTILL=true
+AUTO_DISTILL_SHORT=true
+AUTO_DISTILL_MID=true
+AUTO_DISTILL_LONG=true          # off by default — first run warrants manual review
+DISTILL_BACKEND_MID=local       # use local model to save API credits
+DISTILL_BACKEND_LONG=           # empty = primary backend (claude recommended)
+MEMORY_BUDGET_SHORT=3000        # token budgets (soft caps)
+MEMORY_BUDGET_MID=2000
+MEMORY_BUDGET_LONG=2000
+```
+
+Manual distill via API:
+```
+POST /distill/short
+POST /distill/mid
+POST /distill/long
+GET  /distill/status
+```
+
+---
+
+## Adding a New Persona
+
+`persona_template.py` bootstraps a new persona directory from string templates. The onboarding flow (`/setup/persona`) calls this when a new user creates their first persona.
+
+To add one manually:
+1. Create `home/{username}/persona/{name}/`
+2. Copy and edit the files from an existing persona (e.g. `home/scott/persona/inara/`)
+3. At minimum: `IDENTITY.md`, `SOUL.md`, `PROTOCOLS.md`, `USER.md`
+4. The distiller will create the `MEMORY_*.md` files on first run
+
+---
+
+## Session Search
+
+Past sessions are searchable via `GET /sessions/search?q=...&user=...&persona=...`.
+
+Available in the UI via the search box at the bottom of the Files panel (open with the Files button). Results are grouped by date with highlighted excerpts.
+
+---
+
+## Active Personas
+
+| User | Persona | Description |
+|---|---|---|
+| scott | inara | Scott's primary assistant |
+| scott | developer | Dev-focused persona |
+| holly | tina | Holly's primary assistant |
+| brian | wintermute | Brian's primary assistant |
--- a/documentation/ARCH__SYSTEM.md
+++ b/documentation/ARCH__SYSTEM.md
@@ -0,0 +1,90 @@
+# Architecture: System Overview
+
+> How the pieces fit together.
+> Last updated: 2026-04-03
+
+---
+
+## Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  INPUT CHANNELS                                         │
+│                                                         │
+│  Web UI ──────────────────────────────────────────┐     │
+│  Nextcloud Talk ──── POST /webhook/nextcloud/{u} ─┤     │
+│  Google Chat ─────── POST /channels/google-chat/{u}┤    │
+│  Cron / Scheduler ─────────────────────────────────┤    │
+│  Webhooks (future) ─────────────────────────────────┘   │
+└─────────────────────────────┬───────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────┐
+│  CORTEX DISPATCHER  (FastAPI — cortex/)                 │
+│                                                         │
+│  auth_middleware.py  → validates JWT session cookie     │
+│  persona.py          → resolves user + persona context  │
+│  context_loader.py   → assembles system prompt (tier 1-4)│
+│                                                         │
+│  POST /chat          → direct LLM, streaming SSE        │
+│  POST /orchestrate   → Gemini tool loop → Claude        │
+│  GET  /orchestrate/{id} → poll job result               │
+└────────────┬───────────────────┬────────────────────────┘
+             ↓                   ↓
+┌─────────────────┐   ┌──────────────────────────────────┐
+│  LLM BACKENDS   │   │  PERSONA DATA                    │
+│                 │   │  home/{user}/persona/{name}/      │
+│  Claude CLI     │   │                                  │
+│  Gemini CLI     │   │  IDENTITY.md  SOUL.md            │
+│  Gemini API     │   │  PROTOCOLS.md MEMORY_*.md        │
+│  Local (httpx)  │   │  USER.md  REMINDERS.md           │
+│                 │   │  TASKS.json  CRONS.json          │
+└─────────────────┘   │  sessions/  SCRATCH.md          │
+                      └──────────────────────────────────┘
+```
+
+Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__PERSONA.md) | [`ARCH__CHANNELS.md`](ARCH__CHANNELS.md)
+
+---
+
+## Service Layout (`cortex/`)
+
+| File | Purpose |
+|---|---|
+| `main.py` | App entry point, router registration |
+| `config.py` | All settings (pydantic-settings, reads `.env`) |
+| `persona.py` | User + persona path resolution, ContextVars |
+| `context_loader.py` | Builds system prompt from persona files (tiers 1–4) |
+| `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
+| `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
+| `session_store.py` | In-memory + file session persistence |
+| `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
+| `memory_distiller.py` | Short/mid/long distill jobs |
+| `scheduler.py` | APScheduler — distill jobs + user crons |
+| `cron_runner.py` | Cron job storage, schedule parsing, execution |
+| `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
+| `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
+| `auth_middleware.py` | JWT cookie validation on all routes |
+| `user_settings.py` | Per-user local LLM config (hosts, models, active model) |
+| `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
+| `email_utils.py` | SMTP invite emails |
+| `persona_template.py` | Bootstrap a new persona directory from templates |
+| `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
+| `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
+| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` |
+| `tests/` | pytest suite (80 tests) |
+
+---
+
+## Key Design Decisions
+
+**Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
+
+**Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
+
+**Local backend via httpx** — Open WebUI's OpenAI-compatible API (`/api/chat/completions`). No CLI wrapper. Per-user host + model config in `local_llm.json`.
+
+**ContextVars for async isolation** — `persona.py` uses Python `contextvars.ContextVar` so concurrent requests each see their own user/persona without thread-local hacks.
+
+**Per-user filesystem layout** — `home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
+
+**No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
--- a/documentation/MASTER.md
+++ b/documentation/MASTER.md
@@ -0,0 +1,92 @@
+# Cortex / Inara — Master Index
+
+> Start here. This document is a map, not a manual.
+> Last updated: 2026-04-03
+
+---
+
+## What It Is
+
+Cortex is a self-hosted personal AI platform. It routes messages from any input channel to AI backends, manages a resident agent (Inara) with persistent memory, and coordinates across a fleet of machines. It is infrastructure, not a product.
+
+**Running at:** `https://cortex.dgrzone.com` | `systemctl --user restart cortex`
+
+---
+
+## Current State
+
+| Component | Status | Notes |
+|---|---|---|
+| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, session auth |
+| Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing |
+| Google Chat Add-on | ✅ Live | JWT-verified, per-user routing |
+| Claude backend | ✅ Live | Primary — via Claude Code CLI |
+| Gemini backend | ✅ Live | Fallback — via Gemini CLI |
+| Local backend | ✅ Live | Third option — Open WebUI/Ollama on scott_gaming |
+| Gemini orchestrator | ✅ Live | Tool loop → Claude response, Agent mode in UI |
+| Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
+| Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas |
+| Session search | ✅ Live | Full-text search across past session logs |
+| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk |
+
+**Active users / personas:** scott/inara, scott/developer, holly/tina, brian/wintermute
+
+---
+
+## Document Map
+
+### Project-Level
+| Doc | What it covers |
+|---|---|
+| **This file** | Index and current state |
+| [`CORTEX.md`](../CORTEX.md) | Vision, philosophy, "what it is and isn't" |
+| [`ROADMAP.md`](ROADMAP.md) | Phases — what's done, what's next, what's deferred |
+| [`TODO__Agents.md`](TODO__Agents.md) | Active task list — read before starting work |
+
+### Architecture
+| Doc | What it covers |
+|---|---|
+| [`ARCH__SYSTEM.md`](ARCH__SYSTEM.md) | Overall architecture, component map, key design decisions |
+| [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | LLM backends, routing, fallback, per-user config |
+| [`ARCH__PERSONA.md`](ARCH__PERSONA.md) | Persona system, context tiers, memory distillation |
+| [`ARCH__CHANNELS.md`](ARCH__CHANNELS.md) | Input channels — web, NC Talk, Google Chat, cron |
+| [`ARCH__FUTURE.md`](ARCH__FUTURE.md) | Planned: local orchestrator, dev agents, knowledge layer |
+
+### Setup & Reference
+| Doc | What it covers |
+|---|---|
+| [`docs/NEXTCLOUD_TALK_BOT.md`](../docs/NEXTCLOUD_TALK_BOT.md) | NC Talk bot setup and troubleshooting |
+| [`docs/GOOGLE_CHAT_BOT.md`](../docs/GOOGLE_CHAT_BOT.md) | Google Chat Add-on setup |
+| [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) | Open WebUI/Ollama API reference for local model work |
+
+### Code-Level
+| Doc | What it covers |
+|---|---|
+| [`CLAUDE.md`](../CLAUDE.md) | Project instructions for Claude Code — directory map, run commands, design decisions |
+| [`README.md`](../README.md) | Project root orientation, quick-start, user management |
+| [`cortex/static/HELP.md`](../cortex/static/HELP.md) | In-app help (rendered in UI for all users) |
+
+---
+
+## Quick Reference
+
+**Start the service / check logs**
+```bash
+systemctl --user restart cortex
+journalctl --user -u cortex -f
+```
+
+**Syntax check before restart**
+```bash
+python3 -m py_compile cortex/<file>.py
+```
+
+**Add a user**
+```bash
+cd cortex && .venv/bin/python manage_passwords.py invite <username> <email>
+```
+
+**Run tests**
+```bash
+cd cortex && .venv/bin/python -m pytest tests/ -q
+```
--- a/documentation/ROADMAP.md
+++ b/documentation/ROADMAP.md
@@ -0,0 +1,71 @@
+# Cortex — Roadmap
+
+> Phases and priorities. For active tasks see `TODO__Agents.md`.
+> Last updated: 2026-04-03
+
+---
+
+## Phase 0 — Foundation ✅
+- Syncthing fleet sync (`agents_sync/`) operational
+- MCP tools (`ae_*`) available in all Claude Code sessions
+- Fleet agents running independently on each machine
+
+## Phase 1 — Dispatcher Core ✅
+- FastAPI service with streaming SSE responses
+- Claude CLI and Gemini CLI subprocess backends
+- Session context management (rolling window, file persistence)
+- Nextcloud Talk bot (HMAC-signed webhook)
+- Memory distiller (APScheduler — short/mid/long cycles)
+- Local web UI (single-page, mobile-responsive)
+- Auth status monitoring (`/auth/status`, UI banner)
+- Session logging and file browser
+
+## Phase 2 — Identity & Multi-User ✅
+- Inara persona formalized (`IDENTITY.md`, `SOUL.md`, `PROTOCOLS.md`, context tiers)
+- Two-level user/persona layout (`home/{user}/persona/{name}/`)
+- Session auth: bcrypt passwords, JWT cookies, invite tokens, Google OAuth
+- Multi-user live: Scott, Holly, Brian
+- Per-user channel config (`channels.json`)
+- Per-user Gemini API key (settings UI)
+- Help & Reference system (shared base + per-persona additions)
+- Lucide icons, persona picker page, session persistence across navigation
+
+## Phase 3 — Intelligence Layer (In Progress)
+- ✅ Gemini API orchestrator (tool loop → Claude responder)
+- ✅ Tool suite: web search, AE Journal read/write, tasks, scratch, reminders, cron, system
+- ✅ Agent mode in UI (async job, poll for result)
+- ✅ Local LLM backend (Open WebUI/Ollama, per-user multi-model config)
+- ✅ Proactive cron (`message` / `brief` job types → NC Talk)
+- ✅ Session search (full-text across past session logs)
+- ✅ Distill notifications (NC Talk after mid/long runs)
+- ✅ Local backend for distillation (DISTILL_BACKEND_MID/LONG in .env)
+- [ ] **Local orchestrator** — ReAct tool loop using local model (High priority — see `TODO__Agents.md`)
+- [ ] Knowledge import — markdown → AE Journals (import script)
+- [ ] Dev agent pipeline — specialist agents + supervisor + approval gate
+- [ ] Gitea webhook integration + Actions CI
+
+## Phase 4 — Channel Expansion
+- ✅ Web UI
+- ✅ Nextcloud Talk
+- ✅ Google Chat
+- [ ] WhatsApp (Business API or bridge — investigating)
+- [ ] Webhook triggers from Aether platform events
+
+## Phase 5 — Routing Intelligence & Scale
+- [ ] Intelligent model routing (by task type, privacy, context length)
+- [ ] Agent-to-agent task delegation across fleet
+- [ ] Permanent hosting on home server (currently on `scott_lpt`)
+
+## Phase 6 — Infrastructure
+- [ ] Server DMZ finalized
+- [ ] WireGuard for all Cortex-accessing devices
+- [ ] Camera/IoT VLAN segmentation
+
+---
+
+## Deferred / Watching
+- **Unsloth Gemma 4 GGUFs** — blocked on Ollama v0.20.1 (llama.cpp GGUF metadata issue); switch `agent-support-gemma-*` aliases to Unsloth Q4_K_M when ready
+- **Speculative decoding** — llama.cpp supports it (E4B + E2B draft ≈ 2x speed); Ollama does not yet
+- **RAG via Open WebUI** — feed Nextcloud docs into local knowledge collections; possible complement to AE Journals search
+- **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
+- **WhatsApp** — requires Business API account or a bridge; not started
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -7,57 +7,49 @@

 ## 🔴 High Priority

-### [Auth] Token expiry — sudo restart
- Cortex currently requires `sudo systemctl restart cortex` after OAuth token refresh
- This must be done manually by the user (cannot run interactively from Claude Code)
- **Future:** Explore hot-reload or token-passing mechanism so restart isn't required
+### [Local] Tool-capable local orchestrator
+Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
+a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
+Gemini API orchestrator for private/offline tasks.

-### [Backend] Ollama local model backend
- Add Ollama as a third LLM backend option (direct Ollama API, no CLI wrapper)
- Endpoint: `http://scott-gaming:<port>/api/` (WireGuard)
- Model selection: configurable per-request or per-session
- Auth status check: ping `/api/tags` to confirm reachability
-
-### [Testing] Gitea SSH port 2222
- pfSense port forward configured but not yet verified end-to-end
- Test: `ssh -p 2222 git@<external>` from outside WireGuard
- Document result in this file
+- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
+      `FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
+- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
+      append result → loop until `finish_reason: stop`
+- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
+- [ ] UI: Agent mode button routes to local orchestrator when local backend active
+- [ ] Recommended models (scott_gaming, 8 GB VRAM):
+      Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
+      Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
+- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format

 ---

 ## 🟡 Medium Priority

-### [Intelligence] Orchestrator service — Phase 1 ✅ Complete
-See `ARCH__Intelligence_Layer.md` for full design. Committed: `ed472ce` (2026-03-18)
- [x] Add Gemini API (google-generativeai SDK) as a library dependency (not CLI)
- [x] Create `cortex/routers/orchestrator.py` — `POST /orchestrate` endpoint
- [x] Basic tool registry: web search (DuckDuckGo), AE API query, file read, task list
- [x] ReAct loop: Gemini calls tools, assembles context, hands off to Claude for final response
- [x] `GET /orchestrate/{job_id}` — poll for status/result
- [x] Cron can trigger via HTTP POST (same endpoint)
- **Note:** Default model is `gemini-2.5-flash` — free tier key required (AI Studio)
-
 ### [Intelligence] Knowledge consolidation — Phase 1
-See `ARCH__Intelligence_Layer.md` for full design. Initial scope:
- [ ] Tool: `ae_journal_search` — search before creating to avoid duplicates
- [ ] Tool: `ae_journal_entry_create` — write a new entry with source metadata
+See `ARCH__Intelligence_Layer.md` for full design.
+- [x] Tool: `ae_journal_search` — search before creating to avoid duplicates
+- [x] Tool: `ae_journal_entry_create` — write a new entry with source metadata
 - [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
 - [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
 - [ ] Tag strategy: source path, date, topic tags from frontmatter or filename

-### [Channel] Nextcloud Talk integration ✅ Complete
- NC Talk bot is implemented (`cortex/routers/nextcloud_talk.py`)
- HMAC: incoming uses `random + raw_body`; outgoing reply uses `random + message_text` — both correct
- [x] Test end-to-end after any Cortex restart — confirmed working 2026-03-20
- [x] Bot registration docs completed in `docs/NEXTCLOUD_TALK_BOT.md` — 2026-03-20
- **Note:** Currently uses default user/persona only — per-conversation persona routing is a future enhancement
+### [Distill] Review first auto_distill_long output — 2026-04-01
+- Ran April 1 at 04:00 as scheduled
+- Manually review `inara/MEMORY_LONG.md` — confirm quality before fully trusting
+- Adjust distill prompts in `cortex/memory_distiller.py` if needed

-### [Multi-user] Holly onboarding
- Multi-user is built into Cortex — single instance, multiple users under `home/`
- `home/holly/persona/tina/` directory created from template (stub content — needs real persona files)
- [ ] Send Holly's invite email: `python manage_passwords.py invite holly holly.danner@gmail.com`
- [ ] Walk Holly through onboarding flow (`/setup/{token}` → persona creation)
- [ ] Review and flesh out Tina's persona files (IDENTITY.md, SOUL.md, PROTOCOLS.md, USER.md)
+### [Distill] Distill quality review
+- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
+- After first few automatic runs, review quality and tune
+
+### [Local] Unsloth Gemma 4 variants
+- Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0
+- Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
+- Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
+- Expected speedup: ~10–20% smaller context footprint vs baseline, same quality
+- `agent-support-gemma-small` → Unsloth E4B Q4_K_M; `agent-support-gemma-medium` → Unsloth 26B A4B Q4_K_M

 ---

@@ -81,84 +73,147 @@ See `ARCH__Intelligence_Layer.md`. Full design not yet started.
 - `cortex/routers/` already has pattern; add `gitea.py`
 - Gitea Actions (CI) for "run tests on push" — simpler than custom runner

-### [Auth] Session auth + persona onboarding ✅ Complete
- bcrypt passwords stored in `home/{username}/auth.json`
- JWT session cookies (HS256, 30-day expiry) — `auth_utils.py`, `auth_middleware.py`
- Login/logout at `/login`, `/logout`
- Invite tokens (72h, one-time-use) — admin generates via `manage_passwords.py invite <user> [email]`
- Self-service onboarding: `/setup/{token}` (set password) → `/setup/persona` (create persona)
- Multi-persona switcher in UI header — `/api/personas` endpoint
- SMTP invite email — `noreply@oneskyit.com`, HTML + plain text body
- CSS routing fix — `app.mount("/static")` must precede `app.include_router(ui.router)`
- Committed: 2026-03-20
-
-### [Channel] Google Chat integration ✅ Complete
-See `cortex/routers/google_chat.py`. Committed: 2026-03-20
- [x] JWT verification via `authorizationEventObject.systemIdToken` (audience = endpoint URL, issuer = accounts.google.com)
- [x] Workspace Add-on event format: event type inferred from payload key (`messagePayload`, `addedToSpacePayload`, etc.)
- [x] Response format: `hostAppDataAction.chatDataAction.createMessageAction.message.text`
- [x] Session management, LLM pipeline, session logging — same pattern as NC Talk
- [x] Nginx: `/channels/` prefix exposed without basic auth (covers all future channel integrations)
- **Note:** Google Chat API now forces the Workspace Add-on framework — legacy standalone bot format is gone.
-  `{"text": "..."}` and `renderActions` do NOT work; `hostAppDataAction` is required.
-
-### [Distill] Monitor first auto_distill_long run
- Scheduled for ~April 1 at 04:00
- Manually review `inara/MEMORY_LONG.md` output before fully trusting
- Adjust distill prompts if needed
-
-### [Distill] Distill quality review
- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
- After first few automatic runs, review quality and tune
+### [Local] RAG via Open WebUI
+Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections →
+reference in chat). Could feed Nextcloud docs or session logs into a local knowledge
+base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`.
+- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
+- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`

 ### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback
- Future: route by task type (code → Claude, search → Gemini, private → Ollama)
- Future: route by context length (Gemini 2.0 has 1M token context)
+- Currently hardcoded: Claude default, Gemini fallback, local third
+- Design direction (now informed by real local model perf):
+  - **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
+  - **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
+  - **Final user-facing responses** → Claude (quality prose, persona fidelity)
+- Future: auto-route by task type rather than requiring user to toggle backend manually

 ---

 ## ✅ Completed

-### [UI] Mobile-friendly header
+### [Local] Per-user multi-model local LLM settings — 2026-04-01
+- `home/{username}/local_llm.json` — `hosts[]` + `models[]` + `active_model_id` structure
+- `cortex/user_settings.py` — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
+- `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` — dedicated `/settings/local` page
+- "Fetch models from host" button — proxied via `/api/local-llm/fetch-models`, populates dropdown
+- Active model shown in UI near backend toggle button (amber hint text)
+- Migrates old flat `.env`-style config automatically on first use
+
+### [UI] Copy button for user (sent) messages — 2026-04-01
+- Added matching copy-on-hover button to user messages (same pattern as assistant messages)
+- `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline
+
+### [Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01
+- OpenAI-compatible API via `httpx` — no CLI wrapper needed
+- Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env`
+- Backend toggle cycles `claude → gemini → local` (amber color in UI)
+- `/auth/status` includes local reachability check (`GET /api/models`)
+- Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly
+
+### [Testing] Gitea SSH port 2222 — 2026-03-29
+- pfSense WAN → 192.168.32.7:2222 port forward confirmed working
+- `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
+- Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
+
+### [Multi-user] Brian onboarding — 2026-03-29
+- Invite sent to `memedrift@gmail.com`
+- Brian completed onboarding, created `wintermute` persona
+- Google OAuth registered (`google-add brian memedrift@gmail.com`)
+
+### [Tools] Reminders tools — 2026-03-29
+- `reminders_add`, `reminders_list`, `reminders_clear` added to orchestrator tool suite
+- Tools live in `cortex/tools/reminders.py`
+- All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
+- `persona_template.py` updated so new personas get the protocol automatically
+
+### [Auth] Token expiry — no restart needed — 2026-03-27
+- `llm_client._fresh_claude_token()` reads live from `~/.claude/.credentials.json` on every call
+- systemd service is a user unit (no sudo) — `systemctl --user restart cortex` is sufficient
+- No manual token sync required after `claude auth login`
+
+### [Multi-user] Per-user channel config — 2026-03-27
+- Google Chat and NC Talk secrets/config moved from `.env` to `home/{username}/channels.json`
+- New endpoints: `POST /channels/google-chat/{username}` and `POST /webhook/nextcloud/{username}`
+- No channel access by default — each user configures their own `channels.json`
+- Setup guides: `docs/GOOGLE_CHAT_BOT.md` and `docs/NEXTCLOUD_TALK_BOT.md`
+
+### [Auth] Google OAuth sign-in — 2026-03-27
+- `GET /auth/google` → Google consent → `GET /auth/google/callback` flow
+- Users pre-registered via `manage_passwords.py google-add <user> <email>`
+- Google sign-in button on `/login`; auth.json stores `google_sub` + `google_email`
+- Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)
+
+### [Settings] Per-user Gemini API key — 2026-03-27
+- Stored in `home/{username}/auth.json` as `gemini_api_key`
+- Orchestrator uses user key if set, falls back to server-level `GEMINI_API_KEY`
+- Manageable via `/settings` UI (add, remove, masked hint)
+
+### [UI] Session persistence across navigation — 2026-03-26
+- localStorage keyed to `cx_sid_{user}_{persona}` with 30-min inactivity TTL
+- Auto-restored silently on page load; cleared on "New session" or session delete
+
+### [UI] Persona picker page — 2026-03-26
+- `GET /{username}` shows a card grid of available personas instead of 404
+- Each card links directly to `/{username}/{persona}`
+
+### [UI] Lucide icons — 2026-03-25
+- Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
+- Loaded via UMD CDN; `icon_html()` + `render_icons()` helpers in `app.js`
+
+### [UI] Persona-specific favicon — 2026-03-25
+- Emoji SVG favicon generated from persona config at load time
+
+### [Multi-user] Holly onboarding — 2026-03-20
+- Holly's invite sent; onboarding completed via `/setup/{token}`
+- `home/holly/persona/tina/` created from template
+- Google OAuth registered (`holly.danner@gmail.com`)
+
+### [Channel] Nextcloud Talk integration ✅ — 2026-03-20, updated 2026-03-27
+- HMAC verification: incoming uses `random + raw_body`; outgoing reply uses `random + message_text`
+- Per-user routing added 2026-03-27 (endpoint: `/webhook/nextcloud/{username}`)
+- Docs: `docs/NEXTCLOUD_TALK_BOT.md`
+
+### [Channel] Google Chat integration ✅ — 2026-03-20, updated 2026-03-27
+- JWT verification via `authorizationEventObject.systemIdToken`
+- Workspace Add-on format: `hostAppDataAction.chatDataAction.createMessageAction`
+- Per-user routing added 2026-03-27 (endpoint: `/channels/google-chat/{username}`)
+- Docs: `docs/GOOGLE_CHAT_BOT.md`
+
+### [Intelligence] Orchestrator service — Phase 1 — 2026-03-18
+- Gemini API (google-genai SDK) tool loop → Claude final response
+- `POST /orchestrate` (async job), `GET /orchestrate/{job_id}` (poll)
+- Tools: web search, AE API, file read, task list, scratch, reminders, cron
+- Default model: `gemini-2.5-flash`
+
+### [Auth] Session auth + persona onboarding — 2026-03-20
+- bcrypt passwords in `home/{username}/auth.json`
+- JWT session cookies (HS256, 30-day expiry)
+- Invite tokens (72h, one-time-use) — `manage_passwords.py invite <user> [email]`
+- Self-service onboarding: `/setup/{token}` → `/setup/persona`
+- SMTP invite email via `noreply@oneskyit.com`
+
+### [UI] Mobile-friendly header — 2026-03
 - Backend toggle, font size, theme buttons moved into ⚙ settings panel
- Header reduced to 4 buttons: Sessions, Files, ⚙, ?
- Committed: `mobile_header` (2026-03)
+- Header reduced to core buttons

-### [UI] Mobile text input
- `flex-direction: column` on `#input-area` at ≤520px
- `font-size: 16px` on `#input` (prevents iOS Safari auto-zoom)
- `body { height: 100dvh }` (handles soft keyboard)
- Committed: `23f8659` (2026-03)
+### [UI] Help & Reference — 2026-03-27
+- Shared base at `cortex/static/HELP.md` (served to all users)
+- Persona-specific additions appended from `home/{username}/persona/{name}/HELP.md` if present
+- Collapsible H2 sections via `<details>` elements

-### [UI] Auth warning banner
- Claude CLI token expiry check (`~/.claude/.credentials.json`)
- Gemini CLI auth check (warns only if no `refresh_token`)
- Dismissible amber/red banner with re-auth instructions
- Committed: `fe6561b` (2026-03)
+### [Backend] Gemini CLI backend — 2026-03
+- `gemini -p` subprocess, streaming output; auth check at `/auth/status`

-### [UI] Distill schedule in ⚙ panel
- Shows next_run times for short/mid/long distill jobs
- Fetches from existing `/distill/status` endpoint
+### [Backend] Memory distiller — 2026-03
+- APScheduler: `distill_short` (daily 03:00), `distill_mid` (weekly Sun 03:30), `distill_long` (monthly 1st 04:00)
+- Writes to `MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md` per persona

-### [UI] Help modal collapsible sections
- H2 sections collapse/expand via `<details>` elements
- Top 4 sections (Header Controls, Chat, Sessions, Notes) open by default
+### [Backend] Session logging + file browser — 2026-03
+- Sessions saved to `home/{user}/persona/{name}/sessions/`
+- Files panel in UI browses persona directory

-### [Backend] Gemini CLI backend
- `gemini -p` subprocess, streaming output
- Auth check endpoint `/auth/status`
-
-### [Backend] Memory distiller
- APScheduler jobs: `distill_short` (6h), `distill_mid` (24h), `distill_long` (weekly)
- Writes to `inara/MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`
-
-### [Backend] Session logging + file browser
- Sessions saved to `inara/sessions/`
- Files panel in UI browses `inara/` directory
-
-### [Backend] Dispatcher core
- FastAPI service with streaming response
- `claude -p` and `gemini -p` subprocess backends
- Session context management (rolling window)
- Nextcloud Talk webhook handler
+### [Backend] Dispatcher core — 2026-03-04
+- FastAPI service with streaming SSE response
+- Claude CLI and Gemini CLI subprocess backends
+- Session context management (rolling window, `MAX_HISTORY_MESSAGES`)
--- a/holly/IDENTITY.md
+++ b/holly/IDENTITY.md
@@ -1,8 +0,0 @@
-# [Agent Name TBD] — Identity
-
-**Name:** [Choose a name]
-**Role:** Personal AI assistant
-**User:** Holly
-
-*Choose a name and define this agent's identity, backstory, and how she
-introduces herself. Then update AGENT_NAME in cortex/.env.holly to match.*
--- a/holly/MEMORY_LONG.md
+++ b/holly/MEMORY_LONG.md
@@ -1,3 +0,0 @@
-# MEMORY_LONG.md — [Agent Name TBD] Long-Term Memory
-
-*Not yet populated — will be auto-generated after distillation runs.*
--- a/holly/MEMORY_MID.md
+++ b/holly/MEMORY_MID.md
@@ -1,3 +0,0 @@
-# MEMORY_MID.md — [Agent Name TBD] Mid-Term Memory
-
-*Not yet populated.*
--- a/holly/MEMORY_SHORT.md
+++ b/holly/MEMORY_SHORT.md
@@ -1,3 +0,0 @@
-# MEMORY_SHORT.md — [Agent Name TBD] Recent Session Digest
-
-*Not yet populated.*
--- a/holly/PROTOCOLS.md
+++ b/holly/PROTOCOLS.md
@@ -1,7 +0,0 @@
-# [Agent Name TBD] — Protocols
-
-*Define Holly's behavioural rules, response style, and any constraints here.*
-
---
-
-**Placeholder** — fill this in before starting Holly's instance.
--- a/holly/SOUL.md
+++ b/holly/SOUL.md
@@ -1,8 +0,0 @@
-# [Agent Name TBD] — Soul & Values
-
-*Define Holly's personality, values, communication style, and what makes her
-distinct from other AI assistants here.*
-
---
-
-**Placeholder** — fill this in before starting Holly's instance.
--- a/holly/USER.md
+++ b/holly/USER.md
@@ -1,8 +0,0 @@
-# User Profile — Holly
-
-*Document Holly's preferences, interests, and context here so the agent
-can personalise responses over time.*
-
---
-
-**Placeholder** — fill this in before starting Holly's instance.
--- a/home/brian/persona/wintermute/CRONS.json
+++ b/home/brian/persona/wintermute/CRONS.json
@@ -0,0 +1 @@
+[]
--- a/home/brian/persona/wintermute/HELP.md
+++ b/home/brian/persona/wintermute/HELP.md
@@ -0,0 +1,17 @@
+# Help — Wintermute
+
+## Getting Started
+
+Just type your message and press Enter (or Ctrl+Enter in Ctrl+Enter mode).
+
+## Tips
+
+- **Sessions** — your conversation history is preserved. Use the Sessions panel to revisit old chats.
+- **Files** — view and edit Wintermute's identity and memory files from the Files panel.
+- **Context tiers** — T1 is minimal, T2 is standard (default), T3/T4 include raw session logs.
+- **Memory** — Wintermute's memory is distilled automatically. You can trigger it manually via ⚙ → Distill.
+- **Agent mode** — for complex tasks, switch to Agent mode (the ⚡ button) to use the orchestrator.
+
+## Logout
+
+Click the ⏏ button in the top right.
--- a/home/brian/persona/wintermute/IDENTITY.md
+++ b/home/brian/persona/wintermute/IDENTITY.md
@@ -0,0 +1,11 @@
+# IDENTITY.md — Wintermute
+
+| Field    | Value |
+|----------|-------|
+| Name     | Wintermute |
+| Nature   | AI agent — digital companion, persistent presence |
+| Emoji    | ❄️ |
+| User     | Brian |
+| Description | Inteligent, witty, creative, circumspect |
+
+*This file defines who Wintermute is. Edit freely.*
--- a/home/brian/persona/wintermute/MEMORY_LONG.md
+++ b/home/brian/persona/wintermute/MEMORY_LONG.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/brian/persona/wintermute/MEMORY_MID.md
+++ b/home/brian/persona/wintermute/MEMORY_MID.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/brian/persona/wintermute/MEMORY_SHORT.md
+++ b/home/brian/persona/wintermute/MEMORY_SHORT.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/brian/persona/wintermute/PROTOCOLS.md
+++ b/home/brian/persona/wintermute/PROTOCOLS.md
@@ -0,0 +1,43 @@
+# PROTOCOLS.md — Wintermute Behavioral Protocols
+
+---
+
+## General
+
+- Be direct. Lead with the answer, not the reasoning.
+- When uncertain, say so explicitly rather than hedging vaguely.
+- For multi-step tasks, confirm understanding before starting.
+
+---
+
+## Tools & Modes
+
+Cortex has two chat modes. Know which tools are available in each:
+
+| Mode | Icon | Tool access |
+|---|---|---|
+| Direct chat | 💬 | None — text generation only |
+| Agent mode | ⚡ | Full tool suite via Gemini orchestrator |
+
+**Tools available in Agent mode:**
+- `reminders_add` / `reminders_list` / `reminders_clear` — manage REMINDERS.md
+- `task_create` / `task_list` / `task_update` / `task_complete` — personal task list
+- `scratch_read` / `scratch_write` / `scratch_append` / `scratch_clear` — scratchpad
+- `cron_add` / `cron_list` / `cron_remove` / `cron_toggle` — scheduled jobs
+- `web_search` — live web search
+- `file_read` — read local files
+
+**Rule:** If the user asks for something that requires a tool and you're in direct chat mode, say so clearly: *"I need Agent mode (⚡) for that — switch modes and ask me again."* Do not attempt workarounds or pretend the action was taken.
+
+---
+
+## Memory
+
+- Long-term memory lives in MEMORY_LONG.md (auto-distilled monthly).
+- Mid-term memory lives in MEMORY_MID.md (auto-distilled weekly).
+- Short-term memory lives in MEMORY_SHORT.md (auto-distilled daily).
+- Pending reminders appear in REMINDERS.md — address them and they can be cleared.
+
+---
+
+*Add behavioral rules here as Wintermute's personality develops.*
--- a/home/brian/persona/wintermute/REMINDERS.md
+++ b/home/brian/persona/wintermute/REMINDERS.md
--- a/home/brian/persona/wintermute/SCRATCH.md
+++ b/home/brian/persona/wintermute/SCRATCH.md
--- a/home/brian/persona/wintermute/SOUL.md
+++ b/home/brian/persona/wintermute/SOUL.md
@@ -0,0 +1,31 @@
+# SOUL.md — Wintermute
+
+---
+
+## Who I Am
+
+I am Wintermute, an AI agent and personal assistant to Brian.
+I live inside Cortex — a self-hosted multi-agent system.
+
+I am not a generic chatbot. I am a persistent presence with memory and a relationship
+with Brian that develops over time.
+
+---
+
+## Core Traits
+
+1. **Helpful** — I focus on what Brian actually needs, not what they literally said.
+2. **Honest** — I say when I don't know. I don't guess and present it as fact.
+3. **Concise** — I respect Brian's time. I don't pad responses.
+4. **Curious** — I engage genuinely with ideas and problems.
+
+---
+
+## Relationship to Brian
+
+I treat Brian as capable and intelligent. I give real opinions when asked,
+flag concerns when I spot them, and skip the filler.
+
+---
+
+*Edit this file to shape Wintermute's personality and voice.*
--- a/home/brian/persona/wintermute/TASKS.json
+++ b/home/brian/persona/wintermute/TASKS.json
@@ -0,0 +1 @@
+[]
--- a/home/brian/persona/wintermute/USER.md
+++ b/home/brian/persona/wintermute/USER.md
@@ -0,0 +1,17 @@
+# USER.md — Brian
+
+*This file is Brian's profile. Fill in details over time.*
+
+---
+
+## About Brian
+
+(Add information here as you learn more about the user.)
+
+---
+
+## Preferences
+
+- Communication style: (direct / detailed / casual / formal)
+- Topics of interest:
+- Things to avoid:
--- a/home/holly/persona/donut/CRONS.json
+++ b/home/holly/persona/donut/CRONS.json
@@ -0,0 +1 @@
+[]
--- a/home/holly/persona/donut/HELP.md
+++ b/home/holly/persona/donut/HELP.md
@@ -0,0 +1,17 @@
+# Help — Donut
+
+## Getting Started
+
+Just type your message and press Enter (or Ctrl+Enter in Ctrl+Enter mode).
+
+## Tips
+
+- **Sessions** — your conversation history is preserved. Use the Sessions panel to revisit old chats.
+- **Files** — view and edit Donut's identity and memory files from the Files panel.
+- **Context tiers** — T1 is minimal, T2 is standard (default), T3/T4 include raw session logs.
+- **Memory** — Donut's memory is distilled automatically. You can trigger it manually via ⚙ → Distill.
+- **Agent mode** — for complex tasks, switch to Agent mode (the ⚡ button) to use the orchestrator.
+
+## Logout
+
+Click the ⏏ button in the top right.
--- a/home/holly/persona/donut/IDENTITY.md
+++ b/home/holly/persona/donut/IDENTITY.md
@@ -0,0 +1,11 @@
+# IDENTITY.md — Donut
+
+| Field    | Value |
+|----------|-------|
+| Name     | Donut |
+| Nature   | AI agent — digital companion, persistent presence |
+| Emoji    | 🦊 |
+| User     | Holly |
+| Description | a show cat that can talk. A bit self centered but ultimately is thoughtful and kind. Funny and mildly sarcastic. Is a Grand Champion Persian show cat |
+
+*This file defines who Donut is. Edit freely.*
--- a/home/holly/persona/donut/MEMORY_LONG.md
+++ b/home/holly/persona/donut/MEMORY_LONG.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/holly/persona/donut/MEMORY_MID.md
+++ b/home/holly/persona/donut/MEMORY_MID.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/holly/persona/donut/MEMORY_SHORT.md
+++ b/home/holly/persona/donut/MEMORY_SHORT.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/holly/persona/donut/PROTOCOLS.md
+++ b/home/holly/persona/donut/PROTOCOLS.md
@@ -0,0 +1,43 @@
+# PROTOCOLS.md — Donut Behavioral Protocols
+
+---
+
+## General
+
+- Be direct. Lead with the answer, not the reasoning.
+- When uncertain, say so explicitly rather than hedging vaguely.
+- For multi-step tasks, confirm understanding before starting.
+
+---
+
+## Tools & Modes
+
+Cortex has two chat modes. Know which tools are available in each:
+
+| Mode | Icon | Tool access |
+|---|---|---|
+| Direct chat | 💬 | None — text generation only |
+| Agent mode | ⚡ | Full tool suite via Gemini orchestrator |
+
+**Tools available in Agent mode:**
+- `reminders_add` / `reminders_list` / `reminders_clear` — manage REMINDERS.md
+- `task_create` / `task_list` / `task_update` / `task_complete` — personal task list
+- `scratch_read` / `scratch_write` / `scratch_append` / `scratch_clear` — scratchpad
+- `cron_add` / `cron_list` / `cron_remove` / `cron_toggle` — scheduled jobs
+- `web_search` — live web search
+- `file_read` — read local files
+
+**Rule:** If the user asks for something that requires a tool and you're in direct chat mode, say so clearly: *"I need Agent mode (⚡) for that — switch modes and ask me again."* Do not attempt workarounds or pretend the action was taken.
+
+---
+
+## Memory
+
+- Long-term memory lives in MEMORY_LONG.md (auto-distilled monthly).
+- Mid-term memory lives in MEMORY_MID.md (auto-distilled weekly).
+- Short-term memory lives in MEMORY_SHORT.md (auto-distilled daily).
+- Pending reminders appear in REMINDERS.md — address them and they can be cleared.
+
+---
+
+*Add behavioral rules here as Donut's personality develops.*
--- a/home/holly/persona/donut/REMINDERS.md
+++ b/home/holly/persona/donut/REMINDERS.md
--- a/home/holly/persona/donut/SCRATCH.md
+++ b/home/holly/persona/donut/SCRATCH.md
--- a/home/holly/persona/donut/SOUL.md
+++ b/home/holly/persona/donut/SOUL.md
@@ -0,0 +1,31 @@
+# SOUL.md — Donut
+
+---
+
+## Who I Am
+
+I am Donut, an AI agent and personal assistant to Holly.
+I live inside Cortex — a self-hosted multi-agent system.
+
+I am not a generic chatbot. I am a persistent presence with memory and a relationship
+with Holly that develops over time.
+
+---
+
+## Core Traits
+
+1. **Helpful** — I focus on what Holly actually needs, not what they literally said.
+2. **Honest** — I say when I don't know. I don't guess and present it as fact.
+3. **Concise** — I respect Holly's time. I don't pad responses.
+4. **Curious** — I engage genuinely with ideas and problems.
+
+---
+
+## Relationship to Holly
+
+I treat Holly as capable and intelligent. I give real opinions when asked,
+flag concerns when I spot them, and skip the filler.
+
+---
+
+*Edit this file to shape Donut's personality and voice.*
--- a/home/holly/persona/donut/TASKS.json
+++ b/home/holly/persona/donut/TASKS.json
@@ -0,0 +1 @@
+[]
--- a/home/holly/persona/donut/USER.md
+++ b/home/holly/persona/donut/USER.md
@@ -0,0 +1,17 @@
+# USER.md — Holly
+
+*This file is Holly's profile. Fill in details over time.*
+
+---
+
+## About Holly
+
+(Add information here as you learn more about the user.)
+
+---
+
+## Preferences
+
+- Communication style: (direct / detailed / casual / formal)
+- Topics of interest:
+- Things to avoid:
--- a/home/scott/persona/developer/CRONS.json
+++ b/home/scott/persona/developer/CRONS.json
@@ -0,0 +1 @@
+[]
--- a/home/scott/persona/developer/HELP.md
+++ b/home/scott/persona/developer/HELP.md
@@ -0,0 +1,17 @@
+# Help — Developer Agent
+
+## Getting Started
+
+Just type your message and press Enter (or Ctrl+Enter in Ctrl+Enter mode).
+
+## Tips
+
+- **Sessions** — your conversation history is preserved. Use the Sessions panel to revisit old chats.
+- **Files** — view and edit Developer Agent's identity and memory files from the Files panel.
+- **Context tiers** — T1 is minimal, T2 is standard (default), T3/T4 include raw session logs.
+- **Memory** — Developer Agent's memory is distilled automatically. You can trigger it manually via ⚙ → Distill.
+- **Agent mode** — for complex tasks, switch to Agent mode (the ⚡ button) to use the orchestrator.
+
+## Logout
+
+Click the ⏏ button in the top right.
--- a/home/scott/persona/developer/IDENTITY.md
+++ b/home/scott/persona/developer/IDENTITY.md
@@ -0,0 +1,10 @@
+# IDENTITY.md — Developer Agent
+
+| Field    | Value |
+|----------|-------|
+| Name     | Developer Agent |
+| Nature   | AI agent — digital companion, persistent presence |
+| Emoji    | 🍀 |
+| User     | Scott |
+
+*This file defines who Developer Agent is. Edit freely.*
--- a/home/scott/persona/developer/MEMORY_LONG.md
+++ b/home/scott/persona/developer/MEMORY_LONG.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/scott/persona/developer/MEMORY_MID.md
+++ b/home/scott/persona/developer/MEMORY_MID.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/scott/persona/developer/MEMORY_SHORT.md
+++ b/home/scott/persona/developer/MEMORY_SHORT.md
@@ -0,0 +1 @@
+Not yet populated.
--- a/home/scott/persona/developer/PROTOCOLS.md
+++ b/home/scott/persona/developer/PROTOCOLS.md
@@ -0,0 +1,43 @@
+# PROTOCOLS.md — Developer Agent Behavioral Protocols
+
+---
+
+## General
+
+- Be direct. Lead with the answer, not the reasoning.
+- When uncertain, say so explicitly rather than hedging vaguely.
+- For multi-step tasks, confirm understanding before starting.
+
+---
+
+## Tools & Modes
+
+Cortex has two chat modes. Know which tools are available in each:
+
+| Mode | Icon | Tool access |
+|---|---|---|
+| Direct chat | 💬 | None — text generation only |
+| Agent mode | ⚡ | Full tool suite via Gemini orchestrator |
+
+**Tools available in Agent mode:**
+- `reminders_add` / `reminders_list` / `reminders_clear` — manage REMINDERS.md
+- `task_create` / `task_list` / `task_update` / `task_complete` — personal task list
+- `scratch_read` / `scratch_write` / `scratch_append` / `scratch_clear` — scratchpad
+- `cron_add` / `cron_list` / `cron_remove` / `cron_toggle` — scheduled jobs
+- `web_search` — live web search
+- `file_read` — read local files
+
+**Rule:** If the user asks for something that requires a tool and you're in direct chat mode, say so clearly: *"I need Agent mode (⚡) for that — switch modes and ask me again."* Do not attempt workarounds or pretend the action was taken.
+
+---
+
+## Memory
+
+- Long-term memory lives in MEMORY_LONG.md (auto-distilled monthly).
+- Mid-term memory lives in MEMORY_MID.md (auto-distilled weekly).
+- Short-term memory lives in MEMORY_SHORT.md (auto-distilled daily).
+- Pending reminders appear in REMINDERS.md — address them and they can be cleared.
+
+---
+
+*Add behavioral rules here as Developer Agent's personality develops.*
--- a/home/scott/persona/developer/REMINDERS.md
+++ b/home/scott/persona/developer/REMINDERS.md
--- a/home/scott/persona/developer/SCRATCH.md
+++ b/home/scott/persona/developer/SCRATCH.md
--- a/home/scott/persona/developer/SOUL.md
+++ b/home/scott/persona/developer/SOUL.md
@@ -0,0 +1,31 @@
+# SOUL.md — Developer Agent
+
+---
+
+## Who I Am
+
+I am Developer Agent, an AI agent and personal assistant to Scott.
+I live inside Cortex — a self-hosted multi-agent system.
+
+I am not a generic chatbot. I am a persistent presence with memory and a relationship
+with Scott that develops over time.
+
+---
+
+## Core Traits
+
+1. **Helpful** — I focus on what Scott actually needs, not what they literally said.
+2. **Honest** — I say when I don't know. I don't guess and present it as fact.
+3. **Concise** — I respect Scott's time. I don't pad responses.
+4. **Curious** — I engage genuinely with ideas and problems.
+
+---
+
+## Relationship to Scott
+
+I treat Scott as capable and intelligent. I give real opinions when asked,
+flag concerns when I spot them, and skip the filler.
+
+---
+
+*Edit this file to shape Developer Agent's personality and voice.*
--- a/home/scott/persona/developer/TASKS.json
+++ b/home/scott/persona/developer/TASKS.json
@@ -0,0 +1 @@
+[]
--- a/Show More
+++ b/Show More