feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul

Local LLM: - user_settings.py: per-user hosts/models config (local_llm.json) - routers/local_llm.py + static/local_llm.html: dedicated settings page - llm_client.py: local OpenAI-compatible backend via httpx - config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts - Active model shown near backend toggle (amber hint text) Memory distillation: - memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides - scheduler.py + notification.py: notify NC Talk after mid/long distill - notification.py: outbound channel abstraction (NC Talk, extensible) Session search: - routers/files.py: GET /sessions/search?q= with excerpts grouped by date - static/index.html + app.js: search UI in file sidebar with highlight - _esc() helper to prevent XSS in search results Proactive cron: - cron_runner.py: new job types — message (send directly) and brief (LLM + send) - Both support optional per-job channel override Channels: - routers/nextcloud_talk.py: consolidated using notification._send_nct_message() - routers/auth.py: local backend status in /auth/status - routers/chat.py: /backend returns {primary, fallback, local_model} object UI / UX: - Copy button for user messages (matching assistant) - Autocomplete disabled on sensitive form fields - settings.html: local model section replaced with link to /settings/local Docs overhaul: - MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md - ARCH__Intelligence_Layer.md replaced with redirect table - CORTEX.md trimmed to vision only; README updated - OPEN_WEBUI_API.md added to docs/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:53:06 -04:00
parent bd6532e93a
commit a4daebdc9b
33 changed files with 2985 additions and 486 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -82,6 +82,7 @@ Cortex_and_Inara_dev/
  docs/                  ← Integration reference docs
    NEXTCLOUD_TALK_BOT.md
    OPEN_WEBUI_API.md    ← Open WebUI API: tool calling, RAG, model management
  documentation/         ← Architecture decisions and agent task list
    TODO__Agents.md      ← READ THIS FIRST — active task list
@@ -211,7 +212,7 @@ clearly asked for a directory to be unblocked.
 ---
-## Current State (2026-03-27)
+## Current State (2026-04-03)
 Cortex is running and stable. All three primary channels are live:
@@ -220,34 +221,12 @@ Cortex is running and stable. All three primary channels are live:
 | Web UI | ✅ Live | `https://cortex.dgrzone.com` |
 | Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply |
 | Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format |
 | Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config |
-### Active Tasks
+Active users: scott (inara, developer), holly (tina), brian (wintermute)
-See `documentation/TODO__Agents.md` for the full list. Current priorities:
+See `documentation/TODO__Agents.md` for the active task list.
-
+See `documentation/ROADMAP.md` for phases and what's next.
 - **[High]** Ollama backend — local LLM via `scott_gaming` over WireGuard
 - **[Medium]** NC Talk — complete bot registration docs (`docs/NEXTCLOUD_TALK_BOT.md`)
 - **[Medium]** Knowledge consolidation — markdown → AE Journals
 ### Recently Completed
 - ✅ Help & Reference shared base — `cortex/static/HELP.md` served to all users; persona-specific additions appended if present — 2026-03-27
 - ✅ Google OAuth sign-in — `/auth/google` flow, pre-register via `manage_passwords.py google-add` — 2026-03-27
 - ✅ Per-user Gemini API key — stored in `auth.json`, used by orchestrator, manageable in `/settings` — 2026-03-27
 - ✅ Connected accounts + Gemini key in settings UI — `/settings` shows Google account, key hint, remove link — 2026-03-27
 - ✅ `/{username}` persona picker page — card grid instead of 404 — 2026-03-26
 - ✅ Session persistence across navigation — localStorage TTL 30 min, auto-restore on page load — 2026-03-26
 - ✅ Lucide icons throughout UI — mode selector, send/stop/action buttons, edit/del/copy/save/cancel — 2026-03-25
 - ✅ Persona-specific favicon — emoji SVG set from persona config — 2026-03-25
 - ✅ Session auth — bcrypt passwords, JWT cookies, login/logout, `SessionAuthMiddleware` — 2026-03-20
 - ✅ Persona onboarding — invite tokens, self-service password setup, persona creation form — 2026-03-20
 - ✅ Multi-user/multi-persona support (`home/{username}/persona/{name}/` two-level layout) — 2026-03-20
 - ✅ SMTP invite email — `noreply@oneskyit.com`, HTML + plain text, `manage_passwords.py invite` — 2026-03-20
 - ✅ Scratchpad, task management, and cron/scheduled job tools — 2026-03-20
 - ✅ Test suite (80 tests) covering API, persona routing, tools, security — 2026-03-20
 - ✅ Google Chat bot (Workspace Add-on, JWT auth, `hostAppDataAction` format) — 2026-03-20
 - ✅ Orchestrator Agent mode UI + session persistence — 2026-03-18
 - ✅ Memory distiller (APScheduler, short/mid/long) — 2026-03
 ---
@@ -255,8 +234,14 @@ See `documentation/TODO__Agents.md` for the full list. Current priorities:
 | File | Purpose |
 |---|---|
 | `documentation/MASTER.md` | **Start here** — index, current state, all doc links |
 | `documentation/TODO__Agents.md` | Active task list — read before starting work |
-| `documentation/ARCH__Intelligence_Layer.md` | Full architecture design |
+| `documentation/ROADMAP.md` | Phases — what's done, what's next |
-| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+| `documentation/ARCH__SYSTEM.md` | System architecture and component map |
 | `documentation/ARCH__BACKENDS.md` | LLM backends, routing, per-user config |
 | `documentation/ARCH__PERSONA.md` | Persona system, context tiers, memory distillation |
 | `documentation/ARCH__CHANNELS.md` | Input channels — web, NC Talk, Google Chat, cron |
 | `documentation/ARCH__FUTURE.md` | Planned: local orchestrator, dev agents, knowledge layer |
 | `~/agents_sync/projects/CORTEX.md` | Project vision and philosophy |
 | `~/agents_sync/CLAUDE.md` | Fleet coordination rules |
 | `~/CLAUDE.md` | Machine identity (`scott_lpt`) |
--- a/README.md
+++ b/README.md
@@ -73,22 +73,28 @@ Config lives in `cortex/config.py` and `cortex/.env` (not tracked — see `corte
 ## Key Documentation
 **Start here for a full picture:** [`documentation/MASTER.md`](documentation/MASTER.md)
 | File | Purpose |
 |---|---|
-| `documentation/TODO__Agents.md` | Active task list — read first |
+| `documentation/MASTER.md` | Index — current state, all doc links, quick reference |
-| `documentation/ARCH__Intelligence_Layer.md` | Intelligence layer architecture (orchestrator, dev agents, knowledge) |
+| `documentation/ROADMAP.md` | Phases — what's done, what's next |
 | `documentation/TODO__Agents.md` | Active task list |
 | `documentation/ARCH__SYSTEM.md` | System architecture and component map |
 | `documentation/ARCH__BACKENDS.md` | LLM backends, routing, fallback |
 | `documentation/ARCH__PERSONA.md` | Persona system, context tiers, memory distillation |
 | `documentation/ARCH__CHANNELS.md` | Input channels — web, NC Talk, Google Chat, cron |
 | `documentation/ARCH__FUTURE.md` | Planned features — local orchestrator, dev agents, knowledge layer |
 | `docs/NEXTCLOUD_TALK_BOT.md` | NC Talk bot setup and troubleshooting |
-| `docs/GOOGLE_CHAT_BOT.md` | Google Chat Add-on setup and troubleshooting |
+| `docs/GOOGLE_CHAT_BOT.md` | Google Chat Add-on setup |
-| `cortex/static/HELP.md` | Shared in-app help content (rendered in UI for all users) |
+| `docs/OPEN_WEBUI_API.md` | Open WebUI/Ollama API reference |
 | `home/scott/persona/inara/PROTOCOLS.md` | Inara behavioral protocols (template for all personas) |
 | `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
 ---
 ## Architecture at a Glance
 ```
-[User / Cron / Webhook]
+[Web UI / NC Talk / Google Chat / Cron / Webhooks]
        ↓
  Cortex Dispatcher  (FastAPI, cortex/)
    ├─ POST /chat                            — direct to LLM (streaming SSE)
@@ -96,16 +102,16 @@ Config lives in `cortex/config.py` and `cortex/.env` (not tracked — see `corte
    ├─ POST /webhook/nextcloud/{username}    — Nextcloud Talk bot (per-user)
    └─ POST /channels/google-chat/{username} — Google Chat Add-on (per-user)
        ↓
-  LLM Backend(s)
+  LLM Backends
-  • Claude CLI   — primary reasoning, coding, long-context
+  • Claude CLI   — primary, all user-facing responses
-  • Gemini CLI   — secondary / cost routing
+  • Gemini CLI   — fallback
-  • Gemini API   — orchestrator tool loop (separate from Gemini CLI)
+  • Gemini API   — orchestrator tool loop only (not general chat)
-  • Ollama       — offline/private (scott_gaming, future)
+  • Local        — Open WebUI/Ollama on scott_gaming (private/offline)
        ↓
  Persona context loaded from home/{user}/persona/{name}/
 ```
-See `documentation/ARCH__Intelligence_Layer.md` for the orchestrator/responder and dev-agent architecture.
+See `documentation/ARCH__SYSTEM.md` for the full architecture breakdown.
 ---
--- a/cortex/.env.example
+++ b/cortex/.env.example
@@ -52,12 +52,21 @@ NEXTCLOUD_URL=https://cloud.dgrzone.com
 NEXTCLOUD_TALK_BOT_SECRET=
 # ── LLM backends ────────────────────────────────────────────────────────────
-# Primary backend: "claude" or "gemini" (other is always fallback)
+# Primary backend: "claude", "gemini", or "local" (switchable at runtime via UI)
 PRIMARY_BACKEND=claude
 # Timeouts in seconds
 TIMEOUT_CLAUDE=60
 TIMEOUT_GEMINI=120
 TIMEOUT_LOCAL=300   # local models may need time to load
 # ── Local model (Open WebUI / Ollama — OpenAI-compatible API) ────────────────
 # Leave LOCAL_API_URL blank to disable. When set, "local" appears as a backend option.
 # API key: Open WebUI → Settings → Account → API Keys
 # Model: workspace alias or full Ollama model name
 LOCAL_API_URL=http://192.168.32.19:3000
 LOCAL_API_KEY=
 LOCAL_MODEL=test-agent-simple
 # ── Orchestrator (Gemini API — not Gemini CLI) ───────────────────────────────
 # Required for /orchestrate endpoint and tool use
--- a/cortex/config.py
+++ b/cortex/config.py
@@ -40,6 +40,12 @@ class Settings(BaseSettings):
    max_history_messages: int = 40  # rolling window — 20 turns (user + assistant)
    primary_backend: str = "claude"  # "claude" or "gemini" — other is always fallback
    # Local model backend — OpenAI-compatible API (Open WebUI / Ollama)
    # Set LOCAL_API_URL in .env to enable; leave blank to disable
    local_api_url: str = ""            # e.g. http://192.168.32.19:3000
    local_api_key: str = ""            # sk-... from Open WebUI → Settings → Account → API Keys
    local_model: str = ""              # workspace or model name, e.g. test-agent-simple
    # Per-backend timeouts in seconds
    timeout_claude: int = 60
    timeout_gemini: int = 120   # frequently slow under load
@@ -53,6 +59,12 @@ class Settings(BaseSettings):
    auto_distill_mid: bool = True     # weekly Sunday at 03:30 — LLM summarizes short → mid
    auto_distill_long: bool = False   # monthly 1st at 04:00 — off by default (manual review recommended)
    # Which backend to use for distillation LLM calls.
    # "" = use primary_backend (default); "local" = use local model (saves API credits).
    # "long" stays on default (claude/gemini) for best quality.
    distill_backend_mid: str = ""
    distill_backend_long: str = ""
    # Memory tier token budgets — soft caps used during distillation
    # Override in .env: MEMORY_BUDGET_LONG=4000 etc.
    memory_budget_long: int = 2000
--- a/cortex/cron_runner.py
+++ b/cortex/cron_runner.py
@@ -10,16 +10,20 @@ Job schema:
    "id":         "c_abc123",
    "label":      "Human-readable name",
    "schedule":   "daily:09:00",   # see parse_schedule() for all formats
-    "type":       "remind" | "note",
+    "type":       "remind" | "note" | "message" | "brief",
-    "payload":    "Text to write when the job fires",
+    "payload":    "Text or prompt when the job fires",
    "channel":    null | "nextcloud" | "google_chat",  # for message/brief types
    "enabled":    true,
    "created_at": "ISO 8601",
    "last_run":   null | "ISO 8601"
  }
 Job types:
-  remind  → appends to inara/REMINDERS.md  (auto-loaded into Inara's context)
+  remind   → appends to REMINDERS.md  (auto-loaded into context at tier 2+)
-  note    → appends to inara/SCRATCH.md    (read on demand via scratch_read)
+  note     → appends to SCRATCH.md    (read on demand via scratch_read)
  message  → sends payload as-is to NC Talk notification_room
  brief    → runs LLM with payload as the prompt, sends response to NC Talk
             (good for morning briefings, summaries, proactive check-ins)
 """
 import logging
@@ -150,6 +154,39 @@ async def run_job(job: dict) -> None:
        p.write_text(existing.rstrip() + "\n" + section)
        logger.info("cron [note] fired: %s", label)
    elif job_type == "message":
        # Send payload text directly to the user's notification channel
        from notification import notify
        username = job.get("user") or "scott"
        channel  = job.get("channel") or None
        await notify(username, payload, channel=channel)
        logger.info("cron [message] sent: %s", label)
    elif job_type == "brief":
        # Run LLM with payload as the prompt, send response to notification channel.
        # Great for morning briefings, reminders, proactive check-ins.
        from context_loader import load_context
        from llm_client import complete
        from notification import notify
        from persona import set_context
        from config import settings as _s
        username   = job.get("user") or _s.user_name.lower()
        persona_nm = job.get("persona") or _s.agent_name.lower()
        channel    = job.get("channel") or None
        set_context(username, persona_nm)
        system_prompt = load_context(2)  # tier 2: identity + memory + user profile
        try:
            response_text, backend = await complete(
                system_prompt=system_prompt,
                messages=[{"role": "user", "content": payload}],
            )
            await notify(username, response_text, channel=channel)
            logger.info("cron [brief] sent via %s: %s", backend, label)
        except Exception as e:
            logger.error("cron [brief] LLM error for %s: %s", label, e)
    else:
        logger.warning("cron: unknown type %r (job %s)", job_type, job.get("id"))
        return
--- a/cortex/llm_client.py
+++ b/cortex/llm_client.py
@@ -31,6 +31,10 @@ async def cleanup() -> None:
    _active_pgroups.clear()
 _BACKENDS = ("claude", "gemini", "local")
 _FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"}
 async def complete(
    system_prompt: str,
    messages: list[dict],
@@ -38,12 +42,12 @@ async def complete(
    max_tokens: int = 2048,
 ) -> tuple[str, str]:
    """Returns (response_text, actual_backend_used)."""
-    if model in ("claude", "gemini"):
+    if model in _BACKENDS:
        primary = model
    else:
        primary = settings.primary_backend
-    fallback = "gemini" if primary == "claude" else "claude"
+    fallback = _FALLBACK.get(primary, "claude")
    try:
        response = await _dispatch(primary, system_prompt, messages, model)
@@ -65,6 +69,8 @@ async def _dispatch(
 ) -> str:
    if backend == "gemini":
        return await _gemini(system_prompt, messages)
    if backend == "local":
        return await _local(system_prompt, messages)
    return await _claude(system_prompt, messages, model)
@@ -108,6 +114,54 @@ async def _claude(system_prompt: str, messages: list[dict], model: str | None) -
    return await _run(cmd, timeout=settings.timeout_claude, env=env)
 async def _local(system_prompt: str, messages: list[dict]) -> str:
    """OpenAI-compatible backend — Open WebUI / Ollama.
    Per-user config (home/{user}/local_llm.json) takes precedence over
    the server-level .env defaults.
    """
    import httpx
    from persona import _user
    from user_settings import get_active_local_model
    cfg = get_active_local_model(_user.get())
    if not cfg:
        raise RuntimeError("No local model configured — add one at /settings/local")
    api_url = cfg["api_url"]
    api_key = cfg["api_key"]
    model   = cfg["model_name"]
    if not api_url:
        raise RuntimeError("local_api_url not configured — set LOCAL_API_URL in .env or add a host at /settings/local")
    if not model:
        raise RuntimeError("local_model not configured — add a model at /settings/local")
    logger.info("local backend: %s @ %s", model, api_url)
    msgs: list[dict] = []
    if system_prompt:
        msgs.append({"role": "system", "content": system_prompt})
    msgs.extend(messages)
    url = api_url.rstrip("/") + "/api/chat/completions"
    headers: dict[str, str] = {}
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    payload = {"model": model, "messages": msgs}
    async with httpx.AsyncClient(timeout=settings.timeout_local) as client:
        resp = await client.post(url, json=payload, headers=headers)
        resp.raise_for_status()
        data = resp.json()
    text = data["choices"][0]["message"]["content"]
    if not text or not text.strip():
        raise RuntimeError("Local model returned an empty response")
    return text.strip()
 async def _gemini(system_prompt: str, messages: list[dict]) -> str:
    # Gemini CLI spawns MCP child processes that keep stdout pipes open after responding.
    # start_new_session=True puts the whole tree in its own process group so
--- a/cortex/main.py
+++ b/cortex/main.py
@@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag
 from config import settings
 from auth_middleware import SessionAuthMiddleware
 from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator
-from routers import ui, onboarding, settings, help, auth_google
+from routers import ui, onboarding, settings, help, auth_google, local_llm
@asynccontextmanager
@@ -47,6 +47,7 @@ app.include_router(onboarding.router)
 # Account settings
 app.include_router(settings.router)
 app.include_router(local_llm.router)
 # Help page
 app.include_router(help.router)
--- a/cortex/memory_distiller.py
+++ b/cortex/memory_distiller.py
@@ -77,15 +77,22 @@ def distill_short(username: str | None = None, persona: str | None = None) -> di
 async def distill_mid(username: str | None = None, persona: str | None = None) -> dict:
    """
    Ask the LLM to summarize MEMORY_SHORT.md → MEMORY_MID.md.
    Uses DISTILL_BACKEND_MID if set (e.g. "local"), otherwise primary_backend.
    """
    from llm_client import complete
    from persona import set_context
-    inara_dir = _persona_path(username, persona)
+    u = username or settings.user_name.lower()
    p = persona or settings.agent_name.lower()
    set_context(u, p)
    inara_dir = _persona_path(u, p)
    short_content = _read(inara_dir / "MEMORY_SHORT.md")
    if not short_content.strip() or "Not yet populated" in short_content:
        return {"error": "MEMORY_SHORT.md is empty — run distill/short first"}
    backend_override = settings.distill_backend_mid or None
    budget_tokens = settings.memory_budget_mid
    system_prompt = (
        f"You are {settings.agent_name}'s memory distillation system. "
@@ -100,6 +107,7 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": short_content}],
        model=backend_override,
    )
    now = datetime.now().strftime("%Y-%m-%d %H:%M")
@@ -112,6 +120,7 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    logger.info("distill_mid: wrote %d chars via %s", len(header) + len(response_text), backend)
    return {
        "username": u,
        "backend": backend,
        "chars_written": len(header) + len(response_text),
        "budget_tokens": budget_tokens,
@@ -121,16 +130,23 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
 async def distill_long(username: str | None = None, persona: str | None = None) -> dict:
    """
    Ask the LLM to integrate MEMORY_MID.md into MEMORY_LONG.md.
    Uses DISTILL_BACKEND_LONG if set, otherwise primary_backend.
    """
    from llm_client import complete
    from persona import set_context
-    inara_dir = _persona_path(username, persona)
+    u = username or settings.user_name.lower()
    p = persona or settings.agent_name.lower()
    set_context(u, p)
    inara_dir = _persona_path(u, p)
    long_content = _read(inara_dir / "MEMORY_LONG.md")
    mid_content = _read(inara_dir / "MEMORY_MID.md")
    if not mid_content.strip() or "Not yet populated" in mid_content:
        return {"error": "MEMORY_MID.md is empty — run distill/mid first"}
    backend_override = settings.distill_backend_long or None
    budget_tokens = settings.memory_budget_long
    system_prompt = (
        f"You are {settings.agent_name}'s long-term memory curator. "
@@ -149,6 +165,7 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": user_content}],
        model=backend_override,
    )
    # Ensure the file has the right header if the LLM dropped it
@@ -165,6 +182,7 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    logger.info("distill_long: wrote %d chars via %s", len(response_text), backend)
    return {
        "username": u,
        "backend": backend,
        "chars_written": len(response_text),
        "budget_tokens": budget_tokens,
--- a/cortex/notification.py
+++ b/cortex/notification.py
@@ -0,0 +1,106 @@
 """
 Outbound notification helpers — send messages to user channels proactively.
 Channel config lives in home/{user}/channels.json.
 Each channel that supports proactive notifications needs a notification_channel
 set to its key name (e.g. "nextcloud", "google_chat") in the user's channels.json:
  {
    "notification_channel": "nextcloud",
    "nextcloud": {
      "url": "https://cloud.example.com",
      "bot_secret": "...",
      "notification_room": "<room-token>",
      ...
    }
  }
 If notification_channel is absent, defaults to "nextcloud" if configured.
 If notification_room (for NCT) is absent, notifications are silently skipped.
 """
 import hashlib
 import hmac
 import json
 import logging
 import secrets
 import httpx
 logger = logging.getLogger(__name__)
 async def _send_nct_message(url: str, secret: str, room: str, message: str) -> None:
    """Post a message to a Nextcloud Talk room as the bot."""
    endpoint = f"{url}/ocs/v2.php/apps/spreed/api/v1/bot/{room}/message"
    random_str = secrets.token_hex(32)
    sig = hmac.new(
        secret.encode(),
        (random_str + message).encode("utf-8"),
        hashlib.sha256,
    ).hexdigest()
    body = json.dumps({"message": message}, ensure_ascii=False).encode("utf-8")
    try:
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                endpoint,
                content=body,
                headers={
                    "Content-Type": "application/json",
                    "OCS-APIRequest": "true",
                    "X-Nextcloud-Talk-Bot-Random": random_str,
                    "X-Nextcloud-Talk-Bot-Signature": sig,
                },
                timeout=15,
            )
        if resp.status_code not in (200, 201):
            logger.warning("notify NCT %s → HTTP %d: %s", room, resp.status_code, resp.text[:200])
        else:
            logger.info("notify NCT → %s (%d chars)", room, len(message))
    except Exception as e:
        logger.error("notify NCT error: %s", e)
 async def _notify_nct(nct: dict, message: str, username: str) -> None:
    room   = nct.get("notification_room", "").strip()
    url    = nct.get("url", "").rstrip("/")
    secret = nct.get("bot_secret", "")
    if not room:
        logger.debug("notify: NCT notification_room not set for %s — skipping", username)
        return
    if not url or not secret:
        logger.warning("notify: NCT config incomplete for %s (missing url or secret)", username)
        return
    await _send_nct_message(url, secret, room, message)
 async def notify(username: str, message: str, channel: str | None = None) -> None:
    """Send a notification to the user's preferred outbound channel.
    Channel resolution order:
      1. `channel` parameter if provided
      2. `notification_channel` key in channels.json
      3. "nextcloud" if configured
      4. Silent no-op
    To configure: set `notification_channel` in home/{user}/channels.json.
    For NCT: also set `notification_room` in the nextcloud section.
    """
    from auth_utils import get_user_channels
    channels = get_user_channels(username)
    target = channel or channels.get("notification_channel", "").strip()
    if not target:
        # Auto-detect: use nextcloud if configured
        if "nextcloud" in channels:
            target = "nextcloud"
        else:
            return
    if target == "nextcloud":
        nct = channels.get("nextcloud")
        if not nct:
            logger.debug("notify: nextcloud not configured for %s", username)
            return
        await _notify_nct(nct, message, username)
    else:
        logger.debug("notify: channel %r not yet supported for outbound (user %s)", target, username)
--- a/cortex/requirements.txt
+++ b/cortex/requirements.txt
@@ -16,5 +16,8 @@ bcrypt>=4.0.0
 PyJWT>=2.8.0
 python-multipart>=0.0.9   # required by FastAPI for Form() data
 # Async HTTP client — used for local OpenAI-compatible backend (Open WebUI / Ollama)
 httpx>=0.27.0
 # anthropic SDK not needed — using claude CLI subprocess for auth
 # anthropic>=0.40.0
--- a/cortex/routers/auth.py
+++ b/cortex/routers/auth.py
@@ -13,6 +13,7 @@ import logging
 from datetime import datetime, timezone
 from pathlib import Path
 from fastapi import APIRouter
 from config import settings
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/auth")
@@ -71,9 +72,27 @@ def _gemini_status() -> dict:
        return {"ok": False, "error": str(e), "warning": True, "authenticated": False}
 async def _local_status() -> dict:
    if not settings.local_api_url:
        return {"configured": False}
    try:
        import httpx
        url = settings.local_api_url.rstrip("/") + "/api/models"
        headers = {}
        if settings.local_api_key:
            headers["Authorization"] = f"Bearer {settings.local_api_key}"
        async with httpx.AsyncClient(timeout=5) as client:
            resp = await client.get(url, headers=headers)
        reachable = resp.status_code < 400
        return {"configured": True, "reachable": reachable, "model": settings.local_model}
    except Exception as e:
        return {"configured": True, "reachable": False, "error": str(e), "model": settings.local_model}
@router.get("/status")
 async def auth_status() -> dict:
    return {
        "claude": _claude_status(),
        "gemini": _gemini_status(),
        "local": await _local_status(),
    }
--- a/cortex/routers/chat.py
+++ b/cortex/routers/chat.py
@@ -1,6 +1,7 @@
 import asyncio
 import json
-from fastapi import APIRouter, HTTPException, Query
+import jwt
 from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import StreamingResponse
 from pydantic import BaseModel
 from context_loader import load_context
@@ -9,6 +10,8 @@ from session_logger import log_turn
 from session_store import load as load_session, save as save_session, list_all, generate_session_id, delete as delete_session, rename as rename_session
 from config import settings
 from persona import set_context, validate as validate_persona
 from auth_utils import COOKIE_NAME, decode_token
 import user_settings
 import event_bus
@@ -29,7 +32,7 @@ class ChatRequest(BaseModel):
 class BackendRequest(BaseModel):
-    primary: str  # "claude" or "gemini"
+    primary: str  # "claude", "gemini", or "local"
 class NoteRequest(BaseModel):
@@ -130,19 +133,45 @@ async def chat(req: ChatRequest) -> StreamingResponse:
    )
 _BACKEND_CYCLE = ("claude", "gemini", "local")
 _BACKEND_FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"}
 def _local_model_info(request: Request) -> dict | None:
    """Return active local model {label, model_name} for the session user, or None."""
    try:
        token    = request.cookies.get(COOKIE_NAME)
        username = decode_token(token) if token else None
        if not username:
            return None
        cfg = user_settings.get_active_local_model(username)
        if cfg:
            return {"label": cfg["label"], "model_name": cfg["model_name"]}
    except (jwt.InvalidTokenError, Exception):
        pass
    return None
@router.get("/backend")
-async def get_backend() -> dict:
+async def get_backend(request: Request) -> dict:
-    other = "gemini" if settings.primary_backend == "claude" else "claude"
+    p = settings.primary_backend
-    return {"primary": settings.primary_backend, "fallback": other}
+    return {
        "primary":      p,
        "fallback":     _BACKEND_FALLBACK.get(p, "claude"),
        "local_model":  _local_model_info(request),
    }
@router.post("/backend")
-async def set_backend(req: BackendRequest) -> dict:
+async def set_backend(req: BackendRequest, request: Request) -> dict:
-    if req.primary not in ("claude", "gemini"):
+    if req.primary not in _BACKEND_CYCLE:
-        raise HTTPException(status_code=400, detail="primary must be 'claude' or 'gemini'")
+        raise HTTPException(status_code=400, detail="primary must be 'claude', 'gemini', or 'local'")
    settings.primary_backend = req.primary
-    other = "gemini" if req.primary == "claude" else "claude"
+    return {
-    return {"primary": settings.primary_backend, "fallback": other}
+        "primary":     req.primary,
        "fallback":    _BACKEND_FALLBACK[req.primary],
        "local_model": _local_model_info(request),
    }
 def _set_ctx(user: str, persona: str) -> None:
--- a/cortex/routers/files.py
+++ b/cortex/routers/files.py
@@ -1,7 +1,8 @@
 """
-Read/write the Inara identity markdown files.
+Read/write Inara identity markdown files, and search past session logs.
 Only whitelisted filenames are accessible — no path traversal possible.
 """
 import re
 from fastapi import APIRouter, HTTPException, Query
 from pydantic import BaseModel
 from persona import persona_path, set_context, validate as validate_persona
@@ -47,10 +48,12 @@ async def list_files(
    files = []
    for name in sorted(ALLOWED):
        p = persona_dir / name
        st = p.stat() if p.exists() else None
        files.append({
            "name": name,
            "exists": p.exists(),
-            "size": p.stat().st_size if p.exists() else 0,
+            "size": st.st_size if st else 0,
            "modified": st.st_mtime if st else None,
        })
    return {"files": files}
@@ -83,3 +86,59 @@ async def save_file(
    p = _path(filename)
    p.write_text(req.content)
    return {"ok": True, "name": filename, "size": len(req.content)}
 # ── Session search ────────────────────────────────────────────────────────────
 _CONTEXT_CHARS = 120  # chars of context to include around each match
@router.get("/sessions/search")
 async def search_sessions(
    q: str = Query(..., min_length=2),
    user: str = Query("scott"),
    persona: str = Query("inara"),
    limit: int = Query(20, ge=1, le=100),
 ) -> dict:
    """Full-text search across past session logs.
    Returns up to `limit` matches, newest sessions first.
    Each match includes a short excerpt (120 chars before/after) for context.
    """
    _resolve(user, persona)
    sessions_dir = persona_path() / "sessions"
    if not sessions_dir.exists():
        return {"query": q, "matches": [], "total_files_searched": 0}
    pattern = re.compile(re.escape(q), re.IGNORECASE)
    session_files = sorted(sessions_dir.glob("*.md"), reverse=True)  # newest first
    matches = []
    for sf in session_files:
        if len(matches) >= limit:
            break
        try:
            text = sf.read_text()
        except OSError:
            continue
        for m in pattern.finditer(text):
            if len(matches) >= limit:
                break
            start = max(0, m.start() - _CONTEXT_CHARS)
            end   = min(len(text), m.end() + _CONTEXT_CHARS)
            excerpt = text[start:end].strip()
            # Prefix with ellipsis if we truncated the left side
            if start > 0:
                excerpt = "…" + excerpt
            if end < len(text):
                excerpt = excerpt + "…"
            matches.append({
                "date":    sf.stem,          # YYYY-MM-DD
                "excerpt": excerpt,
            })
    return {
        "query":               q,
        "matches":             matches,
        "total_files_searched": len(session_files),
    }
--- a/cortex/routers/local_llm.py
+++ b/cortex/routers/local_llm.py
@@ -0,0 +1,242 @@
 """
 Local LLM settings — per-user host and model configuration.
 Routes:
  GET  /settings/local                      → settings page
  POST /settings/local/host                 → save/create host
  POST /settings/local/models/add           → add model entry
  POST /settings/local/models/{id}/activate → set active model
  POST /settings/local/models/{id}/remove   → remove model entry
  GET  /api/local-llm/fetch-models          → proxy to host /api/models (JSON)
 """
 import logging
 from pathlib import Path
 import httpx
 import jwt
 from fastapi import APIRouter, Form, Request
 from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
 from auth_utils import COOKIE_NAME, decode_token
 from config import settings as app_settings
 import user_settings as us
 logger = logging.getLogger(__name__)
 router = APIRouter()
 _STATIC = Path(__file__).parent.parent / "static"
 # ── Auth helper ───────────────────────────────────────────────────────────────
 def _get_user(request: Request) -> str | None:
    token = request.cookies.get(COOKIE_NAME)
    if not token:
        return None
    try:
        return decode_token(token)
    except jwt.InvalidTokenError:
        return None
 # ── Page renderer ─────────────────────────────────────────────────────────────
 def _render(username: str, success: str = "", error: str = "") -> str:
    cfg     = us.get_config(username)
    hosts   = cfg["hosts"]
    models  = cfg["models"]
    active  = cfg.get("active_model_id")
    # Build a host lookup for model rows
    host_by_id = {h["id"]: h for h in hosts}
    # ── Host section ──────────────────────────────────────────────────────────
    if hosts:
        h = hosts[0]   # one host for now
        host_id_val  = h["id"]
        host_label   = h.get("label", "")
        host_url     = h.get("api_url", "")
        host_key_hint = f"…{h['api_key'][-4:]}" if h.get("api_key") else "not set"
    else:
        host_id_val  = ""
        host_label   = ""
        host_url     = app_settings.local_api_url
        host_key_hint = f"server default (…{app_settings.local_api_key[-4:]})" \
                        if app_settings.local_api_key else "not set"
    # ── Model rows ────────────────────────────────────────────────────────────
    model_rows = ""
    for m in models:
        is_active = m["id"] == active
        host      = host_by_id.get(m["host_id"], {})
        host_name = host.get("label") or host.get("api_url") or "unknown host"
        badge     = '<span class="active-badge">active</span>' if is_active else ""
        activate_btn = (
            '<span class="active-label">✓ Active</span>'
            if is_active else
            f'''<form method="POST" action="/settings/local/models/{m["id"]}/activate" style="display:inline">
                  <button type="submit" class="row-btn">Set active</button>
                </form>'''
        )
        model_rows += f'''
        <div class="model-row{"  model-active" if is_active else ""}">
          <div class="model-info">
            <span class="model-label">{m.get("label") or m["model_name"]}</span>{badge}
            <span class="model-name">{m["model_name"]}</span>
            <span class="model-host">{host_name}</span>
          </div>
          <div class="model-actions">
            {activate_btn}
            <form method="POST" action="/settings/local/models/{m["id"]}/remove" style="display:inline"
                  onsubmit="return confirm('Remove {m.get('label') or m['model_name']}?')">
              <button type="submit" class="row-btn danger">Remove</button>
            </form>
          </div>
        </div>'''
    if not model_rows:
        model_rows = '<p class="empty-note">No models added yet. Use "Add Model" below.</p>'
    # ── Host select for Add Model ─────────────────────────────────────────────
    host_options = "".join(
        f'<option value="{h["id"]}">{h.get("label") or h["api_url"]}</option>'
        for h in hosts
    )
    add_section_hidden = "" if hosts else ' style="display:none"'
    html = (_STATIC / "local_llm.html").read_text()
    first_host_id = hosts[0]["id"] if hosts else ""
    html = html.replace("{{ username }}",         username)
    html = html.replace("{{ host_id }}",          host_id_val)
    html = html.replace("{{ host_label }}",       host_label)
    html = html.replace("{{ host_url }}",         host_url)
    html = html.replace("{{ host_key_hint }}",    host_key_hint)
    html = html.replace("{{ model_rows }}",       model_rows)
    html = html.replace("{{ host_options }}",     host_options)
    html = html.replace("{{ first_host_id }}",    first_host_id)
    html = html.replace("{{ add_section_hidden }}", add_section_hidden)
    html = html.replace("{{ has_host }}",         "true" if hosts else "false")
    if success:
        html = html.replace("<!-- SUCCESS -->", f'<p class="msg success">{success}</p>')
    if error:
        html = html.replace("<!-- ERROR -->",   f'<p class="msg error">{error}</p>')
    return html
 # ── Routes ────────────────────────────────────────────────────────────────────
@router.get("/settings/local", include_in_schema=False)
 async def local_llm_page(request: Request):
    username = _get_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)
    return HTMLResponse(_render(username))
@router.post("/settings/local/host", include_in_schema=False)
 async def save_host(
    request: Request,
    host_id:  str = Form(""),
    label:    str = Form(""),
    api_url:  str = Form(""),
    api_key:  str = Form(""),
 ):
    username = _get_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)
    if not api_url.strip():
        return HTMLResponse(_render(username, error="API URL is required."))
    us.save_host(username, host_id or None, label, api_url, api_key)
    logger.info("local LLM host saved: %s", username)
    return HTMLResponse(_render(username, success="Host saved."))
@router.post("/settings/local/models/add", include_in_schema=False)
 async def add_model(
    request:    Request,
    host_id:    str = Form(...),
    label:      str = Form(""),
    model_name: str = Form(...),
 ):
    username = _get_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)
    if not model_name.strip():
        return HTMLResponse(_render(username, error="Model name is required."))
    us.add_model(username, host_id, label, model_name)
    logger.info("local model added: %s / %s", username, model_name)
    return HTMLResponse(_render(username, success=f"Model \"{label or model_name}\" added."))
@router.post("/settings/local/models/{model_id}/activate", include_in_schema=False)
 async def activate_model(request: Request, model_id: str):
    username = _get_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)
    if not us.set_active_model(username, model_id):
        return HTMLResponse(_render(username, error="Model not found."))
    logger.info("active local model set: %s / %s", username, model_id)
    return HTMLResponse(_render(username, success="Active model updated."))
@router.post("/settings/local/models/{model_id}/remove", include_in_schema=False)
 async def remove_model(request: Request, model_id: str):
    username = _get_user(request)
    if not username:
        return RedirectResponse("/login", status_code=302)
    us.remove_model(username, model_id)
    logger.info("local model removed: %s / %s", username, model_id)
    return HTMLResponse(_render(username, success="Model removed."))
@router.get("/api/local-llm/fetch-models")
 async def fetch_models(request: Request) -> JSONResponse:
    """Proxy to the configured host's /api/models endpoint.
    Returns [{id, name}] sorted by name, or an error dict.
    """
    username = _get_user(request)
    if not username:
        return JSONResponse({"error": "Not authenticated"}, status_code=401)
    cfg   = us.get_config(username)
    hosts = cfg.get("hosts", [])
    # Fall back to .env if no host configured yet
    if hosts:
        h       = hosts[0]
        api_url = h.get("api_url", "")
        api_key = h.get("api_key", "")
    else:
        api_url = app_settings.local_api_url
        api_key = app_settings.local_api_key
    if not api_url:
        return JSONResponse({"error": "No host configured."}, status_code=400)
    url     = api_url.rstrip("/") + "/api/models"
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    try:
        async with httpx.AsyncClient(timeout=8) as client:
            resp = await client.get(url, headers=headers)
        resp.raise_for_status()
        data   = resp.json()
        models = [
            {"id": m["id"], "name": m.get("name") or m["id"]}
            for m in data.get("data", [])
        ]
        models.sort(key=lambda m: m["name"].lower())
        return JSONResponse({"models": models})
    except httpx.HTTPStatusError as e:
        return JSONResponse({"error": f"Host returned {e.response.status_code}"}, status_code=502)
    except Exception as e:
        return JSONResponse({"error": str(e)}, status_code=502)
--- a/cortex/routers/nextcloud_talk.py
+++ b/cortex/routers/nextcloud_talk.py
@@ -1,16 +1,13 @@
 import asyncio
 import hashlib
 import hmac
 import json
 import logging
 import secrets
 import httpx
 from fastapi import APIRouter, BackgroundTasks, HTTPException, Request, Response
 from auth_utils import get_user_channels
 from context_loader import load_context
 from llm_client import complete
 from notification import _send_nct_message
 from persona import set_context
 from session_logger import log_turn
 from session_store import load as load_session, save as save_session
@@ -40,38 +37,8 @@ def _verify_signature(body: bytes, random_header: str, sig_header: str, secret:
 async def _send_reply(conversation_token: str, message: str, nextcloud_url: str, secret: str) -> None:
    """Post a message to Nextcloud Talk as the bot."""
-    url = (
+    logger.info("NCT _send_reply → room %s (%d chars)", conversation_token, len(message))
-        f"{nextcloud_url}/ocs/v2.php/apps/spreed/api/v1"
+    await _send_nct_message(nextcloud_url, secret, conversation_token, message)
        f"/bot/{conversation_token}/message"
    )
    # NC Talk verifies HMAC over (random + message_text), NOT the raw body.
    # See BotController::getBotFromHeaders → checksumVerificationService::validateRequest($random, $sig, $secret, $message)
    body_dict  = {"message": message}
    body_bytes = json.dumps(body_dict, ensure_ascii=False).encode("utf-8")
    random_str = secrets.token_hex(32)
    sig = hmac.new(
        secret.encode(),
        (random_str + message).encode("utf-8"),
        hashlib.sha256,
    ).hexdigest()
    logger.info("NCT _send_reply → %s (body: %s)", url, body_bytes.decode())
    try:
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                url,
                content=body_bytes,
                headers={
                    "Content-Type": "application/json",
                    "OCS-APIRequest": "true",
                    "X-Nextcloud-Talk-Bot-Random": random_str,
                    "X-Nextcloud-Talk-Bot-Signature": sig,
                },
                timeout=15,
            )
        logger.info("NCT reply: %s — %s", resp.status_code, resp.text[:400])
    except Exception as e:
        logger.error("NCT reply error: %s", e)
 async def _process_message(
--- a/cortex/routers/settings.py
+++ b/cortex/routers/settings.py
@@ -55,6 +55,7 @@ def _settings_page(username: str, personas: list[str], success: str = "", error:
        hint = "Using server key"
    html = html.replace("{{ gemini_key_hint }}", hint)
    html = html.replace("{{ gemini_key_set }}", "true" if gemini_key else "false")
    persona_items = "\n".join(
        f'''<li>
          <a href="/{username}/{p}" class="persona-link">{p}</a>
--- a/cortex/scheduler.py
+++ b/cortex/scheduler.py
@@ -30,24 +30,28 @@ async def _run_short() -> None:
 async def _run_mid() -> None:
    from memory_distiller import distill_mid
    from notification import notify
    try:
        result = await distill_mid()
        if "error" in result:
            logger.warning("auto distill mid skipped: %s", result["error"])
        else:
            logger.info("auto distill mid: %d chars via %s", result["chars_written"], result["backend"])
            await notify(result["username"], f"📝 Weekly memory digest complete ({result['chars_written']} chars via {result['backend']}).")
    except Exception as e:
        logger.error("auto distill mid failed: %s", e)
 async def _run_long() -> None:
    from memory_distiller import distill_long
    from notification import notify
    try:
        result = await distill_long()
        if "error" in result:
            logger.warning("auto distill long skipped: %s", result["error"])
        else:
            logger.info("auto distill long: %d chars via %s", result["chars_written"], result["backend"])
            await notify(result["username"], f"🧠 Monthly long-term memory integration complete ({result['chars_written']} chars via {result['backend']}). Worth a quick review.")
    except Exception as e:
        logger.error("auto distill long failed: %s", e)
--- a/cortex/static/app.js
+++ b/cortex/static/app.js
@@ -16,6 +16,44 @@
        const note_vis_btn_el    = document.getElementById('note-vis-btn');
        const settings_btn_el    = document.getElementById('settings-btn');
        const settings_dd_el     = document.getElementById('settings-dropdown');
        const sessionsBackdrop   = document.getElementById('sessions-backdrop');
        // ── Close all panels/dropdowns (mutual exclusion) ─────────────
        function closeAllPanels() {
            if (mode_dropdown_el)  mode_dropdown_el.classList.remove('open');
            if (settings_dd_el)    settings_dd_el.classList.remove('open');
            if (sessionsPanel)     { sessionsPanel.classList.remove('open'); sessionsBackdrop.classList.remove('open'); }
            const pd = document.getElementById('persona-dropdown');
            if (pd) pd.classList.remove('open');
        }
        // ── Toasts ────────────────────────────────────────────────────
        const toastContainer = document.getElementById('toast-container');
        function showToast(message, type = 'info', duration = 2500) {
            const el = document.createElement('div');
            el.className = 'toast' + (type !== 'info' ? ' ' + type : '');
            el.textContent = message;
            toastContainer.appendChild(el);
            requestAnimationFrame(() => {
                requestAnimationFrame(() => el.classList.add('show'));
            });
            setTimeout(() => {
                el.classList.remove('show');
                el.addEventListener('transitionend', () => el.remove(), { once: true });
            }, duration);
        }
        // ── Syntax highlighting ───────────────────────────────────────
        function highlight_code(container) {
            if (typeof hljs === 'undefined') return;
            container.querySelectorAll('pre code').forEach(el => hljs.highlightElement(el));
        }
        // ── Utility helpers ───────────────────────────────────────────
        function _esc(s) {
            return String(s).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
        }
        // ── Lucide icon helpers ───────────────────────────────────────
        function icon_html(name, size = 16) {
@@ -145,6 +183,7 @@
        }
        function open_mode_dropdown() {
            closeAllPanels();
            // Build options in MRU order (least recent at top, most recent at bottom)
            // — bottom is visually closest to the button since dropdown opens upward
            const ordered = [...mode_mru].reverse();
@@ -236,7 +275,9 @@
        // ── Settings dropdown ─────────────────────────────────────────
        settings_btn_el.addEventListener('click', (e) => {
            e.stopPropagation();
-            settings_dd_el.classList.toggle('open');
+            const isOpen = settings_dd_el.classList.contains('open');
            closeAllPanels();
            if (!isOpen) settings_dd_el.classList.add('open');
        });
        document.addEventListener('click', (e) => {
            if (!settings_dd_el.contains(e.target) && e.target !== settings_btn_el) {
@@ -290,7 +331,9 @@
        if (personaSwitcher) {
            personaSwitcher.addEventListener('click', (e) => {
                if (personaDropEl.children.length === 0) return;
-                personaDropEl.classList.toggle('open');
+                const isOpen = personaDropEl.classList.contains('open');
                closeAllPanels();
                if (!isOpen) personaDropEl.classList.add('open');
                e.stopPropagation();
            });
            document.addEventListener('click', () => personaDropEl.classList.remove('open'));
@@ -298,23 +341,40 @@
        // ── Backend toggle ───────────────────────────────────────────
-        fetch('/backend').then(r => r.json()).then(d => setBackendUI(d.primary));
+        fetch('/backend').then(r => r.json()).then(d => setBackendUI(d));
-        function setBackendUI(backend) {
+        const BACKEND_CYCLE = ['claude', 'gemini', 'local'];
        const BACKEND_CLASS = { claude: '', gemini: 'mem-on', local: 'local-on' };
        const backendModelHint = document.getElementById('backend-model-hint');
        function setBackendUI(d) {
            const backend = d.primary || d;  // accept full response obj or bare string
            primaryBackend = backend;
            backendToggle.textContent = backend;
-            backendToggle.className = 'ctx-btn' + (backend === 'gemini' ? ' mem-on' : '');
+            const extra = BACKEND_CLASS[backend] || '';
            backendToggle.className = 'ctx-btn' + (extra ? ' ' + extra : '');
            if (backendModelHint) {
                if (backend === 'local' && d.local_model) {
                    backendModelHint.textContent = d.local_model.label || d.local_model.model_name;
                    backendModelHint.style.display = '';
                } else {
                    backendModelHint.textContent = '';
                    backendModelHint.style.display = 'none';
                }
            }
        }
        backendToggle.addEventListener('click', async () => {
-            const next = primaryBackend === 'claude' ? 'gemini' : 'claude';
+            const idx = BACKEND_CYCLE.indexOf(primaryBackend);
            const next = BACKEND_CYCLE[(idx + 1) % BACKEND_CYCLE.length];
            const res = await fetch('/backend', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ primary: next }),
            });
            const d = await res.json();
-            setBackendUI(d.primary);
+            setBackendUI(d);
            addMessage('system', `Backend: ${d.primary} (fallback: ${d.fallback})`);
        });
@@ -324,17 +384,26 @@
            e.stopPropagation();
            if (sessionsPanel.classList.contains('open')) {
                sessionsPanel.classList.remove('open');
                sessionsBackdrop.classList.remove('open');
                return;
            }
            closeAllPanels();
            const res = await fetch(`/sessions?${_fileParams}`);
            const data = await res.json();
            renderPanel(data.sessions);
            sessionsPanel.classList.add('open');
            sessionsBackdrop.classList.add('open');
        });
        sessionsBackdrop.addEventListener('click', () => {
            sessionsPanel.classList.remove('open');
            sessionsBackdrop.classList.remove('open');
        });
        document.addEventListener('click', (e) => {
            if (!sessionsPanel.contains(e.target) && e.target !== sessionsBtn) {
                sessionsPanel.classList.remove('open');
                sessionsBackdrop.classList.remove('open');
            }
        });
@@ -354,6 +423,7 @@
                sessionEl.textContent = '';
                addMessage('system', 'New session');
                sessionsPanel.classList.remove('open');
                sessionsBackdrop.classList.remove('open');
                inputEl.focus();
            });
            sessionsPanel.appendChild(newItem);
@@ -408,6 +478,7 @@
                        if (sessionId === s.session_id) {
                            sessionEl.textContent = `session: ${newName || s.session_id}`;
                        }
                        if (newName) showToast('Session renamed', 'success');
                    }
                    input.addEventListener('keydown', (e) => {
@@ -431,7 +502,7 @@
                        currentHistory = [];
                        messagesEl.innerHTML = '';
                        sessionEl.textContent = '';
-                        addMessage('system', 'Session deleted');
+                        showToast('Session deleted');
                    }
                    const res = await fetch(`/sessions?${_fileParams}`);
                    const data = await res.json();
@@ -484,6 +555,7 @@
            if (!silent) addMessage('system', `Resumed session ${id}`);
            scrollToBottom();
            sessionsPanel.classList.remove('open');
            sessionsBackdrop.classList.remove('open');
            inputEl.focus();
            persist_session();
        }
@@ -529,6 +601,7 @@
            if (role === 'assistant' && typeof marked !== 'undefined') {
                div.dataset.raw = text;
                div.innerHTML = marked.parse(text);
                highlight_code(div);
                div.querySelectorAll('a').forEach(a => {
                    a.target = '_blank';
                    a.rel = 'noopener noreferrer';
@@ -544,7 +617,9 @@
                div.appendChild(label);
                div.appendChild(content);
            } else {
                div.dataset.raw = text;
                div.textContent = text;
                div.appendChild(makeCopyBtn(div));
            }
            // Wrap user/assistant messages so action buttons can be attached
@@ -699,6 +774,7 @@
            if (role === 'assistant' && typeof marked !== 'undefined') {
                div.dataset.raw = text;
                div.innerHTML = marked.parse(text);
                highlight_code(div);
                div.querySelectorAll('a').forEach(a => {
                    a.target = '_blank';
                    a.rel = 'noopener noreferrer';
@@ -709,6 +785,76 @@
            }
        }
        // ── Agent tool-call step cards ────────────────────────────────
        function renderToolCalls(toolCalls, beforeEl) {
            if (!toolCalls || toolCalls.length === 0) return;
            const container = document.createElement('div');
            container.className = 'tool-calls-container';
            for (const tc of toolCalls) {
                const details = document.createElement('details');
                details.className = 'tool-call';
                // Summary: name + first arg value snippet
                const args    = tc.args || {};
                const argKeys = Object.keys(args);
                let argSnippet = '';
                if (argKeys.length > 0) {
                    const firstVal = String(args[argKeys[0]]);
                    argSnippet = firstVal.length > 60 ? firstVal.slice(0, 60) + '…' : firstVal;
                }
                const summary = document.createElement('summary');
                const nameSpan = document.createElement('span');
                nameSpan.className = 'tc-name';
                nameSpan.textContent = tc.tool;
                summary.appendChild(nameSpan);
                if (argSnippet) {
                    const snippetSpan = document.createElement('span');
                    snippetSpan.className = 'tc-snippet';
                    snippetSpan.textContent = argSnippet;
                    summary.appendChild(snippetSpan);
                }
                details.appendChild(summary);
                // Expanded body
                const body = document.createElement('div');
                body.className = 'tc-body';
                if (argKeys.length > 0) {
                    const sec = document.createElement('div');
                    sec.className = 'tc-section';
                    const lbl = document.createElement('span');
                    lbl.className = 'tc-label';
                    lbl.textContent = 'args';
                    const pre = document.createElement('pre');
                    pre.textContent = JSON.stringify(args, null, 2);
                    sec.appendChild(lbl);
                    sec.appendChild(pre);
                    body.appendChild(sec);
                }
                const resultStr  = tc.result || '';
                const truncated  = resultStr.length > 400;
                const sec2 = document.createElement('div');
                sec2.className = 'tc-section';
                const lbl2 = document.createElement('span');
                lbl2.className = 'tc-label';
                lbl2.textContent = 'result';
                const pre2 = document.createElement('pre');
                pre2.textContent = truncated ? resultStr.slice(0, 400) + '\n…[truncated]' : resultStr;
                sec2.appendChild(lbl2);
                sec2.appendChild(pre2);
                body.appendChild(sec2);
                details.appendChild(body);
                container.appendChild(details);
            }
            beforeEl.parentElement.insertBefore(container, beforeEl);
        }
        function makeCopyBtn(div) {
            const btn = document.createElement('button');
            btn.className = 'copy-btn';
@@ -722,6 +868,7 @@
                } else {
                    fallbackCopy(text);
                }
                showToast('Copied to clipboard', 'success', 1800);
                btn.innerHTML = icon_html('check', 12) + ' copied';
                render_icons();
                btn.classList.add('copied');
@@ -762,7 +909,7 @@
                });
                if (!res.ok) throw new Error(`HTTP ${res.status}`);
            } catch (err) {
-                addMessage('system', `Note save failed: ${err.message}`);
+                showToast(`Note save failed: ${err.message}`, 'error');
            }
        }
@@ -944,11 +1091,7 @@
                currentHistory.push({ role: 'assistant', content: job.response || '' });
                attachHistoryControls(thinkingDiv, assistHistIdx);
-                const n = job.tool_calls?.length || 0;
+                renderToolCalls(job.tool_calls, thinkingDiv.parentElement);
                if (n) {
                    const names = job.tool_calls.map(t => t.name).join(', ');
                    addMessage('system', `⚡ ${n} tool call${n !== 1 ? 's' : ''}: ${names}`);
                }
            } catch (err) {
                if (err.name === 'AbortError') {
@@ -989,17 +1132,94 @@
        // ── File editor ──────────────────────────────────────────────
        const fileModal      = document.getElementById('file-modal');
-        const fileSelect     = document.getElementById('file-select');
+        const fileSidebar    = document.getElementById('file-sidebar');
        const fileEditor     = document.getElementById('file-editor');
        const filePreview    = document.getElementById('file-preview');
        const fileRawBtn     = document.getElementById('file-raw-btn');
        const filePreviewBtn = document.getElementById('file-preview-btn');
        const fileSaveBtn    = document.getElementById('file-save-btn');
        const fileSavedMsg   = document.getElementById('file-saved-msg');
        const fileCloseBtn   = document.getElementById('file-close-btn');
        const filesBtn       = document.getElementById('files-btn');
        let fileMode        = 'preview'; // 'edit' or 'preview'
        let activeFileName  = null;
        // File groups — controls sidebar order and section labels
        const FILE_GROUPS = [
            { label: 'Identity', files: ['IDENTITY.md', 'SOUL.md', 'PROTOCOLS.md', 'CONTEXT_TIERS.md'] },
            { label: 'Memory',   files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] },
            { label: 'Profile',  files: ['USER.md', 'HELP.md'] },
        ];
        function fmtSize(bytes) {
            if (!bytes) return 'empty';
            if (bytes < 1024) return bytes + ' B';
            return (bytes / 1024).toFixed(1) + ' KB';
        }
        function fmtModified(ts) {
            if (!ts) return '';
            const d   = new Date(ts * 1000);
            const now = new Date();
            if (d.toDateString() === now.toDateString()) return 'today';
            const diff = (now - d) / 86400000;
            if (diff < 2) return 'yesterday';
            return d.toLocaleDateString(undefined, { month: 'short', day: 'numeric' });
        }
        function renderFileSidebar(files) {
            const byName = Object.fromEntries(files.map(f => [f.name, f]));
            fileSidebar.innerHTML = '';
            for (const group of FILE_GROUPS) {
                const groupEl = document.createElement('div');
                groupEl.className = 'file-group';
                const header = document.createElement('div');
                header.className = 'fg-header';
                header.textContent = group.label;
                header.addEventListener('click', () => header.classList.toggle('collapsed'));
                groupEl.appendChild(header);
                const items = document.createElement('div');
                items.className = 'fg-items';
                for (const fname of group.files) {
                    const f = byName[fname];
                    if (!f) continue;
                    const item = document.createElement('div');
                    item.className = 'file-item' + (f.exists ? '' : ' missing');
                    item.dataset.name = fname;
                    if (fname === activeFileName) item.classList.add('active');
                    const nameEl = document.createElement('div');
                    nameEl.className = 'fi-name';
                    nameEl.textContent = fname;
                    item.appendChild(nameEl);
                    const metaEl = document.createElement('div');
                    metaEl.className = 'fi-meta';
                    metaEl.innerHTML = `<span>${fmtSize(f.size)}</span>`
                        + (f.modified ? `<span>${fmtModified(f.modified)}</span>` : '');
                    item.appendChild(metaEl);
                    item.addEventListener('click', () => loadFile(fname));
                    items.appendChild(item);
                }
                groupEl.appendChild(items);
                fileSidebar.appendChild(groupEl);
            }
        }
        function setActiveFile(name) {
            activeFileName = name;
            fileSidebar.querySelectorAll('.file-item').forEach(el => {
                el.classList.toggle('active', el.dataset.name === name);
            });
            document.getElementById('file-modal-title').textContent = name;
        }
        function setFileMode(mode) {
            fileMode = mode;
@@ -1023,27 +1243,22 @@
        }
        async function loadFile(name) {
            setActiveFile(name);
            const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`);
            if (!res.ok) { fileEditor.value = `Error loading ${name}`; return; }
            const data = await res.json();
            fileEditor.value = data.content;
            document.getElementById('file-modal-title').textContent = name;
            setFileMode(fileMode);
        }
        async function openFileModal() {
            // Populate the file list
            const res  = await fetch(`/files?${_fileParams}`);
            const data = await res.json();
-            fileSelect.innerHTML = '';
+            renderFileSidebar(data.files);
            for (const f of data.files) {
                const opt = document.createElement('option');
                opt.value = f.name;
                opt.textContent = f.name + (f.exists ? '' : ' (missing)');
                fileSelect.appendChild(opt);
            }
            fileModal.classList.add('open');
-            await loadFile(fileSelect.value);
+            // Load first existing file
            const first = data.files.find(f => f.exists) || data.files[0];
            if (first) await loadFile(first.name);
        }
        filesBtn.addEventListener('click', () => {
@@ -1051,21 +1266,24 @@
            openFileModal();
        });
        fileSelect.addEventListener('change', () => loadFile(fileSelect.value));
        fileRawBtn.addEventListener('click', () => setFileMode('edit'));
        filePreviewBtn.addEventListener('click', () => setFileMode('preview'));
        fileSaveBtn.addEventListener('click', async () => {
-            const name = fileSelect.value;
+            if (!activeFileName) return;
-            const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`, {
+            const res = await fetch(`/files/${encodeURIComponent(activeFileName)}?${_fileParams}`, {
                method: 'PUT',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ content: fileEditor.value }),
            });
            if (res.ok) {
-                fileSavedMsg.classList.add('show');
+                showToast('File saved', 'success');
-                setTimeout(() => fileSavedMsg.classList.remove('show'), 2000);
+                // Refresh sidebar to update size/modified
                const listRes = await fetch(`/files?${_fileParams}`);
                const listData = await listRes.json();
                renderFileSidebar(listData.files);
            } else {
                showToast('Save failed', 'error');
            }
        });
@@ -1075,6 +1293,66 @@
            if (e.target === fileModal) fileModal.classList.remove('open');
        });
        // ── Session search ────────────────────────────────────────────
        const sessionSearchInput   = document.getElementById('session-search-input');
        const sessionSearchBtn     = document.getElementById('session-search-btn');
        const sessionSearchResults = document.getElementById('session-search-results');
        function _showFileView() {
            fileEditor.style.display = '';
            filePreview.style.display = '';
            sessionSearchResults.style.display = 'none';
        }
        function _showSearchResults(html) {
            fileEditor.style.display = 'none';
            filePreview.style.display = 'none';
            sessionSearchResults.style.display = '';
            sessionSearchResults.innerHTML = html;
        }
        async function runSessionSearch() {
            const q = sessionSearchInput.value.trim();
            if (q.length < 2) return;
            sessionSearchBtn.disabled = true;
            sessionSearchBtn.textContent = '…';
            try {
                const res  = await fetch(`/sessions/search?q=${encodeURIComponent(q)}&${_fileParams}&limit=30`);
                const data = await res.json();
                if (!res.ok) { _showSearchResults(`<p class="sr-error">Error: ${data.detail || res.status}</p>`); return; }
                if (!data.matches.length) {
                    _showSearchResults(`<p class="sr-empty">No results for "<strong>${_esc(q)}</strong>" in ${data.total_files_searched} session file(s).</p>`);
                    return;
                }
                let html = `<div class="sr-header">${data.matches.length} result(s) for "<strong>${_esc(q)}</strong>" across ${data.total_files_searched} session(s)</div>`;
                let lastDate = null;
                for (const m of data.matches) {
                    if (m.date !== lastDate) {
                        html += `<div class="sr-date">${m.date}</div>`;
                        lastDate = m.date;
                    }
                    const hi = m.excerpt.replace(new RegExp(_esc(q), 'gi'), s => `<mark>${_esc(s)}</mark>`);
                    html += `<div class="sr-excerpt">${hi}</div>`;
                }
                _showSearchResults(html);
            } catch (e) {
                _showSearchResults(`<p class="sr-error">Search failed: ${e.message}</p>`);
            } finally {
                sessionSearchBtn.disabled = false;
                sessionSearchBtn.textContent = 'Go';
            }
        }
        sessionSearchBtn.addEventListener('click', runSessionSearch);
        sessionSearchInput.addEventListener('keydown', (e) => {
            if (e.key === 'Enter') runSessionSearch();
        });
        // When a file is clicked, switch back from search results to editor
        fileSidebar.addEventListener('click', () => {
            if (sessionSearchResults.style.display !== 'none') _showFileView();
        });
        document.addEventListener('keydown', (e) => {
            if (e.key === 'Escape') {
                if (fileModal.classList.contains('open')) fileModal.classList.remove('open');
--- a/cortex/static/index.html
+++ b/cortex/static/index.html
@@ -21,6 +21,8 @@
    </script>
    <link rel="stylesheet" href="/static/style.css">
    <script src="/static/marked.min.js"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.11.1/styles/atom-one-dark.min.css">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.11.1/highlight.min.js"></script>
    <script src="https://unpkg.com/lucide@latest/dist/umd/lucide.min.js"></script>
 </head>
 <body>
@@ -101,6 +103,7 @@
                <div class="ctx-row">
                    <button id="backend-toggle" class="ctx-btn" title="Click to switch primary backend">claude</button>
                </div>
                <div id="backend-model-hint"></div>
            </div>
            <div class="ctx-section">
                <div class="ctx-section-title">Display</div>
@@ -123,16 +126,28 @@
        <div id="file-modal-inner">
            <div id="file-modal-header">
                <span id="file-modal-title">Context Files</span>
-                <select id="file-select"></select>
+                <span class="fm-spacer"></span>
                <button class="fm-btn" id="file-raw-btn">edit</button>
                <button class="fm-btn active" id="file-preview-btn">preview</button>
                <button class="fm-btn save" id="file-save-btn">Save</button>
                <span id="file-saved-msg">saved ✓</span>
                <button class="fm-btn" id="file-close-btn">✕</button>
            </div>
            <div id="file-modal-content">
                <div id="file-sidebar-wrap">
                    <div id="file-sidebar"></div>
                    <div id="session-search-wrap">
                        <div id="session-search-label">Session Search</div>
                        <div id="session-search-row">
                            <input id="session-search-input" type="search" placeholder="Search sessions…" autocomplete="off">
                            <button id="session-search-btn">Go</button>
                        </div>
                    </div>
                </div>
                <div id="file-modal-body">
                    <textarea id="file-editor" spellcheck="false"></textarea>
                    <div id="file-preview"></div>
                    <div id="session-search-results" style="display:none"></div>
                </div>
            </div>
        </div>
    </div>
@@ -169,6 +184,8 @@
        </div>
    </div>
    <div id="sessions-backdrop"></div>
    <div id="toast-container"></div>
    <script src="/static/app.js"></script>
 </body>
 </html>
--- a/cortex/static/local_llm.html
+++ b/cortex/static/local_llm.html
@@ -0,0 +1,307 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Cortex — Local Models</title>
  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@100..900&display=swap" rel="stylesheet">
  <style>
    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
    body {
      min-height: 100vh;
      background: #0f1117;
      font-family: 'Inter', system-ui, -apple-system, sans-serif;
      font-weight: 450;
      -webkit-font-smoothing: antialiased;
      color: #e2e8f0;
      padding: 2rem 1.5rem 4rem;
    }
    .page { max-width: 640px; margin: 0 auto; }
    /* ── Nav ── */
    .page-nav {
      display: flex; align-items: center; gap: 0.25rem;
      margin-bottom: 1.75rem; flex-wrap: wrap;
    }
    .nav-link {
      display: inline-flex; align-items: center;
      padding: 0.3rem 0.6rem; border-radius: 6px;
      font-size: 0.8rem; font-weight: 500; color: #64748b;
      text-decoration: none; transition: color 0.15s, background 0.15s;
      white-space: nowrap;
    }
    .nav-link:hover { color: #cbd5e1; background: rgba(255,255,255,0.05); }
    .nav-link.active { color: #a78bfa; }
    .nav-spacer { flex: 1; min-width: 0.5rem; }
    .nav-link.nav-logout { color: #475569; }
    .nav-link.nav-logout:hover { color: #94a3b8; background: none; }
    /* ── Page header ── */
    .page-header { margin-bottom: 2rem; padding-bottom: 1rem; border-bottom: 1px solid #2d3148; }
    .page-header h1 { font-size: 1.4rem; font-weight: 700; color: #a78bfa; }
    .page-header p  { font-size: 0.82rem; color: #94a3b8; margin-top: 0.25rem; }
    /* ── Section cards ── */
    .section {
      background: #1a1d27; border: 1px solid #2d3148;
      border-radius: 10px; padding: 1.5rem; margin-bottom: 1.25rem;
    }
    .section h2 {
      font-size: 0.85rem; font-weight: 600; color: #94a3b8;
      text-transform: uppercase; letter-spacing: 0.05em;
      margin-bottom: 1.1rem; padding-bottom: 0.5rem;
      border-bottom: 1px solid #2d3148;
    }
    /* ── Form elements ── */
    .field { margin-bottom: 0.9rem; }
    label {
      display: block; font-size: 0.78rem; font-weight: 500;
      color: #94a3b8; margin-bottom: 0.35rem;
    }
    input[type="text"], input[type="password"], input[type="url"], select {
      width: 100%; padding: 0.6rem 0.8rem;
      background: #0f1117; border: 1px solid #2d3148; border-radius: 6px;
      color: #e2e8f0; font-size: 0.9rem; font-family: inherit;
      outline: none; transition: border-color 0.15s;
    }
    input:focus, select:focus { border-color: #7c3aed; }
    select { cursor: pointer; }
    .field-row { display: flex; gap: 0.75rem; }
    .field-row .field { flex: 1; margin-bottom: 0; }
    .hint { font-size: 0.75rem; color: #94a3b8; margin-top: 0.35rem; }
    /* ── Buttons ── */
    .btn {
      padding: 0.6rem 1.1rem; border: none; border-radius: 6px;
      font-size: 0.88rem; font-weight: 600; cursor: pointer;
      transition: background 0.15s, opacity 0.15s; font-family: inherit;
    }
    .btn-primary { background: #7c3aed; color: #fff; }
    .btn-primary:hover { background: #6d28d9; }
    .btn-secondary {
      background: #1a1d27; color: #94a3b8;
      border: 1px solid #2d3148;
    }
    .btn-secondary:hover { border-color: #94a3b8; color: #e2e8f0; }
    .btn-sm { padding: 0.35rem 0.7rem; font-size: 0.8rem; font-weight: 500; }
    .btn-row { display: flex; gap: 0.6rem; align-items: center; margin-top: 0.5rem; }
    /* ── Model list ── */
    .model-row {
      display: flex; align-items: center; justify-content: space-between;
      gap: 0.75rem; padding: 0.75rem 0.9rem;
      background: #0f1117; border: 1px solid #2d3148; border-radius: 8px;
      margin-bottom: 0.5rem;
    }
    .model-row.model-active { border-color: #7c3aed; background: #13102a; }
    .model-info { display: flex; flex-direction: column; gap: 0.2rem; min-width: 0; }
    .model-label { font-size: 0.9rem; font-weight: 600; color: #e2e8f0; }
    .model-name  { font-size: 0.75rem; color: #64748b; font-family: monospace; word-break: break-all; }
    .model-host  { font-size: 0.72rem; color: #475569; }
    .active-badge {
      display: inline-block; margin-left: 0.5rem;
      padding: 0.1rem 0.45rem; border-radius: 3px;
      background: #4c1d95; color: #c4b5fd;
      font-size: 0.68rem; font-weight: 600; text-transform: uppercase;
      vertical-align: middle;
    }
    .active-label { font-size: 0.8rem; color: #a78bfa; font-weight: 500; }
    .model-actions { display: flex; gap: 0.4rem; flex-shrink: 0; }
    .row-btn {
      padding: 0.3rem 0.65rem; border-radius: 5px; font-size: 0.78rem;
      font-weight: 500; cursor: pointer; font-family: inherit;
      border: 1px solid #2d3148; background: #1a1d27; color: #94a3b8;
      transition: border-color 0.15s, color 0.15s;
    }
    .row-btn:hover { border-color: #7c3aed; color: #a78bfa; }
    .row-btn.danger { color: #f87171; border-color: #2d3148; }
    .row-btn.danger:hover { border-color: #f87171; }
    .empty-note { font-size: 0.85rem; color: #475569; padding: 0.5rem 0; }
    /* ── Fetch models ── */
    #fetch-status { font-size: 0.8rem; color: #94a3b8; margin-top: 0.5rem; min-height: 1.2rem; }
    #fetch-status.ok    { color: #4ade80; }
    #fetch-status.err   { color: #f87171; }
    #model-select-wrap  { display: none; margin-top: 0.75rem; }
    /* ── Messages ── */
    .msg {
      font-size: 0.85rem; text-align: center;
      padding: 0.6rem 1rem; border-radius: 6px; margin-bottom: 1rem;
    }
    .msg.success { color: #4ade80; background: #052e16; border: 1px solid #166534; }
    .msg.error   { color: #f87171; background: #2d0a0a; border: 1px solid #7f1d1d; }
    /* ── Key hint ── */
    .key-status { font-size: 0.75rem; color: #94a3b8; margin-top: 0.35rem; }
  </style>
 </head>
 <body>
  <div class="page">
    <nav class="page-nav">
      <a href="/" class="nav-link">← Chat</a>
      <a href="/help" class="nav-link">Help</a>
      <a href="/settings" class="nav-link">Settings</a>
      <a href="/settings/local" class="nav-link active">Local Models</a>
      <span class="nav-spacer"></span>
      <a href="/logout" class="nav-link nav-logout">Sign out</a>
    </nav>
    <div class="page-header">
      <h1>Local Models</h1>
      <p>Configure your OpenAI-compatible host and models (Open WebUI, Ollama, LM Studio, etc.)</p>
    </div>
    <!-- SUCCESS -->
    <!-- ERROR -->
    <!-- ── Host ── -->
    <div class="section">
      <h2>Host</h2>
      <p style="font-size:0.82rem; color:#94a3b8; margin-bottom:1rem; line-height:1.55;">
        The API server that hosts your local models. Leave the key blank to keep the existing one.
      </p>
      <form method="POST" action="/settings/local/host">
        <input type="hidden" name="host_id" value="{{ host_id }}">
        <div class="field-row">
          <div class="field">
            <label for="host_label">Label</label>
            <input type="text" id="host_label" name="label"
                   value="{{ host_label }}" placeholder="e.g. Home ML Laptop"
                   autocomplete="off" data-form-type="other">
          </div>
          <div class="field" style="flex:2">
            <label for="host_url">API URL</label>
            <input type="text" id="host_url" name="api_url"
                   value="{{ host_url }}" placeholder="http://192.168.x.x:3000"
                   autocomplete="off" spellcheck="false" data-form-type="other">
          </div>
        </div>
        <div class="field">
          <label for="host_key">API Key</label>
          <input type="password" id="host_key" name="api_key"
                 placeholder="{{ host_key_hint }}"
                 autocomplete="new-password"
                 data-1p-ignore data-lpignore="true" data-form-type="other">
          <p class="key-status">Current: {{ host_key_hint }}</p>
        </div>
        <div class="btn-row">
          <button type="submit" class="btn btn-primary btn-sm">Save Host</button>
          <button type="button" id="fetch-btn" class="btn btn-secondary btn-sm"
                  {{ has_host == 'false' and 'disabled title="Save a host first"' or '' }}>
            Fetch models from host
          </button>
          <span id="fetch-status"></span>
        </div>
      </form>
    </div>
    <!-- ── Configured models ── -->
    <div class="section">
      <h2>Models</h2>
      {{ model_rows }}
    </div>
    <!-- ── Add model ── -->
    <div class="section" id="add-section"{{ add_section_hidden }}>
      <h2>Add Model</h2>
      <div id="model-select-wrap">
        <div class="field">
          <label for="model-picker">Available on host</label>
          <select id="model-picker">
            <option value="">— select a model —</option>
          </select>
        </div>
      </div>
      <form method="POST" action="/settings/local/models/add" id="add-form">
        <input type="hidden" name="host_id" value="{{ first_host_id }}">
        <div class="field-row">
          <div class="field">
            <label for="add-label">Label <span style="color:#475569; font-weight:400">(friendly name)</span></label>
            <input type="text" id="add-label" name="label"
                   placeholder="e.g. Qwen3 8B"
                   autocomplete="off" data-form-type="other">
          </div>
          <div class="field" style="flex:2">
            <label for="add-model-name">Model name</label>
            <input type="text" id="add-model-name" name="model_name"
                   placeholder="e.g. test-agent-simple"
                   autocomplete="off" spellcheck="false" data-form-type="other">
          </div>
        </div>
        <button type="submit" class="btn btn-primary btn-sm">Add Model</button>
      </form>
    </div>
  </div>
  <script>
    const fetchBtn    = document.getElementById('fetch-btn');
    const fetchStatus = document.getElementById('fetch-status');
    const picker      = document.getElementById('model-picker');
    const pickerWrap  = document.getElementById('model-select-wrap');
    const labelInput  = document.getElementById('add-label');
    const nameInput   = document.getElementById('add-model-name');
    if (fetchBtn) {
      fetchBtn.addEventListener('click', async () => {
        fetchBtn.disabled = true;
        fetchStatus.textContent  = 'Fetching…';
        fetchStatus.className    = '';
        try {
          const res  = await fetch('/api/local-llm/fetch-models');
          const data = await res.json();
          if (data.error) {
            fetchStatus.textContent = '✗ ' + data.error;
            fetchStatus.className   = 'err';
            return;
          }
          picker.innerHTML = '<option value="">— select a model —</option>';
          for (const m of data.models) {
            const opt   = document.createElement('option');
            opt.value       = m.id;
            opt.textContent = m.name !== m.id ? `${m.name}  (${m.id})` : m.id;
            opt.dataset.id  = m.id;
            opt.dataset.name = m.name;
            picker.appendChild(opt);
          }
          pickerWrap.style.display = 'block';
          fetchStatus.textContent  = `✓ ${data.models.length} model${data.models.length !== 1 ? 's' : ''} found`;
          fetchStatus.className    = 'ok';
        } catch (e) {
          fetchStatus.textContent = '✗ ' + e.message;
          fetchStatus.className   = 'err';
        } finally {
          fetchBtn.disabled = false;
        }
      });
    }
    // Auto-fill label + model name when a model is selected from the picker
    picker.addEventListener('change', () => {
      const opt = picker.options[picker.selectedIndex];
      if (!opt.value) return;
      nameInput.value  = opt.dataset.id  || opt.value;
      // Only pre-fill label if it looks different from the model id
      if (opt.dataset.name && opt.dataset.name !== opt.dataset.id) {
        labelInput.value = opt.dataset.name;
      } else {
        labelInput.value = '';
      }
      nameInput.focus();
    });
  </script>
 </body>
 </html>
--- a/cortex/static/settings.html
+++ b/cortex/static/settings.html
@@ -241,7 +241,8 @@
          <label for="new_username">New username</label>
          <input type="text" id="new_username" name="new_username"
                 value="{{ username }}"
-                 pattern="[a-z_][a-z0-9_\-]{0,31}" required autofocus>
+                 pattern="[a-z_][a-z0-9_\-]{0,31}" required autofocus
                 autocomplete="off" data-form-type="other">
          <p style="font-size:0.75rem; color:#94a3b8; margin-top:0.3rem;">
            Lowercase letters, digits, _ or - only. You will be logged out after renaming.
          </p>
@@ -281,8 +282,9 @@
        <div class="field">
          <label for="gemini_api_key">API Key</label>
          <input type="text" id="gemini_api_key" name="gemini_api_key"
-                 placeholder="{{ gemini_key_hint }}" autocomplete="off"
+                 placeholder="{{ gemini_key_hint }}"
-                 spellcheck="false" data-1p-ignore data-lpignore="true">
+                 autocomplete="new-password" spellcheck="false"
                 data-1p-ignore data-lpignore="true" data-form-type="other">
        </div>
        <button type="submit">Save Key</button>
      </form>
@@ -294,6 +296,20 @@
      </p>
    </div>
    <!-- Local models link -->
    <div class="section">
      <h2>Local Models</h2>
      <p style="font-size:0.8rem; color:#94a3b8; margin-bottom:0.85rem; line-height:1.55;">
        Configure OpenAI-compatible hosts and models (Open WebUI, Ollama, LM Studio, etc.).
      </p>
      <a href="/settings/local"
         style="display:inline-block; padding:0.55rem 1rem; background:#7c3aed; border-radius:6px;
                color:#fff; font-size:0.88rem; font-weight:600; text-decoration:none;
                transition:background 0.15s;">
        Manage local models →
      </a>
    </div>
    <!-- Change password -->
    <div class="section">
      <h2>Change Password</h2>
--- a/cortex/static/style.css
+++ b/cortex/static/style.css
@@ -431,6 +431,8 @@
            padding: 0;
            font-size: 0.85em;
        }
        /* Syntax highlighting — app theme controls the pre background; hljs adds token colors */
        .message.assistant pre code.hljs { background: transparent; padding: 0; }
        .message.system {
            align-self: center;
@@ -440,6 +442,80 @@
            padding: 2px 0;
        }
        /* ── Tool call step cards (agent mode) ── */
        .tool-calls-container {
            display: flex;
            flex-direction: column;
            gap: 3px;
            margin: 4px 0 6px;
            align-self: stretch;
        }
        .tool-call {
            background: var(--surface);
            border: 1px solid var(--border);
            border-radius: 6px;
            overflow: hidden;
            font-size: 0.78rem;
        }
        .tool-call summary {
            display: flex;
            align-items: baseline;
            gap: 0.5rem;
            padding: 0.35rem 0.65rem;
            cursor: pointer;
            list-style: none;
            user-select: none;
            color: var(--muted);
        }
        .tool-call summary::-webkit-details-marker { display: none; }
        .tool-call summary::before {
            content: '▶';
            font-size: 0.55rem;
            color: var(--muted);
            transition: transform 0.12s;
            flex-shrink: 0;
        }
        .tool-call[open] summary::before { transform: rotate(90deg); }
        .tool-call summary:hover { color: var(--text); background: rgba(255,255,255,0.03); }
        .tc-name {
            font-weight: 600;
            color: var(--accent);
            font-family: 'Courier New', monospace;
        }
        .tc-snippet {
            color: var(--muted);
            overflow: hidden;
            text-overflow: ellipsis;
            white-space: nowrap;
            max-width: 36ch;
        }
        .tc-body {
            padding: 0 0.65rem 0.5rem;
            display: flex;
            flex-direction: column;
            gap: 0.4rem;
        }
        .tc-section { display: flex; flex-direction: column; gap: 2px; }
        .tc-label {
            font-size: 0.68rem;
            font-weight: 600;
            text-transform: uppercase;
            letter-spacing: 0.05em;
            color: var(--muted);
        }
        .tc-body pre {
            margin: 0;
            background: var(--pre-bg);
            border: 1px solid var(--border);
            border-radius: 4px;
            padding: 6px 8px;
            font-size: 0.78rem;
            white-space: pre-wrap;
            word-break: break-word;
            color: var(--text);
            overflow-x: auto;
        }
        .message.error {
            align-self: flex-start;
            background: var(--error-bg);
@@ -451,7 +527,7 @@
        .message.thinking { color: var(--muted); font-style: italic; }
        /* Copy button */
-        .message.assistant { position: relative; }
+        .message.assistant, .message.user { position: relative; }
        .copy-btn {
            display: inline-flex;
@@ -471,7 +547,8 @@
            transition: opacity 0.15s, color 0.15s, border-color 0.15s;
        }
-        .message.assistant:hover .copy-btn { opacity: 1; }
+        .message.assistant:hover .copy-btn,
        .message.user:hover .copy-btn { opacity: 1; }
        .copy-btn:hover  { color: var(--text); border-color: var(--muted); }
        .copy-btn.copied { color: var(--success); border-color: var(--success-dim); }
@@ -807,22 +884,12 @@
            flex-shrink: 0;
        }
        #file-modal-header select {
            background: var(--surface);
            border: 1px solid var(--border);
            border-radius: 5px;
            color: var(--text);
            font-size: 0.85rem;
            padding: 4px 8px;
            cursor: pointer;
        }
        #file-modal-title {
            font-size: 0.9rem;
            font-weight: 600;
            color: var(--accent);
            flex: 1;
        }
        .fm-spacer { flex: 1; }
        .fm-btn {
            background: var(--bg);
@@ -838,13 +905,153 @@
        .fm-btn.active { color: var(--accent); border-color: var(--accent); }
        .fm-btn.save   { color: var(--accent); border-color: var(--inara-border); }
        .fm-btn.save:hover { background: var(--inara-bg); }
-        #file-saved-msg {
+        #file-modal-content {
-            font-size: 0.75rem;
+            flex: 1;
-            color: #6abf6a;
+            display: flex;
-            opacity: 0;
+            overflow: hidden;
-            transition: opacity 0.3s;
+        }
        /* ── File sidebar ── */
        #file-sidebar-wrap {
            width: 190px;
            flex-shrink: 0;
            border-right: 1px solid var(--border);
            display: flex;
            flex-direction: column;
            background: var(--bg);
        }
        #file-sidebar {
            flex: 1;
            overflow-y: auto;
        }
        /* ── Session search (within sidebar) ── */
        #session-search-wrap {
            border-top: 1px solid var(--border);
            padding: 8px 8px 10px;
        }
        #session-search-label {
            font-size: 0.65rem;
            font-weight: 700;
            text-transform: uppercase;
            letter-spacing: 0.06em;
            color: var(--muted);
            margin-bottom: 5px;
        }
        #session-search-row {
            display: flex;
            gap: 4px;
        }
        #session-search-input {
            flex: 1;
            min-width: 0;
            background: var(--surface);
            border: 1px solid var(--border);
            border-radius: 4px;
            color: var(--text);
            font-size: 0.78rem;
            padding: 3px 6px;
        }
        #session-search-btn {
            background: var(--surface);
            border: 1px solid var(--border);
            border-radius: 4px;
            color: var(--muted);
            font-size: 0.78rem;
            padding: 3px 8px;
            cursor: pointer;
        }
        #session-search-btn:hover { color: var(--accent); border-color: var(--accent); }
        /* ── Session search results panel ── */
        #session-search-results {
            flex: 1;
            overflow-y: auto;
            padding: 12px 14px;
            font-size: 0.82rem;
        }
        .sr-header { color: var(--muted); font-size: 0.72rem; margin-bottom: 10px; }
        .sr-date {
            font-size: 0.7rem;
            font-weight: 700;
            text-transform: uppercase;
            letter-spacing: 0.05em;
            color: var(--accent);
            margin: 14px 0 4px;
        }
        .sr-date:first-of-type { margin-top: 0; }
        .sr-excerpt {
            background: var(--surface);
            border-left: 2px solid var(--border);
            border-radius: 0 4px 4px 0;
            padding: 6px 10px;
            margin-bottom: 6px;
            line-height: 1.5;
            white-space: pre-wrap;
            word-break: break-word;
            color: var(--text);
        }
        .sr-excerpt mark {
            background: rgba(139,92,246,0.25);
            color: var(--accent);
            border-radius: 2px;
            padding: 0 1px;
        }
        .sr-empty, .sr-error { color: var(--muted); padding: 8px 0; }
        .fg-header {
            display: flex;
            align-items: center;
            gap: 0.3rem;
            padding: 7px 10px 5px;
            font-size: 0.68rem;
            font-weight: 700;
            text-transform: uppercase;
            letter-spacing: 0.06em;
            color: var(--muted);
            cursor: pointer;
            user-select: none;
        }
        .fg-header::before {
            content: '▾';
            font-size: 0.7rem;
            transition: transform 0.15s;
        }
        .fg-header.collapsed::before { transform: rotate(-90deg); }
        .fg-header.collapsed + .fg-items { display: none; }
        .fg-items { display: flex; flex-direction: column; }
        .file-item {
            padding: 6px 10px 6px 16px;
            cursor: pointer;
            border-left: 2px solid transparent;
            transition: background 0.1s, border-color 0.1s;
        }
        .file-item:hover { background: var(--surface); }
        .file-item.active {
            background: var(--inara-bg);
            border-left-color: var(--accent);
        }
        .file-item.missing { opacity: 0.45; }
        .fi-name {
            font-size: 0.8rem;
            color: var(--text);
            font-weight: 500;
            white-space: nowrap;
            overflow: hidden;
            text-overflow: ellipsis;
        }
        .file-item.active .fi-name { color: var(--accent); }
        .fi-meta {
            display: flex;
            gap: 0.5rem;
            margin-top: 2px;
            font-size: 0.68rem;
            color: var(--muted);
        }
        #file-saved-msg.show { opacity: 1; }
        #file-modal-body {
            flex: 1;
@@ -938,6 +1145,11 @@
        .ctx-btn:hover    { color: var(--text); border-color: var(--muted); }
        .ctx-btn.active   { color: var(--accent); border-color: var(--accent); }
        .ctx-btn.mem-on   { color: var(--success); border-color: var(--success-dim); }
        .ctx-btn.local-on { color: #f59e0b; border-color: #92400e; }
        #backend-model-hint {
            font-size: 0.68rem; color: #f59e0b; opacity: 0.8;
            margin-top: 4px; word-break: break-all; line-height: 1.3;
        }
        #ctx-distill-status {
            margin-top: 6px;
@@ -1173,6 +1385,48 @@
        #auth-banner-close:hover { opacity: 1; }
        /* ── Toasts ──────────────────────────────────────────────── */
        #toast-container {
            position: fixed;
            bottom: 1.25rem;
            right: 1.25rem;
            display: flex;
            flex-direction: column;
            align-items: flex-end;
            gap: 0.4rem;
            z-index: 9999;
            pointer-events: none;
        }
        .toast {
            padding: 0.45rem 0.85rem;
            border-radius: 6px;
            font-size: 0.8rem;
            font-weight: 500;
            color: #fff;
            background: #334155;
            border: 1px solid #475569;
            box-shadow: 0 4px 12px rgba(0,0,0,0.35);
            opacity: 0;
            transform: translateY(6px);
            transition: opacity 0.18s ease, transform 0.18s ease;
            pointer-events: none;
            white-space: nowrap;
        }
        .toast.show { opacity: 1; transform: translateY(0); }
        .toast.success { background: #14532d; border-color: #16a34a; }
        .toast.error   { background: #7f1d1d; border-color: #dc2626; }
        /* Sessions backdrop — hidden by default, visible only as mobile drawer overlay */
        #sessions-backdrop {
            display: none;
            position: fixed;
            inset: 0;
            background: rgba(0, 0, 0, 0.5);
            z-index: 98;
            animation: backdrop-in 0.2s ease;
        }
        @keyframes backdrop-in { from { opacity: 0; } to { opacity: 1; } }
        /* ── Mobile responsive ───────────────────────────────────── */
        @media (max-width: 520px) {
            header { padding: 8px 12px; gap: 8px; }
@@ -1233,6 +1487,36 @@
            /* Larger touch targets */
            #send, #stop { padding: 12px 14px; font-size: 1rem; }
            /* File modal: sidebar collapses to a narrow strip */
            #file-modal-inner { width: 100vw; height: 100dvh; border-radius: 0; }
            #file-sidebar-wrap { width: 130px; }
            .fi-meta { display: none; }
            /* Sessions backdrop active on mobile */
            #sessions-backdrop.open { display: block; }
            /* Sessions panel → full-height drawer sliding in from the right */
            #sessions-panel {
                display: block !important; /* keep rendered so transition works */
                position: fixed;
                top: 0;
                right: 0;
                bottom: 0;
                width: min(300px, 85vw);
                max-height: none;
                height: 100%;
                border-radius: 0;
                border-top: none;
                border-right: none;
                border-bottom: none;
                border-left: 1px solid var(--border);
                transform: translateX(110%);
                transition: transform 0.25s ease;
                z-index: 99;
                overflow-y: auto;
            }
            #sessions-panel.open { transform: translateX(0); }
        }
        /* ── Touch devices — no hover capability ─────────────────── */
--- a/cortex/user_settings.py
+++ b/cortex/user_settings.py
@@ -0,0 +1,194 @@
 """
 Per-user settings stored in home/{user}/local_llm.json.
 Structure:
  {
    "hosts": [{"id", "label", "api_url", "api_key"}, ...],
    "models": [{"id", "host_id", "label", "model_name"}, ...],
    "active_model_id": "<model id>" | null
  }
 Values not configured here fall back to .env server defaults.
 """
 import json
 import logging
 import secrets
 from pathlib import Path
 from config import settings as app_settings
 logger = logging.getLogger(__name__)
 def _llm_path(username: str) -> Path:
    return app_settings.home_root() / username / "local_llm.json"
 def _empty() -> dict:
    return {"hosts": [], "models": [], "active_model_id": None}
 def _load(username: str) -> dict:
    path = _llm_path(username)
    if not path.exists():
        return _empty()
    try:
        data = json.loads(path.read_text())
    except (json.JSONDecodeError, OSError):
        logger.warning("local_llm.json for %s is unreadable — starting fresh", username)
        return _empty()
    # Migrate old single-model format {api_url, api_key, model} → new format
    if "hosts" not in data:
        return _migrate_v0(data)
    return data
 def _migrate_v0(old: dict) -> dict:
    """Migrate flat {api_url, api_key, model} → hosts/models structure."""
    data = _empty()
    api_url    = old.get("api_url")    or app_settings.local_api_url
    api_key    = old.get("api_key")    or app_settings.local_api_key
    model_name = old.get("model")      or app_settings.local_model
    if not api_url:
        return data
    host_id = secrets.token_hex(4)
    data["hosts"].append({
        "id":      host_id,
        "label":   "Local Model Server",
        "api_url": api_url,
        "api_key": api_key,
    })
    if model_name:
        model_id = secrets.token_hex(4)
        data["models"].append({
            "id":         model_id,
            "host_id":    host_id,
            "label":      model_name,
            "model_name": model_name,
        })
        data["active_model_id"] = model_id
    logger.info("migrated local_llm.json v0 → v1 for user (host=%s)", host_id)
    return data
 def _save(username: str, data: dict) -> None:
    _llm_path(username).write_text(json.dumps(data, indent=2))
 # ── Public read API ───────────────────────────────────────────────────────────
 def get_config(username: str) -> dict:
    """Return the full local LLM config for the user."""
    return _load(username)
 def get_active_local_model(username: str) -> dict | None:
    """Return effective {api_url, api_key, model_name, label} for the active model.
    Resolution order:
      1. User's active model + its host config
      2. .env server defaults (LOCAL_API_URL / LOCAL_API_KEY / LOCAL_MODEL)
      3. None — caller should raise a helpful error
    """
    data = _load(username)
    active_id = data.get("active_model_id")
    model = next((m for m in data["models"] if m["id"] == active_id), None)
    if model:
        host = next((h for h in data["hosts"] if h["id"] == model["host_id"]), None)
        if host:
            return {
                "api_url":    host.get("api_url", ""),
                "api_key":    host.get("api_key", ""),
                "model_name": model["model_name"],
                "label":      model.get("label") or model["model_name"],
            }
    # Fall back to .env defaults
    if app_settings.local_api_url and app_settings.local_model:
        return {
            "api_url":    app_settings.local_api_url,
            "api_key":    app_settings.local_api_key,
            "model_name": app_settings.local_model,
            "label":      app_settings.local_model,
        }
    return None
 # ── Host management ───────────────────────────────────────────────────────────
 def save_host(username: str, host_id: str | None,
              label: str, api_url: str, api_key: str) -> str:
    """Create or update a host. Returns the host ID.
    api_key is only written when non-empty, so submitting a masked placeholder
    with a blank key field leaves the stored key unchanged.
    """
    data = _load(username)
    if host_id:
        for h in data["hosts"]:
            if h["id"] == host_id:
                h["label"]   = label.strip()
                h["api_url"] = api_url.strip()
                if api_key.strip():
                    h["api_key"] = api_key.strip()
                break
        else:
            host_id = None  # ID not found — fall through to create
    if not host_id:
        host_id = secrets.token_hex(4)
        data["hosts"].append({
            "id":      host_id,
            "label":   label.strip(),
            "api_url": api_url.strip(),
            "api_key": api_key.strip(),
        })
    _save(username, data)
    return host_id
 # ── Model management ──────────────────────────────────────────────────────────
 def add_model(username: str, host_id: str, label: str, model_name: str) -> str:
    """Add a model entry. Auto-activates if it is the first model. Returns the model ID."""
    data = _load(username)
    model_id = secrets.token_hex(4)
    data["models"].append({
        "id":         model_id,
        "host_id":    host_id,
        "label":      label.strip() or model_name.strip(),
        "model_name": model_name.strip(),
    })
    if not data.get("active_model_id"):
        data["active_model_id"] = model_id
    _save(username, data)
    return model_id
 def remove_model(username: str, model_id: str) -> None:
    data = _load(username)
    data["models"] = [m for m in data["models"] if m["id"] != model_id]
    if data.get("active_model_id") == model_id:
        data["active_model_id"] = data["models"][0]["id"] if data["models"] else None
    _save(username, data)
 def set_active_model(username: str, model_id: str) -> bool:
    """Set the active model. Returns False if the model ID is not found."""
    data = _load(username)
    if not any(m["id"] == model_id for m in data["models"]):
        return False
    data["active_model_id"] = model_id
    _save(username, data)
    return True
--- a/docs/OPEN_WEBUI_API.md
+++ b/docs/OPEN_WEBUI_API.md
@@ -0,0 +1,276 @@
 # Open WebUI API Reference for Cortex
 > Last updated: 2026-04-03
 > Source: https://docs.openwebui.com/reference/api-endpoints/
 > Host in use: `http://192.168.32.19:3000` (scott_gaming — 8 GB VRAM)
 ## Local Model Performance (scott_gaming, 8 GB VRAM)
 | Model | Alias | Speed | Practical Context | Spec Context |
 |---|---|---|---|---|
 | Gemma 4 E4B | `agent-support-gemma-small` | ~25 t/s | **72k tokens** | 128k |
 | Gemma 4 26B A4B (MoE) | `agent-support-gemma-medium` | ~9 t/s | **50k tokens** | 256k |
 Context is VRAM-constrained — spec limits are higher but KV cache fills available VRAM first.
 Techniques to improve: lower KV cache quantization, flash attention, context length tuning in Ollama.
 **Practical implications for the local orchestrator:**
 - System prompt + memory (T2) + tool results + history: budget ~40-50k for small, ~35-40k for medium
 - Medium at 9 t/s is fine for background/async tasks; small at 25 t/s is responsive enough for interactive use
 - Both are well above what's needed for most tool loop iterations (~2-5k tokens per round)
 ---
 ## Authentication
 All API calls use a bearer token:
 ```
 Authorization: Bearer sk-<api-key>
 ```
 API keys are managed in Open WebUI → Settings → Account → API Keys.
 Cortex stores these per-user in `home/{username}/local_llm.json` → `hosts[].api_key`.
 ---
 ## Core Endpoints Used by Cortex
 ### List Available Models
 ```
 GET /api/models
 Authorization: Bearer sk-...
 ```
 Returns all models (Ollama, OpenAI-proxied, custom functions).
 Used by `/api/local-llm/fetch-models` in `routers/local_llm.py`.
 Response shape:
 ```json
 {
  "data": [
    { "id": "gemma4-e4b", "name": "Gemma 4 E4B" },
    ...
  ]
 }
 ```
 ### Chat Completions (OpenAI-compatible)
 ```
 POST /api/chat/completions
 Authorization: Bearer sk-...
 Content-Type: application/json
 ```
 Standard OpenAI chat format. Supports:
 - `messages` — standard role/content array
 - `model` — model ID or workspace alias
 - `tools` + `tool_choice` — function calling (see Tool Loop below)
 - `stream: true/false`
 This is the endpoint used by `_local()` in `llm_client.py`.
 ### Anthropic Messages API Compatibility
 ```
 POST /api/v1/messages
 Authorization: Bearer sk-...
 ```
 Open WebUI also accepts Anthropic-format requests and auto-converts them.
 Could be used to route Claude SDK calls through Open WebUI.
 Base URL for this mode: `http://192.168.32.19:3000/api`
 ### Direct Ollama Proxy
 ```
 GET  /ollama/api/tags        — list models
 POST /ollama/api/generate    — streaming completions
 POST /ollama/api/embed       — generate embeddings
 ```
 Use these if you need to bypass Open WebUI's filter layer and hit Ollama directly.
 Ollama is also accessible directly at `http://192.168.32.19:11434`.
 ---
 ## Tool / Function Calling
 Both Gemma 4 models (E4B and 26B A4B) support function calling via the standard
 OpenAI `tools` parameter. Open WebUI passes this through to the underlying model.
 ### Request Format
 ```json
 POST /api/chat/completions
 {
  "model": "gemma4-26b-a4b",
  "messages": [
    { "role": "system", "content": "..." },
    { "role": "user",   "content": "What's the weather?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "web_search",
        "description": "Search the web for current information",
        "parameters": {
          "type": "object",
          "properties": {
            "query": { "type": "string", "description": "Search query" }
          },
          "required": ["query"]
        }
      }
    }
  ],
  "tool_choice": "auto"
 }
 ```
 ### Tool Call Response
 When the model wants to call a tool, it returns `finish_reason: "tool_calls"`:
 ```json
 {
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "web_search",
          "arguments": "{\"query\": \"current weather NYC\"}"
        }
      }]
    }
  }]
 }
 ```
 ### Sending Tool Results Back
 Append the assistant's tool_call message and a tool result message, then re-submit:
 ```json
 {
  "messages": [
    { "role": "user",      "content": "What's the weather?" },
    { "role": "assistant", "content": null,
      "tool_calls": [{ "id": "call_abc123", "function": { "name": "web_search", "arguments": "..." } }] },
    { "role": "tool",      "tool_call_id": "call_abc123",
      "content": "Current weather in NYC: 62°F, partly cloudy." }
  ],
  "tools": [...],
  "tool_choice": "auto"
 }
 ```
 Repeat until `finish_reason: "stop"`.
 ---
 ## RAG (Retrieval Augmented Generation)
 ### Upload a File
 ```
 POST /api/v1/files/
 Authorization: Bearer sk-...
 Content-Type: multipart/form-data
 file=@/path/to/document.pdf
 ```
 Returns a file ID. Poll `/api/v1/files/{id}/process/status` until `completed`.
 ### Knowledge Collections
 ```
 POST /api/v1/knowledge/{collection_id}/file/add
 { "file_id": "..." }
 ```
 ### Use in Chat
 Reference files or knowledge collections in any chat request:
 ```json
 {
  "model": "gemma4-26b-a4b",
  "messages": [...],
  "files": [
    { "type": "file",       "id": "file-id" },
    { "type": "collection", "id": "collection-id" }
  ]
 }
 ```
 ### Process a Web URL into a Collection
 ```
 POST /api/v1/retrieval/process/web
 { "url": "https://example.com/article", "collection_id": "..." }
 ```
 ---
 ## Filter Behavior with Direct API Calls
 Open WebUI supports inlet/outlet filter pipelines. With direct API access:
 | Filter    | Runs automatically? |
 |-----------|---------------------|
 | `inlet()` | ✅ Yes              |
 | `stream()`| ✅ Yes              |
 | `outlet()`| ❌ Manual only — call `POST /api/chat/completed` after receiving response |
 For Cortex's use case (tool loop orchestration), this is not a concern — we're
 driving the loop ourselves and don't rely on Open WebUI's filter pipeline.
 ---
 ## Relevant Cortex Files
 | File | Purpose |
 |---|---|
 | `cortex/llm_client.py` — `_local()` | Current local backend (direct chat only) |
 | `cortex/routers/local_llm.py` | Local model settings page + fetch-models endpoint |
 | `cortex/user_settings.py` | Per-user host + model config (`local_llm.json`) |
 | `cortex/orchestrator_engine.py` | Gemini API tool loop — reference for local version |
 | `home/{user}/local_llm.json` | Stored host/model config |
 ---
 ## Planned: Local Orchestrator (`local_orchestrator_engine.py`)
 A local equivalent of `orchestrator_engine.py` that:
 1. Takes the same tool definitions already registered in `cortex/tools/`
 2. Converts them to OpenAI `tools` format (already close — minor schema diff from Gemini)
 3. Runs a ReAct loop against the local model via `/api/chat/completions`
 4. Falls back gracefully if the model doesn't return a valid tool call
 See `documentation/TODO__Agents.md` — `[Local] Tool-capable local orchestrator`.
 Model recommendation:
 - **Gemma 4 26B A4B** (256k ctx, MoE — fast for its size) for complex tool tasks
 - **Gemma 4 E4B** (128k ctx) for lightweight/fast tasks
 ---
 ## Notes
 - Open WebUI workspace aliases (e.g. `agent-support-gemma-small`) resolve to the
  underlying Ollama model — use aliases in Cortex for human-friendly model names.
 - `tool_choice: "auto"` lets the model decide; `"none"` forces plain text response;
  `{"type": "function", "function": {"name": "..."}}` forces a specific tool.
 - Gemma 4 models support combined tool use + reasoning (thinking tokens) — useful
  for complex multi-step tasks.
 - For embeddings (future RAG work), use `/ollama/api/embed` directly.
--- a/documentation/ARCH__BACKENDS.md
+++ b/documentation/ARCH__BACKENDS.md
@@ -0,0 +1,106 @@
 # Architecture: LLM Backends
 > How Cortex talks to AI models.
 > Last updated: 2026-04-03
 ---
 ## Three Backends
 | Backend | Used for | Auth | Config |
 |---|---|---|---|
 | **Claude CLI** | Primary chat, all user-facing responses | OAuth token from `~/.claude/.credentials.json` | `DEFAULT_MODEL` in `.env` |
 | **Gemini CLI** | Fallback when Claude unavailable | Gemini CLI credentials | Auto-fallback |
 | **Local (Open WebUI)** | Private/offline tasks, cost-free use | API key per user in `local_llm.json` | `/settings/local` UI |
 The **Gemini API** (google-genai SDK) is also used — but only by the orchestrator tool loop, not as a general chat backend. See [`ARCH__FUTURE.md`](ARCH__FUTURE.md) for the orchestrator pattern.
 ---
 ## Backend Selection
 User toggles backend in the UI: `claude → gemini → local` (cycles). The active backend is stored server-side; the UI reflects it with color coding (default / green / amber).
 When local is active, the active model name appears below the toggle button.
 **Fallback chain** (automatic, on error):
 ```
 claude  → gemini
 gemini  → claude
 local   → claude
 ```
 Auth expiry on Claude triggers a UI banner + `claude_auth_expired` SSE event.
 ---
 ## Claude Backend (`_claude()`)
 Runs `claude --print --no-session-persistence --output-format text` as a subprocess.
 - System prompt passed via `--system-prompt`
 - Conversation history formatted as `<conversation>` block
 - Token read live from `~/.claude/.credentials.json` on every call — never relies on the env var, which goes stale after `claude auth login`
 - Model override via `--model` flag (e.g. `claude-opus-4-6`)
 Timeout: `TIMEOUT_CLAUDE=60` seconds (`.env`)
 ---
 ## Gemini CLI Backend (`_gemini()`)
 Runs `gemini --output-format text --extensions "" -p <prompt>` as a subprocess.
 - `--extensions ""` disables all MCP extensions — prevents child processes from keeping pipes open after responding
 - `start_new_session=True` puts the process in its own group for clean `os.killpg` on timeout
 - Output is cleaned to strip CLI noise lines (loading messages, retry notices, quota warnings)
 Timeout: `TIMEOUT_GEMINI=120` seconds (`.env`)
 ---
 ## Local Backend (`_local()`)
 HTTP POST to Open WebUI's OpenAI-compatible endpoint: `{api_url}/api/chat/completions`.
 Per-user config in `home/{user}/local_llm.json`:
 ```json
 {
  "hosts": [{"id": "...", "label": "scott_gaming", "api_url": "http://192.168.32.19:3000", "api_key": "sk-..."}],
  "models": [{"id": "...", "host_id": "...", "label": "Gemma 4 Small", "model_name": "agent-support-gemma-small"}],
  "active_model_id": "..."
 }
 ```
 Resolution order for active model:
 1. User's `active_model_id` in `local_llm.json`
 2. `.env` server defaults (`LOCAL_API_URL` / `LOCAL_MODEL`)
 3. Error — user is prompted to configure at `/settings/local`
 Timeout: `TIMEOUT_LOCAL=300` seconds (`.env`) — local models may need to load from disk.
 **Manage at:** `/settings/local` — supports multiple hosts and models per user, "Fetch from host" button to populate model list from the server.
 ---
 ## Distillation Backends
 Memory distillation runs on a schedule and uses the LLM for mid and long distill passes. By default uses the primary backend (`claude`). Override in `.env`:
 ```
 DISTILL_BACKEND_MID=local   # saves API credits — Gemma handles summarization well
 DISTILL_BACKEND_LONG=       # empty = use primary (claude recommended for quality)
 ```
 ---
 ## Current Local Models (scott_gaming, 8 GB VRAM)
 | Model | Alias | Speed | Practical Context |
 |---|---|---|---|
 | Gemma 4 E4B | `agent-support-gemma-small` | ~25 t/s | **72k tokens** |
 | Gemma 4 26B A4B (MoE) | `agent-support-gemma-medium` | ~9 t/s | **50k tokens** |
 Both support OpenAI `tools` / `tool_choice` function calling — required for the local orchestrator.
 Full Open WebUI API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
--- a/documentation/ARCH__CHANNELS.md
+++ b/documentation/ARCH__CHANNELS.md
@@ -0,0 +1,149 @@
 # Architecture: Input Channels
 > How messages reach Cortex and how Cortex reaches back.
 > Last updated: 2026-04-03
 ---
 ## Channel Summary
 | Channel | Direction | Auth | Endpoint |
 |---|---|---|---|
 | Web UI | In + Out | JWT session cookie | `/{user}/{persona}` |
 | Nextcloud Talk | In + Out | HMAC-SHA256 | `POST /webhook/nextcloud/{username}` |
 | Google Chat | In + Out | JWT (Google system token) | `POST /channels/google-chat/{username}` |
 | Cron | Out (proactive) | Internal | APScheduler |
 | Webhooks | In (future) | TBD | `POST /webhook/{source}` |
 **Per-user config:** Each channel that needs secrets (NC Talk bot key, Google Chat audience) stores them in `home/{username}/channels.json`. No channel access by default — each user sets up their own.
 ---
 ## Web UI
 Single-page app served from `cortex/static/`. All chat happens via `POST /chat` (streaming SSE for real-time response) or `POST /orchestrate` (async job, polled).
 **Session auth:** Login form (`/login`) → bcrypt password check → JWT cookie (30-day expiry). Google OAuth also available (`/auth/google`). All non-public routes require a valid cookie.
 **Modes:**
 - **Direct** — message goes straight to LLM via `/chat`
 - **Agent** — message goes to orchestrator (`/orchestrate`), tool loop runs, result polled and streamed into UI
 **Context + Memory panel:** Shows current backend (claude/gemini/local), memory tier, active local model. Toggle backend cycles claude → gemini → local.
 **Files panel:** Browse and edit persona markdown files in-browser. Session search at the bottom.
 **Settings:** `/settings` — Gemini API key, Google account, connected status. `/settings/local` — local model hosts and models.
 ---
 ## Nextcloud Talk
 Bot integration. The bot is registered in a Talk room; it receives messages, generates a response, and sends it back via the NC Talk bot API.
 **Incoming:** `POST /webhook/nextcloud/{username}`
 - Signature verified: `HMAC-SHA256(secret, random + raw_body)`
 - Ignores non-Create events and non-Note types
 - Strips `@{persona}` mention prefix from message text
 - Processes in background task (immediate 200 response to NC Talk)
 **Outgoing:** Bot API `POST /ocs/v2.php/apps/spreed/api/v1/bot/{room}/message`
 - Signature: `HMAC-SHA256(secret, random + message_text)` — note: message text, not body
 - Logic lives in `notification.py` (`_send_nct_message`) — shared with proactive notifications
 **Proactive notifications:** Set `notification_room` in `channels.json` → `nextcloud`. Used by distill completion alerts and `message`/`brief` cron jobs.
 **Per-user config (`channels.json`):**
 ```json
 {
  "nextcloud": {
    "persona": "inara",
    "url": "https://cloud.dgrzone.com",
    "bot_secret": "...",
    "notification_room": "<room-token>",
    "timeout": 55
  }
 }
 ```
 Full setup guide: [`docs/NEXTCLOUD_TALK_BOT.md`](../docs/NEXTCLOUD_TALK_BOT.md)
 ---
 ## Google Chat
 Workspace Add-on. Messages arrive as HTTP POST from Google's infrastructure; the handler returns a JSON response synchronously (no background task — Google expects an immediate reply).
 **Incoming:** `POST /channels/google-chat/{username}`
 - Auth: JWT in `authorizationEventObject.systemIdToken`, verified against Google's JWKS
 - Response format: `hostAppDataAction.chatDataAction.createMessageAction`
 **Per-user config (`channels.json`):**
 ```json
 {
  "google_chat": {
    "persona": "inara",
    "audience": "https://cortex.dgrzone.com/channels/google-chat/scott",
    "backend": "claude",
    "timeout": 25
  }
 }
 ```
 Full setup guide: [`docs/GOOGLE_CHAT_BOT.md`](../docs/GOOGLE_CHAT_BOT.md)
 ---
 ## Cron / Proactive Messages
 User-defined scheduled jobs stored in `home/{user}/persona/{name}/CRONS.json`. Registered at startup by `scheduler.py`; manageable via the `cron_*` orchestrator tools.
 **Job types:**
 | Type | What happens |
 |---|---|
 | `remind` | Appends to `REMINDERS.md` — surfaced in context at tier 2+ |
 | `note` | Appends to `SCRATCH.md` — read on demand |
 | `message` | Sends payload text to user's notification channel |
 | `brief` | Runs LLM with payload as prompt, sends response to notification channel |
 **`brief` example — morning briefing:**
 ```json
 {
  "label": "Morning briefing",
  "schedule": "daily:08:00",
  "type": "brief",
  "payload": "Give Scott a brief good morning. Note any pending reminders or tasks due today.",
  "enabled": true
 }
 ```
 **Channel selection for `message`/`brief`:**
 1. `channel` field on the job (if set)
 2. `notification_channel` key in `channels.json`
 3. Auto-detect: uses `nextcloud` if configured
 **Schedule formats:** `hourly` | `daily` | `daily:HH:MM` | `weekly:DOW` | `weekly:DOW:HH:MM`
 ---
 ## Notification Channel Config
 `notification_channel` in `channels.json` sets the default outbound channel for all proactive messages (distill alerts, cron message/brief jobs):
 ```json
 {
  "notification_channel": "nextcloud",
  ...
 }
 ```
 If absent, defaults to `nextcloud` if configured. Currently only NC Talk is supported for outbound; Google Chat outbound is a future item.
 ---
 ## Future Channels
 - **WhatsApp** — Business API or bridge (not started; needs account)
 - **Gitea webhooks** — push/PR/issue events → orchestrator (router pattern exists; add `gitea.py`)
 - **Aether platform events** — trigger agent actions from business data changes
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -0,0 +1,192 @@
 # Architecture: Planned Features
 > What's next and how it's designed to work.
 > Last updated: 2026-04-04
 For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
 ---
 ## 1. Local Orchestrator
 **Status:** High priority — design complete, not yet built.
 Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
 **Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
 **Design:**
 ```
 POST /orchestrate  (mode: "local")
    ↓
 local_orchestrator_engine.py
    • converts tools/ to OpenAI tools format
    • POST /api/chat/completions with tools array
    • parse tool_calls response
    • execute tool, append result
    • loop until finish_reason: "stop"
    ↓
 response returned (local model generates final answer)
 ```
 Model selection:
 - **Gemma 4 E4B** (25 t/s, 72k ctx) — interactive/fast tasks
 - **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
 Context budget per iteration (system prompt + memory + tool results + history):
 - Small model: budget ~40-50k tokens per round
 - Medium model: budget ~35-40k tokens per round
 Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
 ---
 ## 2. Dev Agent Pipeline
 **Status:** Design complete, not yet built.
 Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
 ```
 Task (chat / Gitea issue / Kanban)
    ↓
 Orchestrator — reads relevant files, routes to specialist
    ↓
 Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
 Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
 Human approval gate
    • summary in Cortex UI or NC Talk
    • approve → commit (+ optional push)
    • reject <20><> feedback back to specialist
 ```
 **Specialists** (both Claude CLI):
 - **Frontend** — working dir: `~/OSIT_dev/aether_app_sveltekit/` — runs `svelte-check` after every change
 - **Backend** — working dir: `~/OSIT_dev/aether_api_fastapi/` — runs `py_compile` + unit tests
 **Supervisor** returns structured JSON:
 ```json
 {
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile"],
  "checks_failed": [],
  "review_notes": "...",
  "commit_message": "..."
 }
 ```
 ---
 ## 3. Gitea Integration
 **Status:** Not started. pfSense port forward for SSH already confirmed working.
 - **Webhooks → Cortex:** push/PR/issue events → `POST /webhook/gitea` → orchestrator
  - Router pattern already established; add `cortex/routers/gitea.py`
 - **Gitea Actions CI:** `.gitea/workflows/check.yml` — run `py_compile`/`svelte-check` on push
 - **Cortex → Gitea:** after human approval, call Gitea API to create PR or push branch
 SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
 ---
 ## 4. Knowledge Layer (AE Journals)
 **Status:** Tools exist, import script not yet built.
 AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".
 **Existing tools:** `ae_journal_search`, `ae_journal_entry_create` — already in orchestrator tool suite.
 **Import script (to build):**
 - Walk a markdown directory (Nextcloud, agents_sync docs)
 - Chunk by H2 section
 - Search before creating (deduplication)
 - Tag from frontmatter, filename, directory path
 - Target sources: `~/DgrZone_Nextcloud/`, `~/OSIT_Nextcloud/`
 **Agent workflow:**
 ```
 "Summarize my notes on WireGuard setup"
    → orchestrator calls ae_journal_search("wireguard")
    → returns matching entries
    → Claude synthesizes response
 ```
 ---
 ## 5. Intelligent Model Routing
 **Status:** Deferred. Currently user-toggled.
 Route automatically based on task characteristics rather than requiring manual backend selection:
 | Task type | Backend | Reason |
 |---|---|---|
 | User-facing conversation | Claude | Quality prose, persona fidelity |
 | Tool use / orchestration | Gemini API | Native function calling, free tier |
 | Private / sensitive / offline | Local (Ollama) | No data leaves the network |
 | Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
 | Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
 Routing logic would live in `llm_client.py` or a new `router.py` — map task metadata to backend choice.
 ---
 ## 6. RAG via Open WebUI
 **Status:** Future — Open WebUI already supports it.
 Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via `"files": [{"type": "collection", "id": "..."}]`.
 Would complement AE Journals for local-only contexts where data shouldn't leave the network.
 API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG section.
 ---
 ## 8. Agent Architecture Ideas (from Claude Code leak)
 **Status:** Research — review before building dev agent pipeline and orchestrator.
 The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
 - https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
 - https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself
 **Ideas worth incorporating:**
 **Tiered permission architecture** — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a `brief` cron job accidentally in write mode.
 **Agent lineage tracking** — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.
 **Cost/budget enforcement** — hard token and cost budgets per operation, multiple budget types. `ORCHESTRATOR_MAX_ROUNDS=10` is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).
 **Context compaction/snipping** — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.
 **Nested agent delegation with dependency-aware batching** — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).
 **File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
 **Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
 ---
 ## 7. Permanent Fleet Hosting
 **Status:** Deferred.
 Currently running on `scott_lpt` (main laptop). Long-term target: home server (always-on, Docker).
 `docker-compose.yml` already exists in the project root. Deployment path:
 1. Copy to home server
 2. Configure reverse proxy (Nginx, already Docker-hosted)
 3. Set subdomain `cortex.dgrzone.com` → home server internal IP
 4. WireGuard required for all access — not internet-exposed
--- a/documentation/ARCH__Intelligence_Layer.md
+++ b/documentation/ARCH__Intelligence_Layer.md
@@ -1,306 +1,14 @@
-# Architecture: Intelligence Layer
+# ARCH__Intelligence_Layer.md — Archived
-**Status:** Design phase — not yet implemented
+This document has been split into focused per-topic docs.
 **Last updated:** 2026-03-18
-This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.
+| What you're looking for | New location |
 |---|---|
 | Overall architecture, design decisions | [`ARCH__SYSTEM.md`](ARCH__SYSTEM.md) |
 | Orchestrator/Responder pattern, tool loop | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 1 |
 | Dev agent pipeline, supervisor agent | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 2 |
 | Knowledge layer, AE Journals import | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 4 |
 | LLM backends and routing | [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) |
 | Model routing (future) | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) — section 5 |
---
+*Original content written 2026-03-18. Superseded 2026-04-03.*
 ## Overview
 Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:
 1. **Orchestrator/Responder** — Gemini handles tool use and planning; Claude handles the user-facing response
 2. **Dev Agent Pipeline** — Specialist agents implement code changes; a supervisor checks the work
 3. **Knowledge Layer** — AE Journals becomes the primary knowledge base; agents can read and write it
 These are independent tracks that share the same trigger layer and can be built incrementally.
 ---
 ## 1. Orchestrator / Responder Pattern
 ### The Problem
 Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.
 ### The Pattern
 ```
 User message
    ↓
 Orchestrator (Gemini API)
    • interprets intent
    • decides which tools to call
    • executes tool loop (ReAct: reason → act → observe → repeat)
    • assembles enriched context + tool results
    ↓
 Responder (Claude CLI)
    • receives enriched context
    • writes the user-facing response
    ↓
 User
 ```
 For **direct chat** (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).
 ### Why Gemini API (not CLI)?
 - Gemini CLI is a subprocess; function calling via subprocess is fragile
 - Gemini API (`google-generativeai` SDK) has native structured tool-calling
 - Free tier (Gemini 2.0 Flash) handles orchestration load without cost
 - Access token is short-lived but auto-refreshed by the SDK (no expiry problem)
 ### Tool Strategy
 Tools for the orchestrator are **separate** from the existing `ae_*` MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.
 New orchestrator tools are Python functions wrapped in Gemini function declarations:
 | Tool | What it does | Implementation |
 |---|---|---|
 | `web_search` | DuckDuckGo search | `duckduckgo-search` library |
 | `ae_journal_search` | Search AE Journals via V3 API | HTTP to AE API |
 | `ae_journal_entry_create` | Write a new journal entry | HTTP to AE API |
 | `ae_task_list` | Read Kanban tasks | HTTP to AE API or agents_sync file |
 | `file_read` | Read a file from known safe paths | Python `pathlib` |
 | `gitea_api` | Query Gitea repos, issues, PRs | Gitea REST API |
 Tools are registered in `cortex/tools/` (one file per domain group).
 ### Implementation Path
 ```
 cortex/
  tools/
    __init__.py          — tool registry
    web.py               — web_search
    ae_knowledge.py      — ae_journal_* tools
    ae_tasks.py          — task tools
    gitea.py             — Gitea API tools
  routers/
    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
  orchestrator_engine.py — Gemini tool loop + Claude handoff
 ```
 Endpoint contract:
 ```
 POST /orchestrate
 {
  "task": "What tasks are due this week and summarize my notes on X topic",
  "session_id": "optional — if part of an ongoing conversation",
  "respond_with_claude": true   // false = return Gemini's assembled context only
 }
 → { "job_id": "uuid", "status": "queued" }
 GET /orchestrate/{job_id}
 → { "status": "complete", "result": "...", "tool_calls": [...] }
 ```
 ---
 ## 2. Trigger Layer
 All three capabilities (chat, orchestration, dev agents) share the same trigger layer:
 ```
 ┌────────────────────────────────────────────────┐
 │  TRIGGERS                                      │
 │                                                │
 │  Chat UI  →  POST /chat  (existing)            │
 │  Cron     →  POST /orchestrate  (new)          │
 │  Gitea    →  POST /webhook/gitea  (new)        │
 │  NC Talk  →  POST /webhook/nextcloud  (exists) │
 │  Manual   →  CLI / curl for debugging          │
 └────────────────────────────────────────────────┘
 ```
 Cron trigger example (from existing cron infrastructure):
 ```bash
 curl -X POST http://localhost:8000/orchestrate \
  -H "Content-Type: application/json" \
  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'
 ```
 This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.
 ---
 ## 3. Dev Agent Pipeline
 ### The Goal
 Accept a plain-English task like *"Fix the bug where X, add a test for it"* and produce:
 - A working code change
 - Passing syntax/type checks
 - A summary of what changed and what still needs human review
 - A commit ready to push (pending approval)
 ### Architecture
 ```
 Task request (chat / Gitea issue / Kanban)
    ↓
 Orchestrator
    • reads relevant files (context gathering)
    • routes to correct specialist
    ↓
 Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
 Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
 Human approval gate
    • summary shown in Cortex UI or NC Talk
    • user approves → commit + optional push
    • user rejects → feedback goes back to specialist
 ```
 ### Specialist Agents
 Two initial specialists, both using Claude CLI:
 **Frontend specialist** (working dir: `~/OSIT_dev/aether_app_sveltekit/`):
 - Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
 - Runs `npx svelte-check` after every change — no exceptions
 - Atomic commits (one component or fix per commit)
 **Backend specialist** (working dir: `~/OSIT_dev/aether_api_fastapi/`):
 - Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
 - Runs `python3 -m py_compile` after every file edit
 - Runs unit tests before declaring done
 - Flags E2E tests that need human review
 ### Supervisor Agent
 The supervisor is a separate Claude invocation that receives:
 - The diff of all changed files
 - Stdout/stderr from all checks that were run
 - The original task description
 It returns a structured assessment:
 ```json
 {
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile", "unit_tests"],
  "checks_failed": [],
  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
  "commit_message": "fix: correct session token validation in auth middleware"
 }
 ```
 ### Gitea Integration
 - **Gitea webhooks → Cortex:** Push/PR events trigger supervisor review automatically
 - **Gitea Actions:** Run `py_compile`/`svelte-check` on every push (simple CI, no custom runner)
 - **Cortex → Gitea:** After human approval, supervisor calls Gitea API to create PR or push
 Gitea Actions are simpler than they sound — a `.gitea/workflows/check.yml` is just a YAML file that runs shell commands on push. No external CI infrastructure needed.
 ---
 ## 4. Knowledge Layer
 ### The Goal
 AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.
 ### Import Strategy
 1. **Don't bulk-import blindly.** The orchestrator searches AE Journals before creating anything (deduplication).
 2. **Chunk by section.** A large markdown file becomes multiple journal entries — one per H2 section.
 3. **Preserve provenance.** Each imported entry includes source path, import date, and original file date in its `data_json` or notes.
 4. **Tag intelligently.** Tags come from: frontmatter, filename keywords, directory path, and content analysis.
 ### Source Priority
 | Source | Priority | Notes |
 |---|---|---|
 | `~/DgrZone_Nextcloud/` | High | Personal notes, projects |
 | `~/OSIT_Nextcloud/` | High | Business docs |
 | `~/agents_sync/aether/docs/` | Medium | Platform specs (already structured) |
 | OpenClaw session logs | Low | Historical, lots of noise |
 ### Agent Workflow
 ```
 "Summarize my notes on WireGuard setup"
    ↓
 Orchestrator calls ae_journal_search("wireguard")
    ↓
 Returns matching entries
    ↓
 Claude synthesizes a response
 ```
 ```
 "Save this as a note in my DgrZone journal"
    ↓
 Orchestrator calls ae_journal_entry_create(
    journal="DgrZone General",
    title="...",
    content="...",
    tags=["note", "wireguard"]
 )
 ```
 ### Context Tiers (Inara Memory)
 The existing distill system (`MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`) handles working memory. The Knowledge Layer is complementary — it's the **searchable long-term archive**, not the rolling context window. Agents should:
 - Use memory files for "what have we been working on lately"
 - Use AE Journals search for "what do I know about topic X"
 ---
 ## 5. Model Routing (Future)
 Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:
 | Task type | Model | Reason |
 |---|---|---|
 | User-facing conversation | Claude | Quality prose, reasoning |
 | Tool use / orchestration | Gemini API | Native function calling, free |
 | Private / sensitive | Ollama (local) | No data leaves the network |
 | Long context (>100k tokens) | Gemini 2.0 | 1M token context window |
 | Code generation | Claude | Strong code quality |
 Routing logic lives in `cortex/orchestrator_engine.py` — a simple function that maps task metadata to a backend choice.
 ---
 ## Implementation Order (Recommended)
 1. **Orchestrator Phase 1** — Gemini API integration, basic tool loop, `/orchestrate` endpoint
   - Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
 2. **Knowledge import** — markdown → AE Journal Entries tool + import script
   - Unlocks: searchable knowledge base for all agents
 3. **Dev agent pipeline** — Frontend + Backend specialist agents
   - Unlocks: AI-assisted development with supervisor review
 4. **Gitea integration** — webhook receiver + Actions CI
   - Unlocks: event-driven automation, PR workflow
 5. **Intelligent routing** — model selection by task type
   - Polish: cost and quality optimization
 ---
 ## Key Design Decisions
 | Decision | Choice | Rationale |
 |---|---|---|
 | Orchestrator model | Gemini API (not CLI) | Native tool calling; free tier |
 | Responder model | Claude CLI (Pro sub) | Quality output; no API cost |
 | Direct chat bypass | Yes | Don't add latency when tools aren't needed |
 | Tool set | Separate from ae_* MCPs | ae_* tools are stable; don't risk breaking active agents |
 | Dev agents | Claude CLI in project dir | CLAUDE.md + project context already in place |
 | Human approval gate | Required before commit | Agents can propose; humans decide |
 | Knowledge primary source | AE Journals | Already exists, structured, searchable |
--- a/documentation/ARCH__PERSONA.md
+++ b/documentation/ARCH__PERSONA.md
@@ -0,0 +1,121 @@
 # Architecture: Persona System & Memory
 > How Inara (and other personas) know who they are and what they remember.
 > Last updated: 2026-04-03
 ---
 ## Filesystem Layout
 Each persona lives in `home/{username}/persona/{name}/`:
 ```
 home/scott/persona/inara/
  IDENTITY.md       Who Inara is — role, name, origin
  SOUL.md           Values, personality, voice, what she cares about
  PROTOCOLS.md      Behavioral rules — how she responds, what she avoids
  CONTEXT_TIERS.md  Documents which files load at each tier
  USER.md           Scott's profile — loaded into context so she knows who she's talking to
  HELP.md           Persona-specific help content (appended to shared HELP.md in UI)
  MEMORY_SHORT.md   Recent session digest (auto-distilled daily)
  MEMORY_MID.md     Mid-term summary (auto-distilled weekly)
  MEMORY_LONG.md    Long-term memory (auto-distilled monthly)
  REMINDERS.md      Pending reminders (auto-surfaced at tier 2+)
  SCRATCH.md        Ephemeral scratchpad (read/write via tools)
  TASKS.json        Personal task list (managed via tools)
  CRONS.json        Scheduled jobs (managed via tools)
  sessions/         Session turn logs — YYYY-MM-DD.md, one file per day
 ```
 **ContextVars:** `persona.py` sets `_user` and `_persona` ContextVars per request. Everything downstream calls `persona_path()` to resolve the right directory — no globals, no thread-local state.
 ---
 ## Context Tiers
 Each chat request specifies a tier (default: 2). Higher tiers load more context — slower but richer.
 | Tier | Loaded Files | Use case |
 |---|---|---|
 | 1 | IDENTITY.md | Minimal — lightweight tasks |
 | 2 | + SOUL.md, PROTOCOLS.md, USER.md, MEMORY_SHORT.md, MEMORY_MID.md, REMINDERS.md | Standard chat |
 | 3 | + MEMORY_LONG.md, CONTEXT_TIERS.md | Deep sessions, long tasks |
 | 4 | + SCRATCH.md, TASKS.json | Full state — agent mode |
 `context_loader.py` assembles the system prompt from these files in order. The resulting prompt is passed to whichever LLM backend handles the request.
 ---
 ## Memory Distillation
 Three-tier rolling memory system, run by APScheduler:
 ```
 sessions/YYYY-MM-DD.md  ← raw session logs (written by session_logger.py)
        ↓ daily 03:00
 MEMORY_SHORT.md         ← recent session digest (no LLM — pure aggregation)
        ↓ weekly Sun 03:30
 MEMORY_MID.md           ← concise summary (LLM)
        ↓ monthly 1st 04:00
 MEMORY_LONG.md          ← integrated long-term memory (LLM)
 ```
 **Short distill** — reads the most recent session files that fit within the token budget, writes them in chronological order. No LLM involved — fast and cheap.
 **Mid distill** — LLM summarizes MEMORY_SHORT into a concise digest. Prompt asks for recurring themes, decisions, ongoing projects, Scott's current state and priorities. Written in first person as Inara.
 **Long distill** — LLM integrates MEMORY_MID into MEMORY_LONG. Rules: preserve historical facts, update stale info, absorb new themes, remove irrelevant entries.
 **Distill notifications** — after mid and long runs, `notification.py` sends a message to the user's configured NC Talk notification room (if `notification_room` is set in `channels.json`).
 **Controls** in `.env`:
 ```
 AUTO_DISTILL=true
 AUTO_DISTILL_SHORT=true
 AUTO_DISTILL_MID=true
 AUTO_DISTILL_LONG=true          # off by default — first run warrants manual review
 DISTILL_BACKEND_MID=local       # use local model to save API credits
 DISTILL_BACKEND_LONG=           # empty = primary backend (claude recommended)
 MEMORY_BUDGET_SHORT=3000        # token budgets (soft caps)
 MEMORY_BUDGET_MID=2000
 MEMORY_BUDGET_LONG=2000
 ```
 Manual distill via API:
 ```
 POST /distill/short
 POST /distill/mid
 POST /distill/long
 GET  /distill/status
 ```
 ---
 ## Adding a New Persona
 `persona_template.py` bootstraps a new persona directory from string templates. The onboarding flow (`/setup/persona`) calls this when a new user creates their first persona.
 To add one manually:
 1. Create `home/{username}/persona/{name}/`
 2. Copy and edit the files from an existing persona (e.g. `home/scott/persona/inara/`)
 3. At minimum: `IDENTITY.md`, `SOUL.md`, `PROTOCOLS.md`, `USER.md`
 4. The distiller will create the `MEMORY_*.md` files on first run
 ---
 ## Session Search
 Past sessions are searchable via `GET /sessions/search?q=...&user=...&persona=...`.
 Available in the UI via the search box at the bottom of the Files panel (open with the Files button). Results are grouped by date with highlighted excerpts.
 ---
 ## Active Personas
 | User | Persona | Description |
 |---|---|---|
 | scott | inara | Scott's primary assistant |
 | scott | developer | Dev-focused persona |
 | holly | tina | Holly's primary assistant |
 | brian | wintermute | Brian's primary assistant |
--- a/documentation/ARCH__SYSTEM.md
+++ b/documentation/ARCH__SYSTEM.md
@@ -0,0 +1,90 @@
 # Architecture: System Overview
 > How the pieces fit together.
 > Last updated: 2026-04-03
 ---
 ## Architecture Diagram
 ```
 ┌─────────────────────────────────────────────────────────┐
 │  INPUT CHANNELS                                         │
 │                                                         │
 │  Web UI ──────────────────────────────────────────┐     │
 │  Nextcloud Talk ──── POST /webhook/nextcloud/{u} ─┤     │
 │  Google Chat ─────── POST /channels/google-chat/{u}┤    │
 │  Cron / Scheduler ─────────────────────────────────┤    │
 │  Webhooks (future) ─────────────────────────────────┘   │
 └─────────────────────────────┬───────────────────────────┘
                              ↓
 ┌─────────────────────────────────────────────────────────┐
 │  CORTEX DISPATCHER  (FastAPI — cortex/)                 │
 │                                                         │
 │  auth_middleware.py  → validates JWT session cookie     │
 │  persona.py          → resolves user + persona context  │
 │  context_loader.py   → assembles system prompt (tier 1-4)│
 │                                                         │
 │  POST /chat          → direct LLM, streaming SSE        │
 │  POST /orchestrate   → Gemini tool loop → Claude        │
 │  GET  /orchestrate/{id} → poll job result               │
 └────────────┬───────────────────┬────────────────────────┘
             ↓                   ↓
 ┌─────────────────┐   ┌──────────────────────────────────┐
 │  LLM BACKENDS   │   │  PERSONA DATA                    │
 │                 │   │  home/{user}/persona/{name}/      │
 │  Claude CLI     │   │                                  │
 │  Gemini CLI     │   │  IDENTITY.md  SOUL.md            │
 │  Gemini API     │   │  PROTOCOLS.md MEMORY_*.md        │
 │  Local (httpx)  │   │  USER.md  REMINDERS.md           │
 │                 │   │  TASKS.json  CRONS.json          │
 └─────────────────┘   │  sessions/  SCRATCH.md          │
                      └──────────────────────────────────┘
 ```
 Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__PERSONA.md) | [`ARCH__CHANNELS.md`](ARCH__CHANNELS.md)
 ---
 ## Service Layout (`cortex/`)
 | File | Purpose |
 |---|---|
 | `main.py` | App entry point, router registration |
 | `config.py` | All settings (pydantic-settings, reads `.env`) |
 | `persona.py` | User + persona path resolution, ContextVars |
 | `context_loader.py` | Builds system prompt from persona files (tiers 1–4) |
 | `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
 | `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
 | `session_store.py` | In-memory + file session persistence |
 | `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
 | `memory_distiller.py` | Short/mid/long distill jobs |
 | `scheduler.py` | APScheduler — distill jobs + user crons |
 | `cron_runner.py` | Cron job storage, schedule parsing, execution |
 | `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
 | `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
 | `auth_middleware.py` | JWT cookie validation on all routes |
 | `user_settings.py` | Per-user local LLM config (hosts, models, active model) |
 | `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
 | `email_utils.py` | SMTP invite emails |
 | `persona_template.py` | Bootstrap a new persona directory from templates |
 | `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
 | `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
 | `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` |
 | `tests/` | pytest suite (80 tests) |
 ---
 ## Key Design Decisions
 **Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
 **Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
 **Local backend via httpx** — Open WebUI's OpenAI-compatible API (`/api/chat/completions`). No CLI wrapper. Per-user host + model config in `local_llm.json`.
 **ContextVars for async isolation** — `persona.py` uses Python `contextvars.ContextVar` so concurrent requests each see their own user/persona without thread-local hacks.
 **Per-user filesystem layout** — `home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
 **No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
--- a/documentation/MASTER.md
+++ b/documentation/MASTER.md
@@ -0,0 +1,92 @@
 # Cortex / Inara — Master Index
 > Start here. This document is a map, not a manual.
 > Last updated: 2026-04-03
 ---
 ## What It Is
 Cortex is a self-hosted personal AI platform. It routes messages from any input channel to AI backends, manages a resident agent (Inara) with persistent memory, and coordinates across a fleet of machines. It is infrastructure, not a product.
 **Running at:** `https://cortex.dgrzone.com` | `systemctl --user restart cortex`
 ---
 ## Current State
 | Component | Status | Notes |
 |---|---|---|
 | Web UI | ✅ Live | SPA, dark theme, mobile-responsive, session auth |
 | Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing |
 | Google Chat Add-on | ✅ Live | JWT-verified, per-user routing |
 | Claude backend | ✅ Live | Primary — via Claude Code CLI |
 | Gemini backend | ✅ Live | Fallback — via Gemini CLI |
 | Local backend | ✅ Live | Third option — Open WebUI/Ollama on scott_gaming |
 | Gemini orchestrator | ✅ Live | Tool loop → Claude response, Agent mode in UI |
 | Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
 | Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas |
 | Session search | ✅ Live | Full-text search across past session logs |
 | Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk |
 **Active users / personas:** scott/inara, scott/developer, holly/tina, brian/wintermute
 ---
 ## Document Map
 ### Project-Level
 | Doc | What it covers |
 |---|---|
 | **This file** | Index and current state |
 | [`CORTEX.md`](../CORTEX.md) | Vision, philosophy, "what it is and isn't" |
 | [`ROADMAP.md`](ROADMAP.md) | Phases — what's done, what's next, what's deferred |
 | [`TODO__Agents.md`](TODO__Agents.md) | Active task list — read before starting work |
 ### Architecture
 | Doc | What it covers |
 |---|---|
 | [`ARCH__SYSTEM.md`](ARCH__SYSTEM.md) | Overall architecture, component map, key design decisions |
 | [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | LLM backends, routing, fallback, per-user config |
 | [`ARCH__PERSONA.md`](ARCH__PERSONA.md) | Persona system, context tiers, memory distillation |
 | [`ARCH__CHANNELS.md`](ARCH__CHANNELS.md) | Input channels — web, NC Talk, Google Chat, cron |
 | [`ARCH__FUTURE.md`](ARCH__FUTURE.md) | Planned: local orchestrator, dev agents, knowledge layer |
 ### Setup & Reference
 | Doc | What it covers |
 |---|---|
 | [`docs/NEXTCLOUD_TALK_BOT.md`](../docs/NEXTCLOUD_TALK_BOT.md) | NC Talk bot setup and troubleshooting |
 | [`docs/GOOGLE_CHAT_BOT.md`](../docs/GOOGLE_CHAT_BOT.md) | Google Chat Add-on setup |
 | [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) | Open WebUI/Ollama API reference for local model work |
 ### Code-Level
 | Doc | What it covers |
 |---|---|
 | [`CLAUDE.md`](../CLAUDE.md) | Project instructions for Claude Code — directory map, run commands, design decisions |
 | [`README.md`](../README.md) | Project root orientation, quick-start, user management |
 | [`cortex/static/HELP.md`](../cortex/static/HELP.md) | In-app help (rendered in UI for all users) |
 ---
 ## Quick Reference
 **Start the service / check logs**
 ```bash
 systemctl --user restart cortex
 journalctl --user -u cortex -f
 ```
 **Syntax check before restart**
 ```bash
 python3 -m py_compile cortex/<file>.py
 ```
 **Add a user**
 ```bash
 cd cortex && .venv/bin/python manage_passwords.py invite <username> <email>
 ```
 **Run tests**
 ```bash
 cd cortex && .venv/bin/python -m pytest tests/ -q
 ```
--- a/documentation/ROADMAP.md
+++ b/documentation/ROADMAP.md
@@ -0,0 +1,71 @@
 # Cortex — Roadmap
 > Phases and priorities. For active tasks see `TODO__Agents.md`.
 > Last updated: 2026-04-03
 ---
 ## Phase 0 — Foundation ✅
 - Syncthing fleet sync (`agents_sync/`) operational
 - MCP tools (`ae_*`) available in all Claude Code sessions
 - Fleet agents running independently on each machine
 ## Phase 1 — Dispatcher Core ✅
 - FastAPI service with streaming SSE responses
 - Claude CLI and Gemini CLI subprocess backends
 - Session context management (rolling window, file persistence)
 - Nextcloud Talk bot (HMAC-signed webhook)
 - Memory distiller (APScheduler — short/mid/long cycles)
 - Local web UI (single-page, mobile-responsive)
 - Auth status monitoring (`/auth/status`, UI banner)
 - Session logging and file browser
 ## Phase 2 — Identity & Multi-User ✅
 - Inara persona formalized (`IDENTITY.md`, `SOUL.md`, `PROTOCOLS.md`, context tiers)
 - Two-level user/persona layout (`home/{user}/persona/{name}/`)
 - Session auth: bcrypt passwords, JWT cookies, invite tokens, Google OAuth
 - Multi-user live: Scott, Holly, Brian
 - Per-user channel config (`channels.json`)
 - Per-user Gemini API key (settings UI)
 - Help & Reference system (shared base + per-persona additions)
 - Lucide icons, persona picker page, session persistence across navigation
 ## Phase 3 — Intelligence Layer (In Progress)
 - ✅ Gemini API orchestrator (tool loop → Claude responder)
 - ✅ Tool suite: web search, AE Journal read/write, tasks, scratch, reminders, cron, system
 - ✅ Agent mode in UI (async job, poll for result)
 - ✅ Local LLM backend (Open WebUI/Ollama, per-user multi-model config)
 - ✅ Proactive cron (`message` / `brief` job types → NC Talk)
 - ✅ Session search (full-text across past session logs)
 - ✅ Distill notifications (NC Talk after mid/long runs)
 - ✅ Local backend for distillation (DISTILL_BACKEND_MID/LONG in .env)
 - [ ] **Local orchestrator** — ReAct tool loop using local model (High priority — see `TODO__Agents.md`)
 - [ ] Knowledge import — markdown → AE Journals (import script)
 - [ ] Dev agent pipeline — specialist agents + supervisor + approval gate
 - [ ] Gitea webhook integration + Actions CI
 ## Phase 4 — Channel Expansion
 - ✅ Web UI
 - ✅ Nextcloud Talk
 - ✅ Google Chat
 - [ ] WhatsApp (Business API or bridge — investigating)
 - [ ] Webhook triggers from Aether platform events
 ## Phase 5 — Routing Intelligence & Scale
 - [ ] Intelligent model routing (by task type, privacy, context length)
 - [ ] Agent-to-agent task delegation across fleet
 - [ ] Permanent hosting on home server (currently on `scott_lpt`)
 ## Phase 6 — Infrastructure
 - [ ] Server DMZ finalized
 - [ ] WireGuard for all Cortex-accessing devices
 - [ ] Camera/IoT VLAN segmentation
 ---
 ## Deferred / Watching
 - **Unsloth Gemma 4 GGUFs** — blocked on Ollama v0.20.1 (llama.cpp GGUF metadata issue); switch `agent-support-gemma-*` aliases to Unsloth Q4_K_M when ready
 - **Speculative decoding** — llama.cpp supports it (E4B + E2B draft ≈ 2x speed); Ollama does not yet
 - **RAG via Open WebUI** — feed Nextcloud docs into local knowledge collections; possible complement to AE Journals search
 - **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
 - **WhatsApp** — requires Business API account or a bridge; not started
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -7,16 +7,21 @@
 ## 🔴 High Priority
-### [Backend] Ollama local model backend
+### [Local] Tool-capable local orchestrator
- Add Ollama as a third LLM backend option (direct Ollama API, no CLI wrapper)
+Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
- Endpoint: `http://scott-gaming:<port>/api/` (WireGuard)
+a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
- Model selection: configurable per-request or per-session
+Gemini API orchestrator for private/offline tasks.
 - Auth status check: ping `/api/tags` to confirm reachability
-### [Testing] Gitea SSH port 2222 ✅ — 2026-03-29
+- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
- pfSense WAN → 192.168.32.7:2222 port forward confirmed working
+      `FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
+- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
- Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
+      append result → loop until `finish_reason: stop`
 - [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
 - [ ] UI: Agent mode button routes to local orchestrator when local backend active
 - [ ] Recommended models (scott_gaming, 8 GB VRAM):
      Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
      Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
 - Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
 ---
@@ -30,15 +35,22 @@ See `ARCH__Intelligence_Layer.md` for full design.
 - [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
 - [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
-### [Distill] Monitor first auto_distill_long run
+### [Distill] Review first auto_distill_long output — 2026-04-01
- Scheduled for ~April 1 at 04:00
+- Ran April 1 at 04:00 as scheduled
- Manually review `inara/MEMORY_LONG.md` output before fully trusting
+- Manually review `inara/MEMORY_LONG.md` — confirm quality before fully trusting
- Adjust distill prompts if needed
+- Adjust distill prompts in `cortex/memory_distiller.py` if needed
 ### [Distill] Distill quality review
 - Short/mid/long distill prompts live in `cortex/memory_distiller.py`
 - After first few automatic runs, review quality and tune
 ### [Local] Unsloth Gemma 4 variants
 - Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0
 - Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
 - Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
 - Expected speedup: ~10–20% smaller context footprint vs baseline, same quality
 - `agent-support-gemma-small` → Unsloth E4B Q4_K_M; `agent-support-gemma-medium` → Unsloth 26B A4B Q4_K_M
 ---
 ## 🟢 Lower Priority / Future
@@ -61,15 +73,49 @@ See `ARCH__Intelligence_Layer.md`. Full design not yet started.
 - `cortex/routers/` already has pattern; add `gitea.py`
 - Gitea Actions (CI) for "run tests on push" — simpler than custom runner
 ### [Local] RAG via Open WebUI
 Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections →
 reference in chat). Could feed Nextcloud docs or session logs into a local knowledge
 base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`.
 - `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
 - Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
 ### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback
+- Currently hardcoded: Claude default, Gemini fallback, local third
- Future: route by task type (code → Claude, search → Gemini, private → Ollama)
+- Design direction (now informed by real local model perf):
- Future: route by context length (Gemini 2.0 has 1M token context)
+  - **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
  - **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
  - **Final user-facing responses** → Claude (quality prose, persona fidelity)
 - Future: auto-route by task type rather than requiring user to toggle backend manually
 ---
 ## ✅ Completed
 ### [Local] Per-user multi-model local LLM settings — 2026-04-01
 - `home/{username}/local_llm.json` — `hosts[]` + `models[]` + `active_model_id` structure
 - `cortex/user_settings.py` — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
 - `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` — dedicated `/settings/local` page
 - "Fetch models from host" button — proxied via `/api/local-llm/fetch-models`, populates dropdown
 - Active model shown in UI near backend toggle button (amber hint text)
 - Migrates old flat `.env`-style config automatically on first use
 ### [UI] Copy button for user (sent) messages — 2026-04-01
 - Added matching copy-on-hover button to user messages (same pattern as assistant messages)
 - `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline
 ### [Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01
 - OpenAI-compatible API via `httpx` — no CLI wrapper needed
 - Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env`
 - Backend toggle cycles `claude → gemini → local` (amber color in UI)
 - `/auth/status` includes local reachability check (`GET /api/models`)
 - Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly
 ### [Testing] Gitea SSH port 2222 — 2026-03-29
 - pfSense WAN → 192.168.32.7:2222 port forward confirmed working
 - `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
 - Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
 ### [Multi-user] Brian onboarding — 2026-03-29
 - Invite sent to `memedrift@gmail.com`
 - Brian completed onboarding, created `wintermute` persona