feat: audit log, usage tracking UI, OpenAI orchestrator compaction, onboarding + docs

Tool audit log: - Every orchestrator tool call logged to home/{user}/tool_audit/YYYY-MM-DD.jsonl - Files panel sidebar: audit log group (collapsed), date-linked read-only table - Admin endpoints: /api/audit/files, /api/audit/day, /api/audit/recent, /api/audit/stats - Engine and model name recorded per entry OpenAI orchestrator improvements: - Context budget enforcement: 75% of model context_k (min 16k) - Message compaction: truncates old tool results when approaching budget - max_rounds respected per model config (intersected with server cap) OpenRouter onboarding (setup.html, onboarding.py, app.js, settings.html): - Step 3 of 3: /setup/model with curated model picker - Chat banner for users on server-default model (informational, not alarmist) - Settings quick-link card; /setup/model works standalone for existing users Model registry + session store: - set_role_config / get_role_config for per-role tool lists and system_append - session_store: session rename, session name backfill endpoint UI updates (app.js, index.html, style.css, local_llm.html): - Role toggle in context panel - Off-the-record mode - Agent notes read-only viewer - OPERATIONS.md loaded at T2+ in context Documentation: - HELP.md: full tool table, per-role tool sets, Agent Notes, usage tracking - TOOLS.md: Agent Notes section, count corrected to 44 - ARCH__SYSTEM.md, ARCH__BACKENDS.md, MASTER.md updated to match reality - CLAUDE.md: onboarding flow, documentation philosophy sections - README.md: stack in practice, DeepSeek TUI mention, architecture diagram updated - TODO__Agents.md: onboarding task completed with deviation notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 21:26:43 -04:00
parent c02d2462b0
commit f8f7cd75da
25 changed files with 1088 additions and 151 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -146,8 +146,8 @@ http://localhost:8000/docs
 - Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables
 ### Context / Memory
- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–3)
+- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–4)
- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full
+- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = + last 2 sessions; Tier 4 = + last 7 sessions
 - Memory files are written by the distiller or manually — do not delete them
 ### Security / Safety
@@ -160,6 +160,31 @@ http://localhost:8000/docs
 - Passwords are bcrypt-hashed and stored in `home/{username}/auth.json` — never in `.env` or the DB
 - Invite tokens are one-time-use, 72-hour expiry, stored in `home/{username}/invite.json`
 ### Onboarding Flow
 New users follow a three-step setup before reaching the chat:
 1. `GET /setup/{token}` → password form → `POST /setup/{token}` sets password + session cookie
 2. `GET /setup/persona` → persona creation form → `POST /setup/persona` bootstraps persona directory
 3. `GET /setup/model` → OpenRouter quick-connect → `POST /setup/model` saves host + model + role assignment
 Step 3 is optional (skip link goes straight to `/{user}/{persona}`). `/setup/model` also works
 standalone (accessible from Settings) for existing users who haven't configured a model.
 All in `cortex/routers/onboarding.py`. Model writes use `model_registry.py`: `save_host()`,
 `save_model()`, `set_role(username, "chat", "primary", model_id)`.
 ### Documentation Philosophy
 Cortex is a no-black-box system. Docs must match reality — at all times.
 - **Docs first:** When planning significant changes, update `TODO__Agents.md` and the relevant
  `ARCH__*.md` to describe the intended design *before* implementing. This creates a spec to
  implement against.
 - **Verify after:** Once implementation is complete, re-read the pre-written docs and confirm
  they match what was actually built. Update anything that drifted.
 - **HELP.md is a user contract:** It describes what users can do. Never let it describe
  features that don't exist or omit features that do.
 - **CLAUDE.md + ARCH__*.md are the developer contract:** Update them as the architecture evolves.
 - **Stale docs are bugs.** If you notice drift, fix it before moving on.
 ---
 ## Adding a New Tool
@@ -212,19 +237,23 @@ clearly asked for a directory to be unblocked.
 ---
-## Current State (2026-04-28)
+## Current State (2026-05-06)
 Cortex is running and stable. All channels are live:
 | Channel | Status | Notes |
 |---|---|---|
-| Web UI | ✅ Live | `https://cortex.dgrzone.com` |
+| Web UI | ✅ Live | `https://cortex.dgrzone.com` — PWA-installable |
 | Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply |
 | Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format |
-| Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config |
+| Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming, per-user multi-model config |
-| Orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI |
+| Gemini orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI |
 | Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; fires when orchestrator role → local model |
 | Tool audit log | ✅ Live | Every tool call logged to `home/{user}/tool_audit/YYYY-MM-DD.jsonl` |
 | Token usage tracking | ✅ Live | Per-user `home/{user}/usage.json`; summary in Settings |
 | Web push | ✅ Live | VAPID push notifications; `web_push` tool; subscribe via ☰ menu |
-Active users: scott (inara, developer), holly (tina), brian (wintermute)
+Active users: scott (inara), holly (tina), brian (wintermute)
 **40 orchestrator tools:** web_search, http_fetch,
 file_read/list/write, shell_exec, claude_allow_dir,
--- a/README.md
+++ b/README.md
@@ -10,6 +10,43 @@ Cortex is a self-hosted multi-agent AI platform. It supports multiple users, eac
 ---
 ## Where Cortex Fits
 AI tools aren't one-size-fits-all. Cortex exists in a specific niche — it's not trying to be everything.
 **Cortex is a self-hosted persona platform.** It gives you a persistent AI companion with its own
 identity, memory, and voice — reachable through your chat apps, not just a browser tab. It remembers
 who you are across days and weeks. It can proactively message you on a schedule. It runs on your
 own hardware, behind your own auth.
 ### What Cortex is good at
 - **Being a consistent AI presence** — same persona, same memory, day after day
 - **Multi-channel access** — web, Nextcloud Talk, Google Chat, all routed to the same brain
 - **Proactive work** — scheduled messages, reminders, cron jobs that reach out to you
 - **Multi-user households** — each person gets their own persona (Scott → Inara, Holly → Tina)
 - **Private, offline-capable** — local models via Ollama when you don't want anything leaving the LAN
 ### What Cortex is not
 - **Not a coding assistant.** Cortex lives in chat apps, not in your terminal or IDE.
  Use Claude Code, DeepSeek TUI, Gemini CLI, or Copilot for code-level work — they specialize in reading and
  editing project files. Cortex can't open a codebase.
 - **Not a generic LLM chat UI.** Open WebUI and LibreChat are excellent model-switching frontends.
  Cortex isn't a frontend — it's a platform with its own identity system, orchestrator, and memory
  pipeline. Two different jobs.
 - **Not a SaaS product.** Nobody else hosts your Cortex instance. Nobody else sees your conversations.
  The trade-off is you manage the service yourself — `systemctl --user restart cortex`.
 - **Not an agent framework.** LangChain, CrewAI, and similar are libraries for building AI pipelines.
  Cortex is a running service with concrete personas, not an abstraction layer to build on top of.
 ### The stack in practice
 - Use **Cortex** to talk to Inara — daily assistant, memory keeper, scheduled check-ins
 - Use **Claude Code / DeepSeek TUI** to work *on* Cortex — code edits, architecture, debugging
 - Use **Open WebUI** when you want to test a new model or run a quick prompt without persona context
 Same AI, different interfaces for different jobs.
 ---
 ## Quick Orientation
 | Directory | What it is |
--- a/cortex/main.py
+++ b/cortex/main.py
@@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag
 from config import settings
 from auth_middleware import SessionAuthMiddleware
 from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator
-from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit
+from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit, usage
@asynccontextmanager
@@ -36,6 +36,7 @@ app.include_router(auth.router)
 app.include_router(orchestrator.router)
 app.include_router(push.router)
 app.include_router(audit.router)
 app.include_router(usage.router)
 # Static files — must be mounted BEFORE ui.router so /static/* is matched first.
 # ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc.
--- a/cortex/model_registry.py
+++ b/cortex/model_registry.py
@@ -36,6 +36,7 @@ V2 Schema:
        "credential_id":str | null,     # claude_cli only — references providers.anthropic.credentials
        "account_id":   str | null,     # gemini_api only — references providers.google.accounts
        "context_k":    int,            # context window in k tokens (informational)
        "max_rounds":   int | null,     # per-model tool-loop cap; null = use orchestrator_max_rounds global
        "tags":         [str],          # user-defined capability tags
      },
    ],
@@ -642,7 +643,9 @@ def remove_host(username: str, host_id: str) -> bool:
 def save_model(username: str, model_id: str | None, host_id: str,
               label: str, model_name: str, context_k: int = 0,
-               tags: list[str] | None = None) -> str:
+               tags: list[str] | None = None,
               max_rounds: int | None = None,
               tools: bool = True) -> str:
    """Create or update a local_openai model entry. Returns the model ID."""
    data = _load(username)
    tags = tags or []
@@ -654,6 +657,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
                m["label"]      = label.strip() or model_name.strip()
                m["model_name"] = model_name.strip()
                m["context_k"]  = context_k
                m["max_rounds"] = max_rounds
                m["tools"]      = tools
                m["tags"]       = tags
                _save(username, data)
                return model_id
@@ -668,6 +673,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
        "provider":   "local",
        "host_id":    host_id,
        "context_k":  context_k,
        "max_rounds": max_rounds,
        "tools":      tools,
        "tags":       tags,
    })
    _save(username, data)
@@ -679,7 +686,9 @@ def save_cloud_model(username: str, model_id: str | None,
                     account_id: str | None = None,
                     credential_id: str | None = None,
                     context_k: int = 0,
-                     tags: list[str] | None = None) -> str:
+                     tags: list[str] | None = None,
                     max_rounds: int | None = None,
                     tools: bool = True) -> str:
    """
    Create or update an Anthropic or Google model entry. Returns the model ID.
@@ -698,6 +707,8 @@ def save_cloud_model(username: str, model_id: str | None,
        "model_name": model_name.strip(),
        "provider":   provider,
        "context_k":  context_k,
        "max_rounds": max_rounds,
        "tools":      tools,
        "tags":       tags,
    }
    if account_id:
--- a/cortex/openai_orchestrator.py
+++ b/cortex/openai_orchestrator.py
@@ -273,18 +273,20 @@ async def _run_from_messages(
    final_response = ""
    budget = _context_budget(model_cfg)
-    for round_num in range(starting_round, settings.orchestrator_max_rounds):
+    per_model_limit = (model_cfg or {}).get("max_rounds") or settings.orchestrator_max_rounds
    effective_limit = min(per_model_limit, settings.orchestrator_max_rounds)
    for round_num in range(starting_round, effective_limit):
        messages = _compact_messages(messages, budget)
        est = _estimate_tokens(messages)
        logger.info("OpenAI orchestrator round %d / %d  model=%s  ~%d tokens",
-                    round_num + 1, settings.orchestrator_max_rounds, model_name, est)
+                    round_num + 1, effective_limit, model_name, est)
-        response = await client.chat.completions.create(
+        call_kwargs: dict = {"model": model_name, "messages": messages}
-            model=model_name,
+        if active_tools:
-            messages=messages,
+            call_kwargs["tools"] = active_tools
-            tools=active_tools,
+            call_kwargs["tool_choice"] = "auto"
-            tool_choice="auto",
+        response = await client.chat.completions.create(**call_kwargs)
        )
        choice = response.choices[0]
        msg = choice.message
@@ -339,12 +341,11 @@ async def _run_from_messages(
                    tool_call_log.append({"tool": pt["name"], "args": pt["args"], "result": "[awaiting confirmation]"})
                    messages.append({"role": "tool", "tool_call_id": pt["tool_call_id"], "content": placeholder})
-                conf_resp = await client.chat.completions.create(
+                messages = _compact_messages(messages, budget)
-                    model=model_name,
+                conf_call: dict = {"model": model_name, "messages": messages, "tool_choice": "none"}
-                    messages=messages,
+                if active_tools:
-                    tools=active_tools,
+                    conf_call["tools"] = active_tools
-                    tool_choice="none",
+                conf_resp = await client.chat.completions.create(**conf_call)
                )
                final_response = conf_resp.choices[0].message.content or (
                    "This action requires your explicit confirmation before it can proceed."
                )
@@ -375,9 +376,9 @@ async def _run_from_messages(
            break
    else:
-        logger.warning("OpenAI orchestrator hit max rounds (%d)", settings.orchestrator_max_rounds)
+        logger.warning("OpenAI orchestrator hit max rounds (%d)", effective_limit)
        final_response = (
-            f"Reached the tool iteration limit ({settings.orchestrator_max_rounds} rounds). "
+            f"Reached the tool iteration limit ({effective_limit} rounds). "
            "Here is what was gathered:\n\n"
            + "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log)
        )
@@ -405,7 +406,10 @@ def _build_client(
    if host_type == "openwebui":
        base_url = base_url + "/api"
    client = AsyncOpenAI(base_url=base_url, api_key=api_key)
-    active_tools = get_openai_tools_for_role(user_role, tool_list)
+    if model_cfg.get("tools") is False:
        active_tools = []
    else:
        active_tools = get_openai_tools_for_role(user_role, tool_list)
    return client, model_name, active_tools
--- a/cortex/routers/chat.py
+++ b/cortex/routers/chat.py
@@ -295,6 +295,53 @@ async def rename_session_endpoint(
    return {"ok": True, "session_id": session_id, "name": req.name.strip()}
@router.post("/api/sessions/backfill-names")
 async def backfill_session_names(
    request: Request,
    user: str = Query(""),
    persona: str = Query(""),
 ) -> dict:
    """Name every unnamed session using its first user message (truncated to 60 chars).
    Idempotent — only touches sessions that have no name set.
    user/persona default to the JWT session user + last-used persona cookie."""
    # Resolve user from JWT if not provided
    if not user:
        token = request.cookies.get(COOKIE_NAME)
        if not token:
            raise HTTPException(status_code=401, detail="Not authenticated")
        try:
            user = decode_token(token)
        except jwt.InvalidTokenError:
            raise HTTPException(status_code=401, detail="Invalid session")
    # Resolve persona from cookie if not provided
    if not persona:
        from persona import list_user_personas
        persona_cookie = request.cookies.get("cx_last_persona", "")
        available = list_user_personas(user)
        persona = persona_cookie if persona_cookie in available else (available[0] if available else "")
    if not persona:
        raise HTTPException(status_code=400, detail="No persona found for user")
    _set_ctx(user, persona)
    sessions = list_all()
    named = 0
    for s in sessions:
        if s.get("name"):
            continue
        messages = load_session(s["session_id"])
        first_user = next((m for m in messages if m.get("role") == "user"), None)
        if not first_user:
            continue
        text = (first_user.get("content") or "").strip()
        if not text:
            continue
        auto_name = text[:60].rstrip() + ("…" if len(text) > 60 else "")
        rename_session(s["session_id"], auto_name)
        named += 1
    return {"ok": True, "named": named, "total": len(sessions)}
@router.delete("/sessions/{session_id}")
 async def delete_session_endpoint(
    session_id: str,
--- a/cortex/routers/distill.py
+++ b/cortex/routers/distill.py
@@ -1,25 +1,50 @@
 """
 Manual memory distillation endpoints.
-  POST /distill/short  — roll session logs → MEMORY_SHORT.md (no LLM)
+  POST /distill/short    — roll session logs → MEMORY_SHORT.md (no LLM)
-  POST /distill/mid    — summarize short   → MEMORY_MID.md   (LLM)
+  POST /distill/mid      — summarize short   → MEMORY_MID.md   (LLM)
-  POST /distill/long   — integrate mid     → MEMORY_LONG.md  (LLM)
+  POST /distill/long     — integrate mid     → MEMORY_LONG.md  (LLM)
-  POST /distill/all    — run all three in sequence
+  POST /distill/all      — run all three in sequence
  POST /distill/rebuild  — wipe mid + long, then run all three from scratch
-All endpoints require ?user=<username>&persona=<name> query params so distillation
+All endpoints require ?user=<username>&persona=<name> query params.
-targets the correct persona. Without them, the request is rejected (no silent fallback
+
-to server defaults — that caused wrong-user distillation in a multi-user setup).
+Concurrency: one distillation at a time per persona. A second request while one
 is running returns 409 immediately — no silent queuing.
 """
 import asyncio
 from datetime import datetime, timedelta
 from fastapi import APIRouter, HTTPException, Query
 from memory_distiller import distill_short, distill_mid, distill_long
-from persona import validate as validate_persona, set_context
+from persona import validate as validate_persona, set_context, persona_path as _persona_path
 import scheduler
 router = APIRouter(prefix="/distill")
 # Per-persona asyncio lock. Key: (user, persona)
 _LOCKS: dict[tuple, asyncio.Lock] = {}
 _LOCKS_META: dict[tuple, str] = {}  # key → which step is currently running
 # Minimum time between successive runs of each endpoint, per persona.
 # Prevents accidental rapid-fire runs and token waste.
 _COOLDOWNS: dict[tuple, timedelta] = {
    "short":   timedelta(minutes=1),
    "mid":     timedelta(minutes=30),
    "long":    timedelta(hours=6),
    "all":     timedelta(hours=1),
    "rebuild": timedelta(hours=6),
 }
 _LAST_RUN: dict[tuple, datetime] = {}  # key: (user, persona, endpoint)
 def _get_lock(user: str, persona: str) -> asyncio.Lock:
    key = (user, persona)
    if key not in _LOCKS:
        _LOCKS[key] = asyncio.Lock()
    return _LOCKS[key]
 def _resolve(user: str, persona: str) -> tuple[str, str]:
    """Validate and set persona context. Raises 404 if the persona doesn't exist."""
    try:
        u, p = validate_persona(user, persona)
    except Exception:
@@ -28,13 +53,51 @@ def _resolve(user: str, persona: str) -> tuple[str, str]:
    return u, p
 def _check_lock(user: str, persona: str) -> asyncio.Lock:
    """Return the lock if free, raise 409 if already held."""
    lock = _get_lock(user, persona)
    if lock.locked():
        step = _LOCKS_META.get((user, persona), "distillation")
        raise HTTPException(
            status_code=409,
            detail=f"A {step} is already running for {persona} — please wait for it to finish.",
        )
    return lock
 def _check_cooldown(user: str, persona: str, endpoint: str) -> None:
    """Raise 429 if the endpoint was run too recently for this persona."""
    cooldown = _COOLDOWNS.get(endpoint)
    if not cooldown:
        return
    key = (user, persona, endpoint)
    last = _LAST_RUN.get(key)
    if last:
        elapsed = datetime.now() - last
        if elapsed < cooldown:
            remaining = cooldown - elapsed
            mins = int(remaining.total_seconds() // 60)
            secs = int(remaining.total_seconds() % 60)
            wait = f"{mins}m {secs}s" if mins else f"{secs}s"
            raise HTTPException(
                status_code=429,
                detail=f"{endpoint} was just run — please wait {wait} before running again.",
            )
 def _record_run(user: str, persona: str, endpoint: str) -> None:
    _LAST_RUN[(user, persona, endpoint)] = datetime.now()
@router.get("/status")
 async def distill_status() -> dict:
    """Show auto-distillation schedule and next run times."""
    from config import settings
    # Include which personas are currently distilling
    active = [f"{u}/{p}" for (u, p), lock in _LOCKS.items() if lock.locked()]
    return {
        "enabled": settings.auto_distill,
        "jobs": scheduler.status(),
        "active": active,
        "config": {
            "short": settings.auto_distill_short,
            "mid": settings.auto_distill_mid,
@@ -49,7 +112,16 @@ async def do_distill_short(
    persona: str = Query(...),
 ) -> dict:
    u, p = _resolve(user, persona)
-    return {"ok": True, **distill_short(u, p)}
+    _check_cooldown(u, p, "short")
    lock = _check_lock(u, p)
    async with lock:
        _LOCKS_META[(u, p)] = "short distill"
        try:
            result = distill_short(u, p)
            _record_run(u, p, "short")
            return {"ok": True, **result}
        finally:
            _LOCKS_META.pop((u, p), None)
@router.post("/mid")
@@ -58,8 +130,17 @@ async def do_distill_mid(
    persona: str = Query(...),
 ) -> dict:
    u, p = _resolve(user, persona)
-    result = await distill_mid(u, p)
+    _check_cooldown(u, p, "mid")
-    return {"ok": "error" not in result, **result}
+    lock = _check_lock(u, p)
    async with lock:
        _LOCKS_META[(u, p)] = "mid distill"
        try:
            result = await distill_mid(u, p)
            if "error" not in result:
                _record_run(u, p, "mid")
            return {"ok": "error" not in result, **result}
        finally:
            _LOCKS_META.pop((u, p), None)
@router.post("/long")
@@ -68,8 +149,17 @@ async def do_distill_long(
    persona: str = Query(...),
 ) -> dict:
    u, p = _resolve(user, persona)
-    result = await distill_long(u, p)
+    _check_cooldown(u, p, "long")
-    return {"ok": "error" not in result, **result}
+    lock = _check_lock(u, p)
    async with lock:
        _LOCKS_META[(u, p)] = "long distill"
        try:
            result = await distill_long(u, p)
            if "error" not in result:
                _record_run(u, p, "long")
            return {"ok": "error" not in result, **result}
        finally:
            _LOCKS_META.pop((u, p), None)
@router.post("/all")
@@ -78,14 +168,71 @@ async def do_distill_all(
    persona: str = Query(...),
 ) -> dict:
    u, p = _resolve(user, persona)
-    short_result = distill_short(u, p)
+    _check_cooldown(u, p, "all")
-    mid_result = await distill_mid(u, p)
+    lock = _check_lock(u, p)
-    if "error" in mid_result:
+    async with lock:
-        return {"ok": False, "short": short_result, "mid": mid_result}
+        _LOCKS_META[(u, p)] = "full distill"
-    long_result = await distill_long(u, p)
+        try:
-    return {
+            short_result = distill_short(u, p)
-        "ok": "error" not in long_result,
+            mid_result = await distill_mid(u, p)
-        "short": short_result,
+            if "error" in mid_result:
-        "mid": mid_result,
+                return {"ok": False, "short": short_result, "mid": mid_result}
-        "long": long_result,
+            long_result = await distill_long(u, p)
-    }
+            ok = "error" not in long_result
            if ok:
                _record_run(u, p, "all")
            return {
                "ok": ok,
                "short": short_result,
                "mid": mid_result,
                "long": long_result,
            }
        finally:
            _LOCKS_META.pop((u, p), None)
@router.post("/rebuild")
 async def do_distill_rebuild(
    user: str = Query(...),
    persona: str = Query(...),
 ) -> dict:  # noqa: E501
    """Wipe MEMORY_MID and MEMORY_LONG (with backups), then run short → mid → long.
    Use when memories have drifted, been corrupted, or you want a clean slate
    rebuilt purely from session logs. Hand-edited content will be replaced.
    """
    u, p = _resolve(user, persona)
    _check_cooldown(u, p, "rebuild")
    lock = _check_lock(u, p)
    async with lock:
        _LOCKS_META[(u, p)] = "memory rebuild"
        try:
            from memory_distiller import _rotate_backup, _read
            inara_dir = _persona_path(u, p)
            # Back up then wipe mid and long before rebuilding
            for name in ("MEMORY_MID.md", "MEMORY_LONG.md"):
                path = inara_dir / name
                if path.exists():
                    _rotate_backup(path)
                    path.write_text(
                        f"# {name}\n\n*Cleared for rebuild — {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}.*\n"
                    )
            short_result = distill_short(u, p)
            mid_result = await distill_mid(u, p)
            if "error" in mid_result:
                return {"ok": False, "short": short_result, "mid": mid_result, "rebuilt": True}
            long_result = await distill_long(u, p)
            ok = "error" not in long_result
            if ok:
                _record_run(u, p, "rebuild")
            return {
                "ok": ok,
                "short": short_result,
                "mid": mid_result,
                "long": long_result,
                "rebuilt": True,
            }
        finally:
            _LOCKS_META.pop((u, p), None)
--- a/cortex/routers/files.py
+++ b/cortex/routers/files.py
@@ -27,10 +27,21 @@ ALLOWED = {
    "MEMORY_SHORT.bak1.md",
    "MEMORY_SHORT.bak2.md",
    "HELP.md",
    # Agent private notes — backups only; AGENT_NOTES.md itself is agent-only
    "AGENT_NOTES.bak1.md",
    "AGENT_NOTES.bak2.md",
    "AGENT_NOTES.bak3.md",
 }
 # Files that can be read via the panel but not written by users
 READ_ONLY = {
    "AGENT_NOTES.bak1.md",
    "AGENT_NOTES.bak2.md",
    "AGENT_NOTES.bak3.md",
 }
 # Files served from home/{user}/ instead of persona path
-USER_FILES = {"email_allowlist.json"}
+USER_FILES = {"email_allowlist.json", "usage.json"}
 def _resolve(user: str, persona: str) -> None:
@@ -92,7 +103,11 @@ async def get_file(
    p = _path(filename, user=user)
    if not p.exists():
        raise HTTPException(status_code=404, detail=f"{filename} does not exist")
-    return {"name": filename, "content": p.read_text()}
+    return {
        "name": filename,
        "content": p.read_text(),
        "readonly": filename in READ_ONLY,
    }
 class FileWrite(BaseModel):
@@ -106,6 +121,8 @@ async def save_file(
    user: str = Query("scott"),
    persona: str = Query("inara"),
 ) -> dict:
    if filename in READ_ONLY:
        raise HTTPException(status_code=403, detail=f"{filename} is read-only.")
    _resolve(user, persona)
    p = _path(filename, user=user)
    p.write_text(req.content)
--- a/cortex/routers/local_llm.py
+++ b/cortex/routers/local_llm.py
@@ -159,7 +159,8 @@ def _render(username: str, success: str = "", error: str = "") -> str:
        else:
            secondary = default_secondary
-        ctx      = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else ""
+        ctx       = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else ""
        no_tools  = '' if m.get("tools", True) else '<span class="pbadge pb-notools">no tools</span>'
        tags_html = " ".join(f'<span class="tag">{t}</span>' for t in (m.get("tags") or []))
        sec      = f'<span class="model-host">{secondary}</span>' if secondary else ""
@@ -201,13 +202,15 @@ def _render(username: str, success: str = "", error: str = "") -> str:
        cur_label      = m.get("label", "")
        cur_model_name = m.get("model_name", "")
        cur_ctx        = m.get("context_k", 0) or 0
        cur_max_rounds = m.get("max_rounds") or 0
        cur_tools      = m.get("tools", True)
        cur_tags       = ", ".join(m.get("tags") or [])
        model_rows += f'''
        <div class="model-row" id="model-{m["id"]}">
          <div class="model-row-header">
            <div class="model-info">
-              <div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}</div>
+              <div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}{no_tools}</div>
              <span class="model-name">{m.get("model_name","")}</span>
              {sec}
              <div class="tag-row">{tags_html}</div>
@@ -239,8 +242,22 @@ def _render(username: str, success: str = "", error: str = "") -> str:
            {extra_fields}
            <div class="field-row">
              <div class="field" style="flex:0 0 auto">
-                <label>Context (k)</label>
+                <label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k)</label>
-                <input type="number" name="context_k" value="{cur_ctx}" min="0">
+                <input type="number" name="context_k" value="{cur_ctx}" min="0"
                       title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
              </div>
              <div class="field" style="flex:0 0 auto">
                <label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
                <input type="number" name="max_rounds" value="{cur_max_rounds}" min="0"
                       title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
              </div>
              <div class="field" style="flex:0 0 auto">
                <label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
                <select name="tools"
                        title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
                  <option value="1" {'selected' if cur_tools else ''}>Supported</option>
                  <option value="0" {'' if cur_tools else 'selected'}>Not supported</option>
                </select>
              </div>
              <div class="field">
                <label>Tags</label>
@@ -426,6 +443,8 @@ async def add_model(
    provider:         str = Form("local"),
    label:            str = Form(""),
    context_k:        int = Form(0),
    max_rounds:       int = Form(0),
    tools:            int = Form(1),
    tags:             str = Form(""),
    # local-only fields
    host_id:          str = Form(""),
@@ -439,14 +458,17 @@ async def add_model(
    if not username:
        return RedirectResponse("/login", status_code=302)
-    tag_list = [t.strip() for t in tags.split(",") if t.strip()]
+    tag_list   = [t.strip() for t in tags.split(",") if t.strip()]
    max_rounds_ = max_rounds or None
    tools_bool  = tools != 0
    if provider == "local":
        if not model_name.strip():
            return HTMLResponse(_render(username, error="Model name is required."))
        if not host_id.strip():
            return HTMLResponse(_render(username, error="Select a host."))
-        reg.save_model(username, None, host_id, label, model_name, context_k, tag_list)
+        reg.save_model(username, None, host_id, label, model_name, context_k, tag_list,
                       max_rounds=max_rounds_, tools=tools_bool)
        display = label or model_name
    elif provider in ("google", "anthropic"):
@@ -459,6 +481,7 @@ async def add_model(
            account_id=account_id or None,
            credential_id=credential_id or None,
            context_k=context_k, tags=tag_list,
            max_rounds=max_rounds_, tools=tools_bool,
        )
        display = label or cloud_model_name
    else:
@@ -476,6 +499,8 @@ async def edit_model(
    label:         str = Form(""),
    model_name:    str = Form(""),
    context_k:     int = Form(0),
    max_rounds:    int = Form(0),
    tools:         int = Form(1),
    tags:          str = Form(""),
    host_id:       str = Form(""),
    account_id:    str = Form(""),
@@ -486,17 +511,22 @@ async def edit_model(
        return RedirectResponse("/login", status_code=302)
    if not model_name.strip():
        return HTMLResponse(_render(username, error="Model name is required."))
-    tag_list = [t.strip() for t in tags.split(",") if t.strip()]
+    tag_list    = [t.strip() for t in tags.split(",") if t.strip()]
    max_rounds_ = max_rounds or None
    tools_bool  = tools != 0
    if mtype == "local_openai":
        if not host_id.strip():
            return HTMLResponse(_render(username, error="Select a host for this model."))
-        reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list)
+        reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list,
                       max_rounds=max_rounds_, tools=tools_bool)
    elif mtype == "gemini_api":
        reg.save_cloud_model(username, model_id, "google", model_name, label,
-                             account_id=account_id or None, context_k=context_k, tags=tag_list)
+                             account_id=account_id or None, context_k=context_k, tags=tag_list,
                             max_rounds=max_rounds_, tools=tools_bool)
    elif mtype == "claude_cli":
        reg.save_cloud_model(username, model_id, "anthropic", model_name, label,
-                             credential_id=credential_id or "cli", context_k=context_k, tags=tag_list)
+                             credential_id=credential_id or "cli", context_k=context_k, tags=tag_list,
                             max_rounds=max_rounds_, tools=tools_bool)
    else:
        return HTMLResponse(_render(username, error=f"Unknown model type: {mtype}"))
    display = label.strip() or model_name.strip()
--- a/cortex/routers/onboarding.py
+++ b/cortex/routers/onboarding.py
@@ -1,11 +1,13 @@
 """
-Onboarding router — invite-based setup + persona creation.
+Onboarding router — invite-based setup + persona creation + model connect.
 Routes:
  GET  /setup/{token}      → show password setup form (step 1)
  POST /setup/{token}      → set password, redirect to persona step
  GET  /setup/persona      → show persona creation form (step 2, requires auth)
-  POST /setup/persona      → create persona, redirect to /{user}/{persona}
+  POST /setup/persona      → create persona, redirect to /setup/model
  GET  /setup/model        → OpenRouter quick-connect (step 3, also standalone)
  POST /setup/model        → save host + model + assign to chat role, redirect to chat
 """
 import logging
@@ -21,6 +23,7 @@ from auth_utils import (
 )
 from persona_template import create_persona
 from persona import list_user_personas, validate as validate_persona
 import model_registry
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/setup")
@@ -114,7 +117,11 @@ async def persona_submit(
        description=description.strip(),
    )
    logger.info("persona created: %s/%s", username, persona_name)
-    return RedirectResponse(f"/{username}/{persona_name}", status_code=302)
+    # Step 3: guided model setup before entering the chat
    resp = RedirectResponse("/setup/model", status_code=302)
    # Remember which persona to land on after model setup
    resp.set_cookie("cx_setup_persona", f"{username}/{persona_name}", max_age=3600, httponly=True, samesite="lax")
    return resp
 # ---------------------------------------------------------------------------
@@ -178,3 +185,126 @@ async def setup_submit(
        return resp
    return HTMLResponse(_setup_page("Unknown step."), status_code=400)
 # ---------------------------------------------------------------------------
 # Step 3 — model connect (OpenRouter quick-connect, also standalone)
 # ---------------------------------------------------------------------------
 # Curated model list shown in the Step 3 dropdown.
 _OPENROUTER_MODELS = [
    ("anthropic/claude-3-5-haiku-20241022",  "Claude 3.5 Haiku — Fast & affordable"),
    ("anthropic/claude-3-7-sonnet-20250219", "Claude 3.7 Sonnet — Smarter Claude"),
    ("google/gemini-2.0-flash-001",          "Gemini 2.0 Flash — Fast Google model"),
    ("meta-llama/llama-3.3-70b-instruct",    "Llama 3.3 70B — Open source"),
 ]
 def _model_page(error: str = "", from_setup: bool = False) -> str:
    html = (_STATIC / "setup.html").read_text()
    # Hide steps 1 and 2 inline; show step 3
    html = html.replace('<div id="step-password">', '<div id="step-password" style="display:none">')
    html = html.replace('<div id="step-persona" style="display:none">', '<div id="step-persona" style="display:none">')
    html = html.replace('<div id="step-model" style="display:none">', '<div id="step-model">')
    if from_setup:
        html = html.replace("<!-- SETUP_STEP3_LABEL -->", "Step 3 of 3")
    if error:
        html = html.replace("<!-- ERROR_MODEL -->", f'<p class="error">{error}</p>')
    return html
@router.post("/model/skip", include_in_schema=False)
 async def model_skip(request: Request):
    """Skip model setup — redirect to the remembered persona or user root."""
    from auth_utils import decode_token
    import jwt
    token = request.cookies.get(COOKIE_NAME)
    username = None
    if token:
        try:
            username = decode_token(token)
        except jwt.InvalidTokenError:
            pass
    dest_cookie = request.cookies.get("cx_setup_persona", "")
    dest = f"/{dest_cookie}" if dest_cookie else (f"/{username}" if username else "/")
    resp = RedirectResponse(dest, status_code=302)
    resp.delete_cookie("cx_setup_persona")
    return resp
@router.get("/model", include_in_schema=False)
 async def model_page(request: Request):
    from auth_utils import decode_token
    import jwt
    token = request.cookies.get(COOKIE_NAME)
    if not token:
        return RedirectResponse("/login", status_code=302)
    try:
        decode_token(token)
    except jwt.InvalidTokenError:
        return RedirectResponse("/login", status_code=302)
    from_setup = bool(request.cookies.get("cx_setup_persona"))
    return HTMLResponse(_model_page(from_setup=from_setup))
@router.post("/model", include_in_schema=False)
 async def model_submit(
    request: Request,
    api_key: str = Form(...),
    model_name: str = Form(...),
 ):
    from auth_utils import decode_token
    import jwt
    token = request.cookies.get(COOKIE_NAME)
    if not token:
        return RedirectResponse("/login", status_code=302)
    try:
        username = decode_token(token)
    except jwt.InvalidTokenError:
        return RedirectResponse("/login", status_code=302)
    api_key = api_key.strip()
    model_name = model_name.strip()
    if not api_key:
        from_setup = bool(request.cookies.get("cx_setup_persona"))
        return HTMLResponse(_model_page("API key is required.", from_setup=from_setup), status_code=422)
    # Save OpenRouter as a host
    host_id = model_registry.save_host(
        username=username,
        host_id=None,
        label="OpenRouter",
        api_url="https://openrouter.ai/api/v1",
        api_key=api_key,
        host_type="openai",
    )
    # Find label for selected model
    label = next((lbl for mn, lbl in _OPENROUTER_MODELS if mn == model_name), model_name)
    label = label.split(" — ")[0]  # keep just the model name part
    # Save model entry
    mid = model_registry.save_model(
        username=username,
        model_id=None,
        host_id=host_id,
        label=label,
        model_name=model_name,
        context_k=128,
        tools=True,
    )
    # Assign as chat role primary
    model_registry.set_role(username, "chat", "primary", mid)
    logger.info("openrouter setup complete: %s → %s", username, model_name)
    # Redirect to chat (use remembered persona, or user root)
    dest_cookie = request.cookies.get("cx_setup_persona", "")
    dest = f"/{dest_cookie}" if dest_cookie else f"/{username}"
    resp = RedirectResponse(dest, status_code=302)
    resp.delete_cookie("cx_setup_persona")
    return resp
--- a/cortex/session_store.py
+++ b/cortex/session_store.py
@@ -112,16 +112,17 @@ def list_all() -> list[dict]:
    if not d.exists():
        return []
    results = []
-    for f in sorted(d.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True):
+    for f in d.glob("*.json"):
        try:
            data = json.loads(f.read_text())
-            entry = {
+            results.append({
                "session_id": data["session_id"],
                "name": data.get("name", ""),
                "updated": data.get("updated"),
                "message_count": len(data.get("messages", [])),
-            }
+                "_sort_key": data.get("updated") or f.stat().st_mtime,
-            results.append(entry)
+            })
        except Exception:
            pass
    results.sort(key=lambda s: s.pop("_sort_key"), reverse=True)
    return results
--- a/cortex/static/HELP.md
+++ b/cortex/static/HELP.md
@@ -6,7 +6,24 @@
     and are appended automatically by help.html when present.
 -->
-*Last updated: 2026-05-05*
+*Last updated: 2026-05-08*
 ---
 ## Getting Started
 If this is your first time using Cortex, you need one thing before the chat will work: an AI model connected to your account.
 **Fastest path — OpenRouter:**
 OpenRouter gives you access to Claude, Gemini, and dozens of other models with a single API key.
 1. Get a free API key at [openrouter.ai/keys](https://openrouter.ai/keys)
 2. Go to **☰ → Account → [Set up OpenRouter →]** (shown automatically if no model is configured)
 3. Paste your key, pick a starting model, click **Connect**
 That's it — you're ready to chat.
 **Already past setup but seeing errors?** Go to **☰ → Account → Model Registry → Manage models** and confirm a model is assigned to the **Chat** role (Primary slot). If all slots are empty, add a model first.
 ---
@@ -52,19 +69,45 @@ Click the **⚡** button in the input row to enable the Tools toggle. When lit (
 The orchestrator runs a multi-step tool loop:
-1. The **orchestrator model** reasons about the request and calls tools as needed — web search, file reads, task management, shell commands, Aether Journals, and more
+1. The **orchestrator model** reasons about the request and calls tools as needed
 2. It produces an enriched summary of what it found
 3. The **responder model** (set by the active Role) receives that context and writes the final user-facing reply
 4. A `⚡ N tool calls: …` note appears below the response listing what was used
-The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**. By default this is Gemini API.
+The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**.
 The full tool reference is in the **Tools** tab. 40 tools across web, files, shell, system, tasks, cron, reminders, scratchpad, notifications, and Aether Journals.
 Tools mode is best for tasks requiring research, multi-step reasoning, or side effects (e.g. "search for X", "add a task", "what's on my list?", "append this to my journal"). Regular chat is faster for conversational turns.
 Orchestrated sessions persist to history exactly like regular chat.
 ### Available Tools
 40 tools across 11 categories. Each tool schema is sent to the model on every orchestrated call — fewer active tools means fewer tokens per call.
 | Category | Tools |
 |---|---|
 | **Web** | `web_search`, `http_fetch` |
 | **Files** | `file_read`, `file_list`, `file_write` |
 | **Shell** | `shell_exec`, `claude_allow_dir` |
 | **System** | `cortex_restart`, `cortex_logs`, `cortex_status`, `cortex_update` |
 | **Tasks** | `task_list`, `task_create`, `task_update`, `task_complete` |
 | **Cron** | `cron_list`, `cron_add`, `cron_remove`, `cron_toggle` |
 | **Reminders** | `reminders_add`, `reminders_list`, `reminders_remove`, `reminders_clear` |
 | **Scratchpad** | `scratch_read`, `scratch_write`, `scratch_append`, `scratch_clear` |
 | **Notifications** | `web_push`, `email_send`, `nc_talk_send` |
 | **Aether Journals** | `ae_journal_list/search`, `ae_journal_entries_list`, `ae_journal_entry_read/create/update/disable/append/prepend` |
 | **Agent Notes** | `agent_notes_read`, `agent_notes_write`, `agent_notes_append`, `agent_notes_clear` |
 File, Shell, System, and some Notification tools are **admin-only** and not visible to regular users.
 ### Per-Role Tool Sets
 Each role can be configured with a specific subset of tool categories. When a role has a tool subset configured, only those tools are sent to the orchestrator — the rest are invisible to the model for that session.
 **Example:** a Coder role might only need Web, Files, Shell, and Agent Notes. A Research role might only need Web. Configuring this avoids sending schemas for 30+ irrelevant tools on every call.
 Configure per-role tool sets in **Account → Model Registry → Role Assignments** — expand a role card to see the category checkboxes. The default (no checkboxes selected) sends all tools the user has access to.
 ---
 ## Sessions
@@ -123,11 +166,59 @@ Each response shows a **model tag** (bottom-right of message) with the model lab
 ---
 ## Account Settings
 **Navigate to:** ☰ (top-right menu) → **Account**
 | Section | What you can do |
 |---|---|
 | **Account** | View your username, role badge (Admin / User), rename your username |
 | **Connected Accounts** | See which Google account is linked for OAuth sign-in |
 | **Email Allowlist** | Regex patterns controlling which addresses the `email_send` tool can reach |
 | **Notifications** | Set which channel (NC Talk, Google Chat, email) Inara uses for proactive messages |
 | **Tool Permissions** | Allow or block specific orchestrator tools for your account |
 | **Usage** | Token consumption by model — see below |
 | **Browser Cache** | Clear UI preferences stored locally (theme, font size, session ID, etc.) |
 | **Model Registry** | Configure AI providers, local hosts, and role assignments |
 | **Change Password** | Update your login password |
 | **Personas** | List and rename your personas |
 ---
 ## Usage
 Token consumption is tracked automatically for API-backed models. **Navigate to:** ☰ → **Account** → **Usage** section.
 The table shows all-time totals per model key, with columns for:
 | Column | Meaning |
 |---|---|
 | **Model** | `backend/model-name` key (e.g. `gemini_api/gemini-2.5-flash`, `local/deepseek-v4`) |
 | **Calls** | Number of API calls made |
 | **Prompt** | Input tokens sent |
 | **Output** | Completion tokens received |
 | **Total** | Prompt + Output |
 Values ≥ 1,000 are displayed as `k` (e.g. `24.3k`).
 **What is and isn't tracked:**
 - ✅ Gemini API calls (orchestrator, distillation)
 - ✅ Local OpenAI-compatible calls (Open WebUI, Ollama, OpenRouter)
 - ✗ Claude CLI — no structured token data is returned by the subprocess
 - ✗ Gemini CLI — same reason
 The raw data lives in `home/{username}/usage.json` and is also accessible via the Files panel or the API.
 ---
 ## Model Registry
 Configure which AI models are available and which handles each task type.
-**Navigate to:** ☰ (top-right menu) → **Account** → scroll to **Model Registry** → **Manage models →**
+**New user quick path:** ☰ → **Account** → **Set up OpenRouter →** (the guided wizard adds a host, model, and role assignment in one step).
 **Full manual path:** ☰ → **Account** → scroll to **Model Registry** → **Manage models →**
 ---
@@ -142,10 +233,16 @@ Do this before adding models — models need a provider account or local host to
 2. Enter a label (e.g. "Work", "Personal") and your API key
 3. Get a free key at [aistudio.google.com/apikey](https://aistudio.google.com/apikey)
-**Local hosts** (Open WebUI, Ollama, OpenRouter, etc.):
+**OpenRouter** (recommended for new users — one key for many models):
 1. Get a key at [openrouter.ai/keys](https://openrouter.ai/keys)
 2. Scroll to **Local Hosts** → **+ Add host**
 3. Label: "OpenRouter", URL: `https://openrouter.ai/api/v1`, paste your key, Type: OpenAI-compatible
 4. Click **Fetch models** to verify, then add models from the fetched list
 **Other local hosts** (Open WebUI, Ollama, LM Studio, etc.):
 1. Scroll to **Local Hosts** → click **+ Add host** to expand the form
 2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and optional API key
-3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible (for OpenRouter, LM Studio, etc.)
+3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible
 4. Click **Fetch models** on the saved host card to verify connectivity
 ---
@@ -178,6 +275,8 @@ Scroll to **Role Assignments** at the bottom of the page. Each role has **Primar
 Leave all slots empty to use the server default.
 **Per-role tool sets:** Expand any role card to configure which tool categories the orchestrator can use when that role is active. Unchecked categories are hidden from the model entirely — reducing token overhead on every orchestrated call. Leaving all categories unchecked means all tools the user has access to are available (the default).
 ---
 ## Nextcloud Talk Bot
@@ -245,12 +344,12 @@ Controls how much context is prepended to each LLM call:
 | Tier | Loads | ~Tokens |
 |---|---|---|
-| **T1** | SOUL + IDENTITY + USER summary | ~1,500 |
+| **Min** | SOUL + IDENTITY + USER summary | ~1,500 |
-| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
+| **Std** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
-| **T3** | + last 2 raw session logs | ~15,000 |
+| **Ext** | + last 2 raw session logs | ~15,000 |
-| **T4** | + last 7 raw session logs | ~50,000 |
+| **Full** | + last 7 raw session logs | ~50,000 |
-Default is T2. Use T1 for small/local models. Use T3–T4 for complex multi-session tasks.
+Default is **Std**. Use **Min** for small/local models. Use **Ext** or **Full** for complex multi-session tasks.
 ### Memory Layers
@@ -318,6 +417,7 @@ For direct access or scripting:
 | `GET` | `/orchestrate/{job_id}` | Poll job status and result |
 | `GET` | `/settings/models` | Model registry UI |
 | `POST` | `/api/models/role` | Set a role assignment (JSON body) |
 | `POST` | `/api/models/role-config` | Set per-role tool list and system prompt append |
 | `GET` | `/api/push/vapid-key` | VAPID public key (for push subscription) |
 | `POST` | `/api/push/subscribe` | Register a push subscription |
 | `DELETE` | `/api/push/subscribe` | Remove a push subscription |
@@ -325,6 +425,11 @@ For direct access or scripting:
 | `GET` | `/api/audit/day?date=` | Tool call entries for a specific date (own data) |
 | `GET` | `/api/audit/recent` | Recent tool calls across days (admin) |
 | `GET` | `/api/audit/stats` | Tool call counts by tool/status/user (admin) |
 | `GET` | `/api/usage` | Full daily token usage log (own data) |
 | `GET` | `/api/usage/summary` | Per-model token totals, all time (own data) |
 | `GET` | `/api/usage/all` | Per-model totals for all users (admin) |
 | `GET` | `/setup/model` | Guided OpenRouter setup form (Step 3 / standalone) |
 | `POST` | `/setup/model` | Save OpenRouter host + model + assign to chat role |
 | `GET` | `/health` | Health check — returns `{"status": "ok"}` |
 Chat request body (`POST /chat`):
--- a/cortex/static/TOOLS.md
+++ b/cortex/static/TOOLS.md
@@ -1,6 +1,6 @@
 # Tool Reference
-> This reference covers all 40 orchestrator tools available when the ⚡ toggle is on.
+> This reference covers all 44 orchestrator tools available when the ⚡ toggle is on.
 > Tools are invoked automatically by the orchestrator — you don't call them directly.
 ¹ **Admin only** — requires the `admin` role. Invisible to regular users.  
@@ -102,3 +102,14 @@
 | Tool | What it does |
 |---|---|
 | `ae_task_list` ¹ | List tasks from the agents_sync Kanban board |
 ## Agent Notes
 Private, durable notes visible only to the orchestrator — not surfaced to users. Persist across sessions. Only available in orchestrated (tool-enabled) sessions.
 | Tool | What it does |
 |---|---|
 | `agent_notes_read` | Read the current private notes file |
 | `agent_notes_write` | Overwrite the notes file completely |
 | `agent_notes_append` | Append a timestamped entry (keeps last 3 backups automatically) |
 | `agent_notes_clear` | Erase all notes (backs up first) |
--- a/cortex/static/app.js
+++ b/cortex/static/app.js
@@ -18,6 +18,11 @@
        const settings_dd_el     = document.getElementById('settings-dropdown');
        const sessionsBackdrop   = document.getElementById('sessions-backdrop');
        // ── Utilities ─────────────────────────────────────────────────
        function escapeHtml(str) {
            return String(str).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
        }
        // ── Close all panels/dropdowns (mutual exclusion) ─────────────
        function closeAllPanels() {
            if (mode_dropdown_el)  mode_dropdown_el.classList.remove('open');
@@ -435,8 +440,32 @@
            availableRoles = d.available_roles || [];
            roleIdx        = 0;
            setRoleToggleUI(availableRoles[0] || null);
            _maybeShowNoBanner(availableRoles);
        });
        function _maybeShowNoBanner(roles) {
            const key = 'cx_no_model_banner_dismissed';
            if (roles.length > 0) { localStorage.removeItem(key); return; }
            if (localStorage.getItem(key)) return;
            const banner = document.createElement('div');
            banner.id = 'no-model-banner';
            banner.style.cssText = [
                'background:#1c1a0a','border-bottom:1px solid #78350f',
                'color:#fbbf24','font-size:0.82rem','padding:0.55rem 1rem',
                'display:flex','align-items:center','gap:0.75rem','flex-shrink:0',
            ].join(';');
            banner.innerHTML = `
                <span style="flex:1">⚡ Using server default model — add your own for more choices and to track your usage.</span>
                <a href="/setup/model" style="color:#fbbf24;font-weight:600;white-space:nowrap;">Set up OpenRouter →</a>
                <button onclick="localStorage.setItem('${key}','1');document.getElementById('no-model-banner').remove();"
                        style="background:none;border:none;color:#78350f;cursor:pointer;font-size:1rem;line-height:1;padding:0 0.2rem;"
                        title="Dismiss">✕</button>
            `;
            // Insert at the top of #chat-col (or body if not found)
            const col = document.getElementById('chat-col') || document.body.firstElementChild;
            col.insertBefore(banner, col.firstChild);
        }
        backendToggle.addEventListener('click', () => {
            if (availableRoles.length <= 1) return;
            roleIdx = (roleIdx + 1) % availableRoles.length;
@@ -1067,6 +1096,19 @@
                            sessionId = data.session_id;
                            sessionEl.textContent = `session: ${sessionId}`;
                            persist_session();
                            // Auto-name the session from the first user message
                            if (wasNewSession) {
                                const autoName = text.slice(0, 60).trimEnd() + (text.length > 60 ? '…' : '');
                                fetch(`/sessions/${sessionId}?${_fileParams}`, {
                                    method: 'PATCH',
                                    headers: { 'Content-Type': 'application/json' },
                                    body: JSON.stringify({ name: autoName }),
                                }).then(() => {
                                    sessionEl.textContent = `session: ${autoName}`;
                                    sessionNames.set(sessionId, autoName);
                                }).catch(() => {});
                            }
                            thinkingDiv.className = 'message assistant';
                            setMessageText(thinkingDiv, 'assistant', data.response);
                            const assistHistIdx = currentHistory.length;
@@ -1133,6 +1175,8 @@
            const text = inputEl.value.trim();
            if (!text || activeController) return;
            const wasNewSession = !sessionId;
            inputEl.value = '';
            syncHeight();
            sendBtn.style.display = 'none';
@@ -1357,6 +1401,7 @@
            { label: 'Memory',   files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] },
            { label: 'Profile',  files: ['USER.md', 'HELP.md'] },
            { label: 'Settings', files: ['email_allowlist.json'] },
            { label: 'Agent Notes (read-only)', files: ['AGENT_NOTES.bak1.md', 'AGENT_NOTES.bak2.md', 'AGENT_NOTES.bak3.md'], collapsed: true },
        ];
        function fmtSize(bytes) {
@@ -1394,7 +1439,7 @@
            fileSidebar.innerHTML = '';
            for (const group of FILE_GROUPS) {
-                const { groupEl, items } = _makeFileGroup(group.label);
+                const { groupEl, items } = _makeFileGroup(group.label, group.collapsed || false);
                for (const fname of group.files) {
                    const f = byName[fname];
@@ -1490,12 +1535,20 @@
            // Restore editor/preview buttons hidden by audit view
            fileRawBtn.style.display = '';
            filePreviewBtn.style.display = '';
            fileSaveBtn.style.display = '';
            const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`);
            if (!res.ok) { mdEditor.setValue(`Error loading ${name}`); return; }
            const data = await res.json();
            mdEditor.setValue(data.content);
            mdEditor.clearHistory();
            if (data.readonly) {
                mdEditor.setOption('readOnly', 'nocursor');
                fileSaveBtn.style.display = 'none';
                document.getElementById('file-modal-title').textContent = name + ' (read-only)';
            } else {
                mdEditor.setOption('readOnly', false);
                fileSaveBtn.style.display = '';
                document.getElementById('file-modal-title').textContent = name;
            }
            setFileMode(fileMode);
        }
@@ -1794,11 +1847,13 @@
        let memMid      = localStorage.getItem('mem-mid')   !== 'false';
        let memShort    = localStorage.getItem('mem-short') !== 'false';
        const TIER_LABELS = { 1: 'Min', 2: 'Std', 3: 'Ext', 4: 'Full' };
        function updateTierUI() {
            document.querySelectorAll('.ctx-btn[data-tier]').forEach(btn => {
                btn.classList.toggle('active', parseInt(btn.dataset.tier) === currentTier);
            });
-            ctxOpenBtn.querySelector('.tier-badge').textContent = currentTier;
+            ctxOpenBtn.querySelector('.tier-badge').textContent = TIER_LABELS[currentTier] || currentTier;
        }
        function updateMemUI() {
@@ -1870,33 +1925,46 @@
            memShort = !memShort; localStorage.setItem('mem-short', memShort); updateMemUI();
        });
        const _distillBtns = () => document.querySelectorAll(
            '#distill-short-btn, #distill-mid-btn, #distill-long-btn, #distill-all-btn, #distill-rebuild-btn'
        );
        function showDistillStatus(msg, isErr) {
            distillStatus.textContent = msg;
            distillStatus.classList.toggle('err', !!isErr);
            distillStatus.classList.add('show');
-            setTimeout(() => distillStatus.classList.remove('show'), 5000);
+            setTimeout(() => distillStatus.classList.remove('show'), isErr ? 8000 : 5000);
        }
-        async function runDistill(endpoint) {
+        async function runDistill(endpoint, label) {
-            showDistillStatus('distilling…', false);
+            _distillBtns().forEach(b => { b.disabled = true; });
            showDistillStatus(`${label || endpoint} running…`, false);
            try {
                const res = await fetch(`/distill/${endpoint}?${_fileParams}`, { method: 'POST' });
                const d = await res.json();
-                if (!res.ok || d.ok === false) {
+                if (res.status === 409 || res.status === 429) {
-                    const err = d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`;
+                    showDistillStatus(`⏳ ${d.detail}`, true);
                } else if (!res.ok || d.ok === false) {
                    const err = d.detail || d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`;
                    showDistillStatus(`✗ ${err}`, true);
                } else {
-                    showDistillStatus(`✓ ${endpoint} done`, false);
+                    showDistillStatus(`✓ ${label || endpoint} complete`, false);
                }
            } catch (err) {
                showDistillStatus(`✗ ${err.message}`, true);
            } finally {
                _distillBtns().forEach(b => { b.disabled = false; });
            }
        }
-        document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short'));
+        document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short', 'Short distill'));
-        document.getElementById('distill-mid-btn').addEventListener('click',   () => runDistill('mid'));
+        document.getElementById('distill-mid-btn').addEventListener('click',   () => runDistill('mid',   'Mid distill'));
-        document.getElementById('distill-long-btn').addEventListener('click',  () => runDistill('long'));
+        document.getElementById('distill-long-btn').addEventListener('click',  () => runDistill('long',  'Long distill'));
-        document.getElementById('distill-all-btn').addEventListener('click',   () => runDistill('all'));
+        document.getElementById('distill-all-btn').addEventListener('click',   () => runDistill('all',   'Full distill'));
        document.getElementById('distill-rebuild-btn').addEventListener('click', () => {
            if (!confirm('Rebuild memory from scratch?\n\nThis will wipe MEMORY_MID and MEMORY_LONG (backups kept) then regenerate them from session logs. Any hand-edited content will be replaced.\n\nContinue?')) return;
            runDistill('rebuild', 'Memory rebuild');
        });
        updateTierUI();
        updateMemUI();
--- a/cortex/static/index.html
+++ b/cortex/static/index.html
@@ -87,10 +87,10 @@
            <div class="ctx-section">
                <div class="ctx-section-title">Context Tier</div>
                <div class="ctx-row">
-                    <button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal (~1.5k tokens)">T1</button>
+                    <button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal — identity only (~1.5k tokens)">Min</button>
-                    <button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard (~5k tokens)">T2</button>
+                    <button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard — memory + user profile (~5k tokens)">Std</button>
-                    <button class="ctx-btn" data-tier="3" id="tier-3" title="Extended (~15k tokens)">T3</button>
+                    <button class="ctx-btn" data-tier="3" id="tier-3" title="Extended — + last 2 sessions (~15k tokens)">Ext</button>
-                    <button class="ctx-btn" data-tier="4" id="tier-4" title="Full (~50k tokens)">T4</button>
+                    <button class="ctx-btn" data-tier="4" id="tier-4" title="Full — + last 7 sessions (~50k tokens)">Full</button>
                </div>
            </div>
            <div class="ctx-section">
@@ -108,6 +108,7 @@
                    <button class="ctx-btn" id="distill-mid-btn"   title="Summarize SHORT → MID memory (uses LLM)">Mid</button>
                    <button class="ctx-btn" id="distill-long-btn"  title="Integrate MID → LONG memory (uses LLM)">Long</button>
                    <button class="ctx-btn" id="distill-all-btn"   title="Run Short → Mid → Long in sequence">All</button>
                    <button class="ctx-btn ctx-btn-danger" id="distill-rebuild-btn" title="⚠ Wipe Mid + Long memories and rebuild from session logs. Hand-edited content will be replaced.">Rebuild</button>
                </div>
                <div id="ctx-distill-status"></div>
                <div id="ctx-schedule"></div>
--- a/cortex/static/local_llm.html
+++ b/cortex/static/local_llm.html
@@ -167,9 +167,11 @@
    .pb-anthropic { background: #1e1b4b; color: #818cf8; }
    .pb-google    { background: #042f2e; color: #34d399; }
    .pb-local     { background: #1e293b; color: #64748b; }
    .pb-notools   { background: #3b1a1a; color: #f87171; }
    [data-theme="light"] .pb-anthropic { background: #ede9fe; color: #5b21b6; }
    [data-theme="light"] .pb-google    { background: #d1fae5; color: #065f46; }
    [data-theme="light"] .pb-local     { background: #e2e8f0; color: #475569; }
    [data-theme="light"] .pb-notools   { background: #fee2e2; color: #b91c1c; }
    /* Host & model rows */
    .host-row {
@@ -488,8 +490,22 @@
                   autocomplete="off" data-form-type="other">
          </div>
          <div class="field" style="flex:0 0 auto">
-            <label>Context (k tokens)</label>
+            <label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k tokens)</label>
-            <input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000">
+            <input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000"
                   title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
          </div>
          <div class="field" style="flex:0 0 auto">
            <label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
            <input type="number" name="max_rounds" value="0" min="0"
                   title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
          </div>
          <div class="field" style="flex:0 0 auto">
            <label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
            <select name="tools"
                    title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
              <option value="1" selected>Supported</option>
              <option value="0">Not supported</option>
            </select>
          </div>
        </div>
        <div class="field">
--- a/cortex/static/settings.html
+++ b/cortex/static/settings.html
@@ -423,6 +423,18 @@
    </div>
    <!-- Browser cache -->
    <!-- Usage summary -->
    <div class="section" id="usage-section">
      <h2>Usage</h2>
      <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
        Token consumption tracked for API-backed models (Gemini API, local OpenAI-compatible).
        Claude CLI calls are not metered.
      </p>
      <div id="usage-table-wrap" style="overflow-x:auto;">
        <p style="font-size:0.8rem; color:var(--pg-muted);">Loading…</p>
      </div>
    </div>
    <div class="section">
      <h2>Browser Cache</h2>
      <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
@@ -443,6 +455,25 @@
    <!-- Model Registry link -->
    <div class="section">
      <h2>Model Registry</h2>
      <!-- Quick-start card: shown only when no model is configured for chat role -->
      <div id="openrouter-quickstart" style="display:none; background:#1c1a0a; border:1px solid #78350f;
           border-radius:8px; padding:1rem; margin-bottom:1rem;">
        <p style="font-size:0.82rem; color:#fbbf24; font-weight:600; margin-bottom:0.4rem;">
          ⚡ You're on the server default model
        </p>
        <p style="font-size:0.8rem; color:#d97706; margin-bottom:0.75rem; line-height:1.5;">
          You can chat now, but adding your own model gives you more choices, lets you pick
          role-specific models, and tracks your usage separately.
          OpenRouter is the easiest way to get started — one key, many models.
        </p>
        <a href="/setup/model"
           style="display:inline-block; padding:0.5rem 0.9rem; background:#92400e; border-radius:6px;
                  color:#fef3c7; font-size:0.85rem; font-weight:600; text-decoration:none;">
          Set up OpenRouter →
        </a>
      </div>
      <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
        Configure AI providers (Anthropic, Google), local hosts (Open WebUI, Ollama, OpenRouter, etc.),
        and assign models to roles — chat, orchestrator, distill, and more.
@@ -479,6 +510,22 @@
    </div>
    <!-- Personas -->
    <!-- Sessions -->
    <div class="section">
      <h2>Sessions</h2>
      <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
        Auto-name any sessions that still show a random ID, using their first message as the name.
        Only unnamed sessions are affected — existing names are left alone.
      </p>
      <button type="button" id="backfill-names-btn"
              style="padding:0.5rem 1rem; background:none; border:1px solid var(--pg-border); border-radius:6px;
                     color:var(--pg-muted); font-size:0.88rem; font-weight:500; cursor:pointer;
                     transition:border-color 0.15s, color 0.15s;">
        Auto-name old sessions
      </button>
      <span id="backfill-names-ok" style="display:none; margin-left:0.75rem; font-size:0.8rem; color:#4ade80;"></span>
    </div>
    <div class="section">
      <h2>Personas</h2>
      <ul class="persona-list">
@@ -532,6 +579,84 @@
      document.getElementById('clear-ls-ok').style.display = 'inline';
    });
    // Show OpenRouter quick-start card if no model is configured
    (async () => {
      try {
        const d = await fetch('/backend').then(r => r.json());
        const roles = d.available_roles || [];
        if (roles.length === 0) {
          document.getElementById('openrouter-quickstart').style.display = 'block';
        }
      } catch (_) {}
    })();
    // Usage summary table
    (async () => {
      const wrap = document.getElementById('usage-table-wrap');
      try {
        const resp = await fetch('/api/usage/summary');
        if (!resp.ok) throw new Error(resp.statusText);
        const rows_data = await resp.json();
        if (!rows_data.length) {
          wrap.innerHTML = '<p style="font-size:0.8rem;color:var(--pg-muted);">No usage recorded yet.</p>';
          return;
        }
        const fmt = n => n >= 1000 ? (n / 1000).toFixed(1) + 'k' : String(n);
        const rows = rows_data.map(d => {
          const labelCell = d.label !== d.key
            ? `<span title="${d.key}">${d.label}</span>`
            : `<span>${d.key}</span>`;
          return `<tr>
            <td style="padding:0.4rem 0.75rem 0.4rem 0; font-size:0.82rem; color:var(--pg-text); white-space:nowrap;">${labelCell}</td>
            <td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${d.calls}</td>
            <td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.prompt_tokens)}</td>
            <td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.completion_tokens)}</td>
            <td style="padding:0.4rem 0 0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-text); text-align:right; font-weight:600;">${fmt(d.total_tokens)}</td>
          </tr>`;
        }).join('');
        wrap.innerHTML = `<table style="border-collapse:collapse; width:100%; min-width:360px;">
          <thead>
            <tr style="border-bottom:1px solid var(--pg-border);">
              <th style="padding:0.35rem 0.75rem 0.35rem 0; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:left;">Model</th>
              <th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Calls</th>
              <th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Prompt</th>
              <th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Output</th>
              <th style="padding:0.35rem 0 0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Total</th>
            </tr>
          </thead>
          <tbody>${rows}</tbody>
        </table>`;
      } catch (e) {
        wrap.innerHTML = `<p style="font-size:0.8rem;color:var(--pg-muted);">Could not load usage data.</p>`;
      }
    })();
    // Auto-name old sessions backfill
    document.getElementById('backfill-names-btn').addEventListener('click', async () => {
      const btn = document.getElementById('backfill-names-btn');
      const ok  = document.getElementById('backfill-names-ok');
      btn.disabled = true;
      btn.textContent = 'Working…';
      try {
        const params = new URLSearchParams(window.location.search);
        const user    = params.get('user')    || document.querySelector('input[value]')?.value || '';
        const persona = params.get('persona') || '';
        const qs = user ? `?user=${encodeURIComponent(user)}&persona=${encodeURIComponent(persona)}` : '';
        const res = await fetch(`/api/sessions/backfill-names${qs}`, { method: 'POST' });
        const data = await res.json();
        if (!res.ok) throw new Error(data.detail || res.statusText);
        const n = data.named ?? 0;
        ok.textContent = `Named ${n} session${n !== 1 ? 's' : ''}.`;
        ok.style.display = 'inline';
      } catch (e) {
        ok.textContent = 'Error — check console.';
        ok.style.color = '#f87171';
        ok.style.display = 'inline';
      }
      btn.textContent = 'Auto-name old sessions';
      btn.disabled = false;
    });
    // Persona rename toggle
    document.querySelectorAll('.persona-rename-toggle').forEach(btn => {
      btn.addEventListener('click', () => {
--- a/cortex/static/setup.html
+++ b/cortex/static/setup.html
@@ -127,6 +127,36 @@
    .emoji-opt.selected { border-color: #7c3aed; background: #2d1f52; }
    #emoji-hidden { display: none; }
    .provider-badge {
      display: inline-flex;
      align-items: center;
      gap: 0.4rem;
      background: #2d1f52;
      border: 1px solid #7c3aed;
      border-radius: 6px;
      padding: 0.3rem 0.6rem;
      font-size: 0.78rem;
      color: #a78bfa;
      margin-bottom: 1rem;
    }
    .skip-link {
      display: block;
      text-align: center;
      margin-top: 1rem;
      font-size: 0.8rem;
      color: #64748b;
      text-decoration: none;
    }
    .skip-link:hover { color: #94a3b8; }
    .model-hint {
      font-size: 0.72rem;
      color: #64748b;
      margin-top: 0.75rem;
      text-align: center;
    }
  </style>
 </head>
 <body>
@@ -137,10 +167,11 @@
    </div>
    <!-- ERROR -->
    <!-- ERROR_MODEL -->
    <!-- ── Step 1: password ───────────────────────────────────────── -->
    <div id="step-password">
-      <div class="step-label">Step 1 of 2</div>
+      <div class="step-label">Step 1 of 3</div>
      <h2>Set your password</h2>
      <form method="POST" action="" id="password-form">
        <input type="hidden" name="step" value="password">
@@ -161,7 +192,7 @@
    <!-- ── Step 2: persona ────────────────────────────────────────── -->
    <div id="step-persona" style="display:none">
-      <div class="step-label">Step 2 of 2</div>
+      <div class="step-label">Step 2 of 3</div>
      <h2>Create your persona</h2>
      <form method="POST" action="" id="persona-form">
        <input type="hidden" name="step" value="persona">
@@ -203,6 +234,39 @@
        <button type="submit">Create my persona →</button>
      </form>
    </div>
    <!-- ── Step 3: model connect ─────────────────────────────────── -->
    <div id="step-model" style="display:none">
      <div class="step-label"><!-- SETUP_STEP3_LABEL --></div>
      <h2>Connect an AI model</h2>
      <div class="provider-badge">⚡ Recommended: OpenRouter</div>
      <p style="font-size:0.82rem;color:#94a3b8;margin-bottom:1rem;">
        One API key gives you access to Claude, Gemini, Llama, and dozens of other models.
        Get a free key at <a href="https://openrouter.ai/keys" target="_blank" style="color:#a78bfa;">openrouter.ai/keys</a>.
      </p>
      <form method="POST" action="/setup/model" id="model-form">
        <div class="field">
          <label for="api_key">OpenRouter API key</label>
          <input type="password" id="api_key" name="api_key"
                 autocomplete="off" placeholder="sk-or-v1-..." required>
        </div>
        <div class="field">
          <label for="model_name">Starting model</label>
          <select id="model_name" name="model_name">
            <option value="anthropic/claude-3-5-haiku-20241022">Claude 3.5 Haiku — Fast &amp; affordable</option>
            <option value="anthropic/claude-3-7-sonnet-20250219">Claude 3.7 Sonnet — Smarter Claude</option>
            <option value="google/gemini-2.0-flash-001">Gemini 2.0 Flash — Fast Google model</option>
            <option value="meta-llama/llama-3.3-70b-instruct">Llama 3.3 70B — Open source</option>
          </select>
          <p class="hint">You can add more models or switch anytime in Account → Model Registry.</p>
        </div>
        <button type="submit">Connect &amp; start chatting →</button>
      </form>
      <p class="model-hint">
        Using Ollama, a local model, or something else?
        <a href="#" id="skip-model-link" style="color:#64748b;">Skip this step →</a>
      </p>
    </div>
  </div>
  <script>
@@ -232,6 +296,11 @@
      document.getElementById('step-password').style.display = 'none';
      document.getElementById('step-persona').style.display  = 'block';
    }
    if (params.get('step') === '3') {
      document.getElementById('step-password').style.display = 'none';
      document.getElementById('step-persona').style.display  = 'none';
      document.getElementById('step-model').style.display    = 'block';
    }
    // ── Client-side confirm password check ───────────────────────────
    document.getElementById('password-form').addEventListener('submit', e => {
@@ -243,6 +312,15 @@
      }
    });
    // ── Skip model setup — navigate to user home ─────────────────────
    document.getElementById('skip-model-link')?.addEventListener('click', e => {
      e.preventDefault();
      // Ask server for skip target (the cx_setup_persona cookie has the path)
      fetch('/setup/model/skip', { method: 'POST', credentials: 'same-origin' })
        .then(r => { if (r.redirected) location.href = r.url; else location.href = '/'; })
        .catch(() => { location.href = '/'; });
    });
    // ── Auto-generate persona slug from display name ─────────────────
    document.getElementById('display_name').addEventListener('input', function() {
      const slugField = document.getElementById('persona_name');
--- a/cortex/static/style.css
+++ b/cortex/static/style.css
@@ -1328,7 +1328,10 @@
        .ctx-btn:hover    { color: var(--text); border-color: var(--muted); }
        .ctx-btn.active   { color: var(--accent); border-color: var(--accent); }
        .ctx-btn.mem-on   { color: var(--success); border-color: var(--success-dim); }
-        .ctx-btn.local-on { color: var(--amber); border-color: var(--amber-border); }
+        .ctx-btn.local-on   { color: var(--amber); border-color: var(--amber-border); }
        .ctx-btn-danger     { color: #f87171 !important; border-color: #7f1d1d !important; }
        .ctx-btn-danger:hover { border-color: #f87171 !important; }
        .ctx-btn:disabled   { opacity: 0.4; cursor: not-allowed; pointer-events: none; }
        #backend-model-hint {
            font-size: 0.68rem; color: var(--amber); opacity: 0.9;
            margin-top: 4px; word-break: break-all; line-height: 1.3;
--- a/cortex/tools/init.py
+++ b/cortex/tools/init.py
@@ -64,6 +64,12 @@ from tools.scratch import (
    scratch_clear  as _scratch_clear,
 )
 from tools.notify import nc_talk_send as _nc_talk_send, email_send as _email_send, web_push as _web_push
 from tools.agent_notes import (
    agent_notes_read   as _agent_notes_read,
    agent_notes_write  as _agent_notes_write,
    agent_notes_append as _agent_notes_append,
    agent_notes_clear  as _agent_notes_clear,
 )
 # ── Declaration imports ───────────────────────────────────────────────────────
@@ -77,6 +83,7 @@ import tools.cron         as _mod_cron
 import tools.reminders    as _mod_reminders
 import tools.scratch      as _mod_scratch
 import tools.notify       as _mod_notify
 import tools.agent_notes  as _mod_agent_notes
 # ── Tool categories — used by the Model Registry UI for grouped checkboxes ───
@@ -98,6 +105,7 @@ TOOL_CATEGORIES: dict[str, list[str]] = {
        "ae_journal_entry_prepend",
    ],
    "Aether Tasks":     ["ae_task_list"],
    "Agent Notes":      ["agent_notes_read", "agent_notes_write", "agent_notes_append", "agent_notes_clear"],
 }
 # ── Callable registry ─────────────────────────────────────────────────────────
@@ -143,6 +151,10 @@ _CALLABLES: dict[str, callable] = {
    "email_send":                _email_send,
    "nc_talk_send":              _nc_talk_send,
    "web_push":                  _web_push,
    "agent_notes_read":          _agent_notes_read,
    "agent_notes_write":         _agent_notes_write,
    "agent_notes_append":        _agent_notes_append,
    "agent_notes_clear":         _agent_notes_clear,
 }
 # ── Role-based access control ─────────────────────────────────────────────────
@@ -194,6 +206,7 @@ _ALL_DECLARATIONS: list[types.FunctionDeclaration] = (
    + _mod_notify.DECLARATIONS
    + _mod_ae_knowledge.DECLARATIONS
    + _mod_ae_tasks.DECLARATIONS
    + _mod_agent_notes.DECLARATIONS
 )
 # Full Gemini Tool object (all tools — use get_tools_for_role() in production)
--- a/documentation/ARCH__BACKENDS.md
+++ b/documentation/ARCH__BACKENDS.md
@@ -1,7 +1,7 @@
 # Architecture: LLM Backends
 > How Cortex selects and talks to AI models.
-> Last updated: 2026-04-27 (V2 schema)
+> Last updated: 2026-05-06
 ---
@@ -33,11 +33,11 @@ Resolution order for a role:
 ### Explicit Override
-The UI backend toggle cycles: **auto → claude → gemini → local → auto**
+The **Role** toggle in the Context & Memory panel cycles through configured role slots for the `chat` role: **Primary → Backup 1 → Backup 2 → auto**.
- **auto** (default): role-based routing as above
+- Each slot shows the configured model label
- **claude / gemini / local**: bypasses role routing; forces that backend type
+- `auto` uses the Primary without forcing a specific backend type
- The toggle will be redesigned in Phase 3 to cycle through chat role slots (Primary / Backup 1 / Backup 2)
+- The ⚡ Tools toggle is independent — it routes to the `orchestrator` role regardless of the chat role selection
 **Fallback chain** (automatic, only when no explicit registry entry exists):
 ```
@@ -113,6 +113,8 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
      "provider": "local",
      "host_id": "abc123",
      "context_k": 72,
      "max_rounds": 5,
      "tools": true,
      "tags": ["fast", "local"]
    }
  ],
@@ -125,6 +127,14 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
 }
 ```
 ### Optional model fields
 | Field | Type | Default | Meaning |
 |---|---|---|---|
 | `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). |
 | `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. |
 | `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. |
 ### host_type (local hosts)
 | `host_type` | Chat endpoint | Models endpoint | Use for |
@@ -210,13 +220,6 @@ Memory distillation uses `role="distill"`. Configure via Model Registry → Role
 `.env` override: `ROLE_DISTILL=claude_cli` (default).
 ---
 ## Future: Phase 3 — Backend Toggle Redesign
 The `claude → gemini → local` toggle will be replaced with a slot toggle that cycles
 through the chat role's configured models (Primary → Backup 1 → Backup 2), showing
 the actual model label. See `DESIGN__Model_Registry_V2.md`.
 ---
--- a/documentation/ARCH__SYSTEM.md
+++ b/documentation/ARCH__SYSTEM.md
@@ -1,7 +1,7 @@
 # Architecture: System Overview
 > How the pieces fit together.
-> Last updated: 2026-04-03
+> Last updated: 2026-05-06
 ---
@@ -56,7 +56,9 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
 | `context_loader.py` | Builds system prompt from persona files (tiers 1–4) |
 | `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
 | `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
-| `session_store.py` | In-memory + file session persistence |
+| `openai_orchestrator.py` | OpenAI-compatible ReAct tool loop (local models via Open WebUI/OpenRouter) |
 | `model_registry.py` | Per-user model registry V2 — providers, hosts, models, role assignments |
 | `session_store.py` | In-memory + file session persistence (`session_data/{id}.json`) |
 | `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
 | `memory_distiller.py` | Short/mid/long distill jobs |
 | `scheduler.py` | APScheduler — distill jobs + user crons |
@@ -64,20 +66,23 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
 | `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
 | `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
 | `auth_middleware.py` | JWT cookie validation on all routes |
-| `user_settings.py` | Per-user local LLM config (hosts, models, active model) |
+| `tool_audit.py` | JSONL audit log for every orchestrator tool invocation |
 | `usage_tracker.py` | Per-user token usage tracking (daily buckets → `usage.json`) |
 | `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
 | `email_utils.py` | SMTP invite emails |
 | `persona_template.py` | Bootstrap a new persona directory from templates |
-| `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
+| `routers/` | One file per endpoint group — `chat`, `orchestrator`, `auth`, `files`, `ui`, `settings`, `local_llm`, `distill`, `audit`, `usage`, `push`, `help`, `onboarding`, `auth_google`, `nextcloud_talk`, `google_chat` |
-| `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
+| `tools/` | Orchestrator tool implementations — `web`, `tasks`, `scratch`, `reminders`, `cron`, `system`, `notify`, `ae_journals`, `ae_tasks`, `agent_notes` |
-| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` |
+| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md`, `local_llm.html`, `settings.html` |
-| `tests/` | pytest suite (80 tests) |
+| `tests/` | pytest suite |
 ---
 ## Key Design Decisions
-**Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
+**Two-brain pattern (Gemini orchestrator)** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
 **Single-model pattern (local orchestrator)** — When the `orchestrator` role resolves to a `local_openai` model, `openai_orchestrator.py` runs the full ReAct loop and produces the final response itself. No Claude handoff — the local model does both reasoning and response.
 **Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
@@ -88,3 +93,33 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
 **Per-user filesystem layout** — `home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
 **No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
 **Agent private notes** — `AGENT_NOTES.md` per persona, writable only by the orchestrator via `agent_notes_*` tools. Never loaded into user-facing context. Three rolling backups (`bak1`–`bak3`) are visible read-only in the Files panel. Declared in `tools/agent_notes.py`; usage guidance in `PROTOCOLS.md`.
 **No black boxes** — Every component, flow, and design decision is documented. Documentation is updated before implementation of significant changes and verified after. HELP.md is the user-facing contract; ARCH__*.md files are the developer contract; PROTOCOLS.md is the agent contract. If any of these drift from reality, that is a bug.
 ---
 ## Onboarding Flow
 New users are invited via a one-time token and complete a three-step setup before reaching the chat:
 ```
 1. /setup/{token}         → Set password (POST creates session cookie, consumes token)
 2. /setup/persona         → Create persona (slug, display name, emoji, description)
 3. /setup/model           → Connect a model — OpenRouter recommended
                            (skip link goes straight to /{user}/{persona})
 ```
 Step 3 is the planned addition (see `TODO__Agents.md § Guided onboarding`). Before it exists,
 users land in the chat with no model configured and must navigate Settings → Model Registry
 manually — which is confusing for non-technical users.
 **After Step 3:**
 - `save_host()` adds OpenRouter (`https://openrouter.ai/api/v1`, type `openai`)
 - `save_model()` creates a model entry for the chosen model
 - `set_role(chat, primary, model_id)` assigns it as the chat role primary
 - Redirect to `/{user}/{persona}`
 **Existing users with no model configured** — a dismissable banner is shown in the chat on
 load, linking to `/setup/model` (the Step 3 form works standalone, without step labels).
--- a/documentation/MASTER.md
+++ b/documentation/MASTER.md
@@ -1,7 +1,10 @@
 # Cortex / Inara — Master Index
 > Start here. This document is a map, not a manual.
-> Last updated: 2026-04-28
+> Last updated: 2026-05-06
 >
 > **Documentation philosophy:** Cortex is a no-black-box system. Docs must match reality.
 > Update docs before implementing significant changes. Verify they still match after.
 ---
@@ -17,20 +20,27 @@ Cortex is a self-hosted personal AI platform. It routes messages from any input
 | Component | Status | Notes |
 |---|---|---|
-| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, session auth |
+| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, PWA-installable |
 | Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing |
 | Google Chat Add-on | ✅ Live | JWT-verified, per-user routing |
 | Claude backend | ✅ Live | Primary — via Claude Code CLI |
 | Gemini backend | ✅ Live | Fallback — via Gemini CLI |
-| Local backend | ✅ Live | Third option — Open WebUI/Ollama on scott_gaming |
+| Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming; per-user multi-model config |
-| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ Tools toggle in UI (27 tools) |
+| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ toggle in UI (40 tools) |
-| Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini |
+| Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; used when orchestrator role → local model |
 | Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini, role assignments |
 | Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
 | Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas |
 | Session search | ✅ Live | Full-text search across past session logs |
-| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk |
+| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk / web push |
 | Tool audit log | ✅ Live | Every orchestrator tool call logged to `home/{user}/tool_audit/` |
 | Token usage tracking | ✅ Live | Per-user daily buckets in `home/{user}/usage.json`; visible in Settings |
 | Web push notifications | ✅ Live | VAPID push; `web_push` orchestrator tool; subscribe via ☰ menu |
 | Agent private notes | ✅ Live | `AGENT_NOTES.md` — orchestrator-only notepad; 3 rolling backups; user-visible as read-only |
 | Distill safety | ✅ Live | Per-persona asyncio lock, per-endpoint cooldowns, Rebuild option |
 | Guided onboarding | ✅ Live | Setup Step 3 for OpenRouter; existing-user banner; settings quick-link |
-**Active users / personas:** scott/inara, scott/developer, holly/tina, brian/wintermute
+**Active users / personas:** scott/inara, holly/tina, brian/wintermute
 ---
--- a/documentation/ROADMAP.md
+++ b/documentation/ROADMAP.md
@@ -54,7 +54,6 @@
 ## Phase 5 — Routing Intelligence & Scale
 - [ ] Intelligent model routing (by task type, privacy, context length)
 - [ ] Agent-to-agent task delegation across fleet
 - [ ] Permanent hosting on home server (currently on `scott_lpt`)
 ## Phase 6 — Infrastructure
 - [ ] Server DMZ finalized
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -7,16 +7,41 @@
 ## 🔴 High Priority
 ### [UX] User onboarding — guided model setup
 New users complete password + persona setup and land directly in the chat with no working
 AI model configured. This closes that gap with a guided Step 3 and a fallback for existing
 users who skipped it or were onboarded before this existed.
 Design spec: `documentation/ARCH__SYSTEM.md` § Onboarding Flow
 - [x] **Setup Step 3 page** — new `/setup/model` GET/POST in `onboarding.py` — 2026-05-06
  - Recommends OpenRouter: "one API key, access to Claude, Gemini, and dozens of other models"
  - API key field + curated model dropdown (claude-3-5-haiku, claude-3-7-sonnet, gemini-2.0-flash, llama-3.3-70b)
  - On submit: `save_host()` (OpenRouter) + `save_model()` + `set_role(chat, primary, model_id)` in `model_registry.py`
  - Skip: `POST /setup/model/skip` reads `cx_setup_persona` cookie, redirects to chat; JS fetch on skip-link click
  - Step labels updated: setup.html "1 of 3" / "2 of 3" / "3 of 3" (was "1 of 2" / "2 of 2")
  - Standalone: `/setup/model` works without step labels (no `cx_setup_persona` cookie → no label)
  - Persona creation now redirects to `/setup/model` instead of directly to chat
 - [x] **Existing user banner** — displayed in chat if no role has a model assigned — 2026-05-06
  - Checks `GET /backend` on load (uses `available_roles` — already does role-resolution)
  - Dismissable amber callout strip above chat: "No AI model configured — Set up OpenRouter →"
  - Dismissed via `localStorage` key `cx_no_model_banner_dismissed`; auto-removed when a model is added
 - [x] **Settings quick-link** — amber card in settings Model Registry section — 2026-05-06
  - Checks `GET /backend` on page load; shown if `available_roles` is empty
  - Links to `/setup/model`
 - [x] Update `cortex/static/HELP.md` — Getting Started section + model registry quick-connect note — 2026-05-06
 - [x] Update `CLAUDE.md` — documented `/setup/model` endpoint, setup flow description, docs philosophy — 2026-05-06
 ### [Local] Local orchestrator — reach full parity with Gemini orchestrator
 `openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
 When the `orchestrator` role resolves to a `local_openai` model it routes there
 automatically. Remaining work is quality/reliability parity, not ground-up design.
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
+- [x] Tool schema conversion — Gemini FunctionDeclaration → OpenAI tools format
-      (minor field rename, already partially done)
+- [x] Context budget: `_context_budget()` uses `context_k * 1000 * 0.75`, min 16k — 2026-05-06
- [ ] Context budget enforcement per iteration (40–50k for E4B, 35–40k for 26B A4B)
+- [x] Context compaction: `_compact_messages()` trims old tool results before each round and before the confirmation-gate call — 2026-05-06
- [ ] Context compaction — trim stale tool results mid-run when approaching limit
+- [x] Error handling: malformed tool args caught + logged; tool execution errors returned as strings
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
+- [ ] Retry logic on transient API errors (connection timeout, 429, 503)
 - [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
 - [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
 - Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
@@ -117,7 +142,7 @@ Multi-user setup with real Gemini/Claude API costs. Track per-user token consump
 so Scott can see who's spending what.
 - [x] Count input + output tokens — local backend (OpenAI `usage` field) + Gemini API (`usage_metadata`) — 2026-05-05
 - [x] Append to `home/{user}/usage.json` — daily buckets, per-model breakdown — 2026-05-05
- [ ] Expose via `/api/usage` endpoint; add a summary row to the Settings page
+- [x] Expose via `/api/usage` + `/api/usage/summary` + `/api/usage/all` (admin); usage table in Settings — 2026-05-06
 - [ ] Optional: soft spending limit with a warning toast when exceeded
 ### [Security] Tool call audit log — 2026-05-05
@@ -166,15 +191,6 @@ the foundation. What remains is removing the need to toggle manually.
  - Fast/cheap queries → local E4B (25 t/s, no API cost)
 - [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
 ### [Ops] Permanent fleet hosting — home server deployment
 Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
 home server for always-on reliability. `docker-compose.yml` already exists.
 - [ ] Copy project to home server
 - [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
 - [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
 - [ ] WireGuard required for all access — not internet-exposed
 - [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
 ### [Future] Cortex Mesh — multi-instance fleet coordination
 Each fleet device runs its own Cortex instance. Instances delegate tasks to each
 other based on resources and specialisation. No central coordinator required.