feat: audit log, usage tracking UI, OpenAI orchestrator compaction, onboarding + docs

Tool audit log:
- Every orchestrator tool call logged to home/{user}/tool_audit/YYYY-MM-DD.jsonl
- Files panel sidebar: audit log group (collapsed), date-linked read-only table
- Admin endpoints: /api/audit/files, /api/audit/day, /api/audit/recent, /api/audit/stats
- Engine and model name recorded per entry

OpenAI orchestrator improvements:
- Context budget enforcement: 75% of model context_k (min 16k)
- Message compaction: truncates old tool results when approaching budget
- max_rounds respected per model config (intersected with server cap)

OpenRouter onboarding (setup.html, onboarding.py, app.js, settings.html):
- Step 3 of 3: /setup/model with curated model picker
- Chat banner for users on server-default model (informational, not alarmist)
- Settings quick-link card; /setup/model works standalone for existing users

Model registry + session store:
- set_role_config / get_role_config for per-role tool lists and system_append
- session_store: session rename, session name backfill endpoint

UI updates (app.js, index.html, style.css, local_llm.html):
- Role toggle in context panel
- Off-the-record mode
- Agent notes read-only viewer
- OPERATIONS.md loaded at T2+ in context

Documentation:
- HELP.md: full tool table, per-role tool sets, Agent Notes, usage tracking
- TOOLS.md: Agent Notes section, count corrected to 44
- ARCH__SYSTEM.md, ARCH__BACKENDS.md, MASTER.md updated to match reality
- CLAUDE.md: onboarding flow, documentation philosophy sections
- README.md: stack in practice, DeepSeek TUI mention, architecture diagram updated
- TODO__Agents.md: onboarding task completed with deviation notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-05-08 21:26:43 -04:00
parent c02d2462b0
commit f8f7cd75da
25 changed files with 1088 additions and 151 deletions

View File

@@ -146,8 +146,8 @@ http://localhost:8000/docs
- Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables - Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables
### Context / Memory ### Context / Memory
- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (13) - `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (14)
- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full - Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = + last 2 sessions; Tier 4 = + last 7 sessions
- Memory files are written by the distiller or manually — do not delete them - Memory files are written by the distiller or manually — do not delete them
### Security / Safety ### Security / Safety
@@ -160,6 +160,31 @@ http://localhost:8000/docs
- Passwords are bcrypt-hashed and stored in `home/{username}/auth.json` — never in `.env` or the DB - Passwords are bcrypt-hashed and stored in `home/{username}/auth.json` — never in `.env` or the DB
- Invite tokens are one-time-use, 72-hour expiry, stored in `home/{username}/invite.json` - Invite tokens are one-time-use, 72-hour expiry, stored in `home/{username}/invite.json`
### Onboarding Flow
New users follow a three-step setup before reaching the chat:
1. `GET /setup/{token}` → password form → `POST /setup/{token}` sets password + session cookie
2. `GET /setup/persona` → persona creation form → `POST /setup/persona` bootstraps persona directory
3. `GET /setup/model` → OpenRouter quick-connect → `POST /setup/model` saves host + model + role assignment
Step 3 is optional (skip link goes straight to `/{user}/{persona}`). `/setup/model` also works
standalone (accessible from Settings) for existing users who haven't configured a model.
All in `cortex/routers/onboarding.py`. Model writes use `model_registry.py`: `save_host()`,
`save_model()`, `set_role(username, "chat", "primary", model_id)`.
### Documentation Philosophy
Cortex is a no-black-box system. Docs must match reality — at all times.
- **Docs first:** When planning significant changes, update `TODO__Agents.md` and the relevant
`ARCH__*.md` to describe the intended design *before* implementing. This creates a spec to
implement against.
- **Verify after:** Once implementation is complete, re-read the pre-written docs and confirm
they match what was actually built. Update anything that drifted.
- **HELP.md is a user contract:** It describes what users can do. Never let it describe
features that don't exist or omit features that do.
- **CLAUDE.md + ARCH__*.md are the developer contract:** Update them as the architecture evolves.
- **Stale docs are bugs.** If you notice drift, fix it before moving on.
--- ---
## Adding a New Tool ## Adding a New Tool
@@ -212,19 +237,23 @@ clearly asked for a directory to be unblocked.
--- ---
## Current State (2026-04-28) ## Current State (2026-05-06)
Cortex is running and stable. All channels are live: Cortex is running and stable. All channels are live:
| Channel | Status | Notes | | Channel | Status | Notes |
|---|---|---| |---|---|---|
| Web UI | ✅ Live | `https://cortex.dgrzone.com` | | Web UI | ✅ Live | `https://cortex.dgrzone.com` — PWA-installable |
| Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply | | Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply |
| Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format | | Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format |
| Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config | | Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming, per-user multi-model config |
| Orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI | | Gemini orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI |
| Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; fires when orchestrator role → local model |
| Tool audit log | ✅ Live | Every tool call logged to `home/{user}/tool_audit/YYYY-MM-DD.jsonl` |
| Token usage tracking | ✅ Live | Per-user `home/{user}/usage.json`; summary in Settings |
| Web push | ✅ Live | VAPID push notifications; `web_push` tool; subscribe via ☰ menu |
Active users: scott (inara, developer), holly (tina), brian (wintermute) Active users: scott (inara), holly (tina), brian (wintermute)
**40 orchestrator tools:** web_search, http_fetch, **40 orchestrator tools:** web_search, http_fetch,
file_read/list/write, shell_exec, claude_allow_dir, file_read/list/write, shell_exec, claude_allow_dir,

View File

@@ -10,6 +10,43 @@ Cortex is a self-hosted multi-agent AI platform. It supports multiple users, eac
--- ---
## Where Cortex Fits
AI tools aren't one-size-fits-all. Cortex exists in a specific niche — it's not trying to be everything.
**Cortex is a self-hosted persona platform.** It gives you a persistent AI companion with its own
identity, memory, and voice — reachable through your chat apps, not just a browser tab. It remembers
who you are across days and weeks. It can proactively message you on a schedule. It runs on your
own hardware, behind your own auth.
### What Cortex is good at
- **Being a consistent AI presence** — same persona, same memory, day after day
- **Multi-channel access** — web, Nextcloud Talk, Google Chat, all routed to the same brain
- **Proactive work** — scheduled messages, reminders, cron jobs that reach out to you
- **Multi-user households** — each person gets their own persona (Scott → Inara, Holly → Tina)
- **Private, offline-capable** — local models via Ollama when you don't want anything leaving the LAN
### What Cortex is not
- **Not a coding assistant.** Cortex lives in chat apps, not in your terminal or IDE.
Use Claude Code, DeepSeek TUI, Gemini CLI, or Copilot for code-level work — they specialize in reading and
editing project files. Cortex can't open a codebase.
- **Not a generic LLM chat UI.** Open WebUI and LibreChat are excellent model-switching frontends.
Cortex isn't a frontend — it's a platform with its own identity system, orchestrator, and memory
pipeline. Two different jobs.
- **Not a SaaS product.** Nobody else hosts your Cortex instance. Nobody else sees your conversations.
The trade-off is you manage the service yourself — `systemctl --user restart cortex`.
- **Not an agent framework.** LangChain, CrewAI, and similar are libraries for building AI pipelines.
Cortex is a running service with concrete personas, not an abstraction layer to build on top of.
### The stack in practice
- Use **Cortex** to talk to Inara — daily assistant, memory keeper, scheduled check-ins
- Use **Claude Code / DeepSeek TUI** to work *on* Cortex — code edits, architecture, debugging
- Use **Open WebUI** when you want to test a new model or run a quick prompt without persona context
Same AI, different interfaces for different jobs.
---
## Quick Orientation ## Quick Orientation
| Directory | What it is | | Directory | What it is |

View File

@@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag
from config import settings from config import settings
from auth_middleware import SessionAuthMiddleware from auth_middleware import SessionAuthMiddleware
from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator
from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit, usage
@asynccontextmanager @asynccontextmanager
@@ -36,6 +36,7 @@ app.include_router(auth.router)
app.include_router(orchestrator.router) app.include_router(orchestrator.router)
app.include_router(push.router) app.include_router(push.router)
app.include_router(audit.router) app.include_router(audit.router)
app.include_router(usage.router)
# Static files — must be mounted BEFORE ui.router so /static/* is matched first. # Static files — must be mounted BEFORE ui.router so /static/* is matched first.
# ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc. # ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc.

View File

@@ -36,6 +36,7 @@ V2 Schema:
"credential_id":str | null, # claude_cli only — references providers.anthropic.credentials "credential_id":str | null, # claude_cli only — references providers.anthropic.credentials
"account_id": str | null, # gemini_api only — references providers.google.accounts "account_id": str | null, # gemini_api only — references providers.google.accounts
"context_k": int, # context window in k tokens (informational) "context_k": int, # context window in k tokens (informational)
"max_rounds": int | null, # per-model tool-loop cap; null = use orchestrator_max_rounds global
"tags": [str], # user-defined capability tags "tags": [str], # user-defined capability tags
}, },
], ],
@@ -642,7 +643,9 @@ def remove_host(username: str, host_id: str) -> bool:
def save_model(username: str, model_id: str | None, host_id: str, def save_model(username: str, model_id: str | None, host_id: str,
label: str, model_name: str, context_k: int = 0, label: str, model_name: str, context_k: int = 0,
tags: list[str] | None = None) -> str: tags: list[str] | None = None,
max_rounds: int | None = None,
tools: bool = True) -> str:
"""Create or update a local_openai model entry. Returns the model ID.""" """Create or update a local_openai model entry. Returns the model ID."""
data = _load(username) data = _load(username)
tags = tags or [] tags = tags or []
@@ -654,6 +657,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
m["label"] = label.strip() or model_name.strip() m["label"] = label.strip() or model_name.strip()
m["model_name"] = model_name.strip() m["model_name"] = model_name.strip()
m["context_k"] = context_k m["context_k"] = context_k
m["max_rounds"] = max_rounds
m["tools"] = tools
m["tags"] = tags m["tags"] = tags
_save(username, data) _save(username, data)
return model_id return model_id
@@ -668,6 +673,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
"provider": "local", "provider": "local",
"host_id": host_id, "host_id": host_id,
"context_k": context_k, "context_k": context_k,
"max_rounds": max_rounds,
"tools": tools,
"tags": tags, "tags": tags,
}) })
_save(username, data) _save(username, data)
@@ -679,7 +686,9 @@ def save_cloud_model(username: str, model_id: str | None,
account_id: str | None = None, account_id: str | None = None,
credential_id: str | None = None, credential_id: str | None = None,
context_k: int = 0, context_k: int = 0,
tags: list[str] | None = None) -> str: tags: list[str] | None = None,
max_rounds: int | None = None,
tools: bool = True) -> str:
""" """
Create or update an Anthropic or Google model entry. Returns the model ID. Create or update an Anthropic or Google model entry. Returns the model ID.
@@ -698,6 +707,8 @@ def save_cloud_model(username: str, model_id: str | None,
"model_name": model_name.strip(), "model_name": model_name.strip(),
"provider": provider, "provider": provider,
"context_k": context_k, "context_k": context_k,
"max_rounds": max_rounds,
"tools": tools,
"tags": tags, "tags": tags,
} }
if account_id: if account_id:

View File

@@ -273,18 +273,20 @@ async def _run_from_messages(
final_response = "" final_response = ""
budget = _context_budget(model_cfg) budget = _context_budget(model_cfg)
for round_num in range(starting_round, settings.orchestrator_max_rounds): per_model_limit = (model_cfg or {}).get("max_rounds") or settings.orchestrator_max_rounds
effective_limit = min(per_model_limit, settings.orchestrator_max_rounds)
for round_num in range(starting_round, effective_limit):
messages = _compact_messages(messages, budget) messages = _compact_messages(messages, budget)
est = _estimate_tokens(messages) est = _estimate_tokens(messages)
logger.info("OpenAI orchestrator round %d / %d model=%s ~%d tokens", logger.info("OpenAI orchestrator round %d / %d model=%s ~%d tokens",
round_num + 1, settings.orchestrator_max_rounds, model_name, est) round_num + 1, effective_limit, model_name, est)
response = await client.chat.completions.create( call_kwargs: dict = {"model": model_name, "messages": messages}
model=model_name, if active_tools:
messages=messages, call_kwargs["tools"] = active_tools
tools=active_tools, call_kwargs["tool_choice"] = "auto"
tool_choice="auto", response = await client.chat.completions.create(**call_kwargs)
)
choice = response.choices[0] choice = response.choices[0]
msg = choice.message msg = choice.message
@@ -339,12 +341,11 @@ async def _run_from_messages(
tool_call_log.append({"tool": pt["name"], "args": pt["args"], "result": "[awaiting confirmation]"}) tool_call_log.append({"tool": pt["name"], "args": pt["args"], "result": "[awaiting confirmation]"})
messages.append({"role": "tool", "tool_call_id": pt["tool_call_id"], "content": placeholder}) messages.append({"role": "tool", "tool_call_id": pt["tool_call_id"], "content": placeholder})
conf_resp = await client.chat.completions.create( messages = _compact_messages(messages, budget)
model=model_name, conf_call: dict = {"model": model_name, "messages": messages, "tool_choice": "none"}
messages=messages, if active_tools:
tools=active_tools, conf_call["tools"] = active_tools
tool_choice="none", conf_resp = await client.chat.completions.create(**conf_call)
)
final_response = conf_resp.choices[0].message.content or ( final_response = conf_resp.choices[0].message.content or (
"This action requires your explicit confirmation before it can proceed." "This action requires your explicit confirmation before it can proceed."
) )
@@ -375,9 +376,9 @@ async def _run_from_messages(
break break
else: else:
logger.warning("OpenAI orchestrator hit max rounds (%d)", settings.orchestrator_max_rounds) logger.warning("OpenAI orchestrator hit max rounds (%d)", effective_limit)
final_response = ( final_response = (
f"Reached the tool iteration limit ({settings.orchestrator_max_rounds} rounds). " f"Reached the tool iteration limit ({effective_limit} rounds). "
"Here is what was gathered:\n\n" "Here is what was gathered:\n\n"
+ "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log) + "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log)
) )
@@ -405,7 +406,10 @@ def _build_client(
if host_type == "openwebui": if host_type == "openwebui":
base_url = base_url + "/api" base_url = base_url + "/api"
client = AsyncOpenAI(base_url=base_url, api_key=api_key) client = AsyncOpenAI(base_url=base_url, api_key=api_key)
active_tools = get_openai_tools_for_role(user_role, tool_list) if model_cfg.get("tools") is False:
active_tools = []
else:
active_tools = get_openai_tools_for_role(user_role, tool_list)
return client, model_name, active_tools return client, model_name, active_tools

View File

@@ -295,6 +295,53 @@ async def rename_session_endpoint(
return {"ok": True, "session_id": session_id, "name": req.name.strip()} return {"ok": True, "session_id": session_id, "name": req.name.strip()}
@router.post("/api/sessions/backfill-names")
async def backfill_session_names(
request: Request,
user: str = Query(""),
persona: str = Query(""),
) -> dict:
"""Name every unnamed session using its first user message (truncated to 60 chars).
Idempotent — only touches sessions that have no name set.
user/persona default to the JWT session user + last-used persona cookie."""
# Resolve user from JWT if not provided
if not user:
token = request.cookies.get(COOKIE_NAME)
if not token:
raise HTTPException(status_code=401, detail="Not authenticated")
try:
user = decode_token(token)
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid session")
# Resolve persona from cookie if not provided
if not persona:
from persona import list_user_personas
persona_cookie = request.cookies.get("cx_last_persona", "")
available = list_user_personas(user)
persona = persona_cookie if persona_cookie in available else (available[0] if available else "")
if not persona:
raise HTTPException(status_code=400, detail="No persona found for user")
_set_ctx(user, persona)
sessions = list_all()
named = 0
for s in sessions:
if s.get("name"):
continue
messages = load_session(s["session_id"])
first_user = next((m for m in messages if m.get("role") == "user"), None)
if not first_user:
continue
text = (first_user.get("content") or "").strip()
if not text:
continue
auto_name = text[:60].rstrip() + ("" if len(text) > 60 else "")
rename_session(s["session_id"], auto_name)
named += 1
return {"ok": True, "named": named, "total": len(sessions)}
@router.delete("/sessions/{session_id}") @router.delete("/sessions/{session_id}")
async def delete_session_endpoint( async def delete_session_endpoint(
session_id: str, session_id: str,

View File

@@ -1,25 +1,50 @@
""" """
Manual memory distillation endpoints. Manual memory distillation endpoints.
POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM) POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM)
POST /distill/mid — summarize short → MEMORY_MID.md (LLM) POST /distill/mid — summarize short → MEMORY_MID.md (LLM)
POST /distill/long — integrate mid → MEMORY_LONG.md (LLM) POST /distill/long — integrate mid → MEMORY_LONG.md (LLM)
POST /distill/all — run all three in sequence POST /distill/all — run all three in sequence
POST /distill/rebuild — wipe mid + long, then run all three from scratch
All endpoints require ?user=<username>&persona=<name> query params so distillation All endpoints require ?user=<username>&persona=<name> query params.
targets the correct persona. Without them, the request is rejected (no silent fallback
to server defaults — that caused wrong-user distillation in a multi-user setup). Concurrency: one distillation at a time per persona. A second request while one
is running returns 409 immediately — no silent queuing.
""" """
import asyncio
from datetime import datetime, timedelta
from fastapi import APIRouter, HTTPException, Query from fastapi import APIRouter, HTTPException, Query
from memory_distiller import distill_short, distill_mid, distill_long from memory_distiller import distill_short, distill_mid, distill_long
from persona import validate as validate_persona, set_context from persona import validate as validate_persona, set_context, persona_path as _persona_path
import scheduler import scheduler
router = APIRouter(prefix="/distill") router = APIRouter(prefix="/distill")
# Per-persona asyncio lock. Key: (user, persona)
_LOCKS: dict[tuple, asyncio.Lock] = {}
_LOCKS_META: dict[tuple, str] = {} # key → which step is currently running
# Minimum time between successive runs of each endpoint, per persona.
# Prevents accidental rapid-fire runs and token waste.
_COOLDOWNS: dict[tuple, timedelta] = {
"short": timedelta(minutes=1),
"mid": timedelta(minutes=30),
"long": timedelta(hours=6),
"all": timedelta(hours=1),
"rebuild": timedelta(hours=6),
}
_LAST_RUN: dict[tuple, datetime] = {} # key: (user, persona, endpoint)
def _get_lock(user: str, persona: str) -> asyncio.Lock:
key = (user, persona)
if key not in _LOCKS:
_LOCKS[key] = asyncio.Lock()
return _LOCKS[key]
def _resolve(user: str, persona: str) -> tuple[str, str]: def _resolve(user: str, persona: str) -> tuple[str, str]:
"""Validate and set persona context. Raises 404 if the persona doesn't exist."""
try: try:
u, p = validate_persona(user, persona) u, p = validate_persona(user, persona)
except Exception: except Exception:
@@ -28,13 +53,51 @@ def _resolve(user: str, persona: str) -> tuple[str, str]:
return u, p return u, p
def _check_lock(user: str, persona: str) -> asyncio.Lock:
"""Return the lock if free, raise 409 if already held."""
lock = _get_lock(user, persona)
if lock.locked():
step = _LOCKS_META.get((user, persona), "distillation")
raise HTTPException(
status_code=409,
detail=f"A {step} is already running for {persona} — please wait for it to finish.",
)
return lock
def _check_cooldown(user: str, persona: str, endpoint: str) -> None:
"""Raise 429 if the endpoint was run too recently for this persona."""
cooldown = _COOLDOWNS.get(endpoint)
if not cooldown:
return
key = (user, persona, endpoint)
last = _LAST_RUN.get(key)
if last:
elapsed = datetime.now() - last
if elapsed < cooldown:
remaining = cooldown - elapsed
mins = int(remaining.total_seconds() // 60)
secs = int(remaining.total_seconds() % 60)
wait = f"{mins}m {secs}s" if mins else f"{secs}s"
raise HTTPException(
status_code=429,
detail=f"{endpoint} was just run — please wait {wait} before running again.",
)
def _record_run(user: str, persona: str, endpoint: str) -> None:
_LAST_RUN[(user, persona, endpoint)] = datetime.now()
@router.get("/status") @router.get("/status")
async def distill_status() -> dict: async def distill_status() -> dict:
"""Show auto-distillation schedule and next run times."""
from config import settings from config import settings
# Include which personas are currently distilling
active = [f"{u}/{p}" for (u, p), lock in _LOCKS.items() if lock.locked()]
return { return {
"enabled": settings.auto_distill, "enabled": settings.auto_distill,
"jobs": scheduler.status(), "jobs": scheduler.status(),
"active": active,
"config": { "config": {
"short": settings.auto_distill_short, "short": settings.auto_distill_short,
"mid": settings.auto_distill_mid, "mid": settings.auto_distill_mid,
@@ -49,7 +112,16 @@ async def do_distill_short(
persona: str = Query(...), persona: str = Query(...),
) -> dict: ) -> dict:
u, p = _resolve(user, persona) u, p = _resolve(user, persona)
return {"ok": True, **distill_short(u, p)} _check_cooldown(u, p, "short")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "short distill"
try:
result = distill_short(u, p)
_record_run(u, p, "short")
return {"ok": True, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/mid") @router.post("/mid")
@@ -58,8 +130,17 @@ async def do_distill_mid(
persona: str = Query(...), persona: str = Query(...),
) -> dict: ) -> dict:
u, p = _resolve(user, persona) u, p = _resolve(user, persona)
result = await distill_mid(u, p) _check_cooldown(u, p, "mid")
return {"ok": "error" not in result, **result} lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "mid distill"
try:
result = await distill_mid(u, p)
if "error" not in result:
_record_run(u, p, "mid")
return {"ok": "error" not in result, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/long") @router.post("/long")
@@ -68,8 +149,17 @@ async def do_distill_long(
persona: str = Query(...), persona: str = Query(...),
) -> dict: ) -> dict:
u, p = _resolve(user, persona) u, p = _resolve(user, persona)
result = await distill_long(u, p) _check_cooldown(u, p, "long")
return {"ok": "error" not in result, **result} lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "long distill"
try:
result = await distill_long(u, p)
if "error" not in result:
_record_run(u, p, "long")
return {"ok": "error" not in result, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/all") @router.post("/all")
@@ -78,14 +168,71 @@ async def do_distill_all(
persona: str = Query(...), persona: str = Query(...),
) -> dict: ) -> dict:
u, p = _resolve(user, persona) u, p = _resolve(user, persona)
short_result = distill_short(u, p) _check_cooldown(u, p, "all")
mid_result = await distill_mid(u, p) lock = _check_lock(u, p)
if "error" in mid_result: async with lock:
return {"ok": False, "short": short_result, "mid": mid_result} _LOCKS_META[(u, p)] = "full distill"
long_result = await distill_long(u, p) try:
return { short_result = distill_short(u, p)
"ok": "error" not in long_result, mid_result = await distill_mid(u, p)
"short": short_result, if "error" in mid_result:
"mid": mid_result, return {"ok": False, "short": short_result, "mid": mid_result}
"long": long_result, long_result = await distill_long(u, p)
} ok = "error" not in long_result
if ok:
_record_run(u, p, "all")
return {
"ok": ok,
"short": short_result,
"mid": mid_result,
"long": long_result,
}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/rebuild")
async def do_distill_rebuild(
user: str = Query(...),
persona: str = Query(...),
) -> dict: # noqa: E501
"""Wipe MEMORY_MID and MEMORY_LONG (with backups), then run short → mid → long.
Use when memories have drifted, been corrupted, or you want a clean slate
rebuilt purely from session logs. Hand-edited content will be replaced.
"""
u, p = _resolve(user, persona)
_check_cooldown(u, p, "rebuild")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "memory rebuild"
try:
from memory_distiller import _rotate_backup, _read
inara_dir = _persona_path(u, p)
# Back up then wipe mid and long before rebuilding
for name in ("MEMORY_MID.md", "MEMORY_LONG.md"):
path = inara_dir / name
if path.exists():
_rotate_backup(path)
path.write_text(
f"# {name}\n\n*Cleared for rebuild — {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}.*\n"
)
short_result = distill_short(u, p)
mid_result = await distill_mid(u, p)
if "error" in mid_result:
return {"ok": False, "short": short_result, "mid": mid_result, "rebuilt": True}
long_result = await distill_long(u, p)
ok = "error" not in long_result
if ok:
_record_run(u, p, "rebuild")
return {
"ok": ok,
"short": short_result,
"mid": mid_result,
"long": long_result,
"rebuilt": True,
}
finally:
_LOCKS_META.pop((u, p), None)

View File

@@ -27,10 +27,21 @@ ALLOWED = {
"MEMORY_SHORT.bak1.md", "MEMORY_SHORT.bak1.md",
"MEMORY_SHORT.bak2.md", "MEMORY_SHORT.bak2.md",
"HELP.md", "HELP.md",
# Agent private notes — backups only; AGENT_NOTES.md itself is agent-only
"AGENT_NOTES.bak1.md",
"AGENT_NOTES.bak2.md",
"AGENT_NOTES.bak3.md",
}
# Files that can be read via the panel but not written by users
READ_ONLY = {
"AGENT_NOTES.bak1.md",
"AGENT_NOTES.bak2.md",
"AGENT_NOTES.bak3.md",
} }
# Files served from home/{user}/ instead of persona path # Files served from home/{user}/ instead of persona path
USER_FILES = {"email_allowlist.json"} USER_FILES = {"email_allowlist.json", "usage.json"}
def _resolve(user: str, persona: str) -> None: def _resolve(user: str, persona: str) -> None:
@@ -92,7 +103,11 @@ async def get_file(
p = _path(filename, user=user) p = _path(filename, user=user)
if not p.exists(): if not p.exists():
raise HTTPException(status_code=404, detail=f"{filename} does not exist") raise HTTPException(status_code=404, detail=f"{filename} does not exist")
return {"name": filename, "content": p.read_text()} return {
"name": filename,
"content": p.read_text(),
"readonly": filename in READ_ONLY,
}
class FileWrite(BaseModel): class FileWrite(BaseModel):
@@ -106,6 +121,8 @@ async def save_file(
user: str = Query("scott"), user: str = Query("scott"),
persona: str = Query("inara"), persona: str = Query("inara"),
) -> dict: ) -> dict:
if filename in READ_ONLY:
raise HTTPException(status_code=403, detail=f"{filename} is read-only.")
_resolve(user, persona) _resolve(user, persona)
p = _path(filename, user=user) p = _path(filename, user=user)
p.write_text(req.content) p.write_text(req.content)

View File

@@ -159,7 +159,8 @@ def _render(username: str, success: str = "", error: str = "") -> str:
else: else:
secondary = default_secondary secondary = default_secondary
ctx = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else "" ctx = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else ""
no_tools = '' if m.get("tools", True) else '<span class="pbadge pb-notools">no tools</span>'
tags_html = " ".join(f'<span class="tag">{t}</span>' for t in (m.get("tags") or [])) tags_html = " ".join(f'<span class="tag">{t}</span>' for t in (m.get("tags") or []))
sec = f'<span class="model-host">{secondary}</span>' if secondary else "" sec = f'<span class="model-host">{secondary}</span>' if secondary else ""
@@ -201,13 +202,15 @@ def _render(username: str, success: str = "", error: str = "") -> str:
cur_label = m.get("label", "") cur_label = m.get("label", "")
cur_model_name = m.get("model_name", "") cur_model_name = m.get("model_name", "")
cur_ctx = m.get("context_k", 0) or 0 cur_ctx = m.get("context_k", 0) or 0
cur_max_rounds = m.get("max_rounds") or 0
cur_tools = m.get("tools", True)
cur_tags = ", ".join(m.get("tags") or []) cur_tags = ", ".join(m.get("tags") or [])
model_rows += f''' model_rows += f'''
<div class="model-row" id="model-{m["id"]}"> <div class="model-row" id="model-{m["id"]}">
<div class="model-row-header"> <div class="model-row-header">
<div class="model-info"> <div class="model-info">
<div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}</div> <div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}{no_tools}</div>
<span class="model-name">{m.get("model_name","")}</span> <span class="model-name">{m.get("model_name","")}</span>
{sec} {sec}
<div class="tag-row">{tags_html}</div> <div class="tag-row">{tags_html}</div>
@@ -239,8 +242,22 @@ def _render(username: str, success: str = "", error: str = "") -> str:
{extra_fields} {extra_fields}
<div class="field-row"> <div class="field-row">
<div class="field" style="flex:0 0 auto"> <div class="field" style="flex:0 0 auto">
<label>Context (k)</label> <label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k)</label>
<input type="number" name="context_k" value="{cur_ctx}" min="0"> <input type="number" name="context_k" value="{cur_ctx}" min="0"
title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
<input type="number" name="max_rounds" value="{cur_max_rounds}" min="0"
title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
<select name="tools"
title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
<option value="1" {'selected' if cur_tools else ''}>Supported</option>
<option value="0" {'' if cur_tools else 'selected'}>Not supported</option>
</select>
</div> </div>
<div class="field"> <div class="field">
<label>Tags</label> <label>Tags</label>
@@ -426,6 +443,8 @@ async def add_model(
provider: str = Form("local"), provider: str = Form("local"),
label: str = Form(""), label: str = Form(""),
context_k: int = Form(0), context_k: int = Form(0),
max_rounds: int = Form(0),
tools: int = Form(1),
tags: str = Form(""), tags: str = Form(""),
# local-only fields # local-only fields
host_id: str = Form(""), host_id: str = Form(""),
@@ -439,14 +458,17 @@ async def add_model(
if not username: if not username:
return RedirectResponse("/login", status_code=302) return RedirectResponse("/login", status_code=302)
tag_list = [t.strip() for t in tags.split(",") if t.strip()] tag_list = [t.strip() for t in tags.split(",") if t.strip()]
max_rounds_ = max_rounds or None
tools_bool = tools != 0
if provider == "local": if provider == "local":
if not model_name.strip(): if not model_name.strip():
return HTMLResponse(_render(username, error="Model name is required.")) return HTMLResponse(_render(username, error="Model name is required."))
if not host_id.strip(): if not host_id.strip():
return HTMLResponse(_render(username, error="Select a host.")) return HTMLResponse(_render(username, error="Select a host."))
reg.save_model(username, None, host_id, label, model_name, context_k, tag_list) reg.save_model(username, None, host_id, label, model_name, context_k, tag_list,
max_rounds=max_rounds_, tools=tools_bool)
display = label or model_name display = label or model_name
elif provider in ("google", "anthropic"): elif provider in ("google", "anthropic"):
@@ -459,6 +481,7 @@ async def add_model(
account_id=account_id or None, account_id=account_id or None,
credential_id=credential_id or None, credential_id=credential_id or None,
context_k=context_k, tags=tag_list, context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool,
) )
display = label or cloud_model_name display = label or cloud_model_name
else: else:
@@ -476,6 +499,8 @@ async def edit_model(
label: str = Form(""), label: str = Form(""),
model_name: str = Form(""), model_name: str = Form(""),
context_k: int = Form(0), context_k: int = Form(0),
max_rounds: int = Form(0),
tools: int = Form(1),
tags: str = Form(""), tags: str = Form(""),
host_id: str = Form(""), host_id: str = Form(""),
account_id: str = Form(""), account_id: str = Form(""),
@@ -486,17 +511,22 @@ async def edit_model(
return RedirectResponse("/login", status_code=302) return RedirectResponse("/login", status_code=302)
if not model_name.strip(): if not model_name.strip():
return HTMLResponse(_render(username, error="Model name is required.")) return HTMLResponse(_render(username, error="Model name is required."))
tag_list = [t.strip() for t in tags.split(",") if t.strip()] tag_list = [t.strip() for t in tags.split(",") if t.strip()]
max_rounds_ = max_rounds or None
tools_bool = tools != 0
if mtype == "local_openai": if mtype == "local_openai":
if not host_id.strip(): if not host_id.strip():
return HTMLResponse(_render(username, error="Select a host for this model.")) return HTMLResponse(_render(username, error="Select a host for this model."))
reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list) reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list,
max_rounds=max_rounds_, tools=tools_bool)
elif mtype == "gemini_api": elif mtype == "gemini_api":
reg.save_cloud_model(username, model_id, "google", model_name, label, reg.save_cloud_model(username, model_id, "google", model_name, label,
account_id=account_id or None, context_k=context_k, tags=tag_list) account_id=account_id or None, context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool)
elif mtype == "claude_cli": elif mtype == "claude_cli":
reg.save_cloud_model(username, model_id, "anthropic", model_name, label, reg.save_cloud_model(username, model_id, "anthropic", model_name, label,
credential_id=credential_id or "cli", context_k=context_k, tags=tag_list) credential_id=credential_id or "cli", context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool)
else: else:
return HTMLResponse(_render(username, error=f"Unknown model type: {mtype}")) return HTMLResponse(_render(username, error=f"Unknown model type: {mtype}"))
display = label.strip() or model_name.strip() display = label.strip() or model_name.strip()

View File

@@ -1,11 +1,13 @@
""" """
Onboarding router — invite-based setup + persona creation. Onboarding router — invite-based setup + persona creation + model connect.
Routes: Routes:
GET /setup/{token} → show password setup form (step 1) GET /setup/{token} → show password setup form (step 1)
POST /setup/{token} → set password, redirect to persona step POST /setup/{token} → set password, redirect to persona step
GET /setup/persona → show persona creation form (step 2, requires auth) GET /setup/persona → show persona creation form (step 2, requires auth)
POST /setup/persona → create persona, redirect to /{user}/{persona} POST /setup/persona → create persona, redirect to /setup/model
GET /setup/model → OpenRouter quick-connect (step 3, also standalone)
POST /setup/model → save host + model + assign to chat role, redirect to chat
""" """
import logging import logging
@@ -21,6 +23,7 @@ from auth_utils import (
) )
from persona_template import create_persona from persona_template import create_persona
from persona import list_user_personas, validate as validate_persona from persona import list_user_personas, validate as validate_persona
import model_registry
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter(prefix="/setup") router = APIRouter(prefix="/setup")
@@ -114,7 +117,11 @@ async def persona_submit(
description=description.strip(), description=description.strip(),
) )
logger.info("persona created: %s/%s", username, persona_name) logger.info("persona created: %s/%s", username, persona_name)
return RedirectResponse(f"/{username}/{persona_name}", status_code=302) # Step 3: guided model setup before entering the chat
resp = RedirectResponse("/setup/model", status_code=302)
# Remember which persona to land on after model setup
resp.set_cookie("cx_setup_persona", f"{username}/{persona_name}", max_age=3600, httponly=True, samesite="lax")
return resp
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -178,3 +185,126 @@ async def setup_submit(
return resp return resp
return HTMLResponse(_setup_page("Unknown step."), status_code=400) return HTMLResponse(_setup_page("Unknown step."), status_code=400)
# ---------------------------------------------------------------------------
# Step 3 — model connect (OpenRouter quick-connect, also standalone)
# ---------------------------------------------------------------------------
# Curated model list shown in the Step 3 dropdown.
_OPENROUTER_MODELS = [
("anthropic/claude-3-5-haiku-20241022", "Claude 3.5 Haiku — Fast & affordable"),
("anthropic/claude-3-7-sonnet-20250219", "Claude 3.7 Sonnet — Smarter Claude"),
("google/gemini-2.0-flash-001", "Gemini 2.0 Flash — Fast Google model"),
("meta-llama/llama-3.3-70b-instruct", "Llama 3.3 70B — Open source"),
]
def _model_page(error: str = "", from_setup: bool = False) -> str:
html = (_STATIC / "setup.html").read_text()
# Hide steps 1 and 2 inline; show step 3
html = html.replace('<div id="step-password">', '<div id="step-password" style="display:none">')
html = html.replace('<div id="step-persona" style="display:none">', '<div id="step-persona" style="display:none">')
html = html.replace('<div id="step-model" style="display:none">', '<div id="step-model">')
if from_setup:
html = html.replace("<!-- SETUP_STEP3_LABEL -->", "Step 3 of 3")
if error:
html = html.replace("<!-- ERROR_MODEL -->", f'<p class="error">{error}</p>')
return html
@router.post("/model/skip", include_in_schema=False)
async def model_skip(request: Request):
"""Skip model setup — redirect to the remembered persona or user root."""
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
username = None
if token:
try:
username = decode_token(token)
except jwt.InvalidTokenError:
pass
dest_cookie = request.cookies.get("cx_setup_persona", "")
dest = f"/{dest_cookie}" if dest_cookie else (f"/{username}" if username else "/")
resp = RedirectResponse(dest, status_code=302)
resp.delete_cookie("cx_setup_persona")
return resp
@router.get("/model", include_in_schema=False)
async def model_page(request: Request):
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
if not token:
return RedirectResponse("/login", status_code=302)
try:
decode_token(token)
except jwt.InvalidTokenError:
return RedirectResponse("/login", status_code=302)
from_setup = bool(request.cookies.get("cx_setup_persona"))
return HTMLResponse(_model_page(from_setup=from_setup))
@router.post("/model", include_in_schema=False)
async def model_submit(
request: Request,
api_key: str = Form(...),
model_name: str = Form(...),
):
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
if not token:
return RedirectResponse("/login", status_code=302)
try:
username = decode_token(token)
except jwt.InvalidTokenError:
return RedirectResponse("/login", status_code=302)
api_key = api_key.strip()
model_name = model_name.strip()
if not api_key:
from_setup = bool(request.cookies.get("cx_setup_persona"))
return HTMLResponse(_model_page("API key is required.", from_setup=from_setup), status_code=422)
# Save OpenRouter as a host
host_id = model_registry.save_host(
username=username,
host_id=None,
label="OpenRouter",
api_url="https://openrouter.ai/api/v1",
api_key=api_key,
host_type="openai",
)
# Find label for selected model
label = next((lbl for mn, lbl in _OPENROUTER_MODELS if mn == model_name), model_name)
label = label.split("")[0] # keep just the model name part
# Save model entry
mid = model_registry.save_model(
username=username,
model_id=None,
host_id=host_id,
label=label,
model_name=model_name,
context_k=128,
tools=True,
)
# Assign as chat role primary
model_registry.set_role(username, "chat", "primary", mid)
logger.info("openrouter setup complete: %s%s", username, model_name)
# Redirect to chat (use remembered persona, or user root)
dest_cookie = request.cookies.get("cx_setup_persona", "")
dest = f"/{dest_cookie}" if dest_cookie else f"/{username}"
resp = RedirectResponse(dest, status_code=302)
resp.delete_cookie("cx_setup_persona")
return resp

View File

@@ -112,16 +112,17 @@ def list_all() -> list[dict]:
if not d.exists(): if not d.exists():
return [] return []
results = [] results = []
for f in sorted(d.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True): for f in d.glob("*.json"):
try: try:
data = json.loads(f.read_text()) data = json.loads(f.read_text())
entry = { results.append({
"session_id": data["session_id"], "session_id": data["session_id"],
"name": data.get("name", ""), "name": data.get("name", ""),
"updated": data.get("updated"), "updated": data.get("updated"),
"message_count": len(data.get("messages", [])), "message_count": len(data.get("messages", [])),
} "_sort_key": data.get("updated") or f.stat().st_mtime,
results.append(entry) })
except Exception: except Exception:
pass pass
results.sort(key=lambda s: s.pop("_sort_key"), reverse=True)
return results return results

View File

@@ -6,7 +6,24 @@
and are appended automatically by help.html when present. and are appended automatically by help.html when present.
--> -->
*Last updated: 2026-05-05* *Last updated: 2026-05-08*
---
## Getting Started
If this is your first time using Cortex, you need one thing before the chat will work: an AI model connected to your account.
**Fastest path — OpenRouter:**
OpenRouter gives you access to Claude, Gemini, and dozens of other models with a single API key.
1. Get a free API key at [openrouter.ai/keys](https://openrouter.ai/keys)
2. Go to **☰ → Account → [Set up OpenRouter →]** (shown automatically if no model is configured)
3. Paste your key, pick a starting model, click **Connect**
That's it — you're ready to chat.
**Already past setup but seeing errors?** Go to **☰ → Account → Model Registry → Manage models** and confirm a model is assigned to the **Chat** role (Primary slot). If all slots are empty, add a model first.
--- ---
@@ -52,19 +69,45 @@ Click the **⚡** button in the input row to enable the Tools toggle. When lit (
The orchestrator runs a multi-step tool loop: The orchestrator runs a multi-step tool loop:
1. The **orchestrator model** reasons about the request and calls tools as needed — web search, file reads, task management, shell commands, Aether Journals, and more 1. The **orchestrator model** reasons about the request and calls tools as needed
2. It produces an enriched summary of what it found 2. It produces an enriched summary of what it found
3. The **responder model** (set by the active Role) receives that context and writes the final user-facing reply 3. The **responder model** (set by the active Role) receives that context and writes the final user-facing reply
4. A `⚡ N tool calls: …` note appears below the response listing what was used 4. A `⚡ N tool calls: …` note appears below the response listing what was used
The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**. By default this is Gemini API. The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**.
The full tool reference is in the **Tools** tab. 40 tools across web, files, shell, system, tasks, cron, reminders, scratchpad, notifications, and Aether Journals.
Tools mode is best for tasks requiring research, multi-step reasoning, or side effects (e.g. "search for X", "add a task", "what's on my list?", "append this to my journal"). Regular chat is faster for conversational turns. Tools mode is best for tasks requiring research, multi-step reasoning, or side effects (e.g. "search for X", "add a task", "what's on my list?", "append this to my journal"). Regular chat is faster for conversational turns.
Orchestrated sessions persist to history exactly like regular chat. Orchestrated sessions persist to history exactly like regular chat.
### Available Tools
40 tools across 11 categories. Each tool schema is sent to the model on every orchestrated call — fewer active tools means fewer tokens per call.
| Category | Tools |
|---|---|
| **Web** | `web_search`, `http_fetch` |
| **Files** | `file_read`, `file_list`, `file_write` |
| **Shell** | `shell_exec`, `claude_allow_dir` |
| **System** | `cortex_restart`, `cortex_logs`, `cortex_status`, `cortex_update` |
| **Tasks** | `task_list`, `task_create`, `task_update`, `task_complete` |
| **Cron** | `cron_list`, `cron_add`, `cron_remove`, `cron_toggle` |
| **Reminders** | `reminders_add`, `reminders_list`, `reminders_remove`, `reminders_clear` |
| **Scratchpad** | `scratch_read`, `scratch_write`, `scratch_append`, `scratch_clear` |
| **Notifications** | `web_push`, `email_send`, `nc_talk_send` |
| **Aether Journals** | `ae_journal_list/search`, `ae_journal_entries_list`, `ae_journal_entry_read/create/update/disable/append/prepend` |
| **Agent Notes** | `agent_notes_read`, `agent_notes_write`, `agent_notes_append`, `agent_notes_clear` |
File, Shell, System, and some Notification tools are **admin-only** and not visible to regular users.
### Per-Role Tool Sets
Each role can be configured with a specific subset of tool categories. When a role has a tool subset configured, only those tools are sent to the orchestrator — the rest are invisible to the model for that session.
**Example:** a Coder role might only need Web, Files, Shell, and Agent Notes. A Research role might only need Web. Configuring this avoids sending schemas for 30+ irrelevant tools on every call.
Configure per-role tool sets in **Account → Model Registry → Role Assignments** — expand a role card to see the category checkboxes. The default (no checkboxes selected) sends all tools the user has access to.
--- ---
## Sessions ## Sessions
@@ -123,11 +166,59 @@ Each response shows a **model tag** (bottom-right of message) with the model lab
--- ---
## Account Settings
**Navigate to:** ☰ (top-right menu) → **Account**
| Section | What you can do |
|---|---|
| **Account** | View your username, role badge (Admin / User), rename your username |
| **Connected Accounts** | See which Google account is linked for OAuth sign-in |
| **Email Allowlist** | Regex patterns controlling which addresses the `email_send` tool can reach |
| **Notifications** | Set which channel (NC Talk, Google Chat, email) Inara uses for proactive messages |
| **Tool Permissions** | Allow or block specific orchestrator tools for your account |
| **Usage** | Token consumption by model — see below |
| **Browser Cache** | Clear UI preferences stored locally (theme, font size, session ID, etc.) |
| **Model Registry** | Configure AI providers, local hosts, and role assignments |
| **Change Password** | Update your login password |
| **Personas** | List and rename your personas |
---
## Usage
Token consumption is tracked automatically for API-backed models. **Navigate to:** ☰ → **Account****Usage** section.
The table shows all-time totals per model key, with columns for:
| Column | Meaning |
|---|---|
| **Model** | `backend/model-name` key (e.g. `gemini_api/gemini-2.5-flash`, `local/deepseek-v4`) |
| **Calls** | Number of API calls made |
| **Prompt** | Input tokens sent |
| **Output** | Completion tokens received |
| **Total** | Prompt + Output |
Values ≥ 1,000 are displayed as `k` (e.g. `24.3k`).
**What is and isn't tracked:**
- ✅ Gemini API calls (orchestrator, distillation)
- ✅ Local OpenAI-compatible calls (Open WebUI, Ollama, OpenRouter)
- ✗ Claude CLI — no structured token data is returned by the subprocess
- ✗ Gemini CLI — same reason
The raw data lives in `home/{username}/usage.json` and is also accessible via the Files panel or the API.
---
## Model Registry ## Model Registry
Configure which AI models are available and which handles each task type. Configure which AI models are available and which handles each task type.
**Navigate to:** ☰ (top-right menu) → **Account** → scroll to **Model Registry****Manage models →** **New user quick path:** ☰ → **Account****Set up OpenRouter →** (the guided wizard adds a host, model, and role assignment in one step).
**Full manual path:** ☰ → **Account** → scroll to **Model Registry****Manage models →**
--- ---
@@ -142,10 +233,16 @@ Do this before adding models — models need a provider account or local host to
2. Enter a label (e.g. "Work", "Personal") and your API key 2. Enter a label (e.g. "Work", "Personal") and your API key
3. Get a free key at [aistudio.google.com/apikey](https://aistudio.google.com/apikey) 3. Get a free key at [aistudio.google.com/apikey](https://aistudio.google.com/apikey)
**Local hosts** (Open WebUI, Ollama, OpenRouter, etc.): **OpenRouter** (recommended for new users — one key for many models):
1. Get a key at [openrouter.ai/keys](https://openrouter.ai/keys)
2. Scroll to **Local Hosts****+ Add host**
3. Label: "OpenRouter", URL: `https://openrouter.ai/api/v1`, paste your key, Type: OpenAI-compatible
4. Click **Fetch models** to verify, then add models from the fetched list
**Other local hosts** (Open WebUI, Ollama, LM Studio, etc.):
1. Scroll to **Local Hosts** → click **+ Add host** to expand the form 1. Scroll to **Local Hosts** → click **+ Add host** to expand the form
2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and optional API key 2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and optional API key
3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible (for OpenRouter, LM Studio, etc.) 3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible
4. Click **Fetch models** on the saved host card to verify connectivity 4. Click **Fetch models** on the saved host card to verify connectivity
--- ---
@@ -178,6 +275,8 @@ Scroll to **Role Assignments** at the bottom of the page. Each role has **Primar
Leave all slots empty to use the server default. Leave all slots empty to use the server default.
**Per-role tool sets:** Expand any role card to configure which tool categories the orchestrator can use when that role is active. Unchecked categories are hidden from the model entirely — reducing token overhead on every orchestrated call. Leaving all categories unchecked means all tools the user has access to are available (the default).
--- ---
## Nextcloud Talk Bot ## Nextcloud Talk Bot
@@ -245,12 +344,12 @@ Controls how much context is prepended to each LLM call:
| Tier | Loads | ~Tokens | | Tier | Loads | ~Tokens |
|---|---|---| |---|---|---|
| **T1** | SOUL + IDENTITY + USER summary | ~1,500 | | **Min** | SOUL + IDENTITY + USER summary | ~1,500 |
| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 | | **Std** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
| **T3** | + last 2 raw session logs | ~15,000 | | **Ext** | + last 2 raw session logs | ~15,000 |
| **T4** | + last 7 raw session logs | ~50,000 | | **Full** | + last 7 raw session logs | ~50,000 |
Default is T2. Use T1 for small/local models. Use T3T4 for complex multi-session tasks. Default is **Std**. Use **Min** for small/local models. Use **Ext** or **Full** for complex multi-session tasks.
### Memory Layers ### Memory Layers
@@ -318,6 +417,7 @@ For direct access or scripting:
| `GET` | `/orchestrate/{job_id}` | Poll job status and result | | `GET` | `/orchestrate/{job_id}` | Poll job status and result |
| `GET` | `/settings/models` | Model registry UI | | `GET` | `/settings/models` | Model registry UI |
| `POST` | `/api/models/role` | Set a role assignment (JSON body) | | `POST` | `/api/models/role` | Set a role assignment (JSON body) |
| `POST` | `/api/models/role-config` | Set per-role tool list and system prompt append |
| `GET` | `/api/push/vapid-key` | VAPID public key (for push subscription) | | `GET` | `/api/push/vapid-key` | VAPID public key (for push subscription) |
| `POST` | `/api/push/subscribe` | Register a push subscription | | `POST` | `/api/push/subscribe` | Register a push subscription |
| `DELETE` | `/api/push/subscribe` | Remove a push subscription | | `DELETE` | `/api/push/subscribe` | Remove a push subscription |
@@ -325,6 +425,11 @@ For direct access or scripting:
| `GET` | `/api/audit/day?date=` | Tool call entries for a specific date (own data) | | `GET` | `/api/audit/day?date=` | Tool call entries for a specific date (own data) |
| `GET` | `/api/audit/recent` | Recent tool calls across days (admin) | | `GET` | `/api/audit/recent` | Recent tool calls across days (admin) |
| `GET` | `/api/audit/stats` | Tool call counts by tool/status/user (admin) | | `GET` | `/api/audit/stats` | Tool call counts by tool/status/user (admin) |
| `GET` | `/api/usage` | Full daily token usage log (own data) |
| `GET` | `/api/usage/summary` | Per-model token totals, all time (own data) |
| `GET` | `/api/usage/all` | Per-model totals for all users (admin) |
| `GET` | `/setup/model` | Guided OpenRouter setup form (Step 3 / standalone) |
| `POST` | `/setup/model` | Save OpenRouter host + model + assign to chat role |
| `GET` | `/health` | Health check — returns `{"status": "ok"}` | | `GET` | `/health` | Health check — returns `{"status": "ok"}` |
Chat request body (`POST /chat`): Chat request body (`POST /chat`):

View File

@@ -1,6 +1,6 @@
# Tool Reference # Tool Reference
> This reference covers all 40 orchestrator tools available when the ⚡ toggle is on. > This reference covers all 44 orchestrator tools available when the ⚡ toggle is on.
> Tools are invoked automatically by the orchestrator — you don't call them directly. > Tools are invoked automatically by the orchestrator — you don't call them directly.
¹ **Admin only** — requires the `admin` role. Invisible to regular users. ¹ **Admin only** — requires the `admin` role. Invisible to regular users.
@@ -102,3 +102,14 @@
| Tool | What it does | | Tool | What it does |
|---|---| |---|---|
| `ae_task_list` ¹ | List tasks from the agents_sync Kanban board | | `ae_task_list` ¹ | List tasks from the agents_sync Kanban board |
## Agent Notes
Private, durable notes visible only to the orchestrator — not surfaced to users. Persist across sessions. Only available in orchestrated (tool-enabled) sessions.
| Tool | What it does |
|---|---|
| `agent_notes_read` | Read the current private notes file |
| `agent_notes_write` | Overwrite the notes file completely |
| `agent_notes_append` | Append a timestamped entry (keeps last 3 backups automatically) |
| `agent_notes_clear` | Erase all notes (backs up first) |

View File

@@ -18,6 +18,11 @@
const settings_dd_el = document.getElementById('settings-dropdown'); const settings_dd_el = document.getElementById('settings-dropdown');
const sessionsBackdrop = document.getElementById('sessions-backdrop'); const sessionsBackdrop = document.getElementById('sessions-backdrop');
// ── Utilities ─────────────────────────────────────────────────
function escapeHtml(str) {
return String(str).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
}
// ── Close all panels/dropdowns (mutual exclusion) ───────────── // ── Close all panels/dropdowns (mutual exclusion) ─────────────
function closeAllPanels() { function closeAllPanels() {
if (mode_dropdown_el) mode_dropdown_el.classList.remove('open'); if (mode_dropdown_el) mode_dropdown_el.classList.remove('open');
@@ -435,8 +440,32 @@
availableRoles = d.available_roles || []; availableRoles = d.available_roles || [];
roleIdx = 0; roleIdx = 0;
setRoleToggleUI(availableRoles[0] || null); setRoleToggleUI(availableRoles[0] || null);
_maybeShowNoBanner(availableRoles);
}); });
function _maybeShowNoBanner(roles) {
const key = 'cx_no_model_banner_dismissed';
if (roles.length > 0) { localStorage.removeItem(key); return; }
if (localStorage.getItem(key)) return;
const banner = document.createElement('div');
banner.id = 'no-model-banner';
banner.style.cssText = [
'background:#1c1a0a','border-bottom:1px solid #78350f',
'color:#fbbf24','font-size:0.82rem','padding:0.55rem 1rem',
'display:flex','align-items:center','gap:0.75rem','flex-shrink:0',
].join(';');
banner.innerHTML = `
<span style="flex:1">⚡ Using server default model — add your own for more choices and to track your usage.</span>
<a href="/setup/model" style="color:#fbbf24;font-weight:600;white-space:nowrap;">Set up OpenRouter →</a>
<button onclick="localStorage.setItem('${key}','1');document.getElementById('no-model-banner').remove();"
style="background:none;border:none;color:#78350f;cursor:pointer;font-size:1rem;line-height:1;padding:0 0.2rem;"
title="Dismiss">✕</button>
`;
// Insert at the top of #chat-col (or body if not found)
const col = document.getElementById('chat-col') || document.body.firstElementChild;
col.insertBefore(banner, col.firstChild);
}
backendToggle.addEventListener('click', () => { backendToggle.addEventListener('click', () => {
if (availableRoles.length <= 1) return; if (availableRoles.length <= 1) return;
roleIdx = (roleIdx + 1) % availableRoles.length; roleIdx = (roleIdx + 1) % availableRoles.length;
@@ -1067,6 +1096,19 @@
sessionId = data.session_id; sessionId = data.session_id;
sessionEl.textContent = `session: ${sessionId}`; sessionEl.textContent = `session: ${sessionId}`;
persist_session(); persist_session();
// Auto-name the session from the first user message
if (wasNewSession) {
const autoName = text.slice(0, 60).trimEnd() + (text.length > 60 ? '…' : '');
fetch(`/sessions/${sessionId}?${_fileParams}`, {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: autoName }),
}).then(() => {
sessionEl.textContent = `session: ${autoName}`;
sessionNames.set(sessionId, autoName);
}).catch(() => {});
}
thinkingDiv.className = 'message assistant'; thinkingDiv.className = 'message assistant';
setMessageText(thinkingDiv, 'assistant', data.response); setMessageText(thinkingDiv, 'assistant', data.response);
const assistHistIdx = currentHistory.length; const assistHistIdx = currentHistory.length;
@@ -1133,6 +1175,8 @@
const text = inputEl.value.trim(); const text = inputEl.value.trim();
if (!text || activeController) return; if (!text || activeController) return;
const wasNewSession = !sessionId;
inputEl.value = ''; inputEl.value = '';
syncHeight(); syncHeight();
sendBtn.style.display = 'none'; sendBtn.style.display = 'none';
@@ -1357,6 +1401,7 @@
{ label: 'Memory', files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] }, { label: 'Memory', files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] },
{ label: 'Profile', files: ['USER.md', 'HELP.md'] }, { label: 'Profile', files: ['USER.md', 'HELP.md'] },
{ label: 'Settings', files: ['email_allowlist.json'] }, { label: 'Settings', files: ['email_allowlist.json'] },
{ label: 'Agent Notes (read-only)', files: ['AGENT_NOTES.bak1.md', 'AGENT_NOTES.bak2.md', 'AGENT_NOTES.bak3.md'], collapsed: true },
]; ];
function fmtSize(bytes) { function fmtSize(bytes) {
@@ -1394,7 +1439,7 @@
fileSidebar.innerHTML = ''; fileSidebar.innerHTML = '';
for (const group of FILE_GROUPS) { for (const group of FILE_GROUPS) {
const { groupEl, items } = _makeFileGroup(group.label); const { groupEl, items } = _makeFileGroup(group.label, group.collapsed || false);
for (const fname of group.files) { for (const fname of group.files) {
const f = byName[fname]; const f = byName[fname];
@@ -1490,12 +1535,20 @@
// Restore editor/preview buttons hidden by audit view // Restore editor/preview buttons hidden by audit view
fileRawBtn.style.display = ''; fileRawBtn.style.display = '';
filePreviewBtn.style.display = ''; filePreviewBtn.style.display = '';
fileSaveBtn.style.display = '';
const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`); const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`);
if (!res.ok) { mdEditor.setValue(`Error loading ${name}`); return; } if (!res.ok) { mdEditor.setValue(`Error loading ${name}`); return; }
const data = await res.json(); const data = await res.json();
mdEditor.setValue(data.content); mdEditor.setValue(data.content);
mdEditor.clearHistory(); mdEditor.clearHistory();
if (data.readonly) {
mdEditor.setOption('readOnly', 'nocursor');
fileSaveBtn.style.display = 'none';
document.getElementById('file-modal-title').textContent = name + ' (read-only)';
} else {
mdEditor.setOption('readOnly', false);
fileSaveBtn.style.display = '';
document.getElementById('file-modal-title').textContent = name;
}
setFileMode(fileMode); setFileMode(fileMode);
} }
@@ -1794,11 +1847,13 @@
let memMid = localStorage.getItem('mem-mid') !== 'false'; let memMid = localStorage.getItem('mem-mid') !== 'false';
let memShort = localStorage.getItem('mem-short') !== 'false'; let memShort = localStorage.getItem('mem-short') !== 'false';
const TIER_LABELS = { 1: 'Min', 2: 'Std', 3: 'Ext', 4: 'Full' };
function updateTierUI() { function updateTierUI() {
document.querySelectorAll('.ctx-btn[data-tier]').forEach(btn => { document.querySelectorAll('.ctx-btn[data-tier]').forEach(btn => {
btn.classList.toggle('active', parseInt(btn.dataset.tier) === currentTier); btn.classList.toggle('active', parseInt(btn.dataset.tier) === currentTier);
}); });
ctxOpenBtn.querySelector('.tier-badge').textContent = currentTier; ctxOpenBtn.querySelector('.tier-badge').textContent = TIER_LABELS[currentTier] || currentTier;
} }
function updateMemUI() { function updateMemUI() {
@@ -1870,33 +1925,46 @@
memShort = !memShort; localStorage.setItem('mem-short', memShort); updateMemUI(); memShort = !memShort; localStorage.setItem('mem-short', memShort); updateMemUI();
}); });
const _distillBtns = () => document.querySelectorAll(
'#distill-short-btn, #distill-mid-btn, #distill-long-btn, #distill-all-btn, #distill-rebuild-btn'
);
function showDistillStatus(msg, isErr) { function showDistillStatus(msg, isErr) {
distillStatus.textContent = msg; distillStatus.textContent = msg;
distillStatus.classList.toggle('err', !!isErr); distillStatus.classList.toggle('err', !!isErr);
distillStatus.classList.add('show'); distillStatus.classList.add('show');
setTimeout(() => distillStatus.classList.remove('show'), 5000); setTimeout(() => distillStatus.classList.remove('show'), isErr ? 8000 : 5000);
} }
async function runDistill(endpoint) { async function runDistill(endpoint, label) {
showDistillStatus('distilling…', false); _distillBtns().forEach(b => { b.disabled = true; });
showDistillStatus(`${label || endpoint} running…`, false);
try { try {
const res = await fetch(`/distill/${endpoint}?${_fileParams}`, { method: 'POST' }); const res = await fetch(`/distill/${endpoint}?${_fileParams}`, { method: 'POST' });
const d = await res.json(); const d = await res.json();
if (!res.ok || d.ok === false) { if (res.status === 409 || res.status === 429) {
const err = d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`; showDistillStatus(` ${d.detail}`, true);
} else if (!res.ok || d.ok === false) {
const err = d.detail || d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`;
showDistillStatus(`${err}`, true); showDistillStatus(`${err}`, true);
} else { } else {
showDistillStatus(`${endpoint} done`, false); showDistillStatus(`${label || endpoint} complete`, false);
} }
} catch (err) { } catch (err) {
showDistillStatus(`${err.message}`, true); showDistillStatus(`${err.message}`, true);
} finally {
_distillBtns().forEach(b => { b.disabled = false; });
} }
} }
document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short')); document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short', 'Short distill'));
document.getElementById('distill-mid-btn').addEventListener('click', () => runDistill('mid')); document.getElementById('distill-mid-btn').addEventListener('click', () => runDistill('mid', 'Mid distill'));
document.getElementById('distill-long-btn').addEventListener('click', () => runDistill('long')); document.getElementById('distill-long-btn').addEventListener('click', () => runDistill('long', 'Long distill'));
document.getElementById('distill-all-btn').addEventListener('click', () => runDistill('all')); document.getElementById('distill-all-btn').addEventListener('click', () => runDistill('all', 'Full distill'));
document.getElementById('distill-rebuild-btn').addEventListener('click', () => {
if (!confirm('Rebuild memory from scratch?\n\nThis will wipe MEMORY_MID and MEMORY_LONG (backups kept) then regenerate them from session logs. Any hand-edited content will be replaced.\n\nContinue?')) return;
runDistill('rebuild', 'Memory rebuild');
});
updateTierUI(); updateTierUI();
updateMemUI(); updateMemUI();

View File

@@ -87,10 +87,10 @@
<div class="ctx-section"> <div class="ctx-section">
<div class="ctx-section-title">Context Tier</div> <div class="ctx-section-title">Context Tier</div>
<div class="ctx-row"> <div class="ctx-row">
<button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal (~1.5k tokens)">T1</button> <button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal — identity only (~1.5k tokens)">Min</button>
<button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard (~5k tokens)">T2</button> <button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard — memory + user profile (~5k tokens)">Std</button>
<button class="ctx-btn" data-tier="3" id="tier-3" title="Extended (~15k tokens)">T3</button> <button class="ctx-btn" data-tier="3" id="tier-3" title="Extended — + last 2 sessions (~15k tokens)">Ext</button>
<button class="ctx-btn" data-tier="4" id="tier-4" title="Full (~50k tokens)">T4</button> <button class="ctx-btn" data-tier="4" id="tier-4" title="Full — + last 7 sessions (~50k tokens)">Full</button>
</div> </div>
</div> </div>
<div class="ctx-section"> <div class="ctx-section">
@@ -108,6 +108,7 @@
<button class="ctx-btn" id="distill-mid-btn" title="Summarize SHORT → MID memory (uses LLM)">Mid</button> <button class="ctx-btn" id="distill-mid-btn" title="Summarize SHORT → MID memory (uses LLM)">Mid</button>
<button class="ctx-btn" id="distill-long-btn" title="Integrate MID → LONG memory (uses LLM)">Long</button> <button class="ctx-btn" id="distill-long-btn" title="Integrate MID → LONG memory (uses LLM)">Long</button>
<button class="ctx-btn" id="distill-all-btn" title="Run Short → Mid → Long in sequence">All</button> <button class="ctx-btn" id="distill-all-btn" title="Run Short → Mid → Long in sequence">All</button>
<button class="ctx-btn ctx-btn-danger" id="distill-rebuild-btn" title="⚠ Wipe Mid + Long memories and rebuild from session logs. Hand-edited content will be replaced.">Rebuild</button>
</div> </div>
<div id="ctx-distill-status"></div> <div id="ctx-distill-status"></div>
<div id="ctx-schedule"></div> <div id="ctx-schedule"></div>

View File

@@ -167,9 +167,11 @@
.pb-anthropic { background: #1e1b4b; color: #818cf8; } .pb-anthropic { background: #1e1b4b; color: #818cf8; }
.pb-google { background: #042f2e; color: #34d399; } .pb-google { background: #042f2e; color: #34d399; }
.pb-local { background: #1e293b; color: #64748b; } .pb-local { background: #1e293b; color: #64748b; }
.pb-notools { background: #3b1a1a; color: #f87171; }
[data-theme="light"] .pb-anthropic { background: #ede9fe; color: #5b21b6; } [data-theme="light"] .pb-anthropic { background: #ede9fe; color: #5b21b6; }
[data-theme="light"] .pb-google { background: #d1fae5; color: #065f46; } [data-theme="light"] .pb-google { background: #d1fae5; color: #065f46; }
[data-theme="light"] .pb-local { background: #e2e8f0; color: #475569; } [data-theme="light"] .pb-local { background: #e2e8f0; color: #475569; }
[data-theme="light"] .pb-notools { background: #fee2e2; color: #b91c1c; }
/* Host & model rows */ /* Host & model rows */
.host-row { .host-row {
@@ -488,8 +490,22 @@
autocomplete="off" data-form-type="other"> autocomplete="off" data-form-type="other">
</div> </div>
<div class="field" style="flex:0 0 auto"> <div class="field" style="flex:0 0 auto">
<label>Context (k tokens)</label> <label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k tokens)</label>
<input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000"> <input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000"
title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
<input type="number" name="max_rounds" value="0" min="0"
title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
<select name="tools"
title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
<option value="1" selected>Supported</option>
<option value="0">Not supported</option>
</select>
</div> </div>
</div> </div>
<div class="field"> <div class="field">

View File

@@ -423,6 +423,18 @@
</div> </div>
<!-- Browser cache --> <!-- Browser cache -->
<!-- Usage summary -->
<div class="section" id="usage-section">
<h2>Usage</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Token consumption tracked for API-backed models (Gemini API, local OpenAI-compatible).
Claude CLI calls are not metered.
</p>
<div id="usage-table-wrap" style="overflow-x:auto;">
<p style="font-size:0.8rem; color:var(--pg-muted);">Loading…</p>
</div>
</div>
<div class="section"> <div class="section">
<h2>Browser Cache</h2> <h2>Browser Cache</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;"> <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
@@ -443,6 +455,25 @@
<!-- Model Registry link --> <!-- Model Registry link -->
<div class="section"> <div class="section">
<h2>Model Registry</h2> <h2>Model Registry</h2>
<!-- Quick-start card: shown only when no model is configured for chat role -->
<div id="openrouter-quickstart" style="display:none; background:#1c1a0a; border:1px solid #78350f;
border-radius:8px; padding:1rem; margin-bottom:1rem;">
<p style="font-size:0.82rem; color:#fbbf24; font-weight:600; margin-bottom:0.4rem;">
⚡ You're on the server default model
</p>
<p style="font-size:0.8rem; color:#d97706; margin-bottom:0.75rem; line-height:1.5;">
You can chat now, but adding your own model gives you more choices, lets you pick
role-specific models, and tracks your usage separately.
OpenRouter is the easiest way to get started — one key, many models.
</p>
<a href="/setup/model"
style="display:inline-block; padding:0.5rem 0.9rem; background:#92400e; border-radius:6px;
color:#fef3c7; font-size:0.85rem; font-weight:600; text-decoration:none;">
Set up OpenRouter →
</a>
</div>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;"> <p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Configure AI providers (Anthropic, Google), local hosts (Open WebUI, Ollama, OpenRouter, etc.), Configure AI providers (Anthropic, Google), local hosts (Open WebUI, Ollama, OpenRouter, etc.),
and assign models to roles — chat, orchestrator, distill, and more. and assign models to roles — chat, orchestrator, distill, and more.
@@ -479,6 +510,22 @@
</div> </div>
<!-- Personas --> <!-- Personas -->
<!-- Sessions -->
<div class="section">
<h2>Sessions</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Auto-name any sessions that still show a random ID, using their first message as the name.
Only unnamed sessions are affected — existing names are left alone.
</p>
<button type="button" id="backfill-names-btn"
style="padding:0.5rem 1rem; background:none; border:1px solid var(--pg-border); border-radius:6px;
color:var(--pg-muted); font-size:0.88rem; font-weight:500; cursor:pointer;
transition:border-color 0.15s, color 0.15s;">
Auto-name old sessions
</button>
<span id="backfill-names-ok" style="display:none; margin-left:0.75rem; font-size:0.8rem; color:#4ade80;"></span>
</div>
<div class="section"> <div class="section">
<h2>Personas</h2> <h2>Personas</h2>
<ul class="persona-list"> <ul class="persona-list">
@@ -532,6 +579,84 @@
document.getElementById('clear-ls-ok').style.display = 'inline'; document.getElementById('clear-ls-ok').style.display = 'inline';
}); });
// Show OpenRouter quick-start card if no model is configured
(async () => {
try {
const d = await fetch('/backend').then(r => r.json());
const roles = d.available_roles || [];
if (roles.length === 0) {
document.getElementById('openrouter-quickstart').style.display = 'block';
}
} catch (_) {}
})();
// Usage summary table
(async () => {
const wrap = document.getElementById('usage-table-wrap');
try {
const resp = await fetch('/api/usage/summary');
if (!resp.ok) throw new Error(resp.statusText);
const rows_data = await resp.json();
if (!rows_data.length) {
wrap.innerHTML = '<p style="font-size:0.8rem;color:var(--pg-muted);">No usage recorded yet.</p>';
return;
}
const fmt = n => n >= 1000 ? (n / 1000).toFixed(1) + 'k' : String(n);
const rows = rows_data.map(d => {
const labelCell = d.label !== d.key
? `<span title="${d.key}">${d.label}</span>`
: `<span>${d.key}</span>`;
return `<tr>
<td style="padding:0.4rem 0.75rem 0.4rem 0; font-size:0.82rem; color:var(--pg-text); white-space:nowrap;">${labelCell}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${d.calls}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.prompt_tokens)}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.completion_tokens)}</td>
<td style="padding:0.4rem 0 0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-text); text-align:right; font-weight:600;">${fmt(d.total_tokens)}</td>
</tr>`;
}).join('');
wrap.innerHTML = `<table style="border-collapse:collapse; width:100%; min-width:360px;">
<thead>
<tr style="border-bottom:1px solid var(--pg-border);">
<th style="padding:0.35rem 0.75rem 0.35rem 0; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:left;">Model</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Calls</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Prompt</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Output</th>
<th style="padding:0.35rem 0 0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Total</th>
</tr>
</thead>
<tbody>${rows}</tbody>
</table>`;
} catch (e) {
wrap.innerHTML = `<p style="font-size:0.8rem;color:var(--pg-muted);">Could not load usage data.</p>`;
}
})();
// Auto-name old sessions backfill
document.getElementById('backfill-names-btn').addEventListener('click', async () => {
const btn = document.getElementById('backfill-names-btn');
const ok = document.getElementById('backfill-names-ok');
btn.disabled = true;
btn.textContent = 'Working…';
try {
const params = new URLSearchParams(window.location.search);
const user = params.get('user') || document.querySelector('input[value]')?.value || '';
const persona = params.get('persona') || '';
const qs = user ? `?user=${encodeURIComponent(user)}&persona=${encodeURIComponent(persona)}` : '';
const res = await fetch(`/api/sessions/backfill-names${qs}`, { method: 'POST' });
const data = await res.json();
if (!res.ok) throw new Error(data.detail || res.statusText);
const n = data.named ?? 0;
ok.textContent = `Named ${n} session${n !== 1 ? 's' : ''}.`;
ok.style.display = 'inline';
} catch (e) {
ok.textContent = 'Error — check console.';
ok.style.color = '#f87171';
ok.style.display = 'inline';
}
btn.textContent = 'Auto-name old sessions';
btn.disabled = false;
});
// Persona rename toggle // Persona rename toggle
document.querySelectorAll('.persona-rename-toggle').forEach(btn => { document.querySelectorAll('.persona-rename-toggle').forEach(btn => {
btn.addEventListener('click', () => { btn.addEventListener('click', () => {

View File

@@ -127,6 +127,36 @@
.emoji-opt.selected { border-color: #7c3aed; background: #2d1f52; } .emoji-opt.selected { border-color: #7c3aed; background: #2d1f52; }
#emoji-hidden { display: none; } #emoji-hidden { display: none; }
.provider-badge {
display: inline-flex;
align-items: center;
gap: 0.4rem;
background: #2d1f52;
border: 1px solid #7c3aed;
border-radius: 6px;
padding: 0.3rem 0.6rem;
font-size: 0.78rem;
color: #a78bfa;
margin-bottom: 1rem;
}
.skip-link {
display: block;
text-align: center;
margin-top: 1rem;
font-size: 0.8rem;
color: #64748b;
text-decoration: none;
}
.skip-link:hover { color: #94a3b8; }
.model-hint {
font-size: 0.72rem;
color: #64748b;
margin-top: 0.75rem;
text-align: center;
}
</style> </style>
</head> </head>
<body> <body>
@@ -137,10 +167,11 @@
</div> </div>
<!-- ERROR --> <!-- ERROR -->
<!-- ERROR_MODEL -->
<!-- ── Step 1: password ───────────────────────────────────────── --> <!-- ── Step 1: password ───────────────────────────────────────── -->
<div id="step-password"> <div id="step-password">
<div class="step-label">Step 1 of 2</div> <div class="step-label">Step 1 of 3</div>
<h2>Set your password</h2> <h2>Set your password</h2>
<form method="POST" action="" id="password-form"> <form method="POST" action="" id="password-form">
<input type="hidden" name="step" value="password"> <input type="hidden" name="step" value="password">
@@ -161,7 +192,7 @@
<!-- ── Step 2: persona ────────────────────────────────────────── --> <!-- ── Step 2: persona ────────────────────────────────────────── -->
<div id="step-persona" style="display:none"> <div id="step-persona" style="display:none">
<div class="step-label">Step 2 of 2</div> <div class="step-label">Step 2 of 3</div>
<h2>Create your persona</h2> <h2>Create your persona</h2>
<form method="POST" action="" id="persona-form"> <form method="POST" action="" id="persona-form">
<input type="hidden" name="step" value="persona"> <input type="hidden" name="step" value="persona">
@@ -203,6 +234,39 @@
<button type="submit">Create my persona →</button> <button type="submit">Create my persona →</button>
</form> </form>
</div> </div>
<!-- ── Step 3: model connect ─────────────────────────────────── -->
<div id="step-model" style="display:none">
<div class="step-label"><!-- SETUP_STEP3_LABEL --></div>
<h2>Connect an AI model</h2>
<div class="provider-badge">⚡ Recommended: OpenRouter</div>
<p style="font-size:0.82rem;color:#94a3b8;margin-bottom:1rem;">
One API key gives you access to Claude, Gemini, Llama, and dozens of other models.
Get a free key at <a href="https://openrouter.ai/keys" target="_blank" style="color:#a78bfa;">openrouter.ai/keys</a>.
</p>
<form method="POST" action="/setup/model" id="model-form">
<div class="field">
<label for="api_key">OpenRouter API key</label>
<input type="password" id="api_key" name="api_key"
autocomplete="off" placeholder="sk-or-v1-..." required>
</div>
<div class="field">
<label for="model_name">Starting model</label>
<select id="model_name" name="model_name">
<option value="anthropic/claude-3-5-haiku-20241022">Claude 3.5 Haiku — Fast &amp; affordable</option>
<option value="anthropic/claude-3-7-sonnet-20250219">Claude 3.7 Sonnet — Smarter Claude</option>
<option value="google/gemini-2.0-flash-001">Gemini 2.0 Flash — Fast Google model</option>
<option value="meta-llama/llama-3.3-70b-instruct">Llama 3.3 70B — Open source</option>
</select>
<p class="hint">You can add more models or switch anytime in Account → Model Registry.</p>
</div>
<button type="submit">Connect &amp; start chatting →</button>
</form>
<p class="model-hint">
Using Ollama, a local model, or something else?
<a href="#" id="skip-model-link" style="color:#64748b;">Skip this step →</a>
</p>
</div>
</div> </div>
<script> <script>
@@ -232,6 +296,11 @@
document.getElementById('step-password').style.display = 'none'; document.getElementById('step-password').style.display = 'none';
document.getElementById('step-persona').style.display = 'block'; document.getElementById('step-persona').style.display = 'block';
} }
if (params.get('step') === '3') {
document.getElementById('step-password').style.display = 'none';
document.getElementById('step-persona').style.display = 'none';
document.getElementById('step-model').style.display = 'block';
}
// ── Client-side confirm password check ─────────────────────────── // ── Client-side confirm password check ───────────────────────────
document.getElementById('password-form').addEventListener('submit', e => { document.getElementById('password-form').addEventListener('submit', e => {
@@ -243,6 +312,15 @@
} }
}); });
// ── Skip model setup — navigate to user home ─────────────────────
document.getElementById('skip-model-link')?.addEventListener('click', e => {
e.preventDefault();
// Ask server for skip target (the cx_setup_persona cookie has the path)
fetch('/setup/model/skip', { method: 'POST', credentials: 'same-origin' })
.then(r => { if (r.redirected) location.href = r.url; else location.href = '/'; })
.catch(() => { location.href = '/'; });
});
// ── Auto-generate persona slug from display name ───────────────── // ── Auto-generate persona slug from display name ─────────────────
document.getElementById('display_name').addEventListener('input', function() { document.getElementById('display_name').addEventListener('input', function() {
const slugField = document.getElementById('persona_name'); const slugField = document.getElementById('persona_name');

View File

@@ -1328,7 +1328,10 @@
.ctx-btn:hover { color: var(--text); border-color: var(--muted); } .ctx-btn:hover { color: var(--text); border-color: var(--muted); }
.ctx-btn.active { color: var(--accent); border-color: var(--accent); } .ctx-btn.active { color: var(--accent); border-color: var(--accent); }
.ctx-btn.mem-on { color: var(--success); border-color: var(--success-dim); } .ctx-btn.mem-on { color: var(--success); border-color: var(--success-dim); }
.ctx-btn.local-on { color: var(--amber); border-color: var(--amber-border); } .ctx-btn.local-on { color: var(--amber); border-color: var(--amber-border); }
.ctx-btn-danger { color: #f87171 !important; border-color: #7f1d1d !important; }
.ctx-btn-danger:hover { border-color: #f87171 !important; }
.ctx-btn:disabled { opacity: 0.4; cursor: not-allowed; pointer-events: none; }
#backend-model-hint { #backend-model-hint {
font-size: 0.68rem; color: var(--amber); opacity: 0.9; font-size: 0.68rem; color: var(--amber); opacity: 0.9;
margin-top: 4px; word-break: break-all; line-height: 1.3; margin-top: 4px; word-break: break-all; line-height: 1.3;

View File

@@ -64,6 +64,12 @@ from tools.scratch import (
scratch_clear as _scratch_clear, scratch_clear as _scratch_clear,
) )
from tools.notify import nc_talk_send as _nc_talk_send, email_send as _email_send, web_push as _web_push from tools.notify import nc_talk_send as _nc_talk_send, email_send as _email_send, web_push as _web_push
from tools.agent_notes import (
agent_notes_read as _agent_notes_read,
agent_notes_write as _agent_notes_write,
agent_notes_append as _agent_notes_append,
agent_notes_clear as _agent_notes_clear,
)
# ── Declaration imports ─────────────────────────────────────────────────────── # ── Declaration imports ───────────────────────────────────────────────────────
@@ -77,6 +83,7 @@ import tools.cron as _mod_cron
import tools.reminders as _mod_reminders import tools.reminders as _mod_reminders
import tools.scratch as _mod_scratch import tools.scratch as _mod_scratch
import tools.notify as _mod_notify import tools.notify as _mod_notify
import tools.agent_notes as _mod_agent_notes
# ── Tool categories — used by the Model Registry UI for grouped checkboxes ─── # ── Tool categories — used by the Model Registry UI for grouped checkboxes ───
@@ -98,6 +105,7 @@ TOOL_CATEGORIES: dict[str, list[str]] = {
"ae_journal_entry_prepend", "ae_journal_entry_prepend",
], ],
"Aether Tasks": ["ae_task_list"], "Aether Tasks": ["ae_task_list"],
"Agent Notes": ["agent_notes_read", "agent_notes_write", "agent_notes_append", "agent_notes_clear"],
} }
# ── Callable registry ───────────────────────────────────────────────────────── # ── Callable registry ─────────────────────────────────────────────────────────
@@ -143,6 +151,10 @@ _CALLABLES: dict[str, callable] = {
"email_send": _email_send, "email_send": _email_send,
"nc_talk_send": _nc_talk_send, "nc_talk_send": _nc_talk_send,
"web_push": _web_push, "web_push": _web_push,
"agent_notes_read": _agent_notes_read,
"agent_notes_write": _agent_notes_write,
"agent_notes_append": _agent_notes_append,
"agent_notes_clear": _agent_notes_clear,
} }
# ── Role-based access control ───────────────────────────────────────────────── # ── Role-based access control ─────────────────────────────────────────────────
@@ -194,6 +206,7 @@ _ALL_DECLARATIONS: list[types.FunctionDeclaration] = (
+ _mod_notify.DECLARATIONS + _mod_notify.DECLARATIONS
+ _mod_ae_knowledge.DECLARATIONS + _mod_ae_knowledge.DECLARATIONS
+ _mod_ae_tasks.DECLARATIONS + _mod_ae_tasks.DECLARATIONS
+ _mod_agent_notes.DECLARATIONS
) )
# Full Gemini Tool object (all tools — use get_tools_for_role() in production) # Full Gemini Tool object (all tools — use get_tools_for_role() in production)

View File

@@ -1,7 +1,7 @@
# Architecture: LLM Backends # Architecture: LLM Backends
> How Cortex selects and talks to AI models. > How Cortex selects and talks to AI models.
> Last updated: 2026-04-27 (V2 schema) > Last updated: 2026-05-06
--- ---
@@ -33,11 +33,11 @@ Resolution order for a role:
### Explicit Override ### Explicit Override
The UI backend toggle cycles: **auto → claude → gemini → local → auto** The **Role** toggle in the Context & Memory panel cycles through configured role slots for the `chat` role: **Primary → Backup 1 → Backup 2 → auto**.
- **auto** (default): role-based routing as above - Each slot shows the configured model label
- **claude / gemini / local**: bypasses role routing; forces that backend type - `auto` uses the Primary without forcing a specific backend type
- The toggle will be redesigned in Phase 3 to cycle through chat role slots (Primary / Backup 1 / Backup 2) - The ⚡ Tools toggle is independent — it routes to the `orchestrator` role regardless of the chat role selection
**Fallback chain** (automatic, only when no explicit registry entry exists): **Fallback chain** (automatic, only when no explicit registry entry exists):
``` ```
@@ -113,6 +113,8 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
"provider": "local", "provider": "local",
"host_id": "abc123", "host_id": "abc123",
"context_k": 72, "context_k": 72,
"max_rounds": 5,
"tools": true,
"tags": ["fast", "local"] "tags": ["fast", "local"]
} }
], ],
@@ -125,6 +127,14 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
} }
``` ```
### Optional model fields
| Field | Type | Default | Meaning |
|---|---|---|---|
| `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). |
| `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. |
| `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. |
### host_type (local hosts) ### host_type (local hosts)
| `host_type` | Chat endpoint | Models endpoint | Use for | | `host_type` | Chat endpoint | Models endpoint | Use for |
@@ -210,13 +220,6 @@ Memory distillation uses `role="distill"`. Configure via Model Registry → Role
`.env` override: `ROLE_DISTILL=claude_cli` (default). `.env` override: `ROLE_DISTILL=claude_cli` (default).
---
## Future: Phase 3 — Backend Toggle Redesign
The `claude → gemini → local` toggle will be replaced with a slot toggle that cycles
through the chat role's configured models (Primary → Backup 1 → Backup 2), showing
the actual model label. See `DESIGN__Model_Registry_V2.md`.
--- ---

View File

@@ -1,7 +1,7 @@
# Architecture: System Overview # Architecture: System Overview
> How the pieces fit together. > How the pieces fit together.
> Last updated: 2026-04-03 > Last updated: 2026-05-06
--- ---
@@ -56,7 +56,9 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
| `context_loader.py` | Builds system prompt from persona files (tiers 14) | | `context_loader.py` | Builds system prompt from persona files (tiers 14) |
| `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local | | `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
| `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff | | `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
| `session_store.py` | In-memory + file session persistence | | `openai_orchestrator.py` | OpenAI-compatible ReAct tool loop (local models via Open WebUI/OpenRouter) |
| `model_registry.py` | Per-user model registry V2 — providers, hosts, models, role assignments |
| `session_store.py` | In-memory + file session persistence (`session_data/{id}.json`) |
| `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` | | `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
| `memory_distiller.py` | Short/mid/long distill jobs | | `memory_distiller.py` | Short/mid/long distill jobs |
| `scheduler.py` | APScheduler — distill jobs + user crons | | `scheduler.py` | APScheduler — distill jobs + user crons |
@@ -64,20 +66,23 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
| `notification.py` | Outbound channel messages (distill alerts, cron proactive) | | `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
| `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config | | `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
| `auth_middleware.py` | JWT cookie validation on all routes | | `auth_middleware.py` | JWT cookie validation on all routes |
| `user_settings.py` | Per-user local LLM config (hosts, models, active model) | | `tool_audit.py` | JSONL audit log for every orchestrator tool invocation |
| `usage_tracker.py` | Per-user token usage tracking (daily buckets → `usage.json`) |
| `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) | | `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
| `email_utils.py` | SMTP invite emails | | `email_utils.py` | SMTP invite emails |
| `persona_template.py` | Bootstrap a new persona directory from templates | | `persona_template.py` | Bootstrap a new persona directory from templates |
| `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) | | `routers/` | One file per endpoint group `chat`, `orchestrator`, `auth`, `files`, `ui`, `settings`, `local_llm`, `distill`, `audit`, `usage`, `push`, `help`, `onboarding`, `auth_google`, `nextcloud_talk`, `google_chat` |
| `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) | | `tools/` | Orchestrator tool implementations `web`, `tasks`, `scratch`, `reminders`, `cron`, `system`, `notify`, `ae_journals`, `ae_tasks`, `agent_notes` |
| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` | | `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md`, `local_llm.html`, `settings.html` |
| `tests/` | pytest suite (80 tests) | | `tests/` | pytest suite |
--- ---
## Key Design Decisions ## Key Design Decisions
**Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely. **Two-brain pattern (Gemini orchestrator)** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
**Single-model pattern (local orchestrator)** — When the `orchestrator` role resolves to a `local_openai` model, `openai_orchestrator.py` runs the full ReAct loop and produces the final response itself. No Claude handoff — the local model does both reasoning and response.
**Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path. **Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
@@ -88,3 +93,33 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
**Per-user filesystem layout**`home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up. **Per-user filesystem layout**`home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
**No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems. **No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
**Agent private notes**`AGENT_NOTES.md` per persona, writable only by the orchestrator via `agent_notes_*` tools. Never loaded into user-facing context. Three rolling backups (`bak1``bak3`) are visible read-only in the Files panel. Declared in `tools/agent_notes.py`; usage guidance in `PROTOCOLS.md`.
**No black boxes** — Every component, flow, and design decision is documented. Documentation is updated before implementation of significant changes and verified after. HELP.md is the user-facing contract; ARCH__*.md files are the developer contract; PROTOCOLS.md is the agent contract. If any of these drift from reality, that is a bug.
---
## Onboarding Flow
New users are invited via a one-time token and complete a three-step setup before reaching the chat:
```
1. /setup/{token} → Set password (POST creates session cookie, consumes token)
2. /setup/persona → Create persona (slug, display name, emoji, description)
3. /setup/model → Connect a model — OpenRouter recommended
(skip link goes straight to /{user}/{persona})
```
Step 3 is the planned addition (see `TODO__Agents.md § Guided onboarding`). Before it exists,
users land in the chat with no model configured and must navigate Settings → Model Registry
manually — which is confusing for non-technical users.
**After Step 3:**
- `save_host()` adds OpenRouter (`https://openrouter.ai/api/v1`, type `openai`)
- `save_model()` creates a model entry for the chosen model
- `set_role(chat, primary, model_id)` assigns it as the chat role primary
- Redirect to `/{user}/{persona}`
**Existing users with no model configured** — a dismissable banner is shown in the chat on
load, linking to `/setup/model` (the Step 3 form works standalone, without step labels).

View File

@@ -1,7 +1,10 @@
# Cortex / Inara — Master Index # Cortex / Inara — Master Index
> Start here. This document is a map, not a manual. > Start here. This document is a map, not a manual.
> Last updated: 2026-04-28 > Last updated: 2026-05-06
>
> **Documentation philosophy:** Cortex is a no-black-box system. Docs must match reality.
> Update docs before implementing significant changes. Verify they still match after.
--- ---
@@ -17,20 +20,27 @@ Cortex is a self-hosted personal AI platform. It routes messages from any input
| Component | Status | Notes | | Component | Status | Notes |
|---|---|---| |---|---|---|
| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, session auth | | Web UI | ✅ Live | SPA, dark theme, mobile-responsive, PWA-installable |
| Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing | | Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing |
| Google Chat Add-on | ✅ Live | JWT-verified, per-user routing | | Google Chat Add-on | ✅ Live | JWT-verified, per-user routing |
| Claude backend | ✅ Live | Primary — via Claude Code CLI | | Claude backend | ✅ Live | Primary — via Claude Code CLI |
| Gemini backend | ✅ Live | Fallback — via Gemini CLI | | Gemini backend | ✅ Live | Fallback — via Gemini CLI |
| Local backend | ✅ Live | Third option — Open WebUI/Ollama on scott_gaming | | Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming; per-user multi-model config |
| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ Tools toggle in UI (27 tools) | | Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ toggle in UI (40 tools) |
| Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini | | Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; used when orchestrator role → local model |
| Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini, role assignments |
| Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) | | Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
| Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas | | Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas |
| Session search | ✅ Live | Full-text search across past session logs | | Session search | ✅ Live | Full-text search across past session logs |
| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk | | Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk / web push |
| Tool audit log | ✅ Live | Every orchestrator tool call logged to `home/{user}/tool_audit/` |
| Token usage tracking | ✅ Live | Per-user daily buckets in `home/{user}/usage.json`; visible in Settings |
| Web push notifications | ✅ Live | VAPID push; `web_push` orchestrator tool; subscribe via ☰ menu |
| Agent private notes | ✅ Live | `AGENT_NOTES.md` — orchestrator-only notepad; 3 rolling backups; user-visible as read-only |
| Distill safety | ✅ Live | Per-persona asyncio lock, per-endpoint cooldowns, Rebuild option |
| Guided onboarding | ✅ Live | Setup Step 3 for OpenRouter; existing-user banner; settings quick-link |
**Active users / personas:** scott/inara, scott/developer, holly/tina, brian/wintermute **Active users / personas:** scott/inara, holly/tina, brian/wintermute
--- ---

View File

@@ -54,7 +54,6 @@
## Phase 5 — Routing Intelligence & Scale ## Phase 5 — Routing Intelligence & Scale
- [ ] Intelligent model routing (by task type, privacy, context length) - [ ] Intelligent model routing (by task type, privacy, context length)
- [ ] Agent-to-agent task delegation across fleet - [ ] Agent-to-agent task delegation across fleet
- [ ] Permanent hosting on home server (currently on `scott_lpt`)
## Phase 6 — Infrastructure ## Phase 6 — Infrastructure
- [ ] Server DMZ finalized - [ ] Server DMZ finalized

View File

@@ -7,16 +7,41 @@
## 🔴 High Priority ## 🔴 High Priority
### [UX] User onboarding — guided model setup
New users complete password + persona setup and land directly in the chat with no working
AI model configured. This closes that gap with a guided Step 3 and a fallback for existing
users who skipped it or were onboarded before this existed.
Design spec: `documentation/ARCH__SYSTEM.md` § Onboarding Flow
- [x] **Setup Step 3 page** — new `/setup/model` GET/POST in `onboarding.py` — 2026-05-06
- Recommends OpenRouter: "one API key, access to Claude, Gemini, and dozens of other models"
- API key field + curated model dropdown (claude-3-5-haiku, claude-3-7-sonnet, gemini-2.0-flash, llama-3.3-70b)
- On submit: `save_host()` (OpenRouter) + `save_model()` + `set_role(chat, primary, model_id)` in `model_registry.py`
- Skip: `POST /setup/model/skip` reads `cx_setup_persona` cookie, redirects to chat; JS fetch on skip-link click
- Step labels updated: setup.html "1 of 3" / "2 of 3" / "3 of 3" (was "1 of 2" / "2 of 2")
- Standalone: `/setup/model` works without step labels (no `cx_setup_persona` cookie → no label)
- Persona creation now redirects to `/setup/model` instead of directly to chat
- [x] **Existing user banner** — displayed in chat if no role has a model assigned — 2026-05-06
- Checks `GET /backend` on load (uses `available_roles` — already does role-resolution)
- Dismissable amber callout strip above chat: "No AI model configured — Set up OpenRouter →"
- Dismissed via `localStorage` key `cx_no_model_banner_dismissed`; auto-removed when a model is added
- [x] **Settings quick-link** — amber card in settings Model Registry section — 2026-05-06
- Checks `GET /backend` on page load; shown if `available_roles` is empty
- Links to `/setup/model`
- [x] Update `cortex/static/HELP.md` — Getting Started section + model registry quick-connect note — 2026-05-06
- [x] Update `CLAUDE.md` — documented `/setup/model` endpoint, setup flow description, docs philosophy — 2026-05-06
### [Local] Local orchestrator — reach full parity with Gemini orchestrator ### [Local] Local orchestrator — reach full parity with Gemini orchestrator
`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`. `openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
When the `orchestrator` role resolves to a `local_openai` model it routes there When the `orchestrator` role resolves to a `local_openai` model it routes there
automatically. Remaining work is quality/reliability parity, not ground-up design. automatically. Remaining work is quality/reliability parity, not ground-up design.
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format - [x] Tool schema conversion — Gemini FunctionDeclaration → OpenAI tools format
(minor field rename, already partially done) - [x] Context budget: `_context_budget()` uses `context_k * 1000 * 0.75`, min 16k — 2026-05-06
- [ ] Context budget enforcement per iteration (4050k for E4B, 3540k for 26B A4B) - [x] Context compaction: `_compact_messages()` trims old tool results before each round and before the confirmation-gate call — 2026-05-06
- [ ] Context compaction — trim stale tool results mid-run when approaching limit - [x] Error handling: malformed tool args caught + logged; tool execution errors returned as strings
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls) - [ ] Retry logic on transient API errors (connection timeout, 429, 503)
- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming - [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design - [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1 - Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
@@ -117,7 +142,7 @@ Multi-user setup with real Gemini/Claude API costs. Track per-user token consump
so Scott can see who's spending what. so Scott can see who's spending what.
- [x] Count input + output tokens — local backend (OpenAI `usage` field) + Gemini API (`usage_metadata`) — 2026-05-05 - [x] Count input + output tokens — local backend (OpenAI `usage` field) + Gemini API (`usage_metadata`) — 2026-05-05
- [x] Append to `home/{user}/usage.json` — daily buckets, per-model breakdown — 2026-05-05 - [x] Append to `home/{user}/usage.json` — daily buckets, per-model breakdown — 2026-05-05
- [ ] Expose via `/api/usage` endpoint; add a summary row to the Settings page - [x] Expose via `/api/usage` + `/api/usage/summary` + `/api/usage/all` (admin); usage table in Settings — 2026-05-06
- [ ] Optional: soft spending limit with a warning toast when exceeded - [ ] Optional: soft spending limit with a warning toast when exceeded
### [Security] Tool call audit log — 2026-05-05 ### [Security] Tool call audit log — 2026-05-05
@@ -166,15 +191,6 @@ the foundation. What remains is removing the need to toggle manually.
- Fast/cheap queries → local E4B (25 t/s, no API cost) - Fast/cheap queries → local E4B (25 t/s, no API cost)
- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI - [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
### [Ops] Permanent fleet hosting — home server deployment
Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
home server for always-on reliability. `docker-compose.yml` already exists.
- [ ] Copy project to home server
- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
- [ ] WireGuard required for all access — not internet-exposed
- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
### [Future] Cortex Mesh — multi-instance fleet coordination ### [Future] Cortex Mesh — multi-instance fleet coordination
Each fleet device runs its own Cortex instance. Instances delegate tasks to each Each fleet device runs its own Cortex instance. Instances delegate tasks to each
other based on resources and specialisation. No central coordinator required. other based on resources and specialisation. No central coordinator required.