Compare commits

...

2 Commits

Author SHA1 Message Date
Scott Idem
f8f7cd75da feat: audit log, usage tracking UI, OpenAI orchestrator compaction, onboarding + docs
Tool audit log:
- Every orchestrator tool call logged to home/{user}/tool_audit/YYYY-MM-DD.jsonl
- Files panel sidebar: audit log group (collapsed), date-linked read-only table
- Admin endpoints: /api/audit/files, /api/audit/day, /api/audit/recent, /api/audit/stats
- Engine and model name recorded per entry

OpenAI orchestrator improvements:
- Context budget enforcement: 75% of model context_k (min 16k)
- Message compaction: truncates old tool results when approaching budget
- max_rounds respected per model config (intersected with server cap)

OpenRouter onboarding (setup.html, onboarding.py, app.js, settings.html):
- Step 3 of 3: /setup/model with curated model picker
- Chat banner for users on server-default model (informational, not alarmist)
- Settings quick-link card; /setup/model works standalone for existing users

Model registry + session store:
- set_role_config / get_role_config for per-role tool lists and system_append
- session_store: session rename, session name backfill endpoint

UI updates (app.js, index.html, style.css, local_llm.html):
- Role toggle in context panel
- Off-the-record mode
- Agent notes read-only viewer
- OPERATIONS.md loaded at T2+ in context

Documentation:
- HELP.md: full tool table, per-role tool sets, Agent Notes, usage tracking
- TOOLS.md: Agent Notes section, count corrected to 44
- ARCH__SYSTEM.md, ARCH__BACKENDS.md, MASTER.md updated to match reality
- CLAUDE.md: onboarding flow, documentation philosophy sections
- README.md: stack in practice, DeepSeek TUI mention, architecture diagram updated
- TODO__Agents.md: onboarding task completed with deviation notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 21:26:43 -04:00
Scott Idem
c02d2462b0 feat: agent notes, OpenRouter onboarding, usage tracking, per-role tools docs
Agent notes tool (cortex/tools/agent_notes.py):
- Private durable notepad for the orchestrator — not user-visible
- agent_notes_read/write/append/clear with 3 rolling backups
- Per-persona isolation via ContextVars; no TOOL_ROLES gating needed
- PROTOCOLS.md updated to make this a core proactive tool

OpenRouter guided onboarding:
- Setup Step 3 (/setup/model) — OpenRouter quick-connect with curated model list
- Amber banner in chat for users on server-default model
- Settings quick-link card (/settings/models OpenRouter section)
- POST /setup/model/skip for users who want to bypass Step 3
- Holly pre-configured: DeepSeek V4 Flash (OpenRouter) → Gemma Medium (local) → claude_cli

Usage tracking:
- cortex/routers/usage.py — GET /api/usage, /api/usage/summary, /api/usage/all (admin)

Documentation:
- HELP.md: Tools section rewritten — full tool table by category, per-role tool sets explained
- TOOLS.md: Agent Notes section added; count corrected to 44
- ARCH__SYSTEM.md, ARCH__BACKENDS.md, MASTER.md, CLAUDE.md, README.md updated
- TODO__Agents.md: onboarding task checked off with deviation notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 21:25:31 -04:00
27 changed files with 1347 additions and 151 deletions

View File

@@ -146,8 +146,8 @@ http://localhost:8000/docs
- Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables
### Context / Memory
- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (13)
- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full
- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (14)
- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = + last 2 sessions; Tier 4 = + last 7 sessions
- Memory files are written by the distiller or manually — do not delete them
### Security / Safety
@@ -160,6 +160,31 @@ http://localhost:8000/docs
- Passwords are bcrypt-hashed and stored in `home/{username}/auth.json` — never in `.env` or the DB
- Invite tokens are one-time-use, 72-hour expiry, stored in `home/{username}/invite.json`
### Onboarding Flow
New users follow a three-step setup before reaching the chat:
1. `GET /setup/{token}` → password form → `POST /setup/{token}` sets password + session cookie
2. `GET /setup/persona` → persona creation form → `POST /setup/persona` bootstraps persona directory
3. `GET /setup/model` → OpenRouter quick-connect → `POST /setup/model` saves host + model + role assignment
Step 3 is optional (skip link goes straight to `/{user}/{persona}`). `/setup/model` also works
standalone (accessible from Settings) for existing users who haven't configured a model.
All in `cortex/routers/onboarding.py`. Model writes use `model_registry.py`: `save_host()`,
`save_model()`, `set_role(username, "chat", "primary", model_id)`.
### Documentation Philosophy
Cortex is a no-black-box system. Docs must match reality — at all times.
- **Docs first:** When planning significant changes, update `TODO__Agents.md` and the relevant
`ARCH__*.md` to describe the intended design *before* implementing. This creates a spec to
implement against.
- **Verify after:** Once implementation is complete, re-read the pre-written docs and confirm
they match what was actually built. Update anything that drifted.
- **HELP.md is a user contract:** It describes what users can do. Never let it describe
features that don't exist or omit features that do.
- **CLAUDE.md + ARCH__*.md are the developer contract:** Update them as the architecture evolves.
- **Stale docs are bugs.** If you notice drift, fix it before moving on.
---
## Adding a New Tool
@@ -212,19 +237,23 @@ clearly asked for a directory to be unblocked.
---
## Current State (2026-04-28)
## Current State (2026-05-06)
Cortex is running and stable. All channels are live:
| Channel | Status | Notes |
|---|---|---|
| Web UI | ✅ Live | `https://cortex.dgrzone.com` |
| Web UI | ✅ Live | `https://cortex.dgrzone.com` — PWA-installable |
| Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply |
| Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format |
| Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config |
| Orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI |
| Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming, per-user multi-model config |
| Gemini orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI |
| Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; fires when orchestrator role → local model |
| Tool audit log | ✅ Live | Every tool call logged to `home/{user}/tool_audit/YYYY-MM-DD.jsonl` |
| Token usage tracking | ✅ Live | Per-user `home/{user}/usage.json`; summary in Settings |
| Web push | ✅ Live | VAPID push notifications; `web_push` tool; subscribe via ☰ menu |
Active users: scott (inara, developer), holly (tina), brian (wintermute)
Active users: scott (inara), holly (tina), brian (wintermute)
**40 orchestrator tools:** web_search, http_fetch,
file_read/list/write, shell_exec, claude_allow_dir,

View File

@@ -10,6 +10,43 @@ Cortex is a self-hosted multi-agent AI platform. It supports multiple users, eac
---
## Where Cortex Fits
AI tools aren't one-size-fits-all. Cortex exists in a specific niche — it's not trying to be everything.
**Cortex is a self-hosted persona platform.** It gives you a persistent AI companion with its own
identity, memory, and voice — reachable through your chat apps, not just a browser tab. It remembers
who you are across days and weeks. It can proactively message you on a schedule. It runs on your
own hardware, behind your own auth.
### What Cortex is good at
- **Being a consistent AI presence** — same persona, same memory, day after day
- **Multi-channel access** — web, Nextcloud Talk, Google Chat, all routed to the same brain
- **Proactive work** — scheduled messages, reminders, cron jobs that reach out to you
- **Multi-user households** — each person gets their own persona (Scott → Inara, Holly → Tina)
- **Private, offline-capable** — local models via Ollama when you don't want anything leaving the LAN
### What Cortex is not
- **Not a coding assistant.** Cortex lives in chat apps, not in your terminal or IDE.
Use Claude Code, DeepSeek TUI, Gemini CLI, or Copilot for code-level work — they specialize in reading and
editing project files. Cortex can't open a codebase.
- **Not a generic LLM chat UI.** Open WebUI and LibreChat are excellent model-switching frontends.
Cortex isn't a frontend — it's a platform with its own identity system, orchestrator, and memory
pipeline. Two different jobs.
- **Not a SaaS product.** Nobody else hosts your Cortex instance. Nobody else sees your conversations.
The trade-off is you manage the service yourself — `systemctl --user restart cortex`.
- **Not an agent framework.** LangChain, CrewAI, and similar are libraries for building AI pipelines.
Cortex is a running service with concrete personas, not an abstraction layer to build on top of.
### The stack in practice
- Use **Cortex** to talk to Inara — daily assistant, memory keeper, scheduled check-ins
- Use **Claude Code / DeepSeek TUI** to work *on* Cortex — code edits, architecture, debugging
- Use **Open WebUI** when you want to test a new model or run a quick prompt without persona context
Same AI, different interfaces for different jobs.
---
## Quick Orientation
| Directory | What it is |

View File

@@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag
from config import settings
from auth_middleware import SessionAuthMiddleware
from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator
from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit
from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit, usage
@asynccontextmanager
@@ -36,6 +36,7 @@ app.include_router(auth.router)
app.include_router(orchestrator.router)
app.include_router(push.router)
app.include_router(audit.router)
app.include_router(usage.router)
# Static files — must be mounted BEFORE ui.router so /static/* is matched first.
# ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc.

View File

@@ -36,6 +36,7 @@ V2 Schema:
"credential_id":str | null, # claude_cli only — references providers.anthropic.credentials
"account_id": str | null, # gemini_api only — references providers.google.accounts
"context_k": int, # context window in k tokens (informational)
"max_rounds": int | null, # per-model tool-loop cap; null = use orchestrator_max_rounds global
"tags": [str], # user-defined capability tags
},
],
@@ -642,7 +643,9 @@ def remove_host(username: str, host_id: str) -> bool:
def save_model(username: str, model_id: str | None, host_id: str,
label: str, model_name: str, context_k: int = 0,
tags: list[str] | None = None) -> str:
tags: list[str] | None = None,
max_rounds: int | None = None,
tools: bool = True) -> str:
"""Create or update a local_openai model entry. Returns the model ID."""
data = _load(username)
tags = tags or []
@@ -654,6 +657,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
m["label"] = label.strip() or model_name.strip()
m["model_name"] = model_name.strip()
m["context_k"] = context_k
m["max_rounds"] = max_rounds
m["tools"] = tools
m["tags"] = tags
_save(username, data)
return model_id
@@ -668,6 +673,8 @@ def save_model(username: str, model_id: str | None, host_id: str,
"provider": "local",
"host_id": host_id,
"context_k": context_k,
"max_rounds": max_rounds,
"tools": tools,
"tags": tags,
})
_save(username, data)
@@ -679,7 +686,9 @@ def save_cloud_model(username: str, model_id: str | None,
account_id: str | None = None,
credential_id: str | None = None,
context_k: int = 0,
tags: list[str] | None = None) -> str:
tags: list[str] | None = None,
max_rounds: int | None = None,
tools: bool = True) -> str:
"""
Create or update an Anthropic or Google model entry. Returns the model ID.
@@ -698,6 +707,8 @@ def save_cloud_model(username: str, model_id: str | None,
"model_name": model_name.strip(),
"provider": provider,
"context_k": context_k,
"max_rounds": max_rounds,
"tools": tools,
"tags": tags,
}
if account_id:

View File

@@ -273,18 +273,20 @@ async def _run_from_messages(
final_response = ""
budget = _context_budget(model_cfg)
for round_num in range(starting_round, settings.orchestrator_max_rounds):
per_model_limit = (model_cfg or {}).get("max_rounds") or settings.orchestrator_max_rounds
effective_limit = min(per_model_limit, settings.orchestrator_max_rounds)
for round_num in range(starting_round, effective_limit):
messages = _compact_messages(messages, budget)
est = _estimate_tokens(messages)
logger.info("OpenAI orchestrator round %d / %d model=%s ~%d tokens",
round_num + 1, settings.orchestrator_max_rounds, model_name, est)
round_num + 1, effective_limit, model_name, est)
response = await client.chat.completions.create(
model=model_name,
messages=messages,
tools=active_tools,
tool_choice="auto",
)
call_kwargs: dict = {"model": model_name, "messages": messages}
if active_tools:
call_kwargs["tools"] = active_tools
call_kwargs["tool_choice"] = "auto"
response = await client.chat.completions.create(**call_kwargs)
choice = response.choices[0]
msg = choice.message
@@ -339,12 +341,11 @@ async def _run_from_messages(
tool_call_log.append({"tool": pt["name"], "args": pt["args"], "result": "[awaiting confirmation]"})
messages.append({"role": "tool", "tool_call_id": pt["tool_call_id"], "content": placeholder})
conf_resp = await client.chat.completions.create(
model=model_name,
messages=messages,
tools=active_tools,
tool_choice="none",
)
messages = _compact_messages(messages, budget)
conf_call: dict = {"model": model_name, "messages": messages, "tool_choice": "none"}
if active_tools:
conf_call["tools"] = active_tools
conf_resp = await client.chat.completions.create(**conf_call)
final_response = conf_resp.choices[0].message.content or (
"This action requires your explicit confirmation before it can proceed."
)
@@ -375,9 +376,9 @@ async def _run_from_messages(
break
else:
logger.warning("OpenAI orchestrator hit max rounds (%d)", settings.orchestrator_max_rounds)
logger.warning("OpenAI orchestrator hit max rounds (%d)", effective_limit)
final_response = (
f"Reached the tool iteration limit ({settings.orchestrator_max_rounds} rounds). "
f"Reached the tool iteration limit ({effective_limit} rounds). "
"Here is what was gathered:\n\n"
+ "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log)
)
@@ -405,7 +406,10 @@ def _build_client(
if host_type == "openwebui":
base_url = base_url + "/api"
client = AsyncOpenAI(base_url=base_url, api_key=api_key)
active_tools = get_openai_tools_for_role(user_role, tool_list)
if model_cfg.get("tools") is False:
active_tools = []
else:
active_tools = get_openai_tools_for_role(user_role, tool_list)
return client, model_name, active_tools

View File

@@ -295,6 +295,53 @@ async def rename_session_endpoint(
return {"ok": True, "session_id": session_id, "name": req.name.strip()}
@router.post("/api/sessions/backfill-names")
async def backfill_session_names(
request: Request,
user: str = Query(""),
persona: str = Query(""),
) -> dict:
"""Name every unnamed session using its first user message (truncated to 60 chars).
Idempotent — only touches sessions that have no name set.
user/persona default to the JWT session user + last-used persona cookie."""
# Resolve user from JWT if not provided
if not user:
token = request.cookies.get(COOKIE_NAME)
if not token:
raise HTTPException(status_code=401, detail="Not authenticated")
try:
user = decode_token(token)
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid session")
# Resolve persona from cookie if not provided
if not persona:
from persona import list_user_personas
persona_cookie = request.cookies.get("cx_last_persona", "")
available = list_user_personas(user)
persona = persona_cookie if persona_cookie in available else (available[0] if available else "")
if not persona:
raise HTTPException(status_code=400, detail="No persona found for user")
_set_ctx(user, persona)
sessions = list_all()
named = 0
for s in sessions:
if s.get("name"):
continue
messages = load_session(s["session_id"])
first_user = next((m for m in messages if m.get("role") == "user"), None)
if not first_user:
continue
text = (first_user.get("content") or "").strip()
if not text:
continue
auto_name = text[:60].rstrip() + ("" if len(text) > 60 else "")
rename_session(s["session_id"], auto_name)
named += 1
return {"ok": True, "named": named, "total": len(sessions)}
@router.delete("/sessions/{session_id}")
async def delete_session_endpoint(
session_id: str,

View File

@@ -1,25 +1,50 @@
"""
Manual memory distillation endpoints.
POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM)
POST /distill/mid — summarize short → MEMORY_MID.md (LLM)
POST /distill/long — integrate mid → MEMORY_LONG.md (LLM)
POST /distill/all — run all three in sequence
POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM)
POST /distill/mid — summarize short → MEMORY_MID.md (LLM)
POST /distill/long — integrate mid → MEMORY_LONG.md (LLM)
POST /distill/all — run all three in sequence
POST /distill/rebuild — wipe mid + long, then run all three from scratch
All endpoints require ?user=<username>&persona=<name> query params so distillation
targets the correct persona. Without them, the request is rejected (no silent fallback
to server defaults — that caused wrong-user distillation in a multi-user setup).
All endpoints require ?user=<username>&persona=<name> query params.
Concurrency: one distillation at a time per persona. A second request while one
is running returns 409 immediately — no silent queuing.
"""
import asyncio
from datetime import datetime, timedelta
from fastapi import APIRouter, HTTPException, Query
from memory_distiller import distill_short, distill_mid, distill_long
from persona import validate as validate_persona, set_context
from persona import validate as validate_persona, set_context, persona_path as _persona_path
import scheduler
router = APIRouter(prefix="/distill")
# Per-persona asyncio lock. Key: (user, persona)
_LOCKS: dict[tuple, asyncio.Lock] = {}
_LOCKS_META: dict[tuple, str] = {} # key → which step is currently running
# Minimum time between successive runs of each endpoint, per persona.
# Prevents accidental rapid-fire runs and token waste.
_COOLDOWNS: dict[tuple, timedelta] = {
"short": timedelta(minutes=1),
"mid": timedelta(minutes=30),
"long": timedelta(hours=6),
"all": timedelta(hours=1),
"rebuild": timedelta(hours=6),
}
_LAST_RUN: dict[tuple, datetime] = {} # key: (user, persona, endpoint)
def _get_lock(user: str, persona: str) -> asyncio.Lock:
key = (user, persona)
if key not in _LOCKS:
_LOCKS[key] = asyncio.Lock()
return _LOCKS[key]
def _resolve(user: str, persona: str) -> tuple[str, str]:
"""Validate and set persona context. Raises 404 if the persona doesn't exist."""
try:
u, p = validate_persona(user, persona)
except Exception:
@@ -28,13 +53,51 @@ def _resolve(user: str, persona: str) -> tuple[str, str]:
return u, p
def _check_lock(user: str, persona: str) -> asyncio.Lock:
"""Return the lock if free, raise 409 if already held."""
lock = _get_lock(user, persona)
if lock.locked():
step = _LOCKS_META.get((user, persona), "distillation")
raise HTTPException(
status_code=409,
detail=f"A {step} is already running for {persona} — please wait for it to finish.",
)
return lock
def _check_cooldown(user: str, persona: str, endpoint: str) -> None:
"""Raise 429 if the endpoint was run too recently for this persona."""
cooldown = _COOLDOWNS.get(endpoint)
if not cooldown:
return
key = (user, persona, endpoint)
last = _LAST_RUN.get(key)
if last:
elapsed = datetime.now() - last
if elapsed < cooldown:
remaining = cooldown - elapsed
mins = int(remaining.total_seconds() // 60)
secs = int(remaining.total_seconds() % 60)
wait = f"{mins}m {secs}s" if mins else f"{secs}s"
raise HTTPException(
status_code=429,
detail=f"{endpoint} was just run — please wait {wait} before running again.",
)
def _record_run(user: str, persona: str, endpoint: str) -> None:
_LAST_RUN[(user, persona, endpoint)] = datetime.now()
@router.get("/status")
async def distill_status() -> dict:
"""Show auto-distillation schedule and next run times."""
from config import settings
# Include which personas are currently distilling
active = [f"{u}/{p}" for (u, p), lock in _LOCKS.items() if lock.locked()]
return {
"enabled": settings.auto_distill,
"jobs": scheduler.status(),
"active": active,
"config": {
"short": settings.auto_distill_short,
"mid": settings.auto_distill_mid,
@@ -49,7 +112,16 @@ async def do_distill_short(
persona: str = Query(...),
) -> dict:
u, p = _resolve(user, persona)
return {"ok": True, **distill_short(u, p)}
_check_cooldown(u, p, "short")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "short distill"
try:
result = distill_short(u, p)
_record_run(u, p, "short")
return {"ok": True, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/mid")
@@ -58,8 +130,17 @@ async def do_distill_mid(
persona: str = Query(...),
) -> dict:
u, p = _resolve(user, persona)
result = await distill_mid(u, p)
return {"ok": "error" not in result, **result}
_check_cooldown(u, p, "mid")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "mid distill"
try:
result = await distill_mid(u, p)
if "error" not in result:
_record_run(u, p, "mid")
return {"ok": "error" not in result, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/long")
@@ -68,8 +149,17 @@ async def do_distill_long(
persona: str = Query(...),
) -> dict:
u, p = _resolve(user, persona)
result = await distill_long(u, p)
return {"ok": "error" not in result, **result}
_check_cooldown(u, p, "long")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "long distill"
try:
result = await distill_long(u, p)
if "error" not in result:
_record_run(u, p, "long")
return {"ok": "error" not in result, **result}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/all")
@@ -78,14 +168,71 @@ async def do_distill_all(
persona: str = Query(...),
) -> dict:
u, p = _resolve(user, persona)
short_result = distill_short(u, p)
mid_result = await distill_mid(u, p)
if "error" in mid_result:
return {"ok": False, "short": short_result, "mid": mid_result}
long_result = await distill_long(u, p)
return {
"ok": "error" not in long_result,
"short": short_result,
"mid": mid_result,
"long": long_result,
}
_check_cooldown(u, p, "all")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "full distill"
try:
short_result = distill_short(u, p)
mid_result = await distill_mid(u, p)
if "error" in mid_result:
return {"ok": False, "short": short_result, "mid": mid_result}
long_result = await distill_long(u, p)
ok = "error" not in long_result
if ok:
_record_run(u, p, "all")
return {
"ok": ok,
"short": short_result,
"mid": mid_result,
"long": long_result,
}
finally:
_LOCKS_META.pop((u, p), None)
@router.post("/rebuild")
async def do_distill_rebuild(
user: str = Query(...),
persona: str = Query(...),
) -> dict: # noqa: E501
"""Wipe MEMORY_MID and MEMORY_LONG (with backups), then run short → mid → long.
Use when memories have drifted, been corrupted, or you want a clean slate
rebuilt purely from session logs. Hand-edited content will be replaced.
"""
u, p = _resolve(user, persona)
_check_cooldown(u, p, "rebuild")
lock = _check_lock(u, p)
async with lock:
_LOCKS_META[(u, p)] = "memory rebuild"
try:
from memory_distiller import _rotate_backup, _read
inara_dir = _persona_path(u, p)
# Back up then wipe mid and long before rebuilding
for name in ("MEMORY_MID.md", "MEMORY_LONG.md"):
path = inara_dir / name
if path.exists():
_rotate_backup(path)
path.write_text(
f"# {name}\n\n*Cleared for rebuild — {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}.*\n"
)
short_result = distill_short(u, p)
mid_result = await distill_mid(u, p)
if "error" in mid_result:
return {"ok": False, "short": short_result, "mid": mid_result, "rebuilt": True}
long_result = await distill_long(u, p)
ok = "error" not in long_result
if ok:
_record_run(u, p, "rebuild")
return {
"ok": ok,
"short": short_result,
"mid": mid_result,
"long": long_result,
"rebuilt": True,
}
finally:
_LOCKS_META.pop((u, p), None)

View File

@@ -27,10 +27,21 @@ ALLOWED = {
"MEMORY_SHORT.bak1.md",
"MEMORY_SHORT.bak2.md",
"HELP.md",
# Agent private notes — backups only; AGENT_NOTES.md itself is agent-only
"AGENT_NOTES.bak1.md",
"AGENT_NOTES.bak2.md",
"AGENT_NOTES.bak3.md",
}
# Files that can be read via the panel but not written by users
READ_ONLY = {
"AGENT_NOTES.bak1.md",
"AGENT_NOTES.bak2.md",
"AGENT_NOTES.bak3.md",
}
# Files served from home/{user}/ instead of persona path
USER_FILES = {"email_allowlist.json"}
USER_FILES = {"email_allowlist.json", "usage.json"}
def _resolve(user: str, persona: str) -> None:
@@ -92,7 +103,11 @@ async def get_file(
p = _path(filename, user=user)
if not p.exists():
raise HTTPException(status_code=404, detail=f"{filename} does not exist")
return {"name": filename, "content": p.read_text()}
return {
"name": filename,
"content": p.read_text(),
"readonly": filename in READ_ONLY,
}
class FileWrite(BaseModel):
@@ -106,6 +121,8 @@ async def save_file(
user: str = Query("scott"),
persona: str = Query("inara"),
) -> dict:
if filename in READ_ONLY:
raise HTTPException(status_code=403, detail=f"{filename} is read-only.")
_resolve(user, persona)
p = _path(filename, user=user)
p.write_text(req.content)

View File

@@ -159,7 +159,8 @@ def _render(username: str, success: str = "", error: str = "") -> str:
else:
secondary = default_secondary
ctx = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else ""
ctx = f'<span class="ctx-badge">{m.get("context_k",0)}k</span>' if m.get("context_k") else ""
no_tools = '' if m.get("tools", True) else '<span class="pbadge pb-notools">no tools</span>'
tags_html = " ".join(f'<span class="tag">{t}</span>' for t in (m.get("tags") or []))
sec = f'<span class="model-host">{secondary}</span>' if secondary else ""
@@ -201,13 +202,15 @@ def _render(username: str, success: str = "", error: str = "") -> str:
cur_label = m.get("label", "")
cur_model_name = m.get("model_name", "")
cur_ctx = m.get("context_k", 0) or 0
cur_max_rounds = m.get("max_rounds") or 0
cur_tools = m.get("tools", True)
cur_tags = ", ".join(m.get("tags") or [])
model_rows += f'''
<div class="model-row" id="model-{m["id"]}">
<div class="model-row-header">
<div class="model-info">
<div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}</div>
<div>{badge}<span class="model-label">{m.get("label") or m.get("model_name","")}</span>{ctx}{no_tools}</div>
<span class="model-name">{m.get("model_name","")}</span>
{sec}
<div class="tag-row">{tags_html}</div>
@@ -239,8 +242,22 @@ def _render(username: str, success: str = "", error: str = "") -> str:
{extra_fields}
<div class="field-row">
<div class="field" style="flex:0 0 auto">
<label>Context (k)</label>
<input type="number" name="context_k" value="{cur_ctx}" min="0">
<label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k)</label>
<input type="number" name="context_k" value="{cur_ctx}" min="0"
title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
<input type="number" name="max_rounds" value="{cur_max_rounds}" min="0"
title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
<select name="tools"
title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
<option value="1" {'selected' if cur_tools else ''}>Supported</option>
<option value="0" {'' if cur_tools else 'selected'}>Not supported</option>
</select>
</div>
<div class="field">
<label>Tags</label>
@@ -426,6 +443,8 @@ async def add_model(
provider: str = Form("local"),
label: str = Form(""),
context_k: int = Form(0),
max_rounds: int = Form(0),
tools: int = Form(1),
tags: str = Form(""),
# local-only fields
host_id: str = Form(""),
@@ -439,14 +458,17 @@ async def add_model(
if not username:
return RedirectResponse("/login", status_code=302)
tag_list = [t.strip() for t in tags.split(",") if t.strip()]
tag_list = [t.strip() for t in tags.split(",") if t.strip()]
max_rounds_ = max_rounds or None
tools_bool = tools != 0
if provider == "local":
if not model_name.strip():
return HTMLResponse(_render(username, error="Model name is required."))
if not host_id.strip():
return HTMLResponse(_render(username, error="Select a host."))
reg.save_model(username, None, host_id, label, model_name, context_k, tag_list)
reg.save_model(username, None, host_id, label, model_name, context_k, tag_list,
max_rounds=max_rounds_, tools=tools_bool)
display = label or model_name
elif provider in ("google", "anthropic"):
@@ -459,6 +481,7 @@ async def add_model(
account_id=account_id or None,
credential_id=credential_id or None,
context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool,
)
display = label or cloud_model_name
else:
@@ -476,6 +499,8 @@ async def edit_model(
label: str = Form(""),
model_name: str = Form(""),
context_k: int = Form(0),
max_rounds: int = Form(0),
tools: int = Form(1),
tags: str = Form(""),
host_id: str = Form(""),
account_id: str = Form(""),
@@ -486,17 +511,22 @@ async def edit_model(
return RedirectResponse("/login", status_code=302)
if not model_name.strip():
return HTMLResponse(_render(username, error="Model name is required."))
tag_list = [t.strip() for t in tags.split(",") if t.strip()]
tag_list = [t.strip() for t in tags.split(",") if t.strip()]
max_rounds_ = max_rounds or None
tools_bool = tools != 0
if mtype == "local_openai":
if not host_id.strip():
return HTMLResponse(_render(username, error="Select a host for this model."))
reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list)
reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list,
max_rounds=max_rounds_, tools=tools_bool)
elif mtype == "gemini_api":
reg.save_cloud_model(username, model_id, "google", model_name, label,
account_id=account_id or None, context_k=context_k, tags=tag_list)
account_id=account_id or None, context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool)
elif mtype == "claude_cli":
reg.save_cloud_model(username, model_id, "anthropic", model_name, label,
credential_id=credential_id or "cli", context_k=context_k, tags=tag_list)
credential_id=credential_id or "cli", context_k=context_k, tags=tag_list,
max_rounds=max_rounds_, tools=tools_bool)
else:
return HTMLResponse(_render(username, error=f"Unknown model type: {mtype}"))
display = label.strip() or model_name.strip()

View File

@@ -1,11 +1,13 @@
"""
Onboarding router — invite-based setup + persona creation.
Onboarding router — invite-based setup + persona creation + model connect.
Routes:
GET /setup/{token} → show password setup form (step 1)
POST /setup/{token} → set password, redirect to persona step
GET /setup/persona → show persona creation form (step 2, requires auth)
POST /setup/persona → create persona, redirect to /{user}/{persona}
POST /setup/persona → create persona, redirect to /setup/model
GET /setup/model → OpenRouter quick-connect (step 3, also standalone)
POST /setup/model → save host + model + assign to chat role, redirect to chat
"""
import logging
@@ -21,6 +23,7 @@ from auth_utils import (
)
from persona_template import create_persona
from persona import list_user_personas, validate as validate_persona
import model_registry
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/setup")
@@ -114,7 +117,11 @@ async def persona_submit(
description=description.strip(),
)
logger.info("persona created: %s/%s", username, persona_name)
return RedirectResponse(f"/{username}/{persona_name}", status_code=302)
# Step 3: guided model setup before entering the chat
resp = RedirectResponse("/setup/model", status_code=302)
# Remember which persona to land on after model setup
resp.set_cookie("cx_setup_persona", f"{username}/{persona_name}", max_age=3600, httponly=True, samesite="lax")
return resp
# ---------------------------------------------------------------------------
@@ -178,3 +185,126 @@ async def setup_submit(
return resp
return HTMLResponse(_setup_page("Unknown step."), status_code=400)
# ---------------------------------------------------------------------------
# Step 3 — model connect (OpenRouter quick-connect, also standalone)
# ---------------------------------------------------------------------------
# Curated model list shown in the Step 3 dropdown.
_OPENROUTER_MODELS = [
("anthropic/claude-3-5-haiku-20241022", "Claude 3.5 Haiku — Fast & affordable"),
("anthropic/claude-3-7-sonnet-20250219", "Claude 3.7 Sonnet — Smarter Claude"),
("google/gemini-2.0-flash-001", "Gemini 2.0 Flash — Fast Google model"),
("meta-llama/llama-3.3-70b-instruct", "Llama 3.3 70B — Open source"),
]
def _model_page(error: str = "", from_setup: bool = False) -> str:
html = (_STATIC / "setup.html").read_text()
# Hide steps 1 and 2 inline; show step 3
html = html.replace('<div id="step-password">', '<div id="step-password" style="display:none">')
html = html.replace('<div id="step-persona" style="display:none">', '<div id="step-persona" style="display:none">')
html = html.replace('<div id="step-model" style="display:none">', '<div id="step-model">')
if from_setup:
html = html.replace("<!-- SETUP_STEP3_LABEL -->", "Step 3 of 3")
if error:
html = html.replace("<!-- ERROR_MODEL -->", f'<p class="error">{error}</p>')
return html
@router.post("/model/skip", include_in_schema=False)
async def model_skip(request: Request):
"""Skip model setup — redirect to the remembered persona or user root."""
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
username = None
if token:
try:
username = decode_token(token)
except jwt.InvalidTokenError:
pass
dest_cookie = request.cookies.get("cx_setup_persona", "")
dest = f"/{dest_cookie}" if dest_cookie else (f"/{username}" if username else "/")
resp = RedirectResponse(dest, status_code=302)
resp.delete_cookie("cx_setup_persona")
return resp
@router.get("/model", include_in_schema=False)
async def model_page(request: Request):
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
if not token:
return RedirectResponse("/login", status_code=302)
try:
decode_token(token)
except jwt.InvalidTokenError:
return RedirectResponse("/login", status_code=302)
from_setup = bool(request.cookies.get("cx_setup_persona"))
return HTMLResponse(_model_page(from_setup=from_setup))
@router.post("/model", include_in_schema=False)
async def model_submit(
request: Request,
api_key: str = Form(...),
model_name: str = Form(...),
):
from auth_utils import decode_token
import jwt
token = request.cookies.get(COOKIE_NAME)
if not token:
return RedirectResponse("/login", status_code=302)
try:
username = decode_token(token)
except jwt.InvalidTokenError:
return RedirectResponse("/login", status_code=302)
api_key = api_key.strip()
model_name = model_name.strip()
if not api_key:
from_setup = bool(request.cookies.get("cx_setup_persona"))
return HTMLResponse(_model_page("API key is required.", from_setup=from_setup), status_code=422)
# Save OpenRouter as a host
host_id = model_registry.save_host(
username=username,
host_id=None,
label="OpenRouter",
api_url="https://openrouter.ai/api/v1",
api_key=api_key,
host_type="openai",
)
# Find label for selected model
label = next((lbl for mn, lbl in _OPENROUTER_MODELS if mn == model_name), model_name)
label = label.split("")[0] # keep just the model name part
# Save model entry
mid = model_registry.save_model(
username=username,
model_id=None,
host_id=host_id,
label=label,
model_name=model_name,
context_k=128,
tools=True,
)
# Assign as chat role primary
model_registry.set_role(username, "chat", "primary", mid)
logger.info("openrouter setup complete: %s%s", username, model_name)
# Redirect to chat (use remembered persona, or user root)
dest_cookie = request.cookies.get("cx_setup_persona", "")
dest = f"/{dest_cookie}" if dest_cookie else f"/{username}"
resp = RedirectResponse(dest, status_code=302)
resp.delete_cookie("cx_setup_persona")
return resp

104
cortex/routers/usage.py Normal file
View File

@@ -0,0 +1,104 @@
"""
Usage / token-tracking endpoints.
Self-service (any authenticated user, own data):
GET /api/usage → full usage dict {date: {model_key: {calls, prompt_tokens, completion_tokens}}}
GET /api/usage/summary → aggregate totals per model key, with friendly labels resolved from registry
Admin-only (cross-user aggregation):
GET /api/usage/all → summary for every user {username: summary_dict}
"""
import jwt
from fastapi import APIRouter, HTTPException, Request
from auth_utils import COOKIE_NAME, decode_token, get_user_role
from persona import list_users
import model_registry
import usage_tracker
router = APIRouter(prefix="/api/usage")
def _session_user(request: Request) -> str:
token = request.cookies.get(COOKIE_NAME)
if not token:
raise HTTPException(status_code=401, detail="Not authenticated")
try:
return decode_token(token)
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid session")
def _build_label_map(username: str) -> dict[str, str]:
"""Build a map from usage key (backend/model_name) → registered label."""
label_map: dict[str, str] = {}
try:
for m in model_registry.get_all_models(username):
model_name = m.get("model_name", "")
label = m.get("label", "")
host_type = m.get("host_type", "")
if not model_name or not label:
continue
# local models: key is "local/{model_name}"
if host_type in ("openwebui", "ollama", "openai_compatible"):
label_map[f"local/{model_name}"] = label
# cloud Gemini: key is "gemini_api/{model_name}"
elif host_type == "google":
label_map[f"gemini_api/{model_name}"] = label
except Exception:
pass
return label_map
def _summarize(data: dict, label_map: dict[str, str] | None = None) -> list[dict]:
"""Collapse date-keyed usage dict into per-model totals, sorted by total tokens desc."""
totals: dict[str, dict] = {}
for _date, models in data.items():
for key, counts in models.items():
t = totals.setdefault(key, {"calls": 0, "prompt_tokens": 0, "completion_tokens": 0})
t["calls"] += counts.get("calls", 0)
t["prompt_tokens"] += counts.get("prompt_tokens", 0)
t["completion_tokens"] += counts.get("completion_tokens", 0)
result = []
for key, counts in totals.items():
entry = {
"key": key,
"label": (label_map or {}).get(key) or key,
"calls": counts["calls"],
"prompt_tokens": counts["prompt_tokens"],
"completion_tokens": counts["completion_tokens"],
"total_tokens": counts["prompt_tokens"] + counts["completion_tokens"],
}
result.append(entry)
result.sort(key=lambda x: x["total_tokens"], reverse=True)
return result
@router.get("")
async def get_usage(request: Request) -> dict:
"""Return the raw daily usage log for the authenticated user."""
username = _session_user(request)
return usage_tracker.read_usage(username)
@router.get("/summary")
async def get_usage_summary(request: Request) -> list:
"""Return per-model totals (all time) for the authenticated user, with friendly labels."""
username = _session_user(request)
label_map = _build_label_map(username)
return _summarize(usage_tracker.read_usage(username), label_map)
@router.get("/all")
async def get_all_usage(request: Request) -> dict:
"""Admin: return per-model summary for every user."""
username = _session_user(request)
if get_user_role(username) != "admin":
raise HTTPException(status_code=403, detail="Admin access required")
result = {}
for user in list_users():
label_map = _build_label_map(user)
result[user] = _summarize(usage_tracker.read_usage(user), label_map)
return result

View File

@@ -112,16 +112,17 @@ def list_all() -> list[dict]:
if not d.exists():
return []
results = []
for f in sorted(d.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True):
for f in d.glob("*.json"):
try:
data = json.loads(f.read_text())
entry = {
results.append({
"session_id": data["session_id"],
"name": data.get("name", ""),
"updated": data.get("updated"),
"message_count": len(data.get("messages", [])),
}
results.append(entry)
"_sort_key": data.get("updated") or f.stat().st_mtime,
})
except Exception:
pass
results.sort(key=lambda s: s.pop("_sort_key"), reverse=True)
return results

View File

@@ -6,7 +6,24 @@
and are appended automatically by help.html when present.
-->
*Last updated: 2026-05-05*
*Last updated: 2026-05-08*
---
## Getting Started
If this is your first time using Cortex, you need one thing before the chat will work: an AI model connected to your account.
**Fastest path — OpenRouter:**
OpenRouter gives you access to Claude, Gemini, and dozens of other models with a single API key.
1. Get a free API key at [openrouter.ai/keys](https://openrouter.ai/keys)
2. Go to **☰ → Account → [Set up OpenRouter →]** (shown automatically if no model is configured)
3. Paste your key, pick a starting model, click **Connect**
That's it — you're ready to chat.
**Already past setup but seeing errors?** Go to **☰ → Account → Model Registry → Manage models** and confirm a model is assigned to the **Chat** role (Primary slot). If all slots are empty, add a model first.
---
@@ -52,19 +69,45 @@ Click the **⚡** button in the input row to enable the Tools toggle. When lit (
The orchestrator runs a multi-step tool loop:
1. The **orchestrator model** reasons about the request and calls tools as needed — web search, file reads, task management, shell commands, Aether Journals, and more
1. The **orchestrator model** reasons about the request and calls tools as needed
2. It produces an enriched summary of what it found
3. The **responder model** (set by the active Role) receives that context and writes the final user-facing reply
4. A `⚡ N tool calls: …` note appears below the response listing what was used
The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**. By default this is Gemini API.
The full tool reference is in the **Tools** tab. 40 tools across web, files, shell, system, tasks, cron, reminders, scratchpad, notifications, and Aether Journals.
The ⚡ toggle is **independent of the Role selector** — you can use any role (chat, coder, research, etc.) with or without tools. The orchestrator model is configured in **Account → Model Registry → Role Assignments → Orchestrator**.
Tools mode is best for tasks requiring research, multi-step reasoning, or side effects (e.g. "search for X", "add a task", "what's on my list?", "append this to my journal"). Regular chat is faster for conversational turns.
Orchestrated sessions persist to history exactly like regular chat.
### Available Tools
40 tools across 11 categories. Each tool schema is sent to the model on every orchestrated call — fewer active tools means fewer tokens per call.
| Category | Tools |
|---|---|
| **Web** | `web_search`, `http_fetch` |
| **Files** | `file_read`, `file_list`, `file_write` |
| **Shell** | `shell_exec`, `claude_allow_dir` |
| **System** | `cortex_restart`, `cortex_logs`, `cortex_status`, `cortex_update` |
| **Tasks** | `task_list`, `task_create`, `task_update`, `task_complete` |
| **Cron** | `cron_list`, `cron_add`, `cron_remove`, `cron_toggle` |
| **Reminders** | `reminders_add`, `reminders_list`, `reminders_remove`, `reminders_clear` |
| **Scratchpad** | `scratch_read`, `scratch_write`, `scratch_append`, `scratch_clear` |
| **Notifications** | `web_push`, `email_send`, `nc_talk_send` |
| **Aether Journals** | `ae_journal_list/search`, `ae_journal_entries_list`, `ae_journal_entry_read/create/update/disable/append/prepend` |
| **Agent Notes** | `agent_notes_read`, `agent_notes_write`, `agent_notes_append`, `agent_notes_clear` |
File, Shell, System, and some Notification tools are **admin-only** and not visible to regular users.
### Per-Role Tool Sets
Each role can be configured with a specific subset of tool categories. When a role has a tool subset configured, only those tools are sent to the orchestrator — the rest are invisible to the model for that session.
**Example:** a Coder role might only need Web, Files, Shell, and Agent Notes. A Research role might only need Web. Configuring this avoids sending schemas for 30+ irrelevant tools on every call.
Configure per-role tool sets in **Account → Model Registry → Role Assignments** — expand a role card to see the category checkboxes. The default (no checkboxes selected) sends all tools the user has access to.
---
## Sessions
@@ -123,11 +166,59 @@ Each response shows a **model tag** (bottom-right of message) with the model lab
---
## Account Settings
**Navigate to:** ☰ (top-right menu) → **Account**
| Section | What you can do |
|---|---|
| **Account** | View your username, role badge (Admin / User), rename your username |
| **Connected Accounts** | See which Google account is linked for OAuth sign-in |
| **Email Allowlist** | Regex patterns controlling which addresses the `email_send` tool can reach |
| **Notifications** | Set which channel (NC Talk, Google Chat, email) Inara uses for proactive messages |
| **Tool Permissions** | Allow or block specific orchestrator tools for your account |
| **Usage** | Token consumption by model — see below |
| **Browser Cache** | Clear UI preferences stored locally (theme, font size, session ID, etc.) |
| **Model Registry** | Configure AI providers, local hosts, and role assignments |
| **Change Password** | Update your login password |
| **Personas** | List and rename your personas |
---
## Usage
Token consumption is tracked automatically for API-backed models. **Navigate to:** ☰ → **Account****Usage** section.
The table shows all-time totals per model key, with columns for:
| Column | Meaning |
|---|---|
| **Model** | `backend/model-name` key (e.g. `gemini_api/gemini-2.5-flash`, `local/deepseek-v4`) |
| **Calls** | Number of API calls made |
| **Prompt** | Input tokens sent |
| **Output** | Completion tokens received |
| **Total** | Prompt + Output |
Values ≥ 1,000 are displayed as `k` (e.g. `24.3k`).
**What is and isn't tracked:**
- ✅ Gemini API calls (orchestrator, distillation)
- ✅ Local OpenAI-compatible calls (Open WebUI, Ollama, OpenRouter)
- ✗ Claude CLI — no structured token data is returned by the subprocess
- ✗ Gemini CLI — same reason
The raw data lives in `home/{username}/usage.json` and is also accessible via the Files panel or the API.
---
## Model Registry
Configure which AI models are available and which handles each task type.
**Navigate to:** ☰ (top-right menu) → **Account** → scroll to **Model Registry****Manage models →**
**New user quick path:** ☰ → **Account****Set up OpenRouter →** (the guided wizard adds a host, model, and role assignment in one step).
**Full manual path:** ☰ → **Account** → scroll to **Model Registry****Manage models →**
---
@@ -142,10 +233,16 @@ Do this before adding models — models need a provider account or local host to
2. Enter a label (e.g. "Work", "Personal") and your API key
3. Get a free key at [aistudio.google.com/apikey](https://aistudio.google.com/apikey)
**Local hosts** (Open WebUI, Ollama, OpenRouter, etc.):
**OpenRouter** (recommended for new users — one key for many models):
1. Get a key at [openrouter.ai/keys](https://openrouter.ai/keys)
2. Scroll to **Local Hosts****+ Add host**
3. Label: "OpenRouter", URL: `https://openrouter.ai/api/v1`, paste your key, Type: OpenAI-compatible
4. Click **Fetch models** to verify, then add models from the fetched list
**Other local hosts** (Open WebUI, Ollama, LM Studio, etc.):
1. Scroll to **Local Hosts** → click **+ Add host** to expand the form
2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and optional API key
3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible (for OpenRouter, LM Studio, etc.)
3. Set **Type**: Open WebUI / Ollama, or OpenAI-compatible
4. Click **Fetch models** on the saved host card to verify connectivity
---
@@ -178,6 +275,8 @@ Scroll to **Role Assignments** at the bottom of the page. Each role has **Primar
Leave all slots empty to use the server default.
**Per-role tool sets:** Expand any role card to configure which tool categories the orchestrator can use when that role is active. Unchecked categories are hidden from the model entirely — reducing token overhead on every orchestrated call. Leaving all categories unchecked means all tools the user has access to are available (the default).
---
## Nextcloud Talk Bot
@@ -245,12 +344,12 @@ Controls how much context is prepended to each LLM call:
| Tier | Loads | ~Tokens |
|---|---|---|
| **T1** | SOUL + IDENTITY + USER summary | ~1,500 |
| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
| **T3** | + last 2 raw session logs | ~15,000 |
| **T4** | + last 7 raw session logs | ~50,000 |
| **Min** | SOUL + IDENTITY + USER summary | ~1,500 |
| **Std** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
| **Ext** | + last 2 raw session logs | ~15,000 |
| **Full** | + last 7 raw session logs | ~50,000 |
Default is T2. Use T1 for small/local models. Use T3T4 for complex multi-session tasks.
Default is **Std**. Use **Min** for small/local models. Use **Ext** or **Full** for complex multi-session tasks.
### Memory Layers
@@ -318,6 +417,7 @@ For direct access or scripting:
| `GET` | `/orchestrate/{job_id}` | Poll job status and result |
| `GET` | `/settings/models` | Model registry UI |
| `POST` | `/api/models/role` | Set a role assignment (JSON body) |
| `POST` | `/api/models/role-config` | Set per-role tool list and system prompt append |
| `GET` | `/api/push/vapid-key` | VAPID public key (for push subscription) |
| `POST` | `/api/push/subscribe` | Register a push subscription |
| `DELETE` | `/api/push/subscribe` | Remove a push subscription |
@@ -325,6 +425,11 @@ For direct access or scripting:
| `GET` | `/api/audit/day?date=` | Tool call entries for a specific date (own data) |
| `GET` | `/api/audit/recent` | Recent tool calls across days (admin) |
| `GET` | `/api/audit/stats` | Tool call counts by tool/status/user (admin) |
| `GET` | `/api/usage` | Full daily token usage log (own data) |
| `GET` | `/api/usage/summary` | Per-model token totals, all time (own data) |
| `GET` | `/api/usage/all` | Per-model totals for all users (admin) |
| `GET` | `/setup/model` | Guided OpenRouter setup form (Step 3 / standalone) |
| `POST` | `/setup/model` | Save OpenRouter host + model + assign to chat role |
| `GET` | `/health` | Health check — returns `{"status": "ok"}` |
Chat request body (`POST /chat`):

View File

@@ -1,6 +1,6 @@
# Tool Reference
> This reference covers all 40 orchestrator tools available when the ⚡ toggle is on.
> This reference covers all 44 orchestrator tools available when the ⚡ toggle is on.
> Tools are invoked automatically by the orchestrator — you don't call them directly.
¹ **Admin only** — requires the `admin` role. Invisible to regular users.
@@ -102,3 +102,14 @@
| Tool | What it does |
|---|---|
| `ae_task_list` ¹ | List tasks from the agents_sync Kanban board |
## Agent Notes
Private, durable notes visible only to the orchestrator — not surfaced to users. Persist across sessions. Only available in orchestrated (tool-enabled) sessions.
| Tool | What it does |
|---|---|
| `agent_notes_read` | Read the current private notes file |
| `agent_notes_write` | Overwrite the notes file completely |
| `agent_notes_append` | Append a timestamped entry (keeps last 3 backups automatically) |
| `agent_notes_clear` | Erase all notes (backs up first) |

View File

@@ -18,6 +18,11 @@
const settings_dd_el = document.getElementById('settings-dropdown');
const sessionsBackdrop = document.getElementById('sessions-backdrop');
// ── Utilities ─────────────────────────────────────────────────
function escapeHtml(str) {
return String(str).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
}
// ── Close all panels/dropdowns (mutual exclusion) ─────────────
function closeAllPanels() {
if (mode_dropdown_el) mode_dropdown_el.classList.remove('open');
@@ -435,8 +440,32 @@
availableRoles = d.available_roles || [];
roleIdx = 0;
setRoleToggleUI(availableRoles[0] || null);
_maybeShowNoBanner(availableRoles);
});
function _maybeShowNoBanner(roles) {
const key = 'cx_no_model_banner_dismissed';
if (roles.length > 0) { localStorage.removeItem(key); return; }
if (localStorage.getItem(key)) return;
const banner = document.createElement('div');
banner.id = 'no-model-banner';
banner.style.cssText = [
'background:#1c1a0a','border-bottom:1px solid #78350f',
'color:#fbbf24','font-size:0.82rem','padding:0.55rem 1rem',
'display:flex','align-items:center','gap:0.75rem','flex-shrink:0',
].join(';');
banner.innerHTML = `
<span style="flex:1">⚡ Using server default model — add your own for more choices and to track your usage.</span>
<a href="/setup/model" style="color:#fbbf24;font-weight:600;white-space:nowrap;">Set up OpenRouter →</a>
<button onclick="localStorage.setItem('${key}','1');document.getElementById('no-model-banner').remove();"
style="background:none;border:none;color:#78350f;cursor:pointer;font-size:1rem;line-height:1;padding:0 0.2rem;"
title="Dismiss">✕</button>
`;
// Insert at the top of #chat-col (or body if not found)
const col = document.getElementById('chat-col') || document.body.firstElementChild;
col.insertBefore(banner, col.firstChild);
}
backendToggle.addEventListener('click', () => {
if (availableRoles.length <= 1) return;
roleIdx = (roleIdx + 1) % availableRoles.length;
@@ -1067,6 +1096,19 @@
sessionId = data.session_id;
sessionEl.textContent = `session: ${sessionId}`;
persist_session();
// Auto-name the session from the first user message
if (wasNewSession) {
const autoName = text.slice(0, 60).trimEnd() + (text.length > 60 ? '…' : '');
fetch(`/sessions/${sessionId}?${_fileParams}`, {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: autoName }),
}).then(() => {
sessionEl.textContent = `session: ${autoName}`;
sessionNames.set(sessionId, autoName);
}).catch(() => {});
}
thinkingDiv.className = 'message assistant';
setMessageText(thinkingDiv, 'assistant', data.response);
const assistHistIdx = currentHistory.length;
@@ -1133,6 +1175,8 @@
const text = inputEl.value.trim();
if (!text || activeController) return;
const wasNewSession = !sessionId;
inputEl.value = '';
syncHeight();
sendBtn.style.display = 'none';
@@ -1357,6 +1401,7 @@
{ label: 'Memory', files: ['MEMORY_LONG.md', 'MEMORY_MID.md', 'MEMORY_SHORT.md'] },
{ label: 'Profile', files: ['USER.md', 'HELP.md'] },
{ label: 'Settings', files: ['email_allowlist.json'] },
{ label: 'Agent Notes (read-only)', files: ['AGENT_NOTES.bak1.md', 'AGENT_NOTES.bak2.md', 'AGENT_NOTES.bak3.md'], collapsed: true },
];
function fmtSize(bytes) {
@@ -1394,7 +1439,7 @@
fileSidebar.innerHTML = '';
for (const group of FILE_GROUPS) {
const { groupEl, items } = _makeFileGroup(group.label);
const { groupEl, items } = _makeFileGroup(group.label, group.collapsed || false);
for (const fname of group.files) {
const f = byName[fname];
@@ -1490,12 +1535,20 @@
// Restore editor/preview buttons hidden by audit view
fileRawBtn.style.display = '';
filePreviewBtn.style.display = '';
fileSaveBtn.style.display = '';
const res = await fetch(`/files/${encodeURIComponent(name)}?${_fileParams}`);
if (!res.ok) { mdEditor.setValue(`Error loading ${name}`); return; }
const data = await res.json();
mdEditor.setValue(data.content);
mdEditor.clearHistory();
if (data.readonly) {
mdEditor.setOption('readOnly', 'nocursor');
fileSaveBtn.style.display = 'none';
document.getElementById('file-modal-title').textContent = name + ' (read-only)';
} else {
mdEditor.setOption('readOnly', false);
fileSaveBtn.style.display = '';
document.getElementById('file-modal-title').textContent = name;
}
setFileMode(fileMode);
}
@@ -1794,11 +1847,13 @@
let memMid = localStorage.getItem('mem-mid') !== 'false';
let memShort = localStorage.getItem('mem-short') !== 'false';
const TIER_LABELS = { 1: 'Min', 2: 'Std', 3: 'Ext', 4: 'Full' };
function updateTierUI() {
document.querySelectorAll('.ctx-btn[data-tier]').forEach(btn => {
btn.classList.toggle('active', parseInt(btn.dataset.tier) === currentTier);
});
ctxOpenBtn.querySelector('.tier-badge').textContent = currentTier;
ctxOpenBtn.querySelector('.tier-badge').textContent = TIER_LABELS[currentTier] || currentTier;
}
function updateMemUI() {
@@ -1870,33 +1925,46 @@
memShort = !memShort; localStorage.setItem('mem-short', memShort); updateMemUI();
});
const _distillBtns = () => document.querySelectorAll(
'#distill-short-btn, #distill-mid-btn, #distill-long-btn, #distill-all-btn, #distill-rebuild-btn'
);
function showDistillStatus(msg, isErr) {
distillStatus.textContent = msg;
distillStatus.classList.toggle('err', !!isErr);
distillStatus.classList.add('show');
setTimeout(() => distillStatus.classList.remove('show'), 5000);
setTimeout(() => distillStatus.classList.remove('show'), isErr ? 8000 : 5000);
}
async function runDistill(endpoint) {
showDistillStatus('distilling…', false);
async function runDistill(endpoint, label) {
_distillBtns().forEach(b => { b.disabled = true; });
showDistillStatus(`${label || endpoint} running…`, false);
try {
const res = await fetch(`/distill/${endpoint}?${_fileParams}`, { method: 'POST' });
const d = await res.json();
if (!res.ok || d.ok === false) {
const err = d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`;
if (res.status === 409 || res.status === 429) {
showDistillStatus(` ${d.detail}`, true);
} else if (!res.ok || d.ok === false) {
const err = d.detail || d.error || d.mid?.error || d.long?.error || `HTTP ${res.status}`;
showDistillStatus(`${err}`, true);
} else {
showDistillStatus(`${endpoint} done`, false);
showDistillStatus(`${label || endpoint} complete`, false);
}
} catch (err) {
showDistillStatus(`${err.message}`, true);
} finally {
_distillBtns().forEach(b => { b.disabled = false; });
}
}
document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short'));
document.getElementById('distill-mid-btn').addEventListener('click', () => runDistill('mid'));
document.getElementById('distill-long-btn').addEventListener('click', () => runDistill('long'));
document.getElementById('distill-all-btn').addEventListener('click', () => runDistill('all'));
document.getElementById('distill-short-btn').addEventListener('click', () => runDistill('short', 'Short distill'));
document.getElementById('distill-mid-btn').addEventListener('click', () => runDistill('mid', 'Mid distill'));
document.getElementById('distill-long-btn').addEventListener('click', () => runDistill('long', 'Long distill'));
document.getElementById('distill-all-btn').addEventListener('click', () => runDistill('all', 'Full distill'));
document.getElementById('distill-rebuild-btn').addEventListener('click', () => {
if (!confirm('Rebuild memory from scratch?\n\nThis will wipe MEMORY_MID and MEMORY_LONG (backups kept) then regenerate them from session logs. Any hand-edited content will be replaced.\n\nContinue?')) return;
runDistill('rebuild', 'Memory rebuild');
});
updateTierUI();
updateMemUI();

View File

@@ -87,10 +87,10 @@
<div class="ctx-section">
<div class="ctx-section-title">Context Tier</div>
<div class="ctx-row">
<button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal (~1.5k tokens)">T1</button>
<button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard (~5k tokens)">T2</button>
<button class="ctx-btn" data-tier="3" id="tier-3" title="Extended (~15k tokens)">T3</button>
<button class="ctx-btn" data-tier="4" id="tier-4" title="Full (~50k tokens)">T4</button>
<button class="ctx-btn" data-tier="1" id="tier-1" title="Minimal — identity only (~1.5k tokens)">Min</button>
<button class="ctx-btn active" data-tier="2" id="tier-2" title="Standard — memory + user profile (~5k tokens)">Std</button>
<button class="ctx-btn" data-tier="3" id="tier-3" title="Extended — + last 2 sessions (~15k tokens)">Ext</button>
<button class="ctx-btn" data-tier="4" id="tier-4" title="Full — + last 7 sessions (~50k tokens)">Full</button>
</div>
</div>
<div class="ctx-section">
@@ -108,6 +108,7 @@
<button class="ctx-btn" id="distill-mid-btn" title="Summarize SHORT → MID memory (uses LLM)">Mid</button>
<button class="ctx-btn" id="distill-long-btn" title="Integrate MID → LONG memory (uses LLM)">Long</button>
<button class="ctx-btn" id="distill-all-btn" title="Run Short → Mid → Long in sequence">All</button>
<button class="ctx-btn ctx-btn-danger" id="distill-rebuild-btn" title="⚠ Wipe Mid + Long memories and rebuild from session logs. Hand-edited content will be replaced.">Rebuild</button>
</div>
<div id="ctx-distill-status"></div>
<div id="ctx-schedule"></div>

View File

@@ -167,9 +167,11 @@
.pb-anthropic { background: #1e1b4b; color: #818cf8; }
.pb-google { background: #042f2e; color: #34d399; }
.pb-local { background: #1e293b; color: #64748b; }
.pb-notools { background: #3b1a1a; color: #f87171; }
[data-theme="light"] .pb-anthropic { background: #ede9fe; color: #5b21b6; }
[data-theme="light"] .pb-google { background: #d1fae5; color: #065f46; }
[data-theme="light"] .pb-local { background: #e2e8f0; color: #475569; }
[data-theme="light"] .pb-notools { background: #fee2e2; color: #b91c1c; }
/* Host & model rows */
.host-row {
@@ -488,8 +490,22 @@
autocomplete="off" data-form-type="other">
</div>
<div class="field" style="flex:0 0 auto">
<label>Context (k tokens)</label>
<input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000">
<label title="Context window size in thousands of tokens. 0 = assume 32k.">Context (k tokens)</label>
<input type="number" id="add-context-k" name="context_k" value="0" min="0" max="10000"
title="Context window size in thousands of tokens. 0 = assume 32k (compaction budget ~24k tokens).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">Max rounds</label>
<input type="number" name="max_rounds" value="0" min="0"
title="Per-model tool loop cap. 0 = use the global default (orchestrator_max_rounds).">
</div>
<div class="field" style="flex:0 0 auto">
<label title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">Tool calling</label>
<select name="tools"
title="Whether this model supports tool calling. If not supported, requests skip the tool loop entirely.">
<option value="1" selected>Supported</option>
<option value="0">Not supported</option>
</select>
</div>
</div>
<div class="field">

View File

@@ -423,6 +423,18 @@
</div>
<!-- Browser cache -->
<!-- Usage summary -->
<div class="section" id="usage-section">
<h2>Usage</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Token consumption tracked for API-backed models (Gemini API, local OpenAI-compatible).
Claude CLI calls are not metered.
</p>
<div id="usage-table-wrap" style="overflow-x:auto;">
<p style="font-size:0.8rem; color:var(--pg-muted);">Loading…</p>
</div>
</div>
<div class="section">
<h2>Browser Cache</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
@@ -443,6 +455,25 @@
<!-- Model Registry link -->
<div class="section">
<h2>Model Registry</h2>
<!-- Quick-start card: shown only when no model is configured for chat role -->
<div id="openrouter-quickstart" style="display:none; background:#1c1a0a; border:1px solid #78350f;
border-radius:8px; padding:1rem; margin-bottom:1rem;">
<p style="font-size:0.82rem; color:#fbbf24; font-weight:600; margin-bottom:0.4rem;">
⚡ You're on the server default model
</p>
<p style="font-size:0.8rem; color:#d97706; margin-bottom:0.75rem; line-height:1.5;">
You can chat now, but adding your own model gives you more choices, lets you pick
role-specific models, and tracks your usage separately.
OpenRouter is the easiest way to get started — one key, many models.
</p>
<a href="/setup/model"
style="display:inline-block; padding:0.5rem 0.9rem; background:#92400e; border-radius:6px;
color:#fef3c7; font-size:0.85rem; font-weight:600; text-decoration:none;">
Set up OpenRouter →
</a>
</div>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Configure AI providers (Anthropic, Google), local hosts (Open WebUI, Ollama, OpenRouter, etc.),
and assign models to roles — chat, orchestrator, distill, and more.
@@ -479,6 +510,22 @@
</div>
<!-- Personas -->
<!-- Sessions -->
<div class="section">
<h2>Sessions</h2>
<p style="font-size:0.8rem; color:var(--pg-muted); margin-bottom:0.85rem; line-height:1.55;">
Auto-name any sessions that still show a random ID, using their first message as the name.
Only unnamed sessions are affected — existing names are left alone.
</p>
<button type="button" id="backfill-names-btn"
style="padding:0.5rem 1rem; background:none; border:1px solid var(--pg-border); border-radius:6px;
color:var(--pg-muted); font-size:0.88rem; font-weight:500; cursor:pointer;
transition:border-color 0.15s, color 0.15s;">
Auto-name old sessions
</button>
<span id="backfill-names-ok" style="display:none; margin-left:0.75rem; font-size:0.8rem; color:#4ade80;"></span>
</div>
<div class="section">
<h2>Personas</h2>
<ul class="persona-list">
@@ -532,6 +579,84 @@
document.getElementById('clear-ls-ok').style.display = 'inline';
});
// Show OpenRouter quick-start card if no model is configured
(async () => {
try {
const d = await fetch('/backend').then(r => r.json());
const roles = d.available_roles || [];
if (roles.length === 0) {
document.getElementById('openrouter-quickstart').style.display = 'block';
}
} catch (_) {}
})();
// Usage summary table
(async () => {
const wrap = document.getElementById('usage-table-wrap');
try {
const resp = await fetch('/api/usage/summary');
if (!resp.ok) throw new Error(resp.statusText);
const rows_data = await resp.json();
if (!rows_data.length) {
wrap.innerHTML = '<p style="font-size:0.8rem;color:var(--pg-muted);">No usage recorded yet.</p>';
return;
}
const fmt = n => n >= 1000 ? (n / 1000).toFixed(1) + 'k' : String(n);
const rows = rows_data.map(d => {
const labelCell = d.label !== d.key
? `<span title="${d.key}">${d.label}</span>`
: `<span>${d.key}</span>`;
return `<tr>
<td style="padding:0.4rem 0.75rem 0.4rem 0; font-size:0.82rem; color:var(--pg-text); white-space:nowrap;">${labelCell}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${d.calls}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.prompt_tokens)}</td>
<td style="padding:0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-muted); text-align:right;">${fmt(d.completion_tokens)}</td>
<td style="padding:0.4rem 0 0.4rem 0.5rem; font-size:0.82rem; color:var(--pg-text); text-align:right; font-weight:600;">${fmt(d.total_tokens)}</td>
</tr>`;
}).join('');
wrap.innerHTML = `<table style="border-collapse:collapse; width:100%; min-width:360px;">
<thead>
<tr style="border-bottom:1px solid var(--pg-border);">
<th style="padding:0.35rem 0.75rem 0.35rem 0; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:left;">Model</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Calls</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Prompt</th>
<th style="padding:0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Output</th>
<th style="padding:0.35rem 0 0.35rem 0.5rem; font-size:0.75rem; color:var(--pg-muted); font-weight:600; text-align:right;">Total</th>
</tr>
</thead>
<tbody>${rows}</tbody>
</table>`;
} catch (e) {
wrap.innerHTML = `<p style="font-size:0.8rem;color:var(--pg-muted);">Could not load usage data.</p>`;
}
})();
// Auto-name old sessions backfill
document.getElementById('backfill-names-btn').addEventListener('click', async () => {
const btn = document.getElementById('backfill-names-btn');
const ok = document.getElementById('backfill-names-ok');
btn.disabled = true;
btn.textContent = 'Working…';
try {
const params = new URLSearchParams(window.location.search);
const user = params.get('user') || document.querySelector('input[value]')?.value || '';
const persona = params.get('persona') || '';
const qs = user ? `?user=${encodeURIComponent(user)}&persona=${encodeURIComponent(persona)}` : '';
const res = await fetch(`/api/sessions/backfill-names${qs}`, { method: 'POST' });
const data = await res.json();
if (!res.ok) throw new Error(data.detail || res.statusText);
const n = data.named ?? 0;
ok.textContent = `Named ${n} session${n !== 1 ? 's' : ''}.`;
ok.style.display = 'inline';
} catch (e) {
ok.textContent = 'Error — check console.';
ok.style.color = '#f87171';
ok.style.display = 'inline';
}
btn.textContent = 'Auto-name old sessions';
btn.disabled = false;
});
// Persona rename toggle
document.querySelectorAll('.persona-rename-toggle').forEach(btn => {
btn.addEventListener('click', () => {

View File

@@ -127,6 +127,36 @@
.emoji-opt.selected { border-color: #7c3aed; background: #2d1f52; }
#emoji-hidden { display: none; }
.provider-badge {
display: inline-flex;
align-items: center;
gap: 0.4rem;
background: #2d1f52;
border: 1px solid #7c3aed;
border-radius: 6px;
padding: 0.3rem 0.6rem;
font-size: 0.78rem;
color: #a78bfa;
margin-bottom: 1rem;
}
.skip-link {
display: block;
text-align: center;
margin-top: 1rem;
font-size: 0.8rem;
color: #64748b;
text-decoration: none;
}
.skip-link:hover { color: #94a3b8; }
.model-hint {
font-size: 0.72rem;
color: #64748b;
margin-top: 0.75rem;
text-align: center;
}
</style>
</head>
<body>
@@ -137,10 +167,11 @@
</div>
<!-- ERROR -->
<!-- ERROR_MODEL -->
<!-- ── Step 1: password ───────────────────────────────────────── -->
<div id="step-password">
<div class="step-label">Step 1 of 2</div>
<div class="step-label">Step 1 of 3</div>
<h2>Set your password</h2>
<form method="POST" action="" id="password-form">
<input type="hidden" name="step" value="password">
@@ -161,7 +192,7 @@
<!-- ── Step 2: persona ────────────────────────────────────────── -->
<div id="step-persona" style="display:none">
<div class="step-label">Step 2 of 2</div>
<div class="step-label">Step 2 of 3</div>
<h2>Create your persona</h2>
<form method="POST" action="" id="persona-form">
<input type="hidden" name="step" value="persona">
@@ -203,6 +234,39 @@
<button type="submit">Create my persona →</button>
</form>
</div>
<!-- ── Step 3: model connect ─────────────────────────────────── -->
<div id="step-model" style="display:none">
<div class="step-label"><!-- SETUP_STEP3_LABEL --></div>
<h2>Connect an AI model</h2>
<div class="provider-badge">⚡ Recommended: OpenRouter</div>
<p style="font-size:0.82rem;color:#94a3b8;margin-bottom:1rem;">
One API key gives you access to Claude, Gemini, Llama, and dozens of other models.
Get a free key at <a href="https://openrouter.ai/keys" target="_blank" style="color:#a78bfa;">openrouter.ai/keys</a>.
</p>
<form method="POST" action="/setup/model" id="model-form">
<div class="field">
<label for="api_key">OpenRouter API key</label>
<input type="password" id="api_key" name="api_key"
autocomplete="off" placeholder="sk-or-v1-..." required>
</div>
<div class="field">
<label for="model_name">Starting model</label>
<select id="model_name" name="model_name">
<option value="anthropic/claude-3-5-haiku-20241022">Claude 3.5 Haiku — Fast &amp; affordable</option>
<option value="anthropic/claude-3-7-sonnet-20250219">Claude 3.7 Sonnet — Smarter Claude</option>
<option value="google/gemini-2.0-flash-001">Gemini 2.0 Flash — Fast Google model</option>
<option value="meta-llama/llama-3.3-70b-instruct">Llama 3.3 70B — Open source</option>
</select>
<p class="hint">You can add more models or switch anytime in Account → Model Registry.</p>
</div>
<button type="submit">Connect &amp; start chatting →</button>
</form>
<p class="model-hint">
Using Ollama, a local model, or something else?
<a href="#" id="skip-model-link" style="color:#64748b;">Skip this step →</a>
</p>
</div>
</div>
<script>
@@ -232,6 +296,11 @@
document.getElementById('step-password').style.display = 'none';
document.getElementById('step-persona').style.display = 'block';
}
if (params.get('step') === '3') {
document.getElementById('step-password').style.display = 'none';
document.getElementById('step-persona').style.display = 'none';
document.getElementById('step-model').style.display = 'block';
}
// ── Client-side confirm password check ───────────────────────────
document.getElementById('password-form').addEventListener('submit', e => {
@@ -243,6 +312,15 @@
}
});
// ── Skip model setup — navigate to user home ─────────────────────
document.getElementById('skip-model-link')?.addEventListener('click', e => {
e.preventDefault();
// Ask server for skip target (the cx_setup_persona cookie has the path)
fetch('/setup/model/skip', { method: 'POST', credentials: 'same-origin' })
.then(r => { if (r.redirected) location.href = r.url; else location.href = '/'; })
.catch(() => { location.href = '/'; });
});
// ── Auto-generate persona slug from display name ─────────────────
document.getElementById('display_name').addEventListener('input', function() {
const slugField = document.getElementById('persona_name');

View File

@@ -1328,7 +1328,10 @@
.ctx-btn:hover { color: var(--text); border-color: var(--muted); }
.ctx-btn.active { color: var(--accent); border-color: var(--accent); }
.ctx-btn.mem-on { color: var(--success); border-color: var(--success-dim); }
.ctx-btn.local-on { color: var(--amber); border-color: var(--amber-border); }
.ctx-btn.local-on { color: var(--amber); border-color: var(--amber-border); }
.ctx-btn-danger { color: #f87171 !important; border-color: #7f1d1d !important; }
.ctx-btn-danger:hover { border-color: #f87171 !important; }
.ctx-btn:disabled { opacity: 0.4; cursor: not-allowed; pointer-events: none; }
#backend-model-hint {
font-size: 0.68rem; color: var(--amber); opacity: 0.9;
margin-top: 4px; word-break: break-all; line-height: 1.3;

View File

@@ -64,6 +64,12 @@ from tools.scratch import (
scratch_clear as _scratch_clear,
)
from tools.notify import nc_talk_send as _nc_talk_send, email_send as _email_send, web_push as _web_push
from tools.agent_notes import (
agent_notes_read as _agent_notes_read,
agent_notes_write as _agent_notes_write,
agent_notes_append as _agent_notes_append,
agent_notes_clear as _agent_notes_clear,
)
# ── Declaration imports ───────────────────────────────────────────────────────
@@ -77,6 +83,7 @@ import tools.cron as _mod_cron
import tools.reminders as _mod_reminders
import tools.scratch as _mod_scratch
import tools.notify as _mod_notify
import tools.agent_notes as _mod_agent_notes
# ── Tool categories — used by the Model Registry UI for grouped checkboxes ───
@@ -98,6 +105,7 @@ TOOL_CATEGORIES: dict[str, list[str]] = {
"ae_journal_entry_prepend",
],
"Aether Tasks": ["ae_task_list"],
"Agent Notes": ["agent_notes_read", "agent_notes_write", "agent_notes_append", "agent_notes_clear"],
}
# ── Callable registry ─────────────────────────────────────────────────────────
@@ -143,6 +151,10 @@ _CALLABLES: dict[str, callable] = {
"email_send": _email_send,
"nc_talk_send": _nc_talk_send,
"web_push": _web_push,
"agent_notes_read": _agent_notes_read,
"agent_notes_write": _agent_notes_write,
"agent_notes_append": _agent_notes_append,
"agent_notes_clear": _agent_notes_clear,
}
# ── Role-based access control ─────────────────────────────────────────────────
@@ -194,6 +206,7 @@ _ALL_DECLARATIONS: list[types.FunctionDeclaration] = (
+ _mod_notify.DECLARATIONS
+ _mod_ae_knowledge.DECLARATIONS
+ _mod_ae_tasks.DECLARATIONS
+ _mod_agent_notes.DECLARATIONS
)
# Full Gemini Tool object (all tools — use get_tools_for_role() in production)

155
cortex/tools/agent_notes.py Normal file
View File

@@ -0,0 +1,155 @@
"""
Agent private notes — AGENT_NOTES.md.
A persistent notepad only the orchestrator can write to. The file itself is
never exposed in the Files panel or loaded into user-facing context tiers.
Up to 3 rolling backups are kept automatically before each write so past
versions can be reviewed.
Use for: observations about the user's patterns, working hypotheses,
long-running goals, things to remember across sessions that shouldn't
be part of the distilled memory visible to the user.
"""
import asyncio
from datetime import datetime, timezone
from pathlib import Path
from google.genai import types
from persona import persona_path
_FILENAME = "AGENT_NOTES.md"
_N_BACKUPS = 3
def _notes_path() -> Path:
return persona_path() / _FILENAME
def _now_label() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
def _rotate(path: Path) -> None:
"""Rotate up to _N_BACKUPS rolling backups before a write."""
if not path.exists():
return
for i in range(_N_BACKUPS, 1, -1):
older = path.parent / f"{path.stem}.bak{i}.md"
newer = path.parent / f"{path.stem}.bak{i - 1}.md"
if newer.exists():
older.write_text(newer.read_text())
bak1 = path.parent / f"{path.stem}.bak1.md"
bak1.write_text(path.read_text())
# ── Sync implementations ────────────────────────────────────────────────────
def _agent_notes_read() -> str:
p = _notes_path()
if not p.exists() or not p.read_text().strip():
return "Agent notes are empty."
return p.read_text()
def _agent_notes_write(content: str) -> str:
p = _notes_path()
_rotate(p)
p.write_text(content.rstrip() + "\n")
return "Agent notes updated."
def _agent_notes_append(content: str, heading: str | None = None) -> str:
p = _notes_path()
_rotate(p)
existing = p.read_text() if p.exists() else ""
label = heading or _now_label()
section = f"\n## {label}\n\n{content.strip()}\n"
p.write_text(existing.rstrip() + "\n" + section)
return f"Appended to agent notes: {label}"
def _agent_notes_clear() -> str:
p = _notes_path()
_rotate(p)
p.write_text("")
return "Agent notes cleared."
# ── Async wrappers ───────────────────────────────────────────────────────────
async def agent_notes_read() -> str:
return await asyncio.to_thread(_agent_notes_read)
async def agent_notes_write(content: str) -> str:
return await asyncio.to_thread(_agent_notes_write, content)
async def agent_notes_append(content: str, heading: str | None = None) -> str:
return await asyncio.to_thread(_agent_notes_append, content, heading)
async def agent_notes_clear() -> str:
return await asyncio.to_thread(_agent_notes_clear)
# ── Gemini FunctionDeclarations ──────────────────────────────────────────────
DECLARATIONS = [
types.FunctionDeclaration(
name="agent_notes_read",
description=(
"Read your private agent notes — a persistent notepad only you can write to. "
"Use this to recall observations, working hypotheses, long-running goals, or "
"anything you want to remember across sessions without surfacing it to the user. "
"This file is never shown in the user's Files panel."
),
parameters=types.Schema(type=types.Type.OBJECT, properties={}),
),
types.FunctionDeclaration(
name="agent_notes_write",
description=(
"Replace your private agent notes with new content. "
"A backup is saved automatically before writing. "
"Use agent_notes_append to add without replacing."
),
parameters=types.Schema(
type=types.Type.OBJECT,
properties={
"content": types.Schema(
type=types.Type.STRING,
description="The new notes content (markdown supported).",
),
},
required=["content"],
),
),
types.FunctionDeclaration(
name="agent_notes_append",
description=(
"Add a new section to your private agent notes without replacing existing content. "
"A backup is saved automatically before writing. "
"Each section gets a UTC timestamp heading unless you supply one."
),
parameters=types.Schema(
type=types.Type.OBJECT,
properties={
"content": types.Schema(
type=types.Type.STRING,
description="The content to append (markdown supported).",
),
"heading": types.Schema(
type=types.Type.STRING,
description="Optional section heading. Defaults to current UTC timestamp.",
),
},
required=["content"],
),
),
types.FunctionDeclaration(
name="agent_notes_clear",
description=(
"Erase all private agent notes. A backup is saved automatically before clearing."
),
parameters=types.Schema(type=types.Type.OBJECT, properties={}),
),
]

View File

@@ -1,7 +1,7 @@
# Architecture: LLM Backends
> How Cortex selects and talks to AI models.
> Last updated: 2026-04-27 (V2 schema)
> Last updated: 2026-05-06
---
@@ -33,11 +33,11 @@ Resolution order for a role:
### Explicit Override
The UI backend toggle cycles: **auto → claude → gemini → local → auto**
The **Role** toggle in the Context & Memory panel cycles through configured role slots for the `chat` role: **Primary → Backup 1 → Backup 2 → auto**.
- **auto** (default): role-based routing as above
- **claude / gemini / local**: bypasses role routing; forces that backend type
- The toggle will be redesigned in Phase 3 to cycle through chat role slots (Primary / Backup 1 / Backup 2)
- Each slot shows the configured model label
- `auto` uses the Primary without forcing a specific backend type
- The ⚡ Tools toggle is independent — it routes to the `orchestrator` role regardless of the chat role selection
**Fallback chain** (automatic, only when no explicit registry entry exists):
```
@@ -113,6 +113,8 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
"provider": "local",
"host_id": "abc123",
"context_k": 72,
"max_rounds": 5,
"tools": true,
"tags": ["fast", "local"]
}
],
@@ -125,6 +127,14 @@ Managed at **Settings → Models** (`/settings/models`). Full provider UI coming
}
```
### Optional model fields
| Field | Type | Default | Meaning |
|---|---|---|---|
| `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). |
| `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. |
| `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. |
### host_type (local hosts)
| `host_type` | Chat endpoint | Models endpoint | Use for |
@@ -210,13 +220,6 @@ Memory distillation uses `role="distill"`. Configure via Model Registry → Role
`.env` override: `ROLE_DISTILL=claude_cli` (default).
---
## Future: Phase 3 — Backend Toggle Redesign
The `claude → gemini → local` toggle will be replaced with a slot toggle that cycles
through the chat role's configured models (Primary → Backup 1 → Backup 2), showing
the actual model label. See `DESIGN__Model_Registry_V2.md`.
---

View File

@@ -1,7 +1,7 @@
# Architecture: System Overview
> How the pieces fit together.
> Last updated: 2026-04-03
> Last updated: 2026-05-06
---
@@ -56,7 +56,9 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
| `context_loader.py` | Builds system prompt from persona files (tiers 14) |
| `llm_client.py` | All LLM backends — Claude, Gemini CLI, Local |
| `orchestrator_engine.py` | Gemini API ReAct tool loop → Claude handoff |
| `session_store.py` | In-memory + file session persistence |
| `openai_orchestrator.py` | OpenAI-compatible ReAct tool loop (local models via Open WebUI/OpenRouter) |
| `model_registry.py` | Per-user model registry V2 — providers, hosts, models, role assignments |
| `session_store.py` | In-memory + file session persistence (`session_data/{id}.json`) |
| `session_logger.py` | Writes session turns to `sessions/YYYY-MM-DD.md` |
| `memory_distiller.py` | Short/mid/long distill jobs |
| `scheduler.py` | APScheduler — distill jobs + user crons |
@@ -64,20 +66,23 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
| `notification.py` | Outbound channel messages (distill alerts, cron proactive) |
| `auth_utils.py` | bcrypt passwords, JWT, invite tokens, channel config |
| `auth_middleware.py` | JWT cookie validation on all routes |
| `user_settings.py` | Per-user local LLM config (hosts, models, active model) |
| `tool_audit.py` | JSONL audit log for every orchestrator tool invocation |
| `usage_tracker.py` | Per-user token usage tracking (daily buckets → `usage.json`) |
| `event_bus.py` | Internal SSE pub/sub (NC Talk → browser mirror) |
| `email_utils.py` | SMTP invite emails |
| `persona_template.py` | Bootstrap a new persona directory from templates |
| `routers/` | One file per endpoint group (chat, orchestrator, auth, files, channels, ui, settings…) |
| `tools/` | Orchestrator tool implementations (web, ae_knowledge, tasks, scratch, reminders, cron, system) |
| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md` |
| `tests/` | pytest suite (80 tests) |
| `routers/` | One file per endpoint group `chat`, `orchestrator`, `auth`, `files`, `ui`, `settings`, `local_llm`, `distill`, `audit`, `usage`, `push`, `help`, `onboarding`, `auth_google`, `nextcloud_talk`, `google_chat` |
| `tools/` | Orchestrator tool implementations `web`, `tasks`, `scratch`, `reminders`, `cron`, `system`, `notify`, `ae_journals`, `ae_tasks`, `agent_notes` |
| `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md`, `local_llm.html`, `settings.html` |
| `tests/` | pytest suite |
---
## Key Design Decisions
**Two-brain pattern** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
**Two-brain pattern (Gemini orchestrator)** — Gemini API handles tool use (function calling, planning, web search). Claude CLI handles all user-facing responses. Direct chat bypasses the orchestrator entirely.
**Single-model pattern (local orchestrator)** — When the `orchestrator` role resolves to a `local_openai` model, `openai_orchestrator.py` runs the full ReAct loop and produces the final response itself. No Claude handoff — the local model does both reasoning and response.
**Subprocess backends** — Claude and Gemini run as CLI subprocesses (`claude --print`, `gemini -p`). This keeps auth transparent (Claude Code manages tokens) and avoids API costs on the Pro subscription path.
@@ -88,3 +93,33 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
**Per-user filesystem layout**`home/{user}/persona/{name}/` mirrors Linux home directories. Each persona is a directory of markdown files and JSON. No database. Easy to inspect, edit, and back up.
**No single point of coupling** — tools live in `cortex/tools/`, separate from `ae_*` MCP tools. Channels live in `cortex/routers/`, each self-contained. Adding a channel or tool doesn't touch other subsystems.
**Agent private notes**`AGENT_NOTES.md` per persona, writable only by the orchestrator via `agent_notes_*` tools. Never loaded into user-facing context. Three rolling backups (`bak1``bak3`) are visible read-only in the Files panel. Declared in `tools/agent_notes.py`; usage guidance in `PROTOCOLS.md`.
**No black boxes** — Every component, flow, and design decision is documented. Documentation is updated before implementation of significant changes and verified after. HELP.md is the user-facing contract; ARCH__*.md files are the developer contract; PROTOCOLS.md is the agent contract. If any of these drift from reality, that is a bug.
---
## Onboarding Flow
New users are invited via a one-time token and complete a three-step setup before reaching the chat:
```
1. /setup/{token} → Set password (POST creates session cookie, consumes token)
2. /setup/persona → Create persona (slug, display name, emoji, description)
3. /setup/model → Connect a model — OpenRouter recommended
(skip link goes straight to /{user}/{persona})
```
Step 3 is the planned addition (see `TODO__Agents.md § Guided onboarding`). Before it exists,
users land in the chat with no model configured and must navigate Settings → Model Registry
manually — which is confusing for non-technical users.
**After Step 3:**
- `save_host()` adds OpenRouter (`https://openrouter.ai/api/v1`, type `openai`)
- `save_model()` creates a model entry for the chosen model
- `set_role(chat, primary, model_id)` assigns it as the chat role primary
- Redirect to `/{user}/{persona}`
**Existing users with no model configured** — a dismissable banner is shown in the chat on
load, linking to `/setup/model` (the Step 3 form works standalone, without step labels).

View File

@@ -1,7 +1,10 @@
# Cortex / Inara — Master Index
> Start here. This document is a map, not a manual.
> Last updated: 2026-04-28
> Last updated: 2026-05-06
>
> **Documentation philosophy:** Cortex is a no-black-box system. Docs must match reality.
> Update docs before implementing significant changes. Verify they still match after.
---
@@ -17,20 +20,27 @@ Cortex is a self-hosted personal AI platform. It routes messages from any input
| Component | Status | Notes |
|---|---|---|
| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, session auth |
| Web UI | ✅ Live | SPA, dark theme, mobile-responsive, PWA-installable |
| Nextcloud Talk bot | ✅ Live | HMAC-signed, per-user routing |
| Google Chat Add-on | ✅ Live | JWT-verified, per-user routing |
| Claude backend | ✅ Live | Primary — via Claude Code CLI |
| Gemini backend | ✅ Live | Fallback — via Gemini CLI |
| Local backend | ✅ Live | Third option — Open WebUI/Ollama on scott_gaming |
| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ Tools toggle in UI (27 tools) |
| Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini |
| Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming; per-user multi-model config |
| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ toggle in UI (40 tools) |
| Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; used when orchestrator role → local model |
| Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini, role assignments |
| Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
| Multi-user | ✅ Live | Scott, Holly, Brian — each with own personas |
| Session search | ✅ Live | Full-text search across past session logs |
| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk |
| Proactive cron | ✅ Live | `message` and `brief` job types → NC Talk / web push |
| Tool audit log | ✅ Live | Every orchestrator tool call logged to `home/{user}/tool_audit/` |
| Token usage tracking | ✅ Live | Per-user daily buckets in `home/{user}/usage.json`; visible in Settings |
| Web push notifications | ✅ Live | VAPID push; `web_push` orchestrator tool; subscribe via ☰ menu |
| Agent private notes | ✅ Live | `AGENT_NOTES.md` — orchestrator-only notepad; 3 rolling backups; user-visible as read-only |
| Distill safety | ✅ Live | Per-persona asyncio lock, per-endpoint cooldowns, Rebuild option |
| Guided onboarding | ✅ Live | Setup Step 3 for OpenRouter; existing-user banner; settings quick-link |
**Active users / personas:** scott/inara, scott/developer, holly/tina, brian/wintermute
**Active users / personas:** scott/inara, holly/tina, brian/wintermute
---

View File

@@ -54,7 +54,6 @@
## Phase 5 — Routing Intelligence & Scale
- [ ] Intelligent model routing (by task type, privacy, context length)
- [ ] Agent-to-agent task delegation across fleet
- [ ] Permanent hosting on home server (currently on `scott_lpt`)
## Phase 6 — Infrastructure
- [ ] Server DMZ finalized

View File

@@ -7,16 +7,41 @@
## 🔴 High Priority
### [UX] User onboarding — guided model setup
New users complete password + persona setup and land directly in the chat with no working
AI model configured. This closes that gap with a guided Step 3 and a fallback for existing
users who skipped it or were onboarded before this existed.
Design spec: `documentation/ARCH__SYSTEM.md` § Onboarding Flow
- [x] **Setup Step 3 page** — new `/setup/model` GET/POST in `onboarding.py` — 2026-05-06
- Recommends OpenRouter: "one API key, access to Claude, Gemini, and dozens of other models"
- API key field + curated model dropdown (claude-3-5-haiku, claude-3-7-sonnet, gemini-2.0-flash, llama-3.3-70b)
- On submit: `save_host()` (OpenRouter) + `save_model()` + `set_role(chat, primary, model_id)` in `model_registry.py`
- Skip: `POST /setup/model/skip` reads `cx_setup_persona` cookie, redirects to chat; JS fetch on skip-link click
- Step labels updated: setup.html "1 of 3" / "2 of 3" / "3 of 3" (was "1 of 2" / "2 of 2")
- Standalone: `/setup/model` works without step labels (no `cx_setup_persona` cookie → no label)
- Persona creation now redirects to `/setup/model` instead of directly to chat
- [x] **Existing user banner** — displayed in chat if no role has a model assigned — 2026-05-06
- Checks `GET /backend` on load (uses `available_roles` — already does role-resolution)
- Dismissable amber callout strip above chat: "No AI model configured — Set up OpenRouter →"
- Dismissed via `localStorage` key `cx_no_model_banner_dismissed`; auto-removed when a model is added
- [x] **Settings quick-link** — amber card in settings Model Registry section — 2026-05-06
- Checks `GET /backend` on page load; shown if `available_roles` is empty
- Links to `/setup/model`
- [x] Update `cortex/static/HELP.md` — Getting Started section + model registry quick-connect note — 2026-05-06
- [x] Update `CLAUDE.md` — documented `/setup/model` endpoint, setup flow description, docs philosophy — 2026-05-06
### [Local] Local orchestrator — reach full parity with Gemini orchestrator
`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
When the `orchestrator` role resolves to a `local_openai` model it routes there
automatically. Remaining work is quality/reliability parity, not ground-up design.
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
(minor field rename, already partially done)
- [ ] Context budget enforcement per iteration (4050k for E4B, 3540k for 26B A4B)
- [ ] Context compaction — trim stale tool results mid-run when approaching limit
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
- [x] Tool schema conversion — Gemini FunctionDeclaration → OpenAI tools format
- [x] Context budget: `_context_budget()` uses `context_k * 1000 * 0.75`, min 16k — 2026-05-06
- [x] Context compaction: `_compact_messages()` trims old tool results before each round and before the confirmation-gate call — 2026-05-06
- [x] Error handling: malformed tool args caught + logged; tool execution errors returned as strings
- [ ] Retry logic on transient API errors (connection timeout, 429, 503)
- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
@@ -117,7 +142,7 @@ Multi-user setup with real Gemini/Claude API costs. Track per-user token consump
so Scott can see who's spending what.
- [x] Count input + output tokens — local backend (OpenAI `usage` field) + Gemini API (`usage_metadata`) — 2026-05-05
- [x] Append to `home/{user}/usage.json` — daily buckets, per-model breakdown — 2026-05-05
- [ ] Expose via `/api/usage` endpoint; add a summary row to the Settings page
- [x] Expose via `/api/usage` + `/api/usage/summary` + `/api/usage/all` (admin); usage table in Settings — 2026-05-06
- [ ] Optional: soft spending limit with a warning toast when exceeded
### [Security] Tool call audit log — 2026-05-05
@@ -166,15 +191,6 @@ the foundation. What remains is removing the need to toggle manually.
- Fast/cheap queries → local E4B (25 t/s, no API cost)
- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
### [Ops] Permanent fleet hosting — home server deployment
Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
home server for always-on reliability. `docker-compose.yml` already exists.
- [ ] Copy project to home server
- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
- [ ] WireGuard required for all access — not internet-exposed
- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
### [Future] Cortex Mesh — multi-instance fleet coordination
Each fleet device runs its own Cortex instance. Instances delegate tasks to each
other based on resources and specialisation. No central coordinator required.