diff --git a/CLAUDE.md b/CLAUDE.md index 7f656c4..e0878d6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -146,8 +146,8 @@ http://localhost:8000/docs - Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables ### Context / Memory -- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–3) -- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full +- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–4) +- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = + last 2 sessions; Tier 4 = + last 7 sessions - Memory files are written by the distiller or manually — do not delete them ### Security / Safety @@ -160,6 +160,31 @@ http://localhost:8000/docs - Passwords are bcrypt-hashed and stored in `home/{username}/auth.json` — never in `.env` or the DB - Invite tokens are one-time-use, 72-hour expiry, stored in `home/{username}/invite.json` +### Onboarding Flow +New users follow a three-step setup before reaching the chat: +1. `GET /setup/{token}` → password form → `POST /setup/{token}` sets password + session cookie +2. `GET /setup/persona` → persona creation form → `POST /setup/persona` bootstraps persona directory +3. `GET /setup/model` → OpenRouter quick-connect → `POST /setup/model` saves host + model + role assignment + +Step 3 is optional (skip link goes straight to `/{user}/{persona}`). `/setup/model` also works +standalone (accessible from Settings) for existing users who haven't configured a model. + +All in `cortex/routers/onboarding.py`. Model writes use `model_registry.py`: `save_host()`, +`save_model()`, `set_role(username, "chat", "primary", model_id)`. + +### Documentation Philosophy +Cortex is a no-black-box system. Docs must match reality — at all times. + +- **Docs first:** When planning significant changes, update `TODO__Agents.md` and the relevant + `ARCH__*.md` to describe the intended design *before* implementing. This creates a spec to + implement against. +- **Verify after:** Once implementation is complete, re-read the pre-written docs and confirm + they match what was actually built. Update anything that drifted. +- **HELP.md is a user contract:** It describes what users can do. Never let it describe + features that don't exist or omit features that do. +- **CLAUDE.md + ARCH__*.md are the developer contract:** Update them as the architecture evolves. +- **Stale docs are bugs.** If you notice drift, fix it before moving on. + --- ## Adding a New Tool @@ -212,19 +237,23 @@ clearly asked for a directory to be unblocked. --- -## Current State (2026-04-28) +## Current State (2026-05-06) Cortex is running and stable. All channels are live: | Channel | Status | Notes | |---|---|---| -| Web UI | ✅ Live | `https://cortex.dgrzone.com` | +| Web UI | ✅ Live | `https://cortex.dgrzone.com` — PWA-installable | | Nextcloud Talk | ✅ Live | HMAC-signed webhook, async reply | | Google Chat | ✅ Live | Workspace Add-on, `hostAppDataAction` response format | -| Local backend | ✅ Live | Open WebUI/Ollama, per-user multi-model config | -| Orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI | +| Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming, per-user multi-model config | +| Gemini orchestrator | ✅ Live | Gemini API tool loop → Claude response; ⚡ toggle in UI | +| Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; fires when orchestrator role → local model | +| Tool audit log | ✅ Live | Every tool call logged to `home/{user}/tool_audit/YYYY-MM-DD.jsonl` | +| Token usage tracking | ✅ Live | Per-user `home/{user}/usage.json`; summary in Settings | +| Web push | ✅ Live | VAPID push notifications; `web_push` tool; subscribe via ☰ menu | -Active users: scott (inara, developer), holly (tina), brian (wintermute) +Active users: scott (inara), holly (tina), brian (wintermute) **40 orchestrator tools:** web_search, http_fetch, file_read/list/write, shell_exec, claude_allow_dir, diff --git a/README.md b/README.md index 1749b32..e928a7e 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,43 @@ Cortex is a self-hosted multi-agent AI platform. It supports multiple users, eac --- +## Where Cortex Fits + +AI tools aren't one-size-fits-all. Cortex exists in a specific niche — it's not trying to be everything. + +**Cortex is a self-hosted persona platform.** It gives you a persistent AI companion with its own +identity, memory, and voice — reachable through your chat apps, not just a browser tab. It remembers +who you are across days and weeks. It can proactively message you on a schedule. It runs on your +own hardware, behind your own auth. + +### What Cortex is good at +- **Being a consistent AI presence** — same persona, same memory, day after day +- **Multi-channel access** — web, Nextcloud Talk, Google Chat, all routed to the same brain +- **Proactive work** — scheduled messages, reminders, cron jobs that reach out to you +- **Multi-user households** — each person gets their own persona (Scott → Inara, Holly → Tina) +- **Private, offline-capable** — local models via Ollama when you don't want anything leaving the LAN + +### What Cortex is not +- **Not a coding assistant.** Cortex lives in chat apps, not in your terminal or IDE. + Use Claude Code, DeepSeek TUI, Gemini CLI, or Copilot for code-level work — they specialize in reading and + editing project files. Cortex can't open a codebase. +- **Not a generic LLM chat UI.** Open WebUI and LibreChat are excellent model-switching frontends. + Cortex isn't a frontend — it's a platform with its own identity system, orchestrator, and memory + pipeline. Two different jobs. +- **Not a SaaS product.** Nobody else hosts your Cortex instance. Nobody else sees your conversations. + The trade-off is you manage the service yourself — `systemctl --user restart cortex`. +- **Not an agent framework.** LangChain, CrewAI, and similar are libraries for building AI pipelines. + Cortex is a running service with concrete personas, not an abstraction layer to build on top of. + +### The stack in practice +- Use **Cortex** to talk to Inara — daily assistant, memory keeper, scheduled check-ins +- Use **Claude Code / DeepSeek TUI** to work *on* Cortex — code edits, architecture, debugging +- Use **Open WebUI** when you want to test a new model or run a quick prompt without persona context + +Same AI, different interfaces for different jobs. + +--- + ## Quick Orientation | Directory | What it is | diff --git a/cortex/main.py b/cortex/main.py index 69ae419..fa133b9 100644 --- a/cortex/main.py +++ b/cortex/main.py @@ -9,7 +9,7 @@ logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(messag from config import settings from auth_middleware import SessionAuthMiddleware from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator -from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit +from routers import ui, onboarding, settings, help, auth_google, local_llm, push, audit, usage @asynccontextmanager @@ -36,6 +36,7 @@ app.include_router(auth.router) app.include_router(orchestrator.router) app.include_router(push.router) app.include_router(audit.router) +app.include_router(usage.router) # Static files — must be mounted BEFORE ui.router so /static/* is matched first. # ui.router has a wildcard /{username}/{persona} that would otherwise catch /static/style.css etc. diff --git a/cortex/model_registry.py b/cortex/model_registry.py index b47d021..0d52b05 100644 --- a/cortex/model_registry.py +++ b/cortex/model_registry.py @@ -36,6 +36,7 @@ V2 Schema: "credential_id":str | null, # claude_cli only — references providers.anthropic.credentials "account_id": str | null, # gemini_api only — references providers.google.accounts "context_k": int, # context window in k tokens (informational) + "max_rounds": int | null, # per-model tool-loop cap; null = use orchestrator_max_rounds global "tags": [str], # user-defined capability tags }, ], @@ -642,7 +643,9 @@ def remove_host(username: str, host_id: str) -> bool: def save_model(username: str, model_id: str | None, host_id: str, label: str, model_name: str, context_k: int = 0, - tags: list[str] | None = None) -> str: + tags: list[str] | None = None, + max_rounds: int | None = None, + tools: bool = True) -> str: """Create or update a local_openai model entry. Returns the model ID.""" data = _load(username) tags = tags or [] @@ -654,6 +657,8 @@ def save_model(username: str, model_id: str | None, host_id: str, m["label"] = label.strip() or model_name.strip() m["model_name"] = model_name.strip() m["context_k"] = context_k + m["max_rounds"] = max_rounds + m["tools"] = tools m["tags"] = tags _save(username, data) return model_id @@ -668,6 +673,8 @@ def save_model(username: str, model_id: str | None, host_id: str, "provider": "local", "host_id": host_id, "context_k": context_k, + "max_rounds": max_rounds, + "tools": tools, "tags": tags, }) _save(username, data) @@ -679,7 +686,9 @@ def save_cloud_model(username: str, model_id: str | None, account_id: str | None = None, credential_id: str | None = None, context_k: int = 0, - tags: list[str] | None = None) -> str: + tags: list[str] | None = None, + max_rounds: int | None = None, + tools: bool = True) -> str: """ Create or update an Anthropic or Google model entry. Returns the model ID. @@ -698,6 +707,8 @@ def save_cloud_model(username: str, model_id: str | None, "model_name": model_name.strip(), "provider": provider, "context_k": context_k, + "max_rounds": max_rounds, + "tools": tools, "tags": tags, } if account_id: diff --git a/cortex/openai_orchestrator.py b/cortex/openai_orchestrator.py index 881fc57..cce3d42 100644 --- a/cortex/openai_orchestrator.py +++ b/cortex/openai_orchestrator.py @@ -273,18 +273,20 @@ async def _run_from_messages( final_response = "" budget = _context_budget(model_cfg) - for round_num in range(starting_round, settings.orchestrator_max_rounds): + per_model_limit = (model_cfg or {}).get("max_rounds") or settings.orchestrator_max_rounds + effective_limit = min(per_model_limit, settings.orchestrator_max_rounds) + + for round_num in range(starting_round, effective_limit): messages = _compact_messages(messages, budget) est = _estimate_tokens(messages) logger.info("OpenAI orchestrator round %d / %d model=%s ~%d tokens", - round_num + 1, settings.orchestrator_max_rounds, model_name, est) + round_num + 1, effective_limit, model_name, est) - response = await client.chat.completions.create( - model=model_name, - messages=messages, - tools=active_tools, - tool_choice="auto", - ) + call_kwargs: dict = {"model": model_name, "messages": messages} + if active_tools: + call_kwargs["tools"] = active_tools + call_kwargs["tool_choice"] = "auto" + response = await client.chat.completions.create(**call_kwargs) choice = response.choices[0] msg = choice.message @@ -339,12 +341,11 @@ async def _run_from_messages( tool_call_log.append({"tool": pt["name"], "args": pt["args"], "result": "[awaiting confirmation]"}) messages.append({"role": "tool", "tool_call_id": pt["tool_call_id"], "content": placeholder}) - conf_resp = await client.chat.completions.create( - model=model_name, - messages=messages, - tools=active_tools, - tool_choice="none", - ) + messages = _compact_messages(messages, budget) + conf_call: dict = {"model": model_name, "messages": messages, "tool_choice": "none"} + if active_tools: + conf_call["tools"] = active_tools + conf_resp = await client.chat.completions.create(**conf_call) final_response = conf_resp.choices[0].message.content or ( "This action requires your explicit confirmation before it can proceed." ) @@ -375,9 +376,9 @@ async def _run_from_messages( break else: - logger.warning("OpenAI orchestrator hit max rounds (%d)", settings.orchestrator_max_rounds) + logger.warning("OpenAI orchestrator hit max rounds (%d)", effective_limit) final_response = ( - f"Reached the tool iteration limit ({settings.orchestrator_max_rounds} rounds). " + f"Reached the tool iteration limit ({effective_limit} rounds). " "Here is what was gathered:\n\n" + "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log) ) @@ -405,7 +406,10 @@ def _build_client( if host_type == "openwebui": base_url = base_url + "/api" client = AsyncOpenAI(base_url=base_url, api_key=api_key) - active_tools = get_openai_tools_for_role(user_role, tool_list) + if model_cfg.get("tools") is False: + active_tools = [] + else: + active_tools = get_openai_tools_for_role(user_role, tool_list) return client, model_name, active_tools diff --git a/cortex/routers/chat.py b/cortex/routers/chat.py index f888253..bec1b48 100644 --- a/cortex/routers/chat.py +++ b/cortex/routers/chat.py @@ -295,6 +295,53 @@ async def rename_session_endpoint( return {"ok": True, "session_id": session_id, "name": req.name.strip()} +@router.post("/api/sessions/backfill-names") +async def backfill_session_names( + request: Request, + user: str = Query(""), + persona: str = Query(""), +) -> dict: + """Name every unnamed session using its first user message (truncated to 60 chars). + Idempotent — only touches sessions that have no name set. + user/persona default to the JWT session user + last-used persona cookie.""" + # Resolve user from JWT if not provided + if not user: + token = request.cookies.get(COOKIE_NAME) + if not token: + raise HTTPException(status_code=401, detail="Not authenticated") + try: + user = decode_token(token) + except jwt.InvalidTokenError: + raise HTTPException(status_code=401, detail="Invalid session") + + # Resolve persona from cookie if not provided + if not persona: + from persona import list_user_personas + persona_cookie = request.cookies.get("cx_last_persona", "") + available = list_user_personas(user) + persona = persona_cookie if persona_cookie in available else (available[0] if available else "") + if not persona: + raise HTTPException(status_code=400, detail="No persona found for user") + + _set_ctx(user, persona) + sessions = list_all() + named = 0 + for s in sessions: + if s.get("name"): + continue + messages = load_session(s["session_id"]) + first_user = next((m for m in messages if m.get("role") == "user"), None) + if not first_user: + continue + text = (first_user.get("content") or "").strip() + if not text: + continue + auto_name = text[:60].rstrip() + ("…" if len(text) > 60 else "") + rename_session(s["session_id"], auto_name) + named += 1 + return {"ok": True, "named": named, "total": len(sessions)} + + @router.delete("/sessions/{session_id}") async def delete_session_endpoint( session_id: str, diff --git a/cortex/routers/distill.py b/cortex/routers/distill.py index d7fb4eb..2253e50 100644 --- a/cortex/routers/distill.py +++ b/cortex/routers/distill.py @@ -1,25 +1,50 @@ """ Manual memory distillation endpoints. - POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM) - POST /distill/mid — summarize short → MEMORY_MID.md (LLM) - POST /distill/long — integrate mid → MEMORY_LONG.md (LLM) - POST /distill/all — run all three in sequence + POST /distill/short — roll session logs → MEMORY_SHORT.md (no LLM) + POST /distill/mid — summarize short → MEMORY_MID.md (LLM) + POST /distill/long — integrate mid → MEMORY_LONG.md (LLM) + POST /distill/all — run all three in sequence + POST /distill/rebuild — wipe mid + long, then run all three from scratch -All endpoints require ?user=&persona= query params so distillation -targets the correct persona. Without them, the request is rejected (no silent fallback -to server defaults — that caused wrong-user distillation in a multi-user setup). +All endpoints require ?user=&persona= query params. + +Concurrency: one distillation at a time per persona. A second request while one +is running returns 409 immediately — no silent queuing. """ +import asyncio +from datetime import datetime, timedelta from fastapi import APIRouter, HTTPException, Query from memory_distiller import distill_short, distill_mid, distill_long -from persona import validate as validate_persona, set_context +from persona import validate as validate_persona, set_context, persona_path as _persona_path import scheduler router = APIRouter(prefix="/distill") +# Per-persona asyncio lock. Key: (user, persona) +_LOCKS: dict[tuple, asyncio.Lock] = {} +_LOCKS_META: dict[tuple, str] = {} # key → which step is currently running + +# Minimum time between successive runs of each endpoint, per persona. +# Prevents accidental rapid-fire runs and token waste. +_COOLDOWNS: dict[tuple, timedelta] = { + "short": timedelta(minutes=1), + "mid": timedelta(minutes=30), + "long": timedelta(hours=6), + "all": timedelta(hours=1), + "rebuild": timedelta(hours=6), +} +_LAST_RUN: dict[tuple, datetime] = {} # key: (user, persona, endpoint) + + +def _get_lock(user: str, persona: str) -> asyncio.Lock: + key = (user, persona) + if key not in _LOCKS: + _LOCKS[key] = asyncio.Lock() + return _LOCKS[key] + def _resolve(user: str, persona: str) -> tuple[str, str]: - """Validate and set persona context. Raises 404 if the persona doesn't exist.""" try: u, p = validate_persona(user, persona) except Exception: @@ -28,13 +53,51 @@ def _resolve(user: str, persona: str) -> tuple[str, str]: return u, p +def _check_lock(user: str, persona: str) -> asyncio.Lock: + """Return the lock if free, raise 409 if already held.""" + lock = _get_lock(user, persona) + if lock.locked(): + step = _LOCKS_META.get((user, persona), "distillation") + raise HTTPException( + status_code=409, + detail=f"A {step} is already running for {persona} — please wait for it to finish.", + ) + return lock + + +def _check_cooldown(user: str, persona: str, endpoint: str) -> None: + """Raise 429 if the endpoint was run too recently for this persona.""" + cooldown = _COOLDOWNS.get(endpoint) + if not cooldown: + return + key = (user, persona, endpoint) + last = _LAST_RUN.get(key) + if last: + elapsed = datetime.now() - last + if elapsed < cooldown: + remaining = cooldown - elapsed + mins = int(remaining.total_seconds() // 60) + secs = int(remaining.total_seconds() % 60) + wait = f"{mins}m {secs}s" if mins else f"{secs}s" + raise HTTPException( + status_code=429, + detail=f"{endpoint} was just run — please wait {wait} before running again.", + ) + + +def _record_run(user: str, persona: str, endpoint: str) -> None: + _LAST_RUN[(user, persona, endpoint)] = datetime.now() + + @router.get("/status") async def distill_status() -> dict: - """Show auto-distillation schedule and next run times.""" from config import settings + # Include which personas are currently distilling + active = [f"{u}/{p}" for (u, p), lock in _LOCKS.items() if lock.locked()] return { "enabled": settings.auto_distill, "jobs": scheduler.status(), + "active": active, "config": { "short": settings.auto_distill_short, "mid": settings.auto_distill_mid, @@ -49,7 +112,16 @@ async def do_distill_short( persona: str = Query(...), ) -> dict: u, p = _resolve(user, persona) - return {"ok": True, **distill_short(u, p)} + _check_cooldown(u, p, "short") + lock = _check_lock(u, p) + async with lock: + _LOCKS_META[(u, p)] = "short distill" + try: + result = distill_short(u, p) + _record_run(u, p, "short") + return {"ok": True, **result} + finally: + _LOCKS_META.pop((u, p), None) @router.post("/mid") @@ -58,8 +130,17 @@ async def do_distill_mid( persona: str = Query(...), ) -> dict: u, p = _resolve(user, persona) - result = await distill_mid(u, p) - return {"ok": "error" not in result, **result} + _check_cooldown(u, p, "mid") + lock = _check_lock(u, p) + async with lock: + _LOCKS_META[(u, p)] = "mid distill" + try: + result = await distill_mid(u, p) + if "error" not in result: + _record_run(u, p, "mid") + return {"ok": "error" not in result, **result} + finally: + _LOCKS_META.pop((u, p), None) @router.post("/long") @@ -68,8 +149,17 @@ async def do_distill_long( persona: str = Query(...), ) -> dict: u, p = _resolve(user, persona) - result = await distill_long(u, p) - return {"ok": "error" not in result, **result} + _check_cooldown(u, p, "long") + lock = _check_lock(u, p) + async with lock: + _LOCKS_META[(u, p)] = "long distill" + try: + result = await distill_long(u, p) + if "error" not in result: + _record_run(u, p, "long") + return {"ok": "error" not in result, **result} + finally: + _LOCKS_META.pop((u, p), None) @router.post("/all") @@ -78,14 +168,71 @@ async def do_distill_all( persona: str = Query(...), ) -> dict: u, p = _resolve(user, persona) - short_result = distill_short(u, p) - mid_result = await distill_mid(u, p) - if "error" in mid_result: - return {"ok": False, "short": short_result, "mid": mid_result} - long_result = await distill_long(u, p) - return { - "ok": "error" not in long_result, - "short": short_result, - "mid": mid_result, - "long": long_result, - } + _check_cooldown(u, p, "all") + lock = _check_lock(u, p) + async with lock: + _LOCKS_META[(u, p)] = "full distill" + try: + short_result = distill_short(u, p) + mid_result = await distill_mid(u, p) + if "error" in mid_result: + return {"ok": False, "short": short_result, "mid": mid_result} + long_result = await distill_long(u, p) + ok = "error" not in long_result + if ok: + _record_run(u, p, "all") + return { + "ok": ok, + "short": short_result, + "mid": mid_result, + "long": long_result, + } + finally: + _LOCKS_META.pop((u, p), None) + + +@router.post("/rebuild") +async def do_distill_rebuild( + user: str = Query(...), + persona: str = Query(...), +) -> dict: # noqa: E501 + """Wipe MEMORY_MID and MEMORY_LONG (with backups), then run short → mid → long. + + Use when memories have drifted, been corrupted, or you want a clean slate + rebuilt purely from session logs. Hand-edited content will be replaced. + """ + u, p = _resolve(user, persona) + _check_cooldown(u, p, "rebuild") + lock = _check_lock(u, p) + async with lock: + _LOCKS_META[(u, p)] = "memory rebuild" + try: + from memory_distiller import _rotate_backup, _read + inara_dir = _persona_path(u, p) + + # Back up then wipe mid and long before rebuilding + for name in ("MEMORY_MID.md", "MEMORY_LONG.md"): + path = inara_dir / name + if path.exists(): + _rotate_backup(path) + path.write_text( + f"# {name}\n\n*Cleared for rebuild — {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}.*\n" + ) + + short_result = distill_short(u, p) + mid_result = await distill_mid(u, p) + if "error" in mid_result: + return {"ok": False, "short": short_result, "mid": mid_result, "rebuilt": True} + long_result = await distill_long(u, p) + ok = "error" not in long_result + if ok: + _record_run(u, p, "rebuild") + return { + "ok": ok, + "short": short_result, + "mid": mid_result, + "long": long_result, + "rebuilt": True, + } + finally: + _LOCKS_META.pop((u, p), None) diff --git a/cortex/routers/files.py b/cortex/routers/files.py index d37d091..17d0a77 100644 --- a/cortex/routers/files.py +++ b/cortex/routers/files.py @@ -27,10 +27,21 @@ ALLOWED = { "MEMORY_SHORT.bak1.md", "MEMORY_SHORT.bak2.md", "HELP.md", + # Agent private notes — backups only; AGENT_NOTES.md itself is agent-only + "AGENT_NOTES.bak1.md", + "AGENT_NOTES.bak2.md", + "AGENT_NOTES.bak3.md", +} + +# Files that can be read via the panel but not written by users +READ_ONLY = { + "AGENT_NOTES.bak1.md", + "AGENT_NOTES.bak2.md", + "AGENT_NOTES.bak3.md", } # Files served from home/{user}/ instead of persona path -USER_FILES = {"email_allowlist.json"} +USER_FILES = {"email_allowlist.json", "usage.json"} def _resolve(user: str, persona: str) -> None: @@ -92,7 +103,11 @@ async def get_file( p = _path(filename, user=user) if not p.exists(): raise HTTPException(status_code=404, detail=f"{filename} does not exist") - return {"name": filename, "content": p.read_text()} + return { + "name": filename, + "content": p.read_text(), + "readonly": filename in READ_ONLY, + } class FileWrite(BaseModel): @@ -106,6 +121,8 @@ async def save_file( user: str = Query("scott"), persona: str = Query("inara"), ) -> dict: + if filename in READ_ONLY: + raise HTTPException(status_code=403, detail=f"{filename} is read-only.") _resolve(user, persona) p = _path(filename, user=user) p.write_text(req.content) diff --git a/cortex/routers/local_llm.py b/cortex/routers/local_llm.py index 4475b39..7b168dd 100644 --- a/cortex/routers/local_llm.py +++ b/cortex/routers/local_llm.py @@ -159,7 +159,8 @@ def _render(username: str, success: str = "", error: str = "") -> str: else: secondary = default_secondary - ctx = f'{m.get("context_k",0)}k' if m.get("context_k") else "" + ctx = f'{m.get("context_k",0)}k' if m.get("context_k") else "" + no_tools = '' if m.get("tools", True) else 'no tools' tags_html = " ".join(f'{t}' for t in (m.get("tags") or [])) sec = f'{secondary}' if secondary else "" @@ -201,13 +202,15 @@ def _render(username: str, success: str = "", error: str = "") -> str: cur_label = m.get("label", "") cur_model_name = m.get("model_name", "") cur_ctx = m.get("context_k", 0) or 0 + cur_max_rounds = m.get("max_rounds") or 0 + cur_tools = m.get("tools", True) cur_tags = ", ".join(m.get("tags") or []) model_rows += f'''
-
{badge}{m.get("label") or m.get("model_name","")}{ctx}
+
{badge}{m.get("label") or m.get("model_name","")}{ctx}{no_tools}
{m.get("model_name","")} {sec}
{tags_html}
@@ -239,8 +242,22 @@ def _render(username: str, success: str = "", error: str = "") -> str: {extra_fields}
- - + + +
+
+ + +
+
+ +
@@ -426,6 +443,8 @@ async def add_model( provider: str = Form("local"), label: str = Form(""), context_k: int = Form(0), + max_rounds: int = Form(0), + tools: int = Form(1), tags: str = Form(""), # local-only fields host_id: str = Form(""), @@ -439,14 +458,17 @@ async def add_model( if not username: return RedirectResponse("/login", status_code=302) - tag_list = [t.strip() for t in tags.split(",") if t.strip()] + tag_list = [t.strip() for t in tags.split(",") if t.strip()] + max_rounds_ = max_rounds or None + tools_bool = tools != 0 if provider == "local": if not model_name.strip(): return HTMLResponse(_render(username, error="Model name is required.")) if not host_id.strip(): return HTMLResponse(_render(username, error="Select a host.")) - reg.save_model(username, None, host_id, label, model_name, context_k, tag_list) + reg.save_model(username, None, host_id, label, model_name, context_k, tag_list, + max_rounds=max_rounds_, tools=tools_bool) display = label or model_name elif provider in ("google", "anthropic"): @@ -459,6 +481,7 @@ async def add_model( account_id=account_id or None, credential_id=credential_id or None, context_k=context_k, tags=tag_list, + max_rounds=max_rounds_, tools=tools_bool, ) display = label or cloud_model_name else: @@ -476,6 +499,8 @@ async def edit_model( label: str = Form(""), model_name: str = Form(""), context_k: int = Form(0), + max_rounds: int = Form(0), + tools: int = Form(1), tags: str = Form(""), host_id: str = Form(""), account_id: str = Form(""), @@ -486,17 +511,22 @@ async def edit_model( return RedirectResponse("/login", status_code=302) if not model_name.strip(): return HTMLResponse(_render(username, error="Model name is required.")) - tag_list = [t.strip() for t in tags.split(",") if t.strip()] + tag_list = [t.strip() for t in tags.split(",") if t.strip()] + max_rounds_ = max_rounds or None + tools_bool = tools != 0 if mtype == "local_openai": if not host_id.strip(): return HTMLResponse(_render(username, error="Select a host for this model.")) - reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list) + reg.save_model(username, model_id, host_id, label, model_name, context_k, tag_list, + max_rounds=max_rounds_, tools=tools_bool) elif mtype == "gemini_api": reg.save_cloud_model(username, model_id, "google", model_name, label, - account_id=account_id or None, context_k=context_k, tags=tag_list) + account_id=account_id or None, context_k=context_k, tags=tag_list, + max_rounds=max_rounds_, tools=tools_bool) elif mtype == "claude_cli": reg.save_cloud_model(username, model_id, "anthropic", model_name, label, - credential_id=credential_id or "cli", context_k=context_k, tags=tag_list) + credential_id=credential_id or "cli", context_k=context_k, tags=tag_list, + max_rounds=max_rounds_, tools=tools_bool) else: return HTMLResponse(_render(username, error=f"Unknown model type: {mtype}")) display = label.strip() or model_name.strip() diff --git a/cortex/routers/onboarding.py b/cortex/routers/onboarding.py index c8c8554..773d4e7 100644 --- a/cortex/routers/onboarding.py +++ b/cortex/routers/onboarding.py @@ -1,11 +1,13 @@ """ -Onboarding router — invite-based setup + persona creation. +Onboarding router — invite-based setup + persona creation + model connect. Routes: GET /setup/{token} → show password setup form (step 1) POST /setup/{token} → set password, redirect to persona step GET /setup/persona → show persona creation form (step 2, requires auth) - POST /setup/persona → create persona, redirect to /{user}/{persona} + POST /setup/persona → create persona, redirect to /setup/model + GET /setup/model → OpenRouter quick-connect (step 3, also standalone) + POST /setup/model → save host + model + assign to chat role, redirect to chat """ import logging @@ -21,6 +23,7 @@ from auth_utils import ( ) from persona_template import create_persona from persona import list_user_personas, validate as validate_persona +import model_registry logger = logging.getLogger(__name__) router = APIRouter(prefix="/setup") @@ -114,7 +117,11 @@ async def persona_submit( description=description.strip(), ) logger.info("persona created: %s/%s", username, persona_name) - return RedirectResponse(f"/{username}/{persona_name}", status_code=302) + # Step 3: guided model setup before entering the chat + resp = RedirectResponse("/setup/model", status_code=302) + # Remember which persona to land on after model setup + resp.set_cookie("cx_setup_persona", f"{username}/{persona_name}", max_age=3600, httponly=True, samesite="lax") + return resp # --------------------------------------------------------------------------- @@ -178,3 +185,126 @@ async def setup_submit( return resp return HTMLResponse(_setup_page("Unknown step."), status_code=400) + + +# --------------------------------------------------------------------------- +# Step 3 — model connect (OpenRouter quick-connect, also standalone) +# --------------------------------------------------------------------------- + +# Curated model list shown in the Step 3 dropdown. +_OPENROUTER_MODELS = [ + ("anthropic/claude-3-5-haiku-20241022", "Claude 3.5 Haiku — Fast & affordable"), + ("anthropic/claude-3-7-sonnet-20250219", "Claude 3.7 Sonnet — Smarter Claude"), + ("google/gemini-2.0-flash-001", "Gemini 2.0 Flash — Fast Google model"), + ("meta-llama/llama-3.3-70b-instruct", "Llama 3.3 70B — Open source"), +] + + +def _model_page(error: str = "", from_setup: bool = False) -> str: + html = (_STATIC / "setup.html").read_text() + # Hide steps 1 and 2 inline; show step 3 + html = html.replace('
', '