docs: update end-user HELP.md for model registry V2
- Backends section: add local as third backend option, explain model tag on responses, clarify auto vs explicit toggle behavior - Agent Mode: remove hard-coded "Gemini" reference — orchestrator model is now configurable via role assignments - New Model Registry section: step-by-step for adding Google accounts, local hosts, cloud/local model entries, and role assignments - API reference: add local to model field, add /settings/models endpoint - Remove outdated In Progress section (local backend + multi-user shipped) - Header controls table: update Backend description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@
|
||||
and are appended automatically by help.html when present.
|
||||
-->
|
||||
|
||||
*Last updated: 2026-03-27*
|
||||
*Last updated: 2026-04-27*
|
||||
|
||||
---
|
||||
|
||||
@@ -26,7 +26,7 @@ The **⚙ Settings** panel contains all configuration options:
|
||||
| **Context Tier** | T1 – T4 context depth |
|
||||
| **Memory Layers** | Toggle Long / Mid / Short memory on/off |
|
||||
| **Distill Memory** | Manually trigger short / mid / long / all distillation |
|
||||
| **Backend** | Active LLM backend — click to toggle claude ↔ gemini |
|
||||
| **Backend** | Active LLM backend — click to cycle: claude → gemini → local → auto |
|
||||
| **Display** | Aa/A+/A− font size cycle · ☾/☀ theme toggle |
|
||||
|
||||
All header settings (theme, font size, tier, memory layers) persist in `localStorage` across page refreshes.
|
||||
@@ -42,21 +42,26 @@ All header settings (theme, font size, tier, memory layers) persist in `localSto
|
||||
- **Copy a response:** Hover over any assistant message → click **copy**.
|
||||
- **New line while typing:** `Shift+Enter` (in `Ctrl+Enter` mode) or `Shift+Enter` / Enter (in Enter mode).
|
||||
|
||||
Each assistant response shows a small **model tag** in the bottom-right corner identifying which model and host responded.
|
||||
|
||||
---
|
||||
|
||||
## Agent Mode
|
||||
|
||||
Click the **Agent** button in the input row to enable Agent mode. The button highlights and Send changes to **Run**.
|
||||
|
||||
In Agent mode, messages are routed through the **orchestrator** instead of directly to Claude:
|
||||
In Agent mode, messages are routed through the **orchestrator** instead of directly to the chat model:
|
||||
|
||||
1. **Gemini** runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
|
||||
2. **Claude** receives the enriched context and writes the final response
|
||||
3. A `⚡ N tool calls: …` note appears below the response listing what was used
|
||||
1. The **orchestrator model** runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
|
||||
2. It produces an enriched summary of what it found
|
||||
3. The **responder model** receives that context and writes the final user-facing reply
|
||||
4. A `⚡ N tool calls: …` note appears below the response listing what was used
|
||||
|
||||
Which model acts as orchestrator is set in **Settings → Models → Role Assignments → Orchestrator**. By default this is Gemini API; a capable local model can be assigned instead.
|
||||
|
||||
Agent mode is best for tasks that require research, multi-step reasoning, or tool use (e.g. "search for X", "add a task", "what's on my list?"). Regular chat is faster for conversational turns.
|
||||
|
||||
Agent mode sessions persist to history exactly like regular chat — they survive page refreshes and appear in the Sessions panel.
|
||||
Agent mode sessions persist to history exactly like regular chat.
|
||||
|
||||
---
|
||||
|
||||
@@ -84,10 +89,77 @@ Notes are injected into a session without triggering an LLM response.
|
||||
|
||||
## Backends
|
||||
|
||||
- **Claude CLI** and **Gemini CLI** are both available. One is primary, the other is fallback.
|
||||
- Click **⚙** → **Backend** to toggle between `claude` and `gemini` as the primary.
|
||||
- If the primary fails or times out, the fallback is used automatically. A **⚡** notice appears in the chat when this happens.
|
||||
- Timeouts: Claude 60s, Gemini 120s.
|
||||
Three backends are available:
|
||||
|
||||
| Backend | What it is |
|
||||
|---|---|
|
||||
| **Claude** | Anthropic Claude via the Claude CLI (OAuth — no API key needed) |
|
||||
| **Gemini** | Google Gemini via the Gemini CLI |
|
||||
| **Local** | Any OpenAI-compatible endpoint (Open WebUI, Ollama, OpenRouter, etc.) |
|
||||
|
||||
The **⚙ Backend** toggle cycles: **auto → claude → gemini → local → auto**
|
||||
|
||||
- **auto** uses the model assigned to the `chat` role in your Model Registry (recommended)
|
||||
- Selecting a specific backend forces that backend for all messages, regardless of role assignments
|
||||
- The active model label appears below the toggle button when a specific backend is active
|
||||
|
||||
If the active backend fails, a fallback is tried automatically. A **⚡** badge appears on the response when this happens.
|
||||
|
||||
Each response shows a **model tag** (bottom-right of message) with the model label and host, so you always know what responded.
|
||||
|
||||
---
|
||||
|
||||
## Model Registry ( Settings → Models )
|
||||
|
||||
The Model Registry is where you configure which AI models are available and which handles each task type.
|
||||
|
||||
**Navigate to:** Settings (top-right of any page) → **Models**
|
||||
|
||||
### Cloud Providers
|
||||
|
||||
**Anthropic** — Claude is accessed via the Claude Code CLI. No API key setup needed in the registry — just make sure you're authenticated (`claude auth login` in a terminal).
|
||||
|
||||
**Google** — Gemini models use the Gemini API with an explicit API key:
|
||||
1. Settings → Models → Cloud Providers → Google → **Add account**
|
||||
2. Enter a label (e.g. "Work", "Personal") and your Gemini API key
|
||||
3. Get a free key at [aistudio.google.com](https://aistudio.google.com/apikey) — free tier is sufficient for most use
|
||||
4. Add one account per Google account you want to use
|
||||
|
||||
### Local Hosts
|
||||
|
||||
For Open WebUI, Ollama, LM Studio, OpenRouter, or any OpenAI-compatible server:
|
||||
1. Settings → Models → Local Hosts → expand **Add host**
|
||||
2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and an optional API key
|
||||
3. Choose **Type**: Open WebUI / Ollama, or OpenAI-compatible (OpenRouter, etc.)
|
||||
4. Save, then use **Fetch models** on the host card to verify connectivity
|
||||
|
||||
### Adding a Model
|
||||
|
||||
Settings → Models → Add Model. Use the tabs to select provider:
|
||||
|
||||
| Tab | How it works |
|
||||
|---|---|
|
||||
| **Local** | Select a host → enter model name (or use "Fetch from host" to pick from a live list) |
|
||||
| **Google** | Pick a Gemini model from the catalog → select which Google account to use |
|
||||
| **Anthropic** | Pick a Claude model from the catalog → uses your CLI OAuth session |
|
||||
|
||||
Enter a label (auto-filled from the catalog), context window size, and optional tags. Click **Add Model**.
|
||||
|
||||
### Role Assignments
|
||||
|
||||
Once models are added, assign them to task types at the bottom of Settings → Models:
|
||||
|
||||
| Role | Used for |
|
||||
|---|---|
|
||||
| **Chat** | Regular conversation — the main chat model |
|
||||
| **Orchestrator** | Agent mode tool loop |
|
||||
| **Distill** | Memory distillation (short/mid/long) |
|
||||
| **Coder** | Code-focused tasks |
|
||||
| **Research** | Research and long-context tasks |
|
||||
|
||||
Each role has **Primary**, **Backup 1**, and **Backup 2** slots. If Primary fails or is unreachable, Backup 1 is tried, then Backup 2. Changes save immediately on select.
|
||||
|
||||
Leave all slots empty to use the server-default model (configured in `.env`).
|
||||
|
||||
---
|
||||
|
||||
@@ -95,10 +167,10 @@ Notes are injected into a session without triggering an LLM response.
|
||||
|
||||
Inara is registered as a bot in Nextcloud Talk.
|
||||
|
||||
- Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to by Inara.
|
||||
- The webhook returns `200 OK` immediately; the LLM call and reply happen asynchronously.
|
||||
- Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to.
|
||||
- The webhook returns `200 OK` immediately; the reply happens asynchronously.
|
||||
- Real-time updates stream to the web UI via SSE — you see Talk messages and responses appear live.
|
||||
- To enable the bot in a conversation: open Talk conversation settings → Bots → enable Inara.
|
||||
- To enable the bot in a conversation: open Talk conversation settings → Bots → enable the bot.
|
||||
|
||||
---
|
||||
|
||||
@@ -108,29 +180,27 @@ Inara is available as a bot in Google Chat (One Sky IT Workspace).
|
||||
|
||||
- Send Inara a direct message in Google Chat to start a conversation.
|
||||
- Each DM thread is its own session (`gc_spaces/*` prefix) — history persists across messages.
|
||||
- Responses are synchronous — Google Chat displays Inara's reply directly in the thread.
|
||||
- Responses are synchronous — Google Chat displays the reply directly in the thread.
|
||||
- To add Inara to a space: open the space, add a person/app, search for **Inara**.
|
||||
- Sessions from Google Chat appear as `gc_*` prefixed IDs in the Sessions panel.
|
||||
|
||||
**Technical note:** Cortex uses Google's Workspace Add-on format (`hostAppDataAction`) — the modern API required for all Google Chat apps as of 2025.
|
||||
|
||||
---
|
||||
|
||||
## Files (Identity Editor)
|
||||
|
||||
The **Files** button opens an editor for Inara's identity and memory files:
|
||||
The **Files** button opens an editor for your persona's identity and memory files:
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `SOUL.md` | Core personality, values, and voice |
|
||||
| `IDENTITY.md` | Role, capabilities, and context |
|
||||
| `USER.md` | Scott's profile, preferences, and history |
|
||||
| `USER.md` | Your profile, preferences, and history |
|
||||
| `PROTOCOLS.md` | Behavioural rules and communication protocols |
|
||||
| `CONTEXT_TIERS.md` | Defines what gets loaded at each context tier |
|
||||
| `MEMORY_LONG.md` | Permanent curated long-term memory |
|
||||
| `MEMORY_MID.md` | Rolling mid-term digest (LLM-distilled) |
|
||||
| `MEMORY_SHORT.md` | Recent session rollup (auto-aggregated) |
|
||||
| `TASKS.json` | Inara's personal task list (managed via Agent mode) |
|
||||
| `TASKS.json` | Personal task list (managed via Agent mode) |
|
||||
| `HELP.md` | This file |
|
||||
|
||||
Toggle **preview** / **edit** to switch between rendered markdown and raw text. **Ctrl+S** saves, **Esc** closes.
|
||||
@@ -154,19 +224,19 @@ Default is T2. Use T1 for small/local models. Use T3–T4 for complex multi-sess
|
||||
|
||||
### Memory Layers
|
||||
|
||||
Three independently toggleable memory files, loaded **Long → Mid → Short** (short sits closest to the conversation turn for better LLM recall):
|
||||
Three independently toggleable memory files, loaded **Long → Mid → Short**:
|
||||
|
||||
| Layer | File | Contents |
|
||||
|---|---|---|
|
||||
| **Long** | `MEMORY_LONG.md` | Permanent facts — origin, key decisions, Scott's profile highlights |
|
||||
| **Long** | `MEMORY_LONG.md` | Permanent facts — origin, key decisions, profile highlights |
|
||||
| **Mid** | `MEMORY_MID.md` | Rolling digest of recent weeks — LLM-distilled from Short |
|
||||
| **Short** | `MEMORY_SHORT.md` | Recent session rollup — auto-aggregated from session log files |
|
||||
| **Short** | `MEMORY_SHORT.md` | Recent session rollup — auto-aggregated from session logs |
|
||||
|
||||
Toggle any layer off to save tokens for a focused conversation where history isn't needed.
|
||||
Toggle any layer off to save tokens for a focused conversation.
|
||||
|
||||
### Memory Distillation (manual)
|
||||
### Memory Distillation
|
||||
|
||||
Distillation builds up the memory layers from raw session logs. Currently **manual** — trigger via the ⚙ panel:
|
||||
Distillation builds up the memory layers from raw session logs. Runs automatically on a schedule; trigger manually via the ⚙ panel:
|
||||
|
||||
| Button | What it does |
|
||||
|---|---|
|
||||
@@ -175,12 +245,7 @@ Distillation builds up the memory layers from raw session logs. Currently **manu
|
||||
| **long** | LLM integrates `MEMORY_MID.md` → `MEMORY_LONG.md` |
|
||||
| **all** | Runs short → mid → long in sequence |
|
||||
|
||||
**Recommended workflow:**
|
||||
- Run **short** after any productive session to capture it.
|
||||
- Run **mid** weekly to distil short → mid.
|
||||
- Run **long** monthly to absorb mid into permanent memory.
|
||||
|
||||
Token budgets for each layer are set in `.env` (`MEMORY_BUDGET_LONG`, `MEMORY_BUDGET_MID`, `MEMORY_BUDGET_SHORT`).
|
||||
**Recommended workflow:** run **short** after any productive session; **mid** weekly; **long** monthly.
|
||||
|
||||
---
|
||||
|
||||
@@ -192,9 +257,8 @@ Token budgets for each layer are set in `.env` (`MEMORY_BUDGET_LONG`, `MEMORY_BU
|
||||
| `Enter` | Send (when in Enter mode) |
|
||||
| `Shift+Enter` | New line in message input |
|
||||
| `Ctrl+Enter` | Save inline message edit |
|
||||
| `Esc` | Cancel inline edit |
|
||||
| `Esc` | Cancel inline edit / close any open modal |
|
||||
| `Ctrl+S` | Save file (Files modal) |
|
||||
| `Esc` | Close any open modal |
|
||||
|
||||
---
|
||||
|
||||
@@ -219,10 +283,11 @@ For direct access or scripting:
|
||||
| `POST` | `/distill/mid` | Summarize short → MEMORY_MID (LLM) |
|
||||
| `POST` | `/distill/long` | Integrate mid → MEMORY_LONG (LLM) |
|
||||
| `POST` | `/distill/all` | Run all three distillation steps |
|
||||
| `GET` | `/distill/status` | Show scheduler status and next run times |
|
||||
| `GET` | `/distill/status` | Scheduler status and next run times |
|
||||
| `POST` | `/orchestrate` | Submit an agent task — returns `{"job_id": "..."}` |
|
||||
| `GET` | `/orchestrate/{job_id}` | Poll job status and result |
|
||||
| `GET` | `/orchestrate` | List all jobs from current session (in-memory) |
|
||||
| `GET` | `/settings/models` | Model registry UI |
|
||||
| `POST` | `/api/models/role` | Set a role assignment (JSON body) |
|
||||
| `GET` | `/health` | Health check — returns `{"status": "ok"}` |
|
||||
|
||||
Chat request body (`POST /chat`):
|
||||
@@ -230,8 +295,8 @@ Chat request body (`POST /chat`):
|
||||
{
|
||||
"message": "string",
|
||||
"session_id": "string | null",
|
||||
"tier": 1,
|
||||
"model": "claude | gemini | null",
|
||||
"tier": 2,
|
||||
"model": "claude | gemini | local | null",
|
||||
"include_long": true,
|
||||
"include_mid": true,
|
||||
"include_short": true
|
||||
@@ -240,23 +305,4 @@ Chat request body (`POST /chat`):
|
||||
|
||||
---
|
||||
|
||||
## In Progress / Planned
|
||||
|
||||
- **Ollama local model backend** — direct Ollama API support (no CLI wrapper); target host: scott_gaming via WireGuard
|
||||
- **Nextcloud Talk stabilization** — test end-to-end after restarts; complete bot registration docs
|
||||
- **Multi-user support** — per-user identity/memory files; currently single-user (Scott); Holly instance planned
|
||||
|
||||
### Recently Completed
|
||||
|
||||
- ✓ **Google Chat bot** — Workspace Add-on integration; DM and spaces; JWT verification; session persistence
|
||||
- ✓ **Agent mode** — Gemini tool loop + Claude responder, accessible via UI toggle
|
||||
- ✓ **Personal task management** — `task_list`, `task_create`, `task_update`, `task_complete` tools backed by `TASKS.json`
|
||||
- ✓ **Web search fixed** — DDG package updated (`ddgs`); `WebSearch`/`WebFetch` allowed for Claude CLI fallback
|
||||
- ✓ **Session persistence for orchestrator** — agent mode turns now survive page refresh
|
||||
- ✓ **Systemd user service** — Cortex runs as a user service; no sudo required (`systemctl --user restart cortex`)
|
||||
- ✓ **OAuth token warning banner** — amber banner when Claude CLI token is within 24h of expiry
|
||||
|
||||
---
|
||||
|
||||
*Cortex is Scott's personal AI orchestration system. Inara is its primary resident agent.*
|
||||
*Built on FastAPI + Claude CLI + Gemini CLI. Named after Firefly.*
|
||||
*Cortex is a self-hosted personal AI platform. Named after the 'verse-wide communications network in Firefly.*
|
||||
|
||||
Reference in New Issue
Block a user