Files
Cortex-Inara/cortex/static/HELP.md
Scott Idem 9f6b162fbd docs: update end-user HELP.md for model registry V2
- Backends section: add local as third backend option, explain model
  tag on responses, clarify auto vs explicit toggle behavior
- Agent Mode: remove hard-coded "Gemini" reference — orchestrator model
  is now configurable via role assignments
- New Model Registry section: step-by-step for adding Google accounts,
  local hosts, cloud/local model entries, and role assignments
- API reference: add local to model field, add /settings/models endpoint
- Remove outdated In Progress section (local backend + multi-user shipped)
- Header controls table: update Backend description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 20:57:05 -04:00

309 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cortex UI — Help & Reference
<!-- SHARED BASE: cortex/static/HELP.md
This file is served to all users regardless of persona.
Persona-specific additions live in home/{username}/persona/{name}/HELP.md
and are appended automatically by help.html when present.
-->
*Last updated: 2026-04-27*
---
## Header Controls
| Button | What it does |
|---|---|
| **Sessions** | Open the sessions panel — list, resume, or start sessions |
| **Files** | Open the identity file editor (SOUL, MEMORY, etc.) |
| **⚙ N** | Open the Settings panel (N = current context tier) |
| **?** | Open this help panel |
The **⚙ Settings** panel contains all configuration options:
| Section | Controls |
|---|---|
| **Context Tier** | T1 T4 context depth |
| **Memory Layers** | Toggle Long / Mid / Short memory on/off |
| **Distill Memory** | Manually trigger short / mid / long / all distillation |
| **Backend** | Active LLM backend — click to cycle: claude → gemini → local → auto |
| **Display** | Aa/A+/A font size cycle · ☾/☀ theme toggle |
All header settings (theme, font size, tier, memory layers) persist in `localStorage` across page refreshes.
---
## Chat
- **Send:** `Ctrl+Enter` by default. Click `⌃↵` in the input controls to toggle to plain `Enter` mode.
- **Stop:** Click **Stop** to cancel an in-progress response at any time.
- **Edit a message:** Hover over any message → click **edit**. `Ctrl+Enter` saves, `Esc` cancels.
- **Delete a message:** Hover over any message → click **del**. Removes from session history.
- **Copy a response:** Hover over any assistant message → click **copy**.
- **New line while typing:** `Shift+Enter` (in `Ctrl+Enter` mode) or `Shift+Enter` / Enter (in Enter mode).
Each assistant response shows a small **model tag** in the bottom-right corner identifying which model and host responded.
---
## Agent Mode
Click the **Agent** button in the input row to enable Agent mode. The button highlights and Send changes to **Run**.
In Agent mode, messages are routed through the **orchestrator** instead of directly to the chat model:
1. The **orchestrator model** runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
2. It produces an enriched summary of what it found
3. The **responder model** receives that context and writes the final user-facing reply
4. A `⚡ N tool calls: …` note appears below the response listing what was used
Which model acts as orchestrator is set in **Settings → Models → Role Assignments → Orchestrator**. By default this is Gemini API; a capable local model can be assigned instead.
Agent mode is best for tasks that require research, multi-step reasoning, or tool use (e.g. "search for X", "add a task", "what's on my list?"). Regular chat is faster for conversational turns.
Agent mode sessions persist to history exactly like regular chat.
---
## Sessions
Sessions are named conversation threads that persist across page refreshes.
- Click **Sessions****+ New** to start a fresh session.
- Click any listed session to resume it — full history loads instantly.
- Sessions from Nextcloud Talk appear as `nct_*` prefixed IDs.
- A blue **●** badge appears on the Sessions button when Talk activity arrives in a session you're not currently viewing.
---
## Notes
Notes are injected into a session without triggering an LLM response.
- Click **Note** to toggle note mode. The input border changes colour.
- **Private note** (amber border) — visible only in the UI, never sent to the LLM.
- **Context note** (teal border) — persisted to session history so the LLM sees it on the next turn. Useful for nudging context without a full message.
- Click the `private / public` label to switch between note types.
---
## Backends
Three backends are available:
| Backend | What it is |
|---|---|
| **Claude** | Anthropic Claude via the Claude CLI (OAuth — no API key needed) |
| **Gemini** | Google Gemini via the Gemini CLI |
| **Local** | Any OpenAI-compatible endpoint (Open WebUI, Ollama, OpenRouter, etc.) |
The **⚙ Backend** toggle cycles: **auto → claude → gemini → local → auto**
- **auto** uses the model assigned to the `chat` role in your Model Registry (recommended)
- Selecting a specific backend forces that backend for all messages, regardless of role assignments
- The active model label appears below the toggle button when a specific backend is active
If the active backend fails, a fallback is tried automatically. A **⚡** badge appears on the response when this happens.
Each response shows a **model tag** (bottom-right of message) with the model label and host, so you always know what responded.
---
## Model Registry ( Settings → Models )
The Model Registry is where you configure which AI models are available and which handles each task type.
**Navigate to:** Settings (top-right of any page) → **Models**
### Cloud Providers
**Anthropic** — Claude is accessed via the Claude Code CLI. No API key setup needed in the registry — just make sure you're authenticated (`claude auth login` in a terminal).
**Google** — Gemini models use the Gemini API with an explicit API key:
1. Settings → Models → Cloud Providers → Google → **Add account**
2. Enter a label (e.g. "Work", "Personal") and your Gemini API key
3. Get a free key at [aistudio.google.com](https://aistudio.google.com/apikey) — free tier is sufficient for most use
4. Add one account per Google account you want to use
### Local Hosts
For Open WebUI, Ollama, LM Studio, OpenRouter, or any OpenAI-compatible server:
1. Settings → Models → Local Hosts → expand **Add host**
2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and an optional API key
3. Choose **Type**: Open WebUI / Ollama, or OpenAI-compatible (OpenRouter, etc.)
4. Save, then use **Fetch models** on the host card to verify connectivity
### Adding a Model
Settings → Models → Add Model. Use the tabs to select provider:
| Tab | How it works |
|---|---|
| **Local** | Select a host → enter model name (or use "Fetch from host" to pick from a live list) |
| **Google** | Pick a Gemini model from the catalog → select which Google account to use |
| **Anthropic** | Pick a Claude model from the catalog → uses your CLI OAuth session |
Enter a label (auto-filled from the catalog), context window size, and optional tags. Click **Add Model**.
### Role Assignments
Once models are added, assign them to task types at the bottom of Settings → Models:
| Role | Used for |
|---|---|
| **Chat** | Regular conversation — the main chat model |
| **Orchestrator** | Agent mode tool loop |
| **Distill** | Memory distillation (short/mid/long) |
| **Coder** | Code-focused tasks |
| **Research** | Research and long-context tasks |
Each role has **Primary**, **Backup 1**, and **Backup 2** slots. If Primary fails or is unreachable, Backup 1 is tried, then Backup 2. Changes save immediately on select.
Leave all slots empty to use the server-default model (configured in `.env`).
---
## Nextcloud Talk Bot
Inara is registered as a bot in Nextcloud Talk.
- Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to.
- The webhook returns `200 OK` immediately; the reply happens asynchronously.
- Real-time updates stream to the web UI via SSE — you see Talk messages and responses appear live.
- To enable the bot in a conversation: open Talk conversation settings → Bots → enable the bot.
---
## Google Chat Bot
Inara is available as a bot in Google Chat (One Sky IT Workspace).
- Send Inara a direct message in Google Chat to start a conversation.
- Each DM thread is its own session (`gc_spaces/*` prefix) — history persists across messages.
- Responses are synchronous — Google Chat displays the reply directly in the thread.
- To add Inara to a space: open the space, add a person/app, search for **Inara**.
- Sessions from Google Chat appear as `gc_*` prefixed IDs in the Sessions panel.
---
## Files (Identity Editor)
The **Files** button opens an editor for your persona's identity and memory files:
| File | Purpose |
|---|---|
| `SOUL.md` | Core personality, values, and voice |
| `IDENTITY.md` | Role, capabilities, and context |
| `USER.md` | Your profile, preferences, and history |
| `PROTOCOLS.md` | Behavioural rules and communication protocols |
| `CONTEXT_TIERS.md` | Defines what gets loaded at each context tier |
| `MEMORY_LONG.md` | Permanent curated long-term memory |
| `MEMORY_MID.md` | Rolling mid-term digest (LLM-distilled) |
| `MEMORY_SHORT.md` | Recent session rollup (auto-aggregated) |
| `TASKS.json` | Personal task list (managed via Agent mode) |
| `HELP.md` | This file |
Toggle **preview** / **edit** to switch between rendered markdown and raw text. **Ctrl+S** saves, **Esc** closes.
---
## Context & Memory ( ⚙ panel )
### Context Tiers
Controls how much context is prepended to each LLM call:
| Tier | Loads | ~Tokens |
|---|---|---|
| **T1** | SOUL + IDENTITY + USER summary | ~1,500 |
| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
| **T3** | + last 2 raw session logs | ~15,000 |
| **T4** | + last 7 raw session logs | ~50,000 |
Default is T2. Use T1 for small/local models. Use T3T4 for complex multi-session tasks.
### Memory Layers
Three independently toggleable memory files, loaded **Long → Mid → Short**:
| Layer | File | Contents |
|---|---|---|
| **Long** | `MEMORY_LONG.md` | Permanent facts — origin, key decisions, profile highlights |
| **Mid** | `MEMORY_MID.md` | Rolling digest of recent weeks — LLM-distilled from Short |
| **Short** | `MEMORY_SHORT.md` | Recent session rollup — auto-aggregated from session logs |
Toggle any layer off to save tokens for a focused conversation.
### Memory Distillation
Distillation builds up the memory layers from raw session logs. Runs automatically on a schedule; trigger manually via the ⚙ panel:
| Button | What it does |
|---|---|
| **short** | Rolls recent session log files → `MEMORY_SHORT.md` (fast, no LLM) |
| **mid** | LLM summarizes `MEMORY_SHORT.md``MEMORY_MID.md` |
| **long** | LLM integrates `MEMORY_MID.md``MEMORY_LONG.md` |
| **all** | Runs short → mid → long in sequence |
**Recommended workflow:** run **short** after any productive session; **mid** weekly; **long** monthly.
---
## Keyboard Shortcuts
| Keys | Action |
|---|---|
| `Ctrl+Enter` | Send message (default mode) |
| `Enter` | Send (when in Enter mode) |
| `Shift+Enter` | New line in message input |
| `Ctrl+Enter` | Save inline message edit |
| `Esc` | Cancel inline edit / close any open modal |
| `Ctrl+S` | Save file (Files modal) |
---
## API Reference
For direct access or scripting:
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/chat` | Send a message — returns SSE stream |
| `GET` | `/backend` | Get current primary/fallback backends |
| `POST` | `/backend` | Set primary backend (`{"primary": "claude"}`) |
| `GET` | `/sessions` | List all sessions |
| `GET` | `/history/{id}` | Get session message history |
| `PUT` | `/history/{id}` | Replace full session history |
| `GET` | `/events` | SSE stream for real-time Talk activity |
| `POST` | `/note` | Inject a context note into a session |
| `GET` | `/files` | List identity files |
| `GET` | `/files/{name}` | Read a file |
| `PUT` | `/files/{name}` | Write a file |
| `POST` | `/distill/short` | Aggregate session logs → MEMORY_SHORT |
| `POST` | `/distill/mid` | Summarize short → MEMORY_MID (LLM) |
| `POST` | `/distill/long` | Integrate mid → MEMORY_LONG (LLM) |
| `POST` | `/distill/all` | Run all three distillation steps |
| `GET` | `/distill/status` | Scheduler status and next run times |
| `POST` | `/orchestrate` | Submit an agent task — returns `{"job_id": "..."}` |
| `GET` | `/orchestrate/{job_id}` | Poll job status and result |
| `GET` | `/settings/models` | Model registry UI |
| `POST` | `/api/models/role` | Set a role assignment (JSON body) |
| `GET` | `/health` | Health check — returns `{"status": "ok"}` |
Chat request body (`POST /chat`):
```json
{
"message": "string",
"session_id": "string | null",
"tier": 2,
"model": "claude | gemini | local | null",
"include_long": true,
"include_mid": true,
"include_short": true
}
```
---
*Cortex is a self-hosted personal AI platform. Named after the 'verse-wide communications network in Firefly.*