Cortex-Inara/cortex/static/HELP.md

# Cortex UI — Help & Reference

<!-- SHARED BASE: cortex/static/HELP.md
     This file is served to all users regardless of persona.
     Persona-specific additions live in home/{username}/persona/{name}/HELP.md
     and are appended automatically by help.html when present.
-->

*Last updated: 2026-04-27*

---

## Header Controls

| Button | What it does |
|---|---|
| **Sessions** | Open the sessions panel — list, resume, or start sessions |
| **Files** | Open the identity file editor (SOUL, MEMORY, etc.) |
| **⚙ N** | Open the Settings panel (N = current context tier) |
| **?** | Open this help panel |

The **⚙ Settings** panel contains all configuration options:

| Section | Controls |
|---|---|
| **Context Tier** | T1 – T4 context depth |
| **Memory Layers** | Toggle Long / Mid / Short memory on/off |
| **Distill Memory** | Manually trigger short / mid / long / all distillation |
| **Backend** | Active LLM backend — click to cycle: claude → gemini → local → auto |
| **Display** | Aa/A+/A− font size cycle · ☾/☀ theme toggle |

All header settings (theme, font size, tier, memory layers) persist in `localStorage` across page refreshes.

---

## Chat

- **Send:** `Ctrl+Enter` by default. Click `⌃↵` in the input controls to toggle to plain `Enter` mode.
- **Stop:** Click **Stop** to cancel an in-progress response at any time.
- **Edit a message:** Hover over any message → click **edit**. `Ctrl+Enter` saves, `Esc` cancels.
- **Delete a message:** Hover over any message → click **del**. Removes from session history.
- **Copy a response:** Hover over any assistant message → click **copy**.
- **New line while typing:** `Shift+Enter` (in `Ctrl+Enter` mode) or `Shift+Enter` / Enter (in Enter mode).

Each assistant response shows a small **model tag** in the bottom-right corner identifying which model and host responded.

---

## Agent Mode

Click the **Agent** button in the input row to enable Agent mode. The button highlights and Send changes to **Run**.

In Agent mode, messages are routed through the **orchestrator** instead of directly to the chat model:

1. The **orchestrator model** runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
2. It produces an enriched summary of what it found
3. The **responder model** receives that context and writes the final user-facing reply
4. A `⚡ N tool calls: …` note appears below the response listing what was used

Which model acts as orchestrator is set in **Settings → Models → Role Assignments → Orchestrator**. By default this is Gemini API; a capable local model can be assigned instead.

Agent mode is best for tasks that require research, multi-step reasoning, or tool use (e.g. "search for X", "add a task", "what's on my list?"). Regular chat is faster for conversational turns.

Agent mode sessions persist to history exactly like regular chat.

---

## Sessions

Sessions are named conversation threads that persist across page refreshes.

- Click **Sessions** → **+ New** to start a fresh session.
- Click any listed session to resume it — full history loads instantly.
- Sessions from Nextcloud Talk appear as `nct_*` prefixed IDs.
- A blue **●** badge appears on the Sessions button when Talk activity arrives in a session you're not currently viewing.

---

## Notes

Notes are injected into a session without triggering an LLM response.

- Click **Note** to toggle note mode. The input border changes colour.
- **Private note** (amber border) — visible only in the UI, never sent to the LLM.
- **Context note** (teal border) — persisted to session history so the LLM sees it on the next turn. Useful for nudging context without a full message.
- Click the `private / public` label to switch between note types.

---

## Backends

Three backends are available:

| Backend | What it is |
|---|---|
| **Claude** | Anthropic Claude via the Claude CLI (OAuth — no API key needed) |
| **Gemini** | Google Gemini via the Gemini CLI |
| **Local** | Any OpenAI-compatible endpoint (Open WebUI, Ollama, OpenRouter, etc.) |

The **⚙ Backend** toggle cycles: **auto → claude → gemini → local → auto**

- **auto** uses the model assigned to the `chat` role in your Model Registry (recommended)
- Selecting a specific backend forces that backend for all messages, regardless of role assignments
- The active model label appears below the toggle button when a specific backend is active

If the active backend fails, a fallback is tried automatically. A **⚡** badge appears on the response when this happens.

Each response shows a **model tag** (bottom-right of message) with the model label and host, so you always know what responded.

---

## Model Registry ( Settings → Models )

The Model Registry is where you configure which AI models are available and which handles each task type.

**Navigate to:** Settings (top-right of any page) → **Models**

### Cloud Providers

**Anthropic** — Claude is accessed via the Claude Code CLI. No API key setup needed in the registry — just make sure you're authenticated (`claude auth login` in a terminal).

**Google** — Gemini models use the Gemini API with an explicit API key:
1. Settings → Models → Cloud Providers → Google → **Add account**
2. Enter a label (e.g. "Work", "Personal") and your Gemini API key
3. Get a free key at [aistudio.google.com](https://aistudio.google.com/apikey) — free tier is sufficient for most use
4. Add one account per Google account you want to use

### Local Hosts

For Open WebUI, Ollama, LM Studio, OpenRouter, or any OpenAI-compatible server:
1. Settings → Models → Local Hosts → expand **Add host**
2. Enter a label, the API URL (e.g. `http://192.168.1.100:3000`), and an optional API key
3. Choose **Type**: Open WebUI / Ollama, or OpenAI-compatible (OpenRouter, etc.)
4. Save, then use **Fetch models** on the host card to verify connectivity

### Adding a Model

Settings → Models → Add Model. Use the tabs to select provider:

| Tab | How it works |
|---|---|
| **Local** | Select a host → enter model name (or use "Fetch from host" to pick from a live list) |
| **Google** | Pick a Gemini model from the catalog → select which Google account to use |
| **Anthropic** | Pick a Claude model from the catalog → uses your CLI OAuth session |

Enter a label (auto-filled from the catalog), context window size, and optional tags. Click **Add Model**.

### Role Assignments

Once models are added, assign them to task types at the bottom of Settings → Models:

| Role | Used for |
|---|---|
| **Chat** | Regular conversation — the main chat model |
| **Orchestrator** | Agent mode tool loop |
| **Distill** | Memory distillation (short/mid/long) |
| **Coder** | Code-focused tasks |
| **Research** | Research and long-context tasks |

Each role has **Primary**, **Backup 1**, and **Backup 2** slots. If Primary fails or is unreachable, Backup 1 is tried, then Backup 2. Changes save immediately on select.

Leave all slots empty to use the server-default model (configured in `.env`).

---

## Nextcloud Talk Bot

Inara is registered as a bot in Nextcloud Talk.

- Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to.
- The webhook returns `200 OK` immediately; the reply happens asynchronously.
- Real-time updates stream to the web UI via SSE — you see Talk messages and responses appear live.
- To enable the bot in a conversation: open Talk conversation settings → Bots → enable the bot.

---

## Google Chat Bot

Inara is available as a bot in Google Chat (One Sky IT Workspace).

- Send Inara a direct message in Google Chat to start a conversation.
- Each DM thread is its own session (`gc_spaces/*` prefix) — history persists across messages.
- Responses are synchronous — Google Chat displays the reply directly in the thread.
- To add Inara to a space: open the space, add a person/app, search for **Inara**.
- Sessions from Google Chat appear as `gc_*` prefixed IDs in the Sessions panel.

---

## Files (Identity Editor)

The **Files** button opens an editor for your persona's identity and memory files:

| File | Purpose |
|---|---|
| `SOUL.md` | Core personality, values, and voice |
| `IDENTITY.md` | Role, capabilities, and context |
| `USER.md` | Your profile, preferences, and history |
| `PROTOCOLS.md` | Behavioural rules and communication protocols |
| `CONTEXT_TIERS.md` | Defines what gets loaded at each context tier |
| `MEMORY_LONG.md` | Permanent curated long-term memory |
| `MEMORY_MID.md` | Rolling mid-term digest (LLM-distilled) |
| `MEMORY_SHORT.md` | Recent session rollup (auto-aggregated) |
| `TASKS.json` | Personal task list (managed via Agent mode) |
| `HELP.md` | This file |

Toggle **preview** / **edit** to switch between rendered markdown and raw text. **Ctrl+S** saves, **Esc** closes.

---

## Context & Memory ( ⚙ panel )

### Context Tiers

Controls how much context is prepended to each LLM call:

| Tier | Loads | ~Tokens |
|---|---|---|
| **T1** | SOUL + IDENTITY + USER summary | ~1,500 |
| **T2** | + USER full + PROTOCOLS + HELP + memory layers | ~5,000 |
| **T3** | + last 2 raw session logs | ~15,000 |
| **T4** | + last 7 raw session logs | ~50,000 |

Default is T2. Use T1 for small/local models. Use T3–T4 for complex multi-session tasks.

### Memory Layers

Three independently toggleable memory files, loaded **Long → Mid → Short**:

| Layer | File | Contents |
|---|---|---|
| **Long** | `MEMORY_LONG.md` | Permanent facts — origin, key decisions, profile highlights |
| **Mid** | `MEMORY_MID.md` | Rolling digest of recent weeks — LLM-distilled from Short |
| **Short** | `MEMORY_SHORT.md` | Recent session rollup — auto-aggregated from session logs |

Toggle any layer off to save tokens for a focused conversation.

### Memory Distillation

Distillation builds up the memory layers from raw session logs. Runs automatically on a schedule; trigger manually via the ⚙ panel:

| Button | What it does |
|---|---|
| **short** | Rolls recent session log files → `MEMORY_SHORT.md` (fast, no LLM) |
| **mid** | LLM summarizes `MEMORY_SHORT.md` → `MEMORY_MID.md` |
| **long** | LLM integrates `MEMORY_MID.md` → `MEMORY_LONG.md` |
| **all** | Runs short → mid → long in sequence |

**Recommended workflow:** run **short** after any productive session; **mid** weekly; **long** monthly.

---

## Keyboard Shortcuts

| Keys | Action |
|---|---|
| `Ctrl+Enter` | Send message (default mode) |
| `Enter` | Send (when in Enter mode) |
| `Shift+Enter` | New line in message input |
| `Ctrl+Enter` | Save inline message edit |
| `Esc` | Cancel inline edit / close any open modal |
| `Ctrl+S` | Save file (Files modal) |

---

## API Reference

For direct access or scripting:

| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/chat` | Send a message — returns SSE stream |
| `GET` | `/backend` | Get current primary/fallback backends |
| `POST` | `/backend` | Set primary backend (`{"primary": "claude"}`) |
| `GET` | `/sessions` | List all sessions |
| `GET` | `/history/{id}` | Get session message history |
| `PUT` | `/history/{id}` | Replace full session history |
| `GET` | `/events` | SSE stream for real-time Talk activity |
| `POST` | `/note` | Inject a context note into a session |
| `GET` | `/files` | List identity files |
| `GET` | `/files/{name}` | Read a file |
| `PUT` | `/files/{name}` | Write a file |
| `POST` | `/distill/short` | Aggregate session logs → MEMORY_SHORT |
| `POST` | `/distill/mid` | Summarize short → MEMORY_MID (LLM) |
| `POST` | `/distill/long` | Integrate mid → MEMORY_LONG (LLM) |
| `POST` | `/distill/all` | Run all three distillation steps |
| `GET` | `/distill/status` | Scheduler status and next run times |
| `POST` | `/orchestrate` | Submit an agent task — returns `{"job_id": "..."}` |
| `GET` | `/orchestrate/{job_id}` | Poll job status and result |
| `GET` | `/settings/models` | Model registry UI |
| `POST` | `/api/models/role` | Set a role assignment (JSON body) |
| `GET` | `/health` | Health check — returns `{"status": "ok"}` |

Chat request body (`POST /chat`):
```json
{
  "message": "string",
  "session_id": "string | null",
  "tier": 2,
  "model": "claude | gemini | local | null",
  "include_long": true,
  "include_mid": true,
  "include_short": true
}
```

---

*Cortex is a self-hosted personal AI platform. Named after the 'verse-wide communications network in Firefly.*