Files

Scott Idem 6e56024815 fix: settings page and help docs updated for model registry V2

settings.html:
- Remove Gemini API Key section (keys now managed in Model Registry)
- Rename "Local Models" → "Model Registry" with updated description
  covering all providers (Anthropic, Google, local hosts)
- Update button text: "Manage local models" → "Manage models"

settings.py: remove dead gemini_key template variable lookups

HELP.md:
- Fix navigation path: ☰ → Account → Model Registry → Manage models
- Restructure Model Registry section as ordered steps (1: providers/hosts,
  2: add models, 3: assign roles) so dependency order is clear
- Add explicit note that accounts/hosts must exist before adding models

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-27 21:07:05 -04:00

12 KiB

Raw Blame History

Cortex UI — Help & Reference

Last updated: 2026-04-27

Header Controls

Button	What it does
Sessions	Open the sessions panel — list, resume, or start sessions
Files	Open the identity file editor (SOUL, MEMORY, etc.)
⚙ N	Open the Settings panel (N = current context tier)
?	Open this help panel

The ⚙ Settings panel contains all configuration options:

Section	Controls
Context Tier	T1 – T4 context depth
Memory Layers	Toggle Long / Mid / Short memory on/off
Distill Memory	Manually trigger short / mid / long / all distillation
Backend	Active LLM backend — click to cycle: claude → gemini → local → auto
Display	Aa/A+/A− font size cycle · ☾/☀ theme toggle

All header settings (theme, font size, tier, memory layers) persist in localStorage across page refreshes.

Chat

Send: Ctrl+Enter by default. Click ⌃↵ in the input controls to toggle to plain Enter mode.
Stop: Click Stop to cancel an in-progress response at any time.
Edit a message: Hover over any message → click edit. Ctrl+Enter saves, Esc cancels.
Delete a message: Hover over any message → click del. Removes from session history.
Copy a response: Hover over any assistant message → click copy.
New line while typing: Shift+Enter (in Ctrl+Enter mode) or Shift+Enter / Enter (in Enter mode).

Each assistant response shows a small model tag in the bottom-right corner identifying which model and host responded.

Agent Mode

Click the Agent button in the input row to enable Agent mode. The button highlights and Send changes to Run.

In Agent mode, messages are routed through the orchestrator instead of directly to the chat model:

The orchestrator model runs a tool loop — searches the web, reads files, checks tasks, calls APIs as needed
It produces an enriched summary of what it found
The responder model receives that context and writes the final user-facing reply
A ⚡ N tool calls: … note appears below the response listing what was used

Which model acts as orchestrator is set in Settings → Models → Role Assignments → Orchestrator. By default this is Gemini API; a capable local model can be assigned instead.

Agent mode is best for tasks that require research, multi-step reasoning, or tool use (e.g. "search for X", "add a task", "what's on my list?"). Regular chat is faster for conversational turns.

Agent mode sessions persist to history exactly like regular chat.

Sessions

Sessions are named conversation threads that persist across page refreshes.

Click Sessions → + New to start a fresh session.
Click any listed session to resume it — full history loads instantly.
Sessions from Nextcloud Talk appear as nct_* prefixed IDs.
A blue ● badge appears on the Sessions button when Talk activity arrives in a session you're not currently viewing.

Notes

Notes are injected into a session without triggering an LLM response.

Click Note to toggle note mode. The input border changes colour.
Private note (amber border) — visible only in the UI, never sent to the LLM.
Context note (teal border) — persisted to session history so the LLM sees it on the next turn. Useful for nudging context without a full message.
Click the private / public label to switch between note types.

Backends

Three backends are available:

Backend	What it is
Claude	Anthropic Claude via the Claude CLI (OAuth — no API key needed)
Gemini	Google Gemini via the Gemini CLI
Local	Any OpenAI-compatible endpoint (Open WebUI, Ollama, OpenRouter, etc.)

The ⚙ Backend toggle cycles: auto → claude → gemini → local → auto

auto uses the model assigned to the chat role in your Model Registry (recommended)
Selecting a specific backend forces that backend for all messages, regardless of role assignments
The active model label appears below the toggle button when a specific backend is active

If the active backend fails, a fallback is tried automatically. A ⚡ badge appears on the response when this happens.

Each response shows a model tag (bottom-right of message) with the model label and host, so you always know what responded.

Model Registry

Configure which AI models are available and which handles each task type.

Navigate to: ☰ (top-right menu) → Account → scroll to Model Registry → Manage models →

Step 1 — Set up providers and hosts

Do this before adding models — models need a provider account or local host to attach to.

Anthropic (Claude): Nothing to configure. Claude uses your existing CLI OAuth session. If Claude isn't working, run claude auth login in a terminal.

Google (Gemini): Add one entry per API key you want to use:

Scroll to Cloud Providers → Google → click + Add Google account
Enter a label (e.g. "Work", "Personal") and your API key
Get a free key at aistudio.google.com/apikey

Local hosts (Open WebUI, Ollama, OpenRouter, etc.):

Scroll to Local Hosts → click + Add host to expand the form
Enter a label, the API URL (e.g. http://192.168.1.100:3000), and optional API key
Set Type: Open WebUI / Ollama, or OpenAI-compatible (for OpenRouter, LM Studio, etc.)
Click Fetch models on the saved host card to verify connectivity

Step 2 — Add models

Scroll to Add Model. Select the provider tab, fill in the details, click Add Model:

Tab	What you need
Local	Select a host (from Step 1) → enter model name, or use Fetch from host to pick from a live list
Google	Select a Gemini model from the catalog → select a Google account (from Step 1)
Anthropic	Select a Claude model from the catalog → uses your CLI session automatically

The label and context window size auto-fill from the catalog — edit them if you want. Tags are optional.

Step 3 — Assign models to roles

Scroll to Role Assignments at the bottom of the page. Each role has Primary, Backup 1, and Backup 2 slots — Primary is tried first, then backups in order. Changes save automatically.

Role	Used for
Chat	Regular conversation
Orchestrator	Agent mode tool loop
Distill	Memory distillation (short / mid / long)
Coder	Code-focused tasks
Research	Long-context research tasks

Leave all slots empty to use the server default.

Nextcloud Talk Bot

Inara is registered as a bot in Nextcloud Talk.

Messages sent in enabled Talk conversations are received by Cortex, processed, and replied to.
The webhook returns 200 OK immediately; the reply happens asynchronously.
Real-time updates stream to the web UI via SSE — you see Talk messages and responses appear live.
To enable the bot in a conversation: open Talk conversation settings → Bots → enable the bot.

Google Chat Bot

Inara is available as a bot in Google Chat (One Sky IT Workspace).

Send Inara a direct message in Google Chat to start a conversation.
Each DM thread is its own session (gc_spaces/* prefix) — history persists across messages.
Responses are synchronous — Google Chat displays the reply directly in the thread.
To add Inara to a space: open the space, add a person/app, search for Inara.
Sessions from Google Chat appear as gc_* prefixed IDs in the Sessions panel.

Files (Identity Editor)

The Files button opens an editor for your persona's identity and memory files:

File	Purpose
`SOUL.md`	Core personality, values, and voice
`IDENTITY.md`	Role, capabilities, and context
`USER.md`	Your profile, preferences, and history
`PROTOCOLS.md`	Behavioural rules and communication protocols
`CONTEXT_TIERS.md`	Defines what gets loaded at each context tier
`MEMORY_LONG.md`	Permanent curated long-term memory
`MEMORY_MID.md`	Rolling mid-term digest (LLM-distilled)
`MEMORY_SHORT.md`	Recent session rollup (auto-aggregated)
`TASKS.json`	Personal task list (managed via Agent mode)
`HELP.md`	This file

Toggle preview / edit to switch between rendered markdown and raw text. Ctrl+S saves, Esc closes.

Context & Memory ( ⚙ panel )

Context Tiers

Controls how much context is prepended to each LLM call:

Tier	Loads	~Tokens
T1	SOUL + IDENTITY + USER summary	~1,500
T2	+ USER full + PROTOCOLS + HELP + memory layers	~5,000
T3	+ last 2 raw session logs	~15,000
T4	+ last 7 raw session logs	~50,000

Default is T2. Use T1 for small/local models. Use T3–T4 for complex multi-session tasks.

Memory Layers

Three independently toggleable memory files, loaded Long → Mid → Short:

Layer	File	Contents
Long	`MEMORY_LONG.md`	Permanent facts — origin, key decisions, profile highlights
Mid	`MEMORY_MID.md`	Rolling digest of recent weeks — LLM-distilled from Short
Short	`MEMORY_SHORT.md`	Recent session rollup — auto-aggregated from session logs

Toggle any layer off to save tokens for a focused conversation.

Memory Distillation

Distillation builds up the memory layers from raw session logs. Runs automatically on a schedule; trigger manually via the ⚙ panel:

Button	What it does
short	Rolls recent session log files → `MEMORY_SHORT.md` (fast, no LLM)
mid	LLM summarizes `MEMORY_SHORT.md` → `MEMORY_MID.md`
long	LLM integrates `MEMORY_MID.md` → `MEMORY_LONG.md`
all	Runs short → mid → long in sequence

Recommended workflow: run short after any productive session; mid weekly; long monthly.

Keyboard Shortcuts

Keys	Action
`Ctrl+Enter`	Send message (default mode)
`Enter`	Send (when in Enter mode)
`Shift+Enter`	New line in message input
`Ctrl+Enter`	Save inline message edit
`Esc`	Cancel inline edit / close any open modal
`Ctrl+S`	Save file (Files modal)

API Reference

For direct access or scripting:

Method	Endpoint	Description
`POST`	`/chat`	Send a message — returns SSE stream
`GET`	`/backend`	Get current primary/fallback backends
`POST`	`/backend`	Set primary backend (`{"primary": "claude"}`)
`GET`	`/sessions`	List all sessions
`GET`	`/history/{id}`	Get session message history
`PUT`	`/history/{id}`	Replace full session history
`GET`	`/events`	SSE stream for real-time Talk activity
`POST`	`/note`	Inject a context note into a session
`GET`	`/files`	List identity files
`GET`	`/files/{name}`	Read a file
`PUT`	`/files/{name}`	Write a file
`POST`	`/distill/short`	Aggregate session logs → MEMORY_SHORT
`POST`	`/distill/mid`	Summarize short → MEMORY_MID (LLM)
`POST`	`/distill/long`	Integrate mid → MEMORY_LONG (LLM)
`POST`	`/distill/all`	Run all three distillation steps
`GET`	`/distill/status`	Scheduler status and next run times
`POST`	`/orchestrate`	Submit an agent task — returns `{"job_id": "..."}`
`GET`	`/orchestrate/{job_id}`	Poll job status and result
`GET`	`/settings/models`	Model registry UI
`POST`	`/api/models/role`	Set a role assignment (JSON body)
`GET`	`/health`	Health check — returns `{"status": "ok"}`

Chat request body (POST /chat):

{
  "message": "string",
  "session_id": "string | null",
  "tier": 2,
  "model": "claude | gemini | local | null",
  "include_long": true,
  "include_mid": true,
  "include_short": true
}

Cortex is a self-hosted personal AI platform. Named after the 'verse-wide communications network in Firefly.

12 KiB Raw Blame History Unescape Escape