Files

Scott Idem a4daebdc9b feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul

Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)

Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)

Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results

Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override

Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object

UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local

Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 20:53:06 -04:00

4.7 KiB

Raw Blame History

Architecture: Persona System & Memory

How Inara (and other personas) know who they are and what they remember. Last updated: 2026-04-03

Filesystem Layout

Each persona lives in home/{username}/persona/{name}/:

home/scott/persona/inara/
  IDENTITY.md       Who Inara is — role, name, origin
  SOUL.md           Values, personality, voice, what she cares about
  PROTOCOLS.md      Behavioral rules — how she responds, what she avoids
  CONTEXT_TIERS.md  Documents which files load at each tier
  USER.md           Scott's profile — loaded into context so she knows who she's talking to
  HELP.md           Persona-specific help content (appended to shared HELP.md in UI)
  MEMORY_SHORT.md   Recent session digest (auto-distilled daily)
  MEMORY_MID.md     Mid-term summary (auto-distilled weekly)
  MEMORY_LONG.md    Long-term memory (auto-distilled monthly)
  REMINDERS.md      Pending reminders (auto-surfaced at tier 2+)
  SCRATCH.md        Ephemeral scratchpad (read/write via tools)
  TASKS.json        Personal task list (managed via tools)
  CRONS.json        Scheduled jobs (managed via tools)
  sessions/         Session turn logs — YYYY-MM-DD.md, one file per day

ContextVars: persona.py sets _user and _persona ContextVars per request. Everything downstream calls persona_path() to resolve the right directory — no globals, no thread-local state.

Context Tiers

Each chat request specifies a tier (default: 2). Higher tiers load more context — slower but richer.

Tier	Loaded Files	Use case
1	IDENTITY.md	Minimal — lightweight tasks
2	+ SOUL.md, PROTOCOLS.md, USER.md, MEMORY_SHORT.md, MEMORY_MID.md, REMINDERS.md	Standard chat
3	+ MEMORY_LONG.md, CONTEXT_TIERS.md	Deep sessions, long tasks
4	+ SCRATCH.md, TASKS.json	Full state — agent mode

context_loader.py assembles the system prompt from these files in order. The resulting prompt is passed to whichever LLM backend handles the request.

Memory Distillation

Three-tier rolling memory system, run by APScheduler:

sessions/YYYY-MM-DD.md  ← raw session logs (written by session_logger.py)
        ↓ daily 03:00
MEMORY_SHORT.md         ← recent session digest (no LLM — pure aggregation)
        ↓ weekly Sun 03:30
MEMORY_MID.md           ← concise summary (LLM)
        ↓ monthly 1st 04:00
MEMORY_LONG.md          ← integrated long-term memory (LLM)

Short distill — reads the most recent session files that fit within the token budget, writes them in chronological order. No LLM involved — fast and cheap.

Mid distill — LLM summarizes MEMORY_SHORT into a concise digest. Prompt asks for recurring themes, decisions, ongoing projects, Scott's current state and priorities. Written in first person as Inara.

Long distill — LLM integrates MEMORY_MID into MEMORY_LONG. Rules: preserve historical facts, update stale info, absorb new themes, remove irrelevant entries.

Distill notifications — after mid and long runs, notification.py sends a message to the user's configured NC Talk notification room (if notification_room is set in channels.json).

Controls in .env:

AUTO_DISTILL=true
AUTO_DISTILL_SHORT=true
AUTO_DISTILL_MID=true
AUTO_DISTILL_LONG=true          # off by default — first run warrants manual review
DISTILL_BACKEND_MID=local       # use local model to save API credits
DISTILL_BACKEND_LONG=           # empty = primary backend (claude recommended)
MEMORY_BUDGET_SHORT=3000        # token budgets (soft caps)
MEMORY_BUDGET_MID=2000
MEMORY_BUDGET_LONG=2000

Manual distill via API:

POST /distill/short
POST /distill/mid
POST /distill/long
GET  /distill/status

Adding a New Persona

persona_template.py bootstraps a new persona directory from string templates. The onboarding flow (/setup/persona) calls this when a new user creates their first persona.

To add one manually:

Create home/{username}/persona/{name}/
Copy and edit the files from an existing persona (e.g. home/scott/persona/inara/)
At minimum: IDENTITY.md, SOUL.md, PROTOCOLS.md, USER.md
The distiller will create the MEMORY_*.md files on first run

Session Search

Past sessions are searchable via GET /sessions/search?q=...&user=...&persona=....

Available in the UI via the search box at the bottom of the Files panel (open with the Files button). Results are grouped by date with highlighted excerpts.

Active Personas

User	Persona	Description
scott	inara	Scott's primary assistant
scott	developer	Dev-focused persona
holly	tina	Holly's primary assistant
brian	wintermute	Brian's primary assistant

4.7 KiB Raw Blame History