Files

Scott Idem a4daebdc9b feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul

Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)

Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)

Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results

Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override

Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object

UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local

Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 20:53:06 -04:00

11 KiB

Raw Blame History

Cortex / Inara — Agent Task List

Read this file before starting any work on this project. Status: Active development — ongoing.

🔴 High Priority

[Local] Tool-capable local orchestrator

Design and implement local_orchestrator_engine.py — a ReAct tool loop driven by a local model via Open WebUI's OpenAI-compatible API, as an alternative to the Gemini API orchestrator for private/offline tasks.

Convert existing Cortex tool definitions (cortex/tools/) from Gemini FunctionDeclaration format to OpenAI tools format (minor schema diff)
Implement tool loop: send tools → parse tool_calls response → execute → append result → loop until finish_reason: stop
Wire into routers/orchestrator.py — new mode param: "local" vs "gemini"
UI: Agent mode button routes to local orchestrator when local backend active
Recommended models (scott_gaming, 8 GB VRAM): Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
Reference: docs/OPEN_WEBUI_API.md for full tool call request/response format

🟡 Medium Priority

[Intelligence] Knowledge consolidation — Phase 1

See ARCH__Intelligence_Layer.md for full design.

Tool: ae_journal_search — search before creating to avoid duplicates
Tool: ae_journal_entry_create — write a new entry with source metadata
Import script: walk a markdown directory, chunk by H2 section, create entries
Target: markdown files from ~/DgrZone_Nextcloud/ and ~/OSIT_Nextcloud/
Tag strategy: source path, date, topic tags from frontmatter or filename

[Distill] Review first auto_distill_long output — 2026-04-01

Ran April 1 at 04:00 as scheduled
Manually review inara/MEMORY_LONG.md — confirm quality before fully trusting
Adjust distill prompts in cortex/memory_distiller.py if needed

[Distill] Distill quality review

Short/mid/long distill prompts live in cortex/memory_distiller.py
After first few automatic runs, review quality and tune

[Local] Unsloth Gemma 4 variants

Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with 500: unable to load model on Ollama v0.20.0
Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
Expected speedup: ~10–20% smaller context footprint vs baseline, same quality
agent-support-gemma-small → Unsloth E4B Q4_K_M; agent-support-gemma-medium → Unsloth 26B A4B Q4_K_M

🟢 Lower Priority / Future

[Intelligence] Dev agent pipeline

See ARCH__Intelligence_Layer.md. Full design not yet started.

Specialist agent: frontend (SvelteKit) code changes
Specialist agent: backend (FastAPI) code changes
Supervisor agent: diff review, syntax check, test runner
Gitea webhook integration: trigger on push/PR, report back
Human approval gate before commit

[Intelligence] Supervisor agent

Runs py_compile, svelte-check, unit tests after specialist agent work
Reports pass/fail back to orchestrator
Only commits on explicit approval

[Channel] Gitea webhooks

Receive push/PR/issue events → route to appropriate agent
cortex/routers/ already has pattern; add gitea.py
Gitea Actions (CI) for "run tests on push" — simpler than custom runner

[Local] RAG via Open WebUI

Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections → reference in chat). Could feed Nextcloud docs or session logs into a local knowledge base accessible to local models. Endpoints documented in docs/OPEN_WEBUI_API.md.

/api/v1/files/ upload + /api/v1/retrieval/process/web for URLs
Reference in chat via "files": [{"type": "collection", "id": "..."}]

[Backend] Intelligent model routing

Currently hardcoded: Claude default, Gemini fallback, local third
Design direction (now informed by real local model perf):
- Private/offline tasks → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
- Complex tool tasks / long context → Gemini (1M token context, strong function calling)
- Final user-facing responses → Claude (quality prose, persona fidelity)
Future: auto-route by task type rather than requiring user to toggle backend manually

✅ Completed

[Local] Per-user multi-model local LLM settings — 2026-04-01

home/{username}/local_llm.json — hosts[] + models[] + active_model_id structure
cortex/user_settings.py — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
cortex/routers/local_llm.py + cortex/static/local_llm.html — dedicated /settings/local page
"Fetch models from host" button — proxied via /api/local-llm/fetch-models, populates dropdown
Active model shown in UI near backend toggle button (amber hint text)
Migrates old flat .env-style config automatically on first use

[UI] Copy button for user (sent) messages — 2026-04-01

Added matching copy-on-hover button to user messages (same pattern as assistant messages)
div.dataset.raw set on send; makeCopyBtn(div) appended inline

[Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01

OpenAI-compatible API via httpx — no CLI wrapper needed
Configured via LOCAL_API_URL / LOCAL_API_KEY / LOCAL_MODEL in .env
Backend toggle cycles claude → gemini → local (amber color in UI)
/auth/status includes local reachability check (GET /api/models)
Tested end-to-end: test-agent-simple (Qwen3-8B) on scott-lt-i7-rtx:3000, full persona context flowing correctly

[Testing] Gitea SSH port 2222 — 2026-03-29

pfSense WAN → 192.168.32.7:2222 port forward confirmed working
ssh -p 2222 git@git.dgrzone.com reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
Clone/push via SSH: git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git

[Multi-user] Brian onboarding — 2026-03-29

Invite sent to memedrift@gmail.com
Brian completed onboarding, created wintermute persona
Google OAuth registered (google-add brian memedrift@gmail.com)

[Tools] Reminders tools — 2026-03-29

reminders_add, reminders_list, reminders_clear added to orchestrator tool suite
Tools live in cortex/tools/reminders.py
All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
persona_template.py updated so new personas get the protocol automatically

[Auth] Token expiry — no restart needed — 2026-03-27

llm_client._fresh_claude_token() reads live from ~/.claude/.credentials.json on every call
systemd service is a user unit (no sudo) — systemctl --user restart cortex is sufficient
No manual token sync required after claude auth login

[Multi-user] Per-user channel config — 2026-03-27

Google Chat and NC Talk secrets/config moved from .env to home/{username}/channels.json
New endpoints: POST /channels/google-chat/{username} and POST /webhook/nextcloud/{username}
No channel access by default — each user configures their own channels.json
Setup guides: docs/GOOGLE_CHAT_BOT.md and docs/NEXTCLOUD_TALK_BOT.md

GET /auth/google → Google consent → GET /auth/google/callback flow
Users pre-registered via manage_passwords.py google-add <user> <email>
Google sign-in button on /login; auth.json stores google_sub + google_email
Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)

[Settings] Per-user Gemini API key — 2026-03-27

Stored in home/{username}/auth.json as gemini_api_key
Orchestrator uses user key if set, falls back to server-level GEMINI_API_KEY
Manageable via /settings UI (add, remove, masked hint)

localStorage keyed to cx_sid_{user}_{persona} with 30-min inactivity TTL
Auto-restored silently on page load; cleared on "New session" or session delete

[UI] Persona picker page — 2026-03-26

GET /{username} shows a card grid of available personas instead of 404
Each card links directly to /{username}/{persona}

[UI] Lucide icons — 2026-03-25

Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
Loaded via UMD CDN; icon_html() + render_icons() helpers in app.js

[UI] Persona-specific favicon — 2026-03-25

Emoji SVG favicon generated from persona config at load time

[Multi-user] Holly onboarding — 2026-03-20

Holly's invite sent; onboarding completed via /setup/{token}
home/holly/persona/tina/ created from template
Google OAuth registered (holly.danner@gmail.com)

[Channel] Nextcloud Talk integration ✅ — 2026-03-20, updated 2026-03-27

HMAC verification: incoming uses random + raw_body; outgoing reply uses random + message_text
Per-user routing added 2026-03-27 (endpoint: /webhook/nextcloud/{username})
Docs: docs/NEXTCLOUD_TALK_BOT.md

[Channel] Google Chat integration ✅ — 2026-03-20, updated 2026-03-27

JWT verification via authorizationEventObject.systemIdToken
Workspace Add-on format: hostAppDataAction.chatDataAction.createMessageAction
Per-user routing added 2026-03-27 (endpoint: /channels/google-chat/{username})
Docs: docs/GOOGLE_CHAT_BOT.md

[Intelligence] Orchestrator service — Phase 1 — 2026-03-18

Gemini API (google-genai SDK) tool loop → Claude final response
POST /orchestrate (async job), GET /orchestrate/{job_id} (poll)
Tools: web search, AE API, file read, task list, scratch, reminders, cron
Default model: gemini-2.5-flash

[Auth] Session auth + persona onboarding — 2026-03-20

bcrypt passwords in home/{username}/auth.json
JWT session cookies (HS256, 30-day expiry)
Invite tokens (72h, one-time-use) — manage_passwords.py invite <user> [email]
Self-service onboarding: /setup/{token} → /setup/persona
SMTP invite email via noreply@oneskyit.com

[UI] Mobile-friendly header — 2026-03

Backend toggle, font size, theme buttons moved into ⚙ settings panel
Header reduced to core buttons

[UI] Help & Reference — 2026-03-27

Shared base at cortex/static/HELP.md (served to all users)
Persona-specific additions appended from home/{username}/persona/{name}/HELP.md if present
Collapsible H2 sections via <details> elements

[Backend] Gemini CLI backend — 2026-03

gemini -p subprocess, streaming output; auth check at /auth/status

[Backend] Memory distiller — 2026-03

APScheduler: distill_short (daily 03:00), distill_mid (weekly Sun 03:30), distill_long (monthly 1st 04:00)
Writes to MEMORY_SHORT.md, MEMORY_MID.md, MEMORY_LONG.md per persona

[Backend] Session logging + file browser — 2026-03

Sessions saved to home/{user}/persona/{name}/sessions/
Files panel in UI browses persona directory

[Backend] Dispatcher core — 2026-03-04

FastAPI service with streaming SSE response
Claude CLI and Gemini CLI subprocess backends
Session context management (rolling window, MAX_HISTORY_MESSAGES)

11 KiB Raw Blame History Unescape Escape