Files
Cortex-Inara/documentation/TODO__Agents.md
Scott Idem a4daebdc9b feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul
Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)

Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)

Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results

Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override

Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object

UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local

Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:53:06 -04:00

220 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cortex / Inara — Agent Task List
> Read this file before starting any work on this project.
> **Status:** Active development — ongoing.
---
## 🔴 High Priority
### [Local] Tool-capable local orchestrator
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
Gemini API orchestrator for private/offline tasks.
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
append result → loop until `finish_reason: stop`
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
---
## 🟡 Medium Priority
### [Intelligence] Knowledge consolidation — Phase 1
See `ARCH__Intelligence_Layer.md` for full design.
- [x] Tool: `ae_journal_search` — search before creating to avoid duplicates
- [x] Tool: `ae_journal_entry_create` — write a new entry with source metadata
- [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
- [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
- [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
### [Distill] Review first auto_distill_long output — 2026-04-01
- Ran April 1 at 04:00 as scheduled
- Manually review `inara/MEMORY_LONG.md` — confirm quality before fully trusting
- Adjust distill prompts in `cortex/memory_distiller.py` if needed
### [Distill] Distill quality review
- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
- After first few automatic runs, review quality and tune
### [Local] Unsloth Gemma 4 variants
- Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0
- Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
- Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
- Expected speedup: ~1020% smaller context footprint vs baseline, same quality
- `agent-support-gemma-small` → Unsloth E4B Q4_K_M; `agent-support-gemma-medium` → Unsloth 26B A4B Q4_K_M
---
## 🟢 Lower Priority / Future
### [Intelligence] Dev agent pipeline
See `ARCH__Intelligence_Layer.md`. Full design not yet started.
- [ ] Specialist agent: frontend (SvelteKit) code changes
- [ ] Specialist agent: backend (FastAPI) code changes
- [ ] Supervisor agent: diff review, syntax check, test runner
- [ ] Gitea webhook integration: trigger on push/PR, report back
- [ ] Human approval gate before commit
### [Intelligence] Supervisor agent
- Runs `py_compile`, `svelte-check`, unit tests after specialist agent work
- Reports pass/fail back to orchestrator
- Only commits on explicit approval
### [Channel] Gitea webhooks
- Receive push/PR/issue events → route to appropriate agent
- `cortex/routers/` already has pattern; add `gitea.py`
- Gitea Actions (CI) for "run tests on push" — simpler than custom runner
### [Local] RAG via Open WebUI
Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections →
reference in chat). Could feed Nextcloud docs or session logs into a local knowledge
base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`.
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback, local third
- Design direction (now informed by real local model perf):
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
- Future: auto-route by task type rather than requiring user to toggle backend manually
---
## ✅ Completed
### [Local] Per-user multi-model local LLM settings — 2026-04-01
- `home/{username}/local_llm.json``hosts[]` + `models[]` + `active_model_id` structure
- `cortex/user_settings.py` — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
- `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` — dedicated `/settings/local` page
- "Fetch models from host" button — proxied via `/api/local-llm/fetch-models`, populates dropdown
- Active model shown in UI near backend toggle button (amber hint text)
- Migrates old flat `.env`-style config automatically on first use
### [UI] Copy button for user (sent) messages — 2026-04-01
- Added matching copy-on-hover button to user messages (same pattern as assistant messages)
- `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline
### [Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01
- OpenAI-compatible API via `httpx` — no CLI wrapper needed
- Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env`
- Backend toggle cycles `claude → gemini → local` (amber color in UI)
- `/auth/status` includes local reachability check (`GET /api/models`)
- Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly
### [Testing] Gitea SSH port 2222 — 2026-03-29
- pfSense WAN → 192.168.32.7:2222 port forward confirmed working
- `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
- Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
### [Multi-user] Brian onboarding — 2026-03-29
- Invite sent to `memedrift@gmail.com`
- Brian completed onboarding, created `wintermute` persona
- Google OAuth registered (`google-add brian memedrift@gmail.com`)
### [Tools] Reminders tools — 2026-03-29
- `reminders_add`, `reminders_list`, `reminders_clear` added to orchestrator tool suite
- Tools live in `cortex/tools/reminders.py`
- All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
- `persona_template.py` updated so new personas get the protocol automatically
### [Auth] Token expiry — no restart needed — 2026-03-27
- `llm_client._fresh_claude_token()` reads live from `~/.claude/.credentials.json` on every call
- systemd service is a user unit (no sudo) — `systemctl --user restart cortex` is sufficient
- No manual token sync required after `claude auth login`
### [Multi-user] Per-user channel config — 2026-03-27
- Google Chat and NC Talk secrets/config moved from `.env` to `home/{username}/channels.json`
- New endpoints: `POST /channels/google-chat/{username}` and `POST /webhook/nextcloud/{username}`
- No channel access by default — each user configures their own `channels.json`
- Setup guides: `docs/GOOGLE_CHAT_BOT.md` and `docs/NEXTCLOUD_TALK_BOT.md`
### [Auth] Google OAuth sign-in — 2026-03-27
- `GET /auth/google` → Google consent → `GET /auth/google/callback` flow
- Users pre-registered via `manage_passwords.py google-add <user> <email>`
- Google sign-in button on `/login`; auth.json stores `google_sub` + `google_email`
- Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)
### [Settings] Per-user Gemini API key — 2026-03-27
- Stored in `home/{username}/auth.json` as `gemini_api_key`
- Orchestrator uses user key if set, falls back to server-level `GEMINI_API_KEY`
- Manageable via `/settings` UI (add, remove, masked hint)
### [UI] Session persistence across navigation — 2026-03-26
- localStorage keyed to `cx_sid_{user}_{persona}` with 30-min inactivity TTL
- Auto-restored silently on page load; cleared on "New session" or session delete
### [UI] Persona picker page — 2026-03-26
- `GET /{username}` shows a card grid of available personas instead of 404
- Each card links directly to `/{username}/{persona}`
### [UI] Lucide icons — 2026-03-25
- Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
- Loaded via UMD CDN; `icon_html()` + `render_icons()` helpers in `app.js`
### [UI] Persona-specific favicon — 2026-03-25
- Emoji SVG favicon generated from persona config at load time
### [Multi-user] Holly onboarding — 2026-03-20
- Holly's invite sent; onboarding completed via `/setup/{token}`
- `home/holly/persona/tina/` created from template
- Google OAuth registered (`holly.danner@gmail.com`)
### [Channel] Nextcloud Talk integration ✅ — 2026-03-20, updated 2026-03-27
- HMAC verification: incoming uses `random + raw_body`; outgoing reply uses `random + message_text`
- Per-user routing added 2026-03-27 (endpoint: `/webhook/nextcloud/{username}`)
- Docs: `docs/NEXTCLOUD_TALK_BOT.md`
### [Channel] Google Chat integration ✅ — 2026-03-20, updated 2026-03-27
- JWT verification via `authorizationEventObject.systemIdToken`
- Workspace Add-on format: `hostAppDataAction.chatDataAction.createMessageAction`
- Per-user routing added 2026-03-27 (endpoint: `/channels/google-chat/{username}`)
- Docs: `docs/GOOGLE_CHAT_BOT.md`
### [Intelligence] Orchestrator service — Phase 1 — 2026-03-18
- Gemini API (google-genai SDK) tool loop → Claude final response
- `POST /orchestrate` (async job), `GET /orchestrate/{job_id}` (poll)
- Tools: web search, AE API, file read, task list, scratch, reminders, cron
- Default model: `gemini-2.5-flash`
### [Auth] Session auth + persona onboarding — 2026-03-20
- bcrypt passwords in `home/{username}/auth.json`
- JWT session cookies (HS256, 30-day expiry)
- Invite tokens (72h, one-time-use) — `manage_passwords.py invite <user> [email]`
- Self-service onboarding: `/setup/{token}``/setup/persona`
- SMTP invite email via `noreply@oneskyit.com`
### [UI] Mobile-friendly header — 2026-03
- Backend toggle, font size, theme buttons moved into ⚙ settings panel
- Header reduced to core buttons
### [UI] Help & Reference — 2026-03-27
- Shared base at `cortex/static/HELP.md` (served to all users)
- Persona-specific additions appended from `home/{username}/persona/{name}/HELP.md` if present
- Collapsible H2 sections via `<details>` elements
### [Backend] Gemini CLI backend — 2026-03
- `gemini -p` subprocess, streaming output; auth check at `/auth/status`
### [Backend] Memory distiller — 2026-03
- APScheduler: `distill_short` (daily 03:00), `distill_mid` (weekly Sun 03:30), `distill_long` (monthly 1st 04:00)
- Writes to `MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md` per persona
### [Backend] Session logging + file browser — 2026-03
- Sessions saved to `home/{user}/persona/{name}/sessions/`
- Files panel in UI browses persona directory
### [Backend] Dispatcher core — 2026-03-04
- FastAPI service with streaming SSE response
- Claude CLI and Gemini CLI subprocess backends
- Session context management (rolling window, `MAX_HISTORY_MESSAGES`)