Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)
Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)
Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results
Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override
Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object
UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local
Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
220 lines
11 KiB
Markdown
220 lines
11 KiB
Markdown
# Cortex / Inara — Agent Task List
|
||
|
||
> Read this file before starting any work on this project.
|
||
> **Status:** Active development — ongoing.
|
||
|
||
---
|
||
|
||
## 🔴 High Priority
|
||
|
||
### [Local] Tool-capable local orchestrator
|
||
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
|
||
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
|
||
Gemini API orchestrator for private/offline tasks.
|
||
|
||
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
|
||
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
|
||
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
|
||
append result → loop until `finish_reason: stop`
|
||
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
|
||
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
|
||
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
|
||
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
|
||
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
|
||
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
|
||
|
||
---
|
||
|
||
## 🟡 Medium Priority
|
||
|
||
### [Intelligence] Knowledge consolidation — Phase 1
|
||
See `ARCH__Intelligence_Layer.md` for full design.
|
||
- [x] Tool: `ae_journal_search` — search before creating to avoid duplicates
|
||
- [x] Tool: `ae_journal_entry_create` — write a new entry with source metadata
|
||
- [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
|
||
- [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
|
||
- [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
|
||
|
||
### [Distill] Review first auto_distill_long output — 2026-04-01
|
||
- Ran April 1 at 04:00 as scheduled
|
||
- Manually review `inara/MEMORY_LONG.md` — confirm quality before fully trusting
|
||
- Adjust distill prompts in `cortex/memory_distiller.py` if needed
|
||
|
||
### [Distill] Distill quality review
|
||
- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
|
||
- After first few automatic runs, review quality and tune
|
||
|
||
### [Local] Unsloth Gemma 4 variants
|
||
- Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0
|
||
- Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
|
||
- Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
|
||
- Expected speedup: ~10–20% smaller context footprint vs baseline, same quality
|
||
- `agent-support-gemma-small` → Unsloth E4B Q4_K_M; `agent-support-gemma-medium` → Unsloth 26B A4B Q4_K_M
|
||
|
||
---
|
||
|
||
## 🟢 Lower Priority / Future
|
||
|
||
### [Intelligence] Dev agent pipeline
|
||
See `ARCH__Intelligence_Layer.md`. Full design not yet started.
|
||
- [ ] Specialist agent: frontend (SvelteKit) code changes
|
||
- [ ] Specialist agent: backend (FastAPI) code changes
|
||
- [ ] Supervisor agent: diff review, syntax check, test runner
|
||
- [ ] Gitea webhook integration: trigger on push/PR, report back
|
||
- [ ] Human approval gate before commit
|
||
|
||
### [Intelligence] Supervisor agent
|
||
- Runs `py_compile`, `svelte-check`, unit tests after specialist agent work
|
||
- Reports pass/fail back to orchestrator
|
||
- Only commits on explicit approval
|
||
|
||
### [Channel] Gitea webhooks
|
||
- Receive push/PR/issue events → route to appropriate agent
|
||
- `cortex/routers/` already has pattern; add `gitea.py`
|
||
- Gitea Actions (CI) for "run tests on push" — simpler than custom runner
|
||
|
||
### [Local] RAG via Open WebUI
|
||
Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections →
|
||
reference in chat). Could feed Nextcloud docs or session logs into a local knowledge
|
||
base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`.
|
||
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
|
||
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
|
||
|
||
### [Backend] Intelligent model routing
|
||
- Currently hardcoded: Claude default, Gemini fallback, local third
|
||
- Design direction (now informed by real local model perf):
|
||
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
|
||
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
|
||
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
|
||
- Future: auto-route by task type rather than requiring user to toggle backend manually
|
||
|
||
---
|
||
|
||
## ✅ Completed
|
||
|
||
### [Local] Per-user multi-model local LLM settings — 2026-04-01
|
||
- `home/{username}/local_llm.json` — `hosts[]` + `models[]` + `active_model_id` structure
|
||
- `cortex/user_settings.py` — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
|
||
- `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` — dedicated `/settings/local` page
|
||
- "Fetch models from host" button — proxied via `/api/local-llm/fetch-models`, populates dropdown
|
||
- Active model shown in UI near backend toggle button (amber hint text)
|
||
- Migrates old flat `.env`-style config automatically on first use
|
||
|
||
### [UI] Copy button for user (sent) messages — 2026-04-01
|
||
- Added matching copy-on-hover button to user messages (same pattern as assistant messages)
|
||
- `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline
|
||
|
||
### [Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01
|
||
- OpenAI-compatible API via `httpx` — no CLI wrapper needed
|
||
- Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env`
|
||
- Backend toggle cycles `claude → gemini → local` (amber color in UI)
|
||
- `/auth/status` includes local reachability check (`GET /api/models`)
|
||
- Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly
|
||
|
||
### [Testing] Gitea SSH port 2222 — 2026-03-29
|
||
- pfSense WAN → 192.168.32.7:2222 port forward confirmed working
|
||
- `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
|
||
- Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
|
||
|
||
### [Multi-user] Brian onboarding — 2026-03-29
|
||
- Invite sent to `memedrift@gmail.com`
|
||
- Brian completed onboarding, created `wintermute` persona
|
||
- Google OAuth registered (`google-add brian memedrift@gmail.com`)
|
||
|
||
### [Tools] Reminders tools — 2026-03-29
|
||
- `reminders_add`, `reminders_list`, `reminders_clear` added to orchestrator tool suite
|
||
- Tools live in `cortex/tools/reminders.py`
|
||
- All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
|
||
- `persona_template.py` updated so new personas get the protocol automatically
|
||
|
||
### [Auth] Token expiry — no restart needed — 2026-03-27
|
||
- `llm_client._fresh_claude_token()` reads live from `~/.claude/.credentials.json` on every call
|
||
- systemd service is a user unit (no sudo) — `systemctl --user restart cortex` is sufficient
|
||
- No manual token sync required after `claude auth login`
|
||
|
||
### [Multi-user] Per-user channel config — 2026-03-27
|
||
- Google Chat and NC Talk secrets/config moved from `.env` to `home/{username}/channels.json`
|
||
- New endpoints: `POST /channels/google-chat/{username}` and `POST /webhook/nextcloud/{username}`
|
||
- No channel access by default — each user configures their own `channels.json`
|
||
- Setup guides: `docs/GOOGLE_CHAT_BOT.md` and `docs/NEXTCLOUD_TALK_BOT.md`
|
||
|
||
### [Auth] Google OAuth sign-in — 2026-03-27
|
||
- `GET /auth/google` → Google consent → `GET /auth/google/callback` flow
|
||
- Users pre-registered via `manage_passwords.py google-add <user> <email>`
|
||
- Google sign-in button on `/login`; auth.json stores `google_sub` + `google_email`
|
||
- Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)
|
||
|
||
### [Settings] Per-user Gemini API key — 2026-03-27
|
||
- Stored in `home/{username}/auth.json` as `gemini_api_key`
|
||
- Orchestrator uses user key if set, falls back to server-level `GEMINI_API_KEY`
|
||
- Manageable via `/settings` UI (add, remove, masked hint)
|
||
|
||
### [UI] Session persistence across navigation — 2026-03-26
|
||
- localStorage keyed to `cx_sid_{user}_{persona}` with 30-min inactivity TTL
|
||
- Auto-restored silently on page load; cleared on "New session" or session delete
|
||
|
||
### [UI] Persona picker page — 2026-03-26
|
||
- `GET /{username}` shows a card grid of available personas instead of 404
|
||
- Each card links directly to `/{username}/{persona}`
|
||
|
||
### [UI] Lucide icons — 2026-03-25
|
||
- Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
|
||
- Loaded via UMD CDN; `icon_html()` + `render_icons()` helpers in `app.js`
|
||
|
||
### [UI] Persona-specific favicon — 2026-03-25
|
||
- Emoji SVG favicon generated from persona config at load time
|
||
|
||
### [Multi-user] Holly onboarding — 2026-03-20
|
||
- Holly's invite sent; onboarding completed via `/setup/{token}`
|
||
- `home/holly/persona/tina/` created from template
|
||
- Google OAuth registered (`holly.danner@gmail.com`)
|
||
|
||
### [Channel] Nextcloud Talk integration ✅ — 2026-03-20, updated 2026-03-27
|
||
- HMAC verification: incoming uses `random + raw_body`; outgoing reply uses `random + message_text`
|
||
- Per-user routing added 2026-03-27 (endpoint: `/webhook/nextcloud/{username}`)
|
||
- Docs: `docs/NEXTCLOUD_TALK_BOT.md`
|
||
|
||
### [Channel] Google Chat integration ✅ — 2026-03-20, updated 2026-03-27
|
||
- JWT verification via `authorizationEventObject.systemIdToken`
|
||
- Workspace Add-on format: `hostAppDataAction.chatDataAction.createMessageAction`
|
||
- Per-user routing added 2026-03-27 (endpoint: `/channels/google-chat/{username}`)
|
||
- Docs: `docs/GOOGLE_CHAT_BOT.md`
|
||
|
||
### [Intelligence] Orchestrator service — Phase 1 — 2026-03-18
|
||
- Gemini API (google-genai SDK) tool loop → Claude final response
|
||
- `POST /orchestrate` (async job), `GET /orchestrate/{job_id}` (poll)
|
||
- Tools: web search, AE API, file read, task list, scratch, reminders, cron
|
||
- Default model: `gemini-2.5-flash`
|
||
|
||
### [Auth] Session auth + persona onboarding — 2026-03-20
|
||
- bcrypt passwords in `home/{username}/auth.json`
|
||
- JWT session cookies (HS256, 30-day expiry)
|
||
- Invite tokens (72h, one-time-use) — `manage_passwords.py invite <user> [email]`
|
||
- Self-service onboarding: `/setup/{token}` → `/setup/persona`
|
||
- SMTP invite email via `noreply@oneskyit.com`
|
||
|
||
### [UI] Mobile-friendly header — 2026-03
|
||
- Backend toggle, font size, theme buttons moved into ⚙ settings panel
|
||
- Header reduced to core buttons
|
||
|
||
### [UI] Help & Reference — 2026-03-27
|
||
- Shared base at `cortex/static/HELP.md` (served to all users)
|
||
- Persona-specific additions appended from `home/{username}/persona/{name}/HELP.md` if present
|
||
- Collapsible H2 sections via `<details>` elements
|
||
|
||
### [Backend] Gemini CLI backend — 2026-03
|
||
- `gemini -p` subprocess, streaming output; auth check at `/auth/status`
|
||
|
||
### [Backend] Memory distiller — 2026-03
|
||
- APScheduler: `distill_short` (daily 03:00), `distill_mid` (weekly Sun 03:30), `distill_long` (monthly 1st 04:00)
|
||
- Writes to `MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md` per persona
|
||
|
||
### [Backend] Session logging + file browser — 2026-03
|
||
- Sessions saved to `home/{user}/persona/{name}/sessions/`
|
||
- Files panel in UI browses persona directory
|
||
|
||
### [Backend] Dispatcher core — 2026-03-04
|
||
- FastAPI service with streaming SSE response
|
||
- Claude CLI and Gemini CLI subprocess backends
|
||
- Session context management (rolling window, `MAX_HISTORY_MESSAGES`)
|