Files
Cortex-Inara/documentation/TODO__Agents.md
Scott Idem 45c95d20ba feat: model registry V2 — provider-aware schema with multi-account support
Adds a providers section to the per-user model registry for Anthropic and
Google as first-class providers alongside local hosts. Google accounts
(API keys) are now stored as a list so multiple Google accounts can coexist.

Changes:
- model_registry.py: V2 schema, auto migration V1→V2 (pulls gemini_api_key
  from auth.json into providers.google.accounts), _resolve_model() merges
  account API key for gemini_api type models
- routers/orchestrator.py: uses model-resolved api_key when orchestrator
  role resolves to a gemini_api model with account_id
- ANTHROPIC_CATALOG and GOOGLE_CATALOG constants for model picker (Phase 2)
- New functions: get_google_api_key(), save/remove_google_account(), get_catalog()
- Documentation: ARCH__BACKENDS.md updated to V2 schema, DESIGN doc added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 20:21:04 -04:00

227 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cortex / Inara — Agent Task List
> Read this file before starting any work on this project.
> **Status:** Active development — ongoing.
---
## 🔴 High Priority
### [Local] Tool-capable local orchestrator
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
Gemini API orchestrator for private/offline tasks.
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
append result → loop until `finish_reason: stop`
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
---
## 🟡 Medium Priority
### [Models] Model Registry V2 — Unified Provider System
See `DESIGN__Model_Registry_V2.md` for full design.
- [x] **Phase 1** — V2 schema with providers (Anthropic/Google), multi-account Gemini, auto migration, orchestrator uses account API key — 2026-04-27
- [ ] **Phase 2** — Cloud provider UI: Anthropic + Google sections in `/settings/models`, account management, model entry creation for cloud models
- [ ] **Phase 3** — Unified roles + toggle redesign: standalone role assignments, chat toggle cycles role slots (Primary/Backup 1/Backup 2) showing model label
- [ ] **Phase 4** — Polish: Claude API key, OpenRouter as named provider, catalog sync from API
### [Intelligence] Knowledge consolidation — Phase 1
See `ARCH__Intelligence_Layer.md` for full design.
- [x] Tool: `ae_journal_search` — search before creating to avoid duplicates
- [x] Tool: `ae_journal_entry_create` — write a new entry with source metadata
- [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
- [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
- [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
### [Distill] Review first auto_distill_long output — 2026-04-01
- Ran April 1 at 04:00 as scheduled
- Manually review `inara/MEMORY_LONG.md` — confirm quality before fully trusting
- Adjust distill prompts in `cortex/memory_distiller.py` if needed
### [Distill] Distill quality review
- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
- After first few automatic runs, review quality and tune
### [Local] Unsloth Gemma 4 variants
- Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0
- Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
- Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
- Expected speedup: ~1020% smaller context footprint vs baseline, same quality
- `agent-support-gemma-small` → Unsloth E4B Q4_K_M; `agent-support-gemma-medium` → Unsloth 26B A4B Q4_K_M
---
## 🟢 Lower Priority / Future
### [Intelligence] Dev agent pipeline
See `ARCH__Intelligence_Layer.md`. Full design not yet started.
- [ ] Specialist agent: frontend (SvelteKit) code changes
- [ ] Specialist agent: backend (FastAPI) code changes
- [ ] Supervisor agent: diff review, syntax check, test runner
- [ ] Gitea webhook integration: trigger on push/PR, report back
- [ ] Human approval gate before commit
### [Intelligence] Supervisor agent
- Runs `py_compile`, `svelte-check`, unit tests after specialist agent work
- Reports pass/fail back to orchestrator
- Only commits on explicit approval
### [Channel] Gitea webhooks
- Receive push/PR/issue events → route to appropriate agent
- `cortex/routers/` already has pattern; add `gitea.py`
- Gitea Actions (CI) for "run tests on push" — simpler than custom runner
### [Local] RAG via Open WebUI
Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections →
reference in chat). Could feed Nextcloud docs or session logs into a local knowledge
base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`.
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback, local third
- Design direction (now informed by real local model perf):
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
- Future: auto-route by task type rather than requiring user to toggle backend manually
---
## ✅ Completed
### [Local] Per-user multi-model local LLM settings — 2026-04-01
- `home/{username}/local_llm.json``hosts[]` + `models[]` + `active_model_id` structure
- `cortex/user_settings.py` — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
- `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` — dedicated `/settings/local` page
- "Fetch models from host" button — proxied via `/api/local-llm/fetch-models`, populates dropdown
- Active model shown in UI near backend toggle button (amber hint text)
- Migrates old flat `.env`-style config automatically on first use
### [UI] Copy button for user (sent) messages — 2026-04-01
- Added matching copy-on-hover button to user messages (same pattern as assistant messages)
- `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline
### [Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01
- OpenAI-compatible API via `httpx` — no CLI wrapper needed
- Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env`
- Backend toggle cycles `claude → gemini → local` (amber color in UI)
- `/auth/status` includes local reachability check (`GET /api/models`)
- Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly
### [Testing] Gitea SSH port 2222 — 2026-03-29
- pfSense WAN → 192.168.32.7:2222 port forward confirmed working
- `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
- Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
### [Multi-user] Brian onboarding — 2026-03-29
- Invite sent to `memedrift@gmail.com`
- Brian completed onboarding, created `wintermute` persona
- Google OAuth registered (`google-add brian memedrift@gmail.com`)
### [Tools] Reminders tools — 2026-03-29
- `reminders_add`, `reminders_list`, `reminders_clear` added to orchestrator tool suite
- Tools live in `cortex/tools/reminders.py`
- All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
- `persona_template.py` updated so new personas get the protocol automatically
### [Auth] Token expiry — no restart needed — 2026-03-27
- `llm_client._fresh_claude_token()` reads live from `~/.claude/.credentials.json` on every call
- systemd service is a user unit (no sudo) — `systemctl --user restart cortex` is sufficient
- No manual token sync required after `claude auth login`
### [Multi-user] Per-user channel config — 2026-03-27
- Google Chat and NC Talk secrets/config moved from `.env` to `home/{username}/channels.json`
- New endpoints: `POST /channels/google-chat/{username}` and `POST /webhook/nextcloud/{username}`
- No channel access by default — each user configures their own `channels.json`
- Setup guides: `docs/GOOGLE_CHAT_BOT.md` and `docs/NEXTCLOUD_TALK_BOT.md`
### [Auth] Google OAuth sign-in — 2026-03-27
- `GET /auth/google` → Google consent → `GET /auth/google/callback` flow
- Users pre-registered via `manage_passwords.py google-add <user> <email>`
- Google sign-in button on `/login`; auth.json stores `google_sub` + `google_email`
- Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)
### [Settings] Per-user Gemini API key — 2026-03-27
- Stored in `home/{username}/auth.json` as `gemini_api_key`
- Orchestrator uses user key if set, falls back to server-level `GEMINI_API_KEY`
- Manageable via `/settings` UI (add, remove, masked hint)
### [UI] Session persistence across navigation — 2026-03-26
- localStorage keyed to `cx_sid_{user}_{persona}` with 30-min inactivity TTL
- Auto-restored silently on page load; cleared on "New session" or session delete
### [UI] Persona picker page — 2026-03-26
- `GET /{username}` shows a card grid of available personas instead of 404
- Each card links directly to `/{username}/{persona}`
### [UI] Lucide icons — 2026-03-25
- Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
- Loaded via UMD CDN; `icon_html()` + `render_icons()` helpers in `app.js`
### [UI] Persona-specific favicon — 2026-03-25
- Emoji SVG favicon generated from persona config at load time
### [Multi-user] Holly onboarding — 2026-03-20
- Holly's invite sent; onboarding completed via `/setup/{token}`
- `home/holly/persona/tina/` created from template
- Google OAuth registered (`holly.danner@gmail.com`)
### [Channel] Nextcloud Talk integration ✅ — 2026-03-20, updated 2026-03-27
- HMAC verification: incoming uses `random + raw_body`; outgoing reply uses `random + message_text`
- Per-user routing added 2026-03-27 (endpoint: `/webhook/nextcloud/{username}`)
- Docs: `docs/NEXTCLOUD_TALK_BOT.md`
### [Channel] Google Chat integration ✅ — 2026-03-20, updated 2026-03-27
- JWT verification via `authorizationEventObject.systemIdToken`
- Workspace Add-on format: `hostAppDataAction.chatDataAction.createMessageAction`
- Per-user routing added 2026-03-27 (endpoint: `/channels/google-chat/{username}`)
- Docs: `docs/GOOGLE_CHAT_BOT.md`
### [Intelligence] Orchestrator service — Phase 1 — 2026-03-18
- Gemini API (google-genai SDK) tool loop → Claude final response
- `POST /orchestrate` (async job), `GET /orchestrate/{job_id}` (poll)
- Tools: web search, AE API, file read, task list, scratch, reminders, cron
- Default model: `gemini-2.5-flash`
### [Auth] Session auth + persona onboarding — 2026-03-20
- bcrypt passwords in `home/{username}/auth.json`
- JWT session cookies (HS256, 30-day expiry)
- Invite tokens (72h, one-time-use) — `manage_passwords.py invite <user> [email]`
- Self-service onboarding: `/setup/{token}``/setup/persona`
- SMTP invite email via `noreply@oneskyit.com`
### [UI] Mobile-friendly header — 2026-03
- Backend toggle, font size, theme buttons moved into ⚙ settings panel
- Header reduced to core buttons
### [UI] Help & Reference — 2026-03-27
- Shared base at `cortex/static/HELP.md` (served to all users)
- Persona-specific additions appended from `home/{username}/persona/{name}/HELP.md` if present
- Collapsible H2 sections via `<details>` elements
### [Backend] Gemini CLI backend — 2026-03
- `gemini -p` subprocess, streaming output; auth check at `/auth/status`
### [Backend] Memory distiller — 2026-03
- APScheduler: `distill_short` (daily 03:00), `distill_mid` (weekly Sun 03:30), `distill_long` (monthly 1st 04:00)
- Writes to `MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md` per persona
### [Backend] Session logging + file browser — 2026-03
- Sessions saved to `home/{user}/persona/{name}/sessions/`
- Files panel in UI browses persona directory
### [Backend] Dispatcher core — 2026-03-04
- FastAPI service with streaming SSE response
- Claude CLI and Gemini CLI subprocess backends
- Session context management (rolling window, `MAX_HISTORY_MESSAGES`)