# Cortex / Inara β€” Agent Task List > Read this file before starting any work on this project. > **Status:** Active development β€” ongoing. --- ## πŸ”΄ High Priority ### [Local] Tool-capable local orchestrator Design and implement `local_orchestrator_engine.py` β€” a ReAct tool loop driven by a local model via Open WebUI's OpenAI-compatible API, as an alternative to the Gemini API orchestrator for private/offline tasks. - [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini `FunctionDeclaration` format to OpenAI `tools` format (minor schema diff) - [ ] Implement tool loop: send tools β†’ parse `tool_calls` response β†’ execute β†’ append result β†’ loop until `finish_reason: stop` - [ ] Wire into `routers/orchestrator.py` β€” new `mode` param: `"local"` vs `"gemini"` - [ ] UI: Agent mode button routes to local orchestrator when local backend active - [ ] Recommended models (scott_gaming, 8 GB VRAM): Gemma 4 E4B β€” 25 t/s, 72k practical ctx β€” interactive/fast tasks Gemma 4 26B A4B β€” 9 t/s, 50k practical ctx β€” heavier reasoning, background tasks - Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format --- ## 🟑 Medium Priority ### [Models] Model Registry V2 β€” Unified Provider System See `DESIGN__Model_Registry_V2.md` for full design. - [x] **Phase 1** β€” V2 schema with providers (Anthropic/Google), multi-account Gemini, auto migration, orchestrator uses account API key β€” 2026-04-27 - [ ] **Phase 2** β€” Cloud provider UI: Anthropic + Google sections in `/settings/models`, account management, model entry creation for cloud models - [ ] **Phase 3** β€” Unified roles + toggle redesign: standalone role assignments, chat toggle cycles role slots (Primary/Backup 1/Backup 2) showing model label - [ ] **Phase 4** β€” Polish: Claude API key, OpenRouter as named provider, catalog sync from API ### [Intelligence] Knowledge consolidation β€” Phase 1 See `ARCH__Intelligence_Layer.md` for full design. - [x] Tool: `ae_journal_list` β€” list all journals for the account β€” 2026-04-28 - [x] Tool: `ae_journal_search` β€” search before creating to avoid duplicates - [x] Tool: `ae_journal_entry_create` β€” write a new entry with source metadata - [x] Tool: `ae_journal_entry_update` β€” PATCH any fields on an existing entry β€” 2026-04-28 - [x] Tool: `ae_journal_entry_disable` β€” soft-delete via enable=false β€” 2026-04-28 - [x] Tool: `ae_journal_entry_append` β€” readβ†’append timestamped sectionβ†’write (running logs) β€” 2026-04-28 - [x] Tool: `ae_journal_entry_prepend` β€” readβ†’prepend timestamped sectionβ†’write (newest-first logs) β€” 2026-04-28 - [ ] Import script: walk a markdown directory, chunk by H2 section, create entries - [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/` - [ ] Tag strategy: source path, date, topic tags from frontmatter or filename ### [Distill] Review first auto_distill_long output β€” 2026-04-01 - Ran April 1 at 04:00 as scheduled - Manually review `inara/MEMORY_LONG.md` β€” confirm quality before fully trusting - Adjust distill prompts in `cortex/memory_distiller.py` if needed ### [Distill] Distill quality review - Short/mid/long distill prompts live in `cortex/memory_distiller.py` - After first few automatic runs, review quality and tune ### [Local] Unsloth Gemma 4 variants - Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with `500: unable to load model` on Ollama v0.20.0 - Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files - Waiting on Ollama point release (v0.20.1+) β€” then switch Open WebUI to Unsloth variants - Expected speedup: ~10–20% smaller context footprint vs baseline, same quality - `agent-support-gemma-small` β†’ Unsloth E4B Q4_K_M; `agent-support-gemma-medium` β†’ Unsloth 26B A4B Q4_K_M --- ## 🟒 Lower Priority / Future ### [Intelligence] Dev agent pipeline See `ARCH__Intelligence_Layer.md`. Full design not yet started. - [ ] Specialist agent: frontend (SvelteKit) code changes - [ ] Specialist agent: backend (FastAPI) code changes - [ ] Supervisor agent: diff review, syntax check, test runner - [ ] Gitea webhook integration: trigger on push/PR, report back - [ ] Human approval gate before commit ### [Intelligence] Supervisor agent - Runs `py_compile`, `svelte-check`, unit tests after specialist agent work - Reports pass/fail back to orchestrator - Only commits on explicit approval ### [Channel] Gitea webhooks - Receive push/PR/issue events β†’ route to appropriate agent - `cortex/routers/` already has pattern; add `gitea.py` - Gitea Actions (CI) for "run tests on push" β€” simpler than custom runner ### [Local] RAG via Open WebUI Open WebUI has a full RAG pipeline (file upload β†’ embed β†’ knowledge collections β†’ reference in chat). Could feed Nextcloud docs or session logs into a local knowledge base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md`. - `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs - Reference in chat via `"files": [{"type": "collection", "id": "..."}]` ### [Backend] Intelligent model routing - Currently hardcoded: Claude default, Gemini fallback, local third - Design direction (now informed by real local model perf): - **Private/offline tasks** β†’ local (Gemma 4 E4B for speed, 26B A4B for reasoning) - **Complex tool tasks / long context** β†’ Gemini (1M token context, strong function calling) - **Final user-facing responses** β†’ Claude (quality prose, persona fidelity) - Future: auto-route by task type rather than requiring user to toggle backend manually --- ## βœ… Completed ### [UI] Input area polish β€” 2026-04-28 - Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size) - S size collapses mode-select to a row (compact); M/L keep vertical column layout - Input height minimum derived from setting so empty textarea reflects selected size - Context & Memory panel and Settings dropdown are mutually exclusive (closeAllPanels fix) - Both panels now use consistent shadow (var(--shadow)) and z-index (200) ### [Tools] Tools toggle β€” decoupled from Role/Backend β€” 2026-04-28 - Removed "Agent" mode from the mode selector; replaced with independent ⚑ toggle - `toolsEnabled` persists in localStorage; routes to orchestrator regardless of active mode - Layout: column (M/L) or row (S) driven by `data-size` attribute set by JS - chat_role flows from UI β†’ OrchestrateRequest β†’ orchestrator_engine.run(response_role=...) ### [Tools] shell_exec tool β€” 2026-04-28 - `shell_exec(command, working_dir, timeout)` in `cortex/tools/system.py` - Runs any shell command on the Cortex host; timeout clamped 1–120s - Use for system diagnostics: `df -h`, `ps aux`, `journalctl`, `free -h`, etc. ### [Tools] Aether Journals full toolkit β€” 2026-04-28 - `ae_journal_list` β€” list all journals + ids for the account - `ae_journal_entry_update` β€” PATCH any fields (title, content, summary, tags, enable) - `ae_journal_entry_disable` β€” soft-delete via enable=false - `ae_journal_entry_append` β€” readβ†’append timestamped sectionβ†’write (running/data logs) - `ae_journal_entry_prepend` β€” readβ†’prepend timestamped sectionβ†’write (newest-first) - Shared `_get_entry` / `_patch_entry` helpers; OpenAI JSON Schema auto-derived from Gemini declarations ### [Local] Per-user multi-model local LLM settings β€” 2026-04-01 - `home/{username}/local_llm.json` β€” `hosts[]` + `models[]` + `active_model_id` structure - `cortex/user_settings.py` β€” CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model - `cortex/routers/local_llm.py` + `cortex/static/local_llm.html` β€” dedicated `/settings/local` page - "Fetch models from host" button β€” proxied via `/api/local-llm/fetch-models`, populates dropdown - Active model shown in UI near backend toggle button (amber hint text) - Migrates old flat `.env`-style config automatically on first use ### [UI] Copy button for user (sent) messages β€” 2026-04-01 - Added matching copy-on-hover button to user messages (same pattern as assistant messages) - `div.dataset.raw` set on send; `makeCopyBtn(div)` appended inline ### [Backend] Local model backend (Open WebUI / Ollama) β€” 2026-04-01 - OpenAI-compatible API via `httpx` β€” no CLI wrapper needed - Configured via `LOCAL_API_URL` / `LOCAL_API_KEY` / `LOCAL_MODEL` in `.env` - Backend toggle cycles `claude β†’ gemini β†’ local` (amber color in UI) - `/auth/status` includes local reachability check (`GET /api/models`) - Tested end-to-end: `test-agent-simple` (Qwen3-8B) on `scott-lt-i7-rtx:3000`, full persona context flowing correctly ### [Testing] Gitea SSH port 2222 β€” 2026-03-29 - pfSense WAN β†’ 192.168.32.7:2222 port forward confirmed working - `ssh -p 2222 git@git.dgrzone.com` reaches Gitea (returns "Invalid repository path" β€” expected, confirms connectivity) - Clone/push via SSH: `git clone ssh://git@git.dgrzone.com:2222//.git` ### [Multi-user] Brian onboarding β€” 2026-03-29 - Invite sent to `memedrift@gmail.com` - Brian completed onboarding, created `wintermute` persona - Google OAuth registered (`google-add brian memedrift@gmail.com`) ### [Tools] Reminders tools β€” 2026-03-29 - `reminders_add`, `reminders_list`, `reminders_clear` added to orchestrator tool suite - Tools live in `cortex/tools/reminders.py` - All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode) - `persona_template.py` updated so new personas get the protocol automatically ### [Auth] Token expiry β€” no restart needed β€” 2026-03-27 - `llm_client._fresh_claude_token()` reads live from `~/.claude/.credentials.json` on every call - systemd service is a user unit (no sudo) β€” `systemctl --user restart cortex` is sufficient - No manual token sync required after `claude auth login` ### [Multi-user] Per-user channel config β€” 2026-03-27 - Google Chat and NC Talk secrets/config moved from `.env` to `home/{username}/channels.json` - New endpoints: `POST /channels/google-chat/{username}` and `POST /webhook/nextcloud/{username}` - No channel access by default β€” each user configures their own `channels.json` - Setup guides: `docs/GOOGLE_CHAT_BOT.md` and `docs/NEXTCLOUD_TALK_BOT.md` ### [Auth] Google OAuth sign-in β€” 2026-03-27 - `GET /auth/google` β†’ Google consent β†’ `GET /auth/google/callback` flow - Users pre-registered via `manage_passwords.py google-add ` - Google sign-in button on `/login`; auth.json stores `google_sub` + `google_email` - Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com) ### [Settings] Per-user Gemini API key β€” 2026-03-27 - Stored in `home/{username}/auth.json` as `gemini_api_key` - Orchestrator uses user key if set, falls back to server-level `GEMINI_API_KEY` - Manageable via `/settings` UI (add, remove, masked hint) ### [UI] Session persistence across navigation β€” 2026-03-26 - localStorage keyed to `cx_sid_{user}_{persona}` with 30-min inactivity TTL - Auto-restored silently on page load; cleared on "New session" or session delete ### [UI] Persona picker page β€” 2026-03-26 - `GET /{username}` shows a card grid of available personas instead of 404 - Each card links directly to `/{username}/{persona}` ### [UI] Lucide icons β€” 2026-03-25 - Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel - Loaded via UMD CDN; `icon_html()` + `render_icons()` helpers in `app.js` ### [UI] Persona-specific favicon β€” 2026-03-25 - Emoji SVG favicon generated from persona config at load time ### [Multi-user] Holly onboarding β€” 2026-03-20 - Holly's invite sent; onboarding completed via `/setup/{token}` - `home/holly/persona/tina/` created from template - Google OAuth registered (`holly.danner@gmail.com`) ### [Channel] Nextcloud Talk integration βœ… β€” 2026-03-20, updated 2026-03-27 - HMAC verification: incoming uses `random + raw_body`; outgoing reply uses `random + message_text` - Per-user routing added 2026-03-27 (endpoint: `/webhook/nextcloud/{username}`) - Docs: `docs/NEXTCLOUD_TALK_BOT.md` ### [Channel] Google Chat integration βœ… β€” 2026-03-20, updated 2026-03-27 - JWT verification via `authorizationEventObject.systemIdToken` - Workspace Add-on format: `hostAppDataAction.chatDataAction.createMessageAction` - Per-user routing added 2026-03-27 (endpoint: `/channels/google-chat/{username}`) - Docs: `docs/GOOGLE_CHAT_BOT.md` ### [Intelligence] Orchestrator service β€” Phase 1 β€” 2026-03-18 - Gemini API (google-genai SDK) tool loop β†’ Claude final response - `POST /orchestrate` (async job), `GET /orchestrate/{job_id}` (poll) - Tools: web search, AE API, file read, task list, scratch, reminders, cron - Default model: `gemini-2.5-flash` ### [Auth] Session auth + persona onboarding β€” 2026-03-20 - bcrypt passwords in `home/{username}/auth.json` - JWT session cookies (HS256, 30-day expiry) - Invite tokens (72h, one-time-use) β€” `manage_passwords.py invite [email]` - Self-service onboarding: `/setup/{token}` β†’ `/setup/persona` - SMTP invite email via `noreply@oneskyit.com` ### [UI] Mobile-friendly header β€” 2026-03 - Backend toggle, font size, theme buttons moved into βš™ settings panel - Header reduced to core buttons ### [UI] Help & Reference β€” 2026-03-27 - Shared base at `cortex/static/HELP.md` (served to all users) - Persona-specific additions appended from `home/{username}/persona/{name}/HELP.md` if present - Collapsible H2 sections via `
` elements ### [Backend] Gemini CLI backend β€” 2026-03 - `gemini -p` subprocess, streaming output; auth check at `/auth/status` ### [Backend] Memory distiller β€” 2026-03 - APScheduler: `distill_short` (daily 03:00), `distill_mid` (weekly Sun 03:30), `distill_long` (monthly 1st 04:00) - Writes to `MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md` per persona ### [Backend] Session logging + file browser β€” 2026-03 - Sessions saved to `home/{user}/persona/{name}/sessions/` - Files panel in UI browses persona directory ### [Backend] Dispatcher core β€” 2026-03-04 - FastAPI service with streaming SSE response - Claude CLI and Gemini CLI subprocess backends - Session context management (rolling window, `MAX_HISTORY_MESSAGES`)