Files
Cortex-Inara/documentation/TODO__Agents.md

18 KiB
Raw Blame History

Cortex / Inara — Agent Task List

Read this file before starting any work on this project. Status: Active development — ongoing.


🔴 High Priority

[Local] Tool-capable local orchestrator

Design and implement local_orchestrator_engine.py — a ReAct tool loop driven by a local model via Open WebUI's OpenAI-compatible API, as an alternative to the Gemini API orchestrator for private/offline tasks.

  • Convert existing Cortex tool definitions (cortex/tools/) from Gemini FunctionDeclaration format to OpenAI tools format (minor schema diff)
  • Implement tool loop: send tools → parse tool_calls response → execute → append result → loop until finish_reason: stop
  • Wire into routers/orchestrator.py — new mode param: "local" vs "gemini"
  • UI: Agent mode button routes to local orchestrator when local backend active
  • Recommended models (scott_gaming, 8 GB VRAM): Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
  • Reference: docs/OPEN_WEBUI_API.md for full tool call request/response format

🟡 Medium Priority

[UI] Progressive Web App (PWA)

Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.

  • Add manifest.json (name, icons, theme color, display: standalone, start_url)
  • Serve manifest.json from cortex/routers/ui.py or as a static file
  • Add <link rel="manifest"> to index.html
  • Basic service worker for offline shell (cache static assets; network-first for API)
  • Register service worker in app.js
  • Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts

[Channel] Proactive notifications

Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder fires, a cron job completes, or something else warrants attention. The cron/reminder infrastructure already exists — this closes the loop so she can interrupt the user.

  • Add outbound message helper for NC Talk (send_nextcloud_message(user, text))
  • Add outbound message helper for Google Chat (send_google_chat_message(user, text))
  • Wire cron job completion and reminder triggers to call outbound helper
  • Store user preference: which channel to use for proactive notifications
  • channels.json already per-user — add notify_channel: "nextcloud" | "google_chat" | null

[UI] File attachments in chat

Upload an image or document inline and have it flow into context. Natural workflow ("here's this PDF, summarize it"); local backend already supports multimodal via Open WebUI.

  • Add attachment button to input area (paperclip icon, hidden file input)
  • Client: encode file as base64 or multipart; send alongside message text
  • Server: accept file in POST /chat; route to appropriate backend
    • Claude: content array with image blocks (base64 or URL)
    • Gemini: parts array with inline_data
    • Local (Open WebUI): content array with image_url items
  • UI: show thumbnail/filename above the sent message

[Models] Edit existing model entries

Currently models can only be removed and re-added. Add an edit flow so fields (display name, model ID, context size, tags, notes) can be updated in-place.

  • Add "Edit" button next to each model row in local_llm.html (alongside Remove)
  • Populate the Add Model form with the model's current values when edit is clicked
  • On save, PATCH or delete+recreate via user_settings.py
  • Applies to both local and (future) cloud model entries

[Auth] Encrypted sessions

Allow users to opt-in to per-session encryption so session logs on disk cannot be read without the user's key.

  • Design key derivation: password-based (PBKDF2/Argon2) or separate passphrase
  • Encrypt session_logger.py output before writing to sessions/*.md
  • Decrypt on read in session_store.py (history reload, file browser)
  • UI toggle in Settings to enable/disable encrypted sessions per persona
  • Decide: encrypt at rest only, or also in-memory session store?
  • Consider: how distillation and session search interact with encrypted files

[Models] Model Registry V2 — Unified Provider System

See DESIGN__Model_Registry_V2.md for full design.

  • Phase 1 — V2 schema with providers (Anthropic/Google), multi-account Gemini, auto migration, orchestrator uses account API key — 2026-04-27
  • Phase 2 — Cloud provider UI: Anthropic + Google sections in /settings/models, account management, model entry creation for cloud models
  • Phase 3 — Unified roles + toggle redesign: standalone role assignments, chat toggle cycles role slots (Primary/Backup 1/Backup 2) showing model label
  • Phase 4 — Polish: Claude API key, OpenRouter as named provider, catalog sync from API

[Intelligence] Knowledge consolidation — Phase 1

See ARCH__Intelligence_Layer.md for full design.

  • Tool: ae_journal_list — list all journals for the account — 2026-04-28
  • Tool: ae_journal_search — search before creating to avoid duplicates
  • Tool: ae_journal_entry_create — write a new entry with source metadata
  • Tool: ae_journal_entry_update — PATCH any fields on an existing entry — 2026-04-28
  • Tool: ae_journal_entry_disable — soft-delete via enable=false — 2026-04-28
  • Tool: ae_journal_entry_append — read→append timestamped section→write (running logs) — 2026-04-28
  • Tool: ae_journal_entry_prepend — read→prepend timestamped section→write (newest-first logs) — 2026-04-28
  • Import script: walk a markdown directory, chunk by H2 section, create entries
  • Target: markdown files from ~/DgrZone_Nextcloud/ and ~/OSIT_Nextcloud/
  • Tag strategy: source path, date, topic tags from frontmatter or filename

[Distill] Review first auto_distill_long output — 2026-04-01

  • Ran April 1 at 04:00 as scheduled
  • Manually review inara/MEMORY_LONG.md — confirm quality before fully trusting
  • Adjust distill prompts in cortex/memory_distiller.py if needed

[Distill] Distill quality review

  • Short/mid/long distill prompts live in cortex/memory_distiller.py
  • After first few automatic runs, review quality and tune

[Local] Unsloth Gemma 4 variants

  • Unsloth Dynamic 2.0 Q4_K_M GGUFs fail with 500: unable to load model on Ollama v0.20.0
  • Root cause: Ollama's bundled llama.cpp doesn't recognize Gemma 4 GGUF architecture metadata from raw files
  • Waiting on Ollama point release (v0.20.1+) — then switch Open WebUI to Unsloth variants
  • Expected speedup: ~1020% smaller context footprint vs baseline, same quality
  • agent-support-gemma-small → Unsloth E4B Q4_K_M; agent-support-gemma-medium → Unsloth 26B A4B Q4_K_M

🟢 Lower Priority / Future

The file browser has per-file session search, but no way to query across all sessions for a persona. A unified search would make the session archive useful as a knowledge source.

  • POST /sessions/search?q=... — walks home/{user}/persona/{name}/sessions/*.md, returns matching excerpts with date + line context
  • UI: search input in file browser sidebar already present — wire to new endpoint
  • Consider: index on startup vs. live grep (live grep is fine at typical session volume)

[Backend] API usage / cost tracking

Multi-user setup with real Gemini/Claude API costs. Track per-user token consumption so Scott can see who's spending what.

  • Count input + output tokens per /chat and /orchestrate call (all backends return usage)
  • Append to home/{user}/usage.json — daily buckets, per-model breakdown
  • Expose via /api/usage endpoint; add a summary row to the Settings page
  • Optional: soft spending limit with a warning toast when exceeded

[Intelligence] Dev agent pipeline

See ARCH__Intelligence_Layer.md. Full design not yet started.

  • Specialist agent: frontend (SvelteKit) code changes
  • Specialist agent: backend (FastAPI) code changes
  • Supervisor agent: diff review, syntax check, test runner
  • Gitea webhook integration: trigger on push/PR, report back
  • Human approval gate before commit

[Intelligence] Supervisor agent

  • Runs py_compile, svelte-check, unit tests after specialist agent work
  • Reports pass/fail back to orchestrator
  • Only commits on explicit approval

[Channel] Gitea webhooks

  • Receive push/PR/issue events → route to appropriate agent
  • cortex/routers/ already has pattern; add gitea.py
  • Gitea Actions (CI) for "run tests on push" — simpler than custom runner

[Local] RAG via Open WebUI

Open WebUI has a full RAG pipeline (file upload → embed → knowledge collections → reference in chat). Could feed Nextcloud docs or session logs into a local knowledge base accessible to local models. Endpoints documented in docs/OPEN_WEBUI_API.md.

  • /api/v1/files/ upload + /api/v1/retrieval/process/web for URLs
  • Reference in chat via "files": [{"type": "collection", "id": "..."}]

[Backend] Intelligent model routing

  • Currently hardcoded: Claude default, Gemini fallback, local third
  • Design direction (now informed by real local model perf):
    • Private/offline tasks → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
    • Complex tool tasks / long context → Gemini (1M token context, strong function calling)
    • Final user-facing responses → Claude (quality prose, persona fidelity)
  • Future: auto-route by task type rather than requiring user to toggle backend manually

Completed

[UI] Input area polish — 2026-04-28

  • Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
  • S size collapses mode-select to a row (compact); M/L keep vertical column layout
  • Input height minimum derived from setting so empty textarea reflects selected size
  • Context & Memory panel and Settings dropdown are mutually exclusive (closeAllPanels fix)
  • Both panels now use consistent shadow (var(--shadow)) and z-index (200)

[Tools] Tools toggle — decoupled from Role/Backend — 2026-04-28

  • Removed "Agent" mode from the mode selector; replaced with independent toggle
  • toolsEnabled persists in localStorage; routes to orchestrator regardless of active mode
  • Layout: column (M/L) or row (S) driven by data-size attribute set by JS
  • chat_role flows from UI → OrchestrateRequest → orchestrator_engine.run(response_role=...)

[Tools] shell_exec tool — 2026-04-28

  • shell_exec(command, working_dir, timeout) in cortex/tools/system.py
  • Runs any shell command on the Cortex host; timeout clamped 1120s
  • Use for system diagnostics: df -h, ps aux, journalctl, free -h, etc.

[Tools] Aether Journals full toolkit — 2026-04-28

  • ae_journal_list — list all journals + ids for the account
  • ae_journal_entry_update — PATCH any fields (title, content, summary, tags, enable)
  • ae_journal_entry_disable — soft-delete via enable=false
  • ae_journal_entry_append — read→append timestamped section→write (running/data logs)
  • ae_journal_entry_prepend — read→prepend timestamped section→write (newest-first)
  • Shared _get_entry / _patch_entry helpers; OpenAI JSON Schema auto-derived from Gemini declarations

[Local] Per-user multi-model local LLM settings — 2026-04-01

  • home/{username}/local_llm.jsonhosts[] + models[] + active_model_id structure
  • cortex/user_settings.py — CRUD functions: save_host, add_model, remove_model, set_active_model, get_active_local_model
  • cortex/routers/local_llm.py + cortex/static/local_llm.html — dedicated /settings/local page
  • "Fetch models from host" button — proxied via /api/local-llm/fetch-models, populates dropdown
  • Active model shown in UI near backend toggle button (amber hint text)
  • Migrates old flat .env-style config automatically on first use

[UI] Copy button for user (sent) messages — 2026-04-01

  • Added matching copy-on-hover button to user messages (same pattern as assistant messages)
  • div.dataset.raw set on send; makeCopyBtn(div) appended inline

[Backend] Local model backend (Open WebUI / Ollama) — 2026-04-01

  • OpenAI-compatible API via httpx — no CLI wrapper needed
  • Configured via LOCAL_API_URL / LOCAL_API_KEY / LOCAL_MODEL in .env
  • Backend toggle cycles claude → gemini → local (amber color in UI)
  • /auth/status includes local reachability check (GET /api/models)
  • Tested end-to-end: test-agent-simple (Qwen3-8B) on scott-lt-i7-rtx:3000, full persona context flowing correctly

[Testing] Gitea SSH port 2222 — 2026-03-29

  • pfSense WAN → 192.168.32.7:2222 port forward confirmed working
  • ssh -p 2222 git@git.dgrzone.com reaches Gitea (returns "Invalid repository path" — expected, confirms connectivity)
  • Clone/push via SSH: git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git

[Multi-user] Brian onboarding — 2026-03-29

  • Invite sent to memedrift@gmail.com
  • Brian completed onboarding, created wintermute persona
  • Google OAuth registered (google-add brian memedrift@gmail.com)

[Tools] Reminders tools — 2026-03-29

  • reminders_add, reminders_list, reminders_clear added to orchestrator tool suite
  • Tools live in cortex/tools/reminders.py
  • All persona PROTOCOLS.md updated with Tools & Modes reference (direct chat vs Agent mode)
  • persona_template.py updated so new personas get the protocol automatically

[Auth] Token expiry — no restart needed — 2026-03-27

  • llm_client._fresh_claude_token() reads live from ~/.claude/.credentials.json on every call
  • systemd service is a user unit (no sudo) — systemctl --user restart cortex is sufficient
  • No manual token sync required after claude auth login

[Multi-user] Per-user channel config — 2026-03-27

  • Google Chat and NC Talk secrets/config moved from .env to home/{username}/channels.json
  • New endpoints: POST /channels/google-chat/{username} and POST /webhook/nextcloud/{username}
  • No channel access by default — each user configures their own channels.json
  • Setup guides: docs/GOOGLE_CHAT_BOT.md and docs/NEXTCLOUD_TALK_BOT.md

[Auth] Google OAuth sign-in — 2026-03-27

  • GET /auth/google → Google consent → GET /auth/google/callback flow
  • Users pre-registered via manage_passwords.py google-add <user> <email>
  • Google sign-in button on /login; auth.json stores google_sub + google_email
  • Active users: scott (scott.idem@oneskyit.com), holly (holly.danner@gmail.com), brian (memedrift@gmail.com)

[Settings] Per-user Gemini API key — 2026-03-27

  • Stored in home/{username}/auth.json as gemini_api_key
  • Orchestrator uses user key if set, falls back to server-level GEMINI_API_KEY
  • Manageable via /settings UI (add, remove, masked hint)

[UI] Session persistence across navigation — 2026-03-26

  • localStorage keyed to cx_sid_{user}_{persona} with 30-min inactivity TTL
  • Auto-restored silently on page load; cleared on "New session" or session delete

[UI] Persona picker page — 2026-03-26

  • GET /{username} shows a card grid of available personas instead of 404
  • Each card links directly to /{username}/{persona}

[UI] Lucide icons — 2026-03-25

  • Icons throughout: mode selector, send/stop buttons, edit/del/copy, save/cancel
  • Loaded via UMD CDN; icon_html() + render_icons() helpers in app.js

[UI] Persona-specific favicon — 2026-03-25

  • Emoji SVG favicon generated from persona config at load time

[Multi-user] Holly onboarding — 2026-03-20

  • Holly's invite sent; onboarding completed via /setup/{token}
  • home/holly/persona/tina/ created from template
  • Google OAuth registered (holly.danner@gmail.com)

[Channel] Nextcloud Talk integration — 2026-03-20, updated 2026-03-27

  • HMAC verification: incoming uses random + raw_body; outgoing reply uses random + message_text
  • Per-user routing added 2026-03-27 (endpoint: /webhook/nextcloud/{username})
  • Docs: docs/NEXTCLOUD_TALK_BOT.md

[Channel] Google Chat integration — 2026-03-20, updated 2026-03-27

  • JWT verification via authorizationEventObject.systemIdToken
  • Workspace Add-on format: hostAppDataAction.chatDataAction.createMessageAction
  • Per-user routing added 2026-03-27 (endpoint: /channels/google-chat/{username})
  • Docs: docs/GOOGLE_CHAT_BOT.md

[Intelligence] Orchestrator service — Phase 1 — 2026-03-18

  • Gemini API (google-genai SDK) tool loop → Claude final response
  • POST /orchestrate (async job), GET /orchestrate/{job_id} (poll)
  • Tools: web search, AE API, file read, task list, scratch, reminders, cron
  • Default model: gemini-2.5-flash

[Auth] Session auth + persona onboarding — 2026-03-20

  • bcrypt passwords in home/{username}/auth.json
  • JWT session cookies (HS256, 30-day expiry)
  • Invite tokens (72h, one-time-use) — manage_passwords.py invite <user> [email]
  • Self-service onboarding: /setup/{token}/setup/persona
  • SMTP invite email via noreply@oneskyit.com

[UI] Mobile-friendly header — 2026-03

  • Backend toggle, font size, theme buttons moved into ⚙ settings panel
  • Header reduced to core buttons

[UI] Help & Reference — 2026-03-27

  • Shared base at cortex/static/HELP.md (served to all users)
  • Persona-specific additions appended from home/{username}/persona/{name}/HELP.md if present
  • Collapsible H2 sections via <details> elements

[Backend] Gemini CLI backend — 2026-03

  • gemini -p subprocess, streaming output; auth check at /auth/status

[Backend] Memory distiller — 2026-03

  • APScheduler: distill_short (daily 03:00), distill_mid (weekly Sun 03:30), distill_long (monthly 1st 04:00)
  • Writes to MEMORY_SHORT.md, MEMORY_MID.md, MEMORY_LONG.md per persona

[Backend] Session logging + file browser — 2026-03

  • Sessions saved to home/{user}/persona/{name}/sessions/
  • Files panel in UI browses persona directory

[Backend] Dispatcher core — 2026-03-04

  • FastAPI service with streaming SSE response
  • Claude CLI and Gemini CLI subprocess backends
  • Session context management (rolling window, MAX_HISTORY_MESSAGES)