Files
Cortex-Inara/documentation/ROADMAP.md
Scott Idem a4daebdc9b feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul
Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)

Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)

Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results

Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override

Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object

UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local

Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:53:06 -04:00

72 lines
3.2 KiB
Markdown

# Cortex — Roadmap
> Phases and priorities. For active tasks see `TODO__Agents.md`.
> Last updated: 2026-04-03
---
## Phase 0 — Foundation ✅
- Syncthing fleet sync (`agents_sync/`) operational
- MCP tools (`ae_*`) available in all Claude Code sessions
- Fleet agents running independently on each machine
## Phase 1 — Dispatcher Core ✅
- FastAPI service with streaming SSE responses
- Claude CLI and Gemini CLI subprocess backends
- Session context management (rolling window, file persistence)
- Nextcloud Talk bot (HMAC-signed webhook)
- Memory distiller (APScheduler — short/mid/long cycles)
- Local web UI (single-page, mobile-responsive)
- Auth status monitoring (`/auth/status`, UI banner)
- Session logging and file browser
## Phase 2 — Identity & Multi-User ✅
- Inara persona formalized (`IDENTITY.md`, `SOUL.md`, `PROTOCOLS.md`, context tiers)
- Two-level user/persona layout (`home/{user}/persona/{name}/`)
- Session auth: bcrypt passwords, JWT cookies, invite tokens, Google OAuth
- Multi-user live: Scott, Holly, Brian
- Per-user channel config (`channels.json`)
- Per-user Gemini API key (settings UI)
- Help & Reference system (shared base + per-persona additions)
- Lucide icons, persona picker page, session persistence across navigation
## Phase 3 — Intelligence Layer (In Progress)
- ✅ Gemini API orchestrator (tool loop → Claude responder)
- ✅ Tool suite: web search, AE Journal read/write, tasks, scratch, reminders, cron, system
- ✅ Agent mode in UI (async job, poll for result)
- ✅ Local LLM backend (Open WebUI/Ollama, per-user multi-model config)
- ✅ Proactive cron (`message` / `brief` job types → NC Talk)
- ✅ Session search (full-text across past session logs)
- ✅ Distill notifications (NC Talk after mid/long runs)
- ✅ Local backend for distillation (DISTILL_BACKEND_MID/LONG in .env)
- [ ] **Local orchestrator** — ReAct tool loop using local model (High priority — see `TODO__Agents.md`)
- [ ] Knowledge import — markdown → AE Journals (import script)
- [ ] Dev agent pipeline — specialist agents + supervisor + approval gate
- [ ] Gitea webhook integration + Actions CI
## Phase 4 — Channel Expansion
- ✅ Web UI
- ✅ Nextcloud Talk
- ✅ Google Chat
- [ ] WhatsApp (Business API or bridge — investigating)
- [ ] Webhook triggers from Aether platform events
## Phase 5 — Routing Intelligence & Scale
- [ ] Intelligent model routing (by task type, privacy, context length)
- [ ] Agent-to-agent task delegation across fleet
- [ ] Permanent hosting on home server (currently on `scott_lpt`)
## Phase 6 — Infrastructure
- [ ] Server DMZ finalized
- [ ] WireGuard for all Cortex-accessing devices
- [ ] Camera/IoT VLAN segmentation
---
## Deferred / Watching
- **Unsloth Gemma 4 GGUFs** — blocked on Ollama v0.20.1 (llama.cpp GGUF metadata issue); switch `agent-support-gemma-*` aliases to Unsloth Q4_K_M when ready
- **Speculative decoding** — llama.cpp supports it (E4B + E2B draft ≈ 2x speed); Ollama does not yet
- **RAG via Open WebUI** — feed Nextcloud docs into local knowledge collections; possible complement to AE Journals search
- **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
- **WhatsApp** — requires Business API account or a bridge; not started