feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul

Local LLM: - user_settings.py: per-user hosts/models config (local_llm.json) - routers/local_llm.py + static/local_llm.html: dedicated settings page - llm_client.py: local OpenAI-compatible backend via httpx - config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts - Active model shown near backend toggle (amber hint text) Memory distillation: - memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides - scheduler.py + notification.py: notify NC Talk after mid/long distill - notification.py: outbound channel abstraction (NC Talk, extensible) Session search: - routers/files.py: GET /sessions/search?q= with excerpts grouped by date - static/index.html + app.js: search UI in file sidebar with highlight - _esc() helper to prevent XSS in search results Proactive cron: - cron_runner.py: new job types — message (send directly) and brief (LLM + send) - Both support optional per-job channel override Channels: - routers/nextcloud_talk.py: consolidated using notification._send_nct_message() - routers/auth.py: local backend status in /auth/status - routers/chat.py: /backend returns {primary, fallback, local_model} object UI / UX: - Copy button for user messages (matching assistant) - Autocomplete disabled on sensitive form fields - settings.html: local model section replaced with link to /settings/local Docs overhaul: - MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md - ARCH__Intelligence_Layer.md replaced with redirect table - CORTEX.md trimmed to vision only; README updated - OPEN_WEBUI_API.md added to docs/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:53:06 -04:00
parent bd6532e93a
commit a4daebdc9b
33 changed files with 2985 additions and 486 deletions
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -0,0 +1,192 @@
+# Architecture: Planned Features
+
+> What's next and how it's designed to work.
+> Last updated: 2026-04-04
+
+For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
+
+---
+
+## 1. Local Orchestrator
+
+**Status:** High priority — design complete, not yet built.
+
+Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
+
+**Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
+
+**Design:**
+```
+POST /orchestrate  (mode: "local")
+    ↓
+local_orchestrator_engine.py
+    • converts tools/ to OpenAI tools format
+    • POST /api/chat/completions with tools array
+    • parse tool_calls response
+    • execute tool, append result
+    • loop until finish_reason: "stop"
+    ↓
+response returned (local model generates final answer)
+```
+
+Model selection:
+- **Gemma 4 E4B** (25 t/s, 72k ctx) — interactive/fast tasks
+- **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
+
+Context budget per iteration (system prompt + memory + tool results + history):
+- Small model: budget ~40-50k tokens per round
+- Medium model: budget ~35-40k tokens per round
+
+Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
+
+---
+
+## 2. Dev Agent Pipeline
+
+**Status:** Design complete, not yet built.
+
+Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
+
+```
+Task (chat / Gitea issue / Kanban)
+    ↓
+Orchestrator — reads relevant files, routes to specialist
+    ↓
+Specialist Agent (Claude CLI in project directory)
+    • implements the change
+    • runs self-check: py_compile / svelte-check
+    ↓
+Supervisor Agent
+    • reviews the diff
+    • runs test suite
+    • returns: PASS / NEEDS_REVIEW / FAIL + reason
+    ↓
+Human approval gate
+    • summary in Cortex UI or NC Talk
+    • approve → commit (+ optional push)
+    • reject <20><> feedback back to specialist
+```
+
+**Specialists** (both Claude CLI):
+- **Frontend** — working dir: `~/OSIT_dev/aether_app_sveltekit/` — runs `svelte-check` after every change
+- **Backend** — working dir: `~/OSIT_dev/aether_api_fastapi/` — runs `py_compile` + unit tests
+
+**Supervisor** returns structured JSON:
+```json
+{
+  "verdict": "PASS | NEEDS_REVIEW | FAIL",
+  "checks_passed": ["py_compile"],
+  "checks_failed": [],
+  "review_notes": "...",
+  "commit_message": "..."
+}
+```
+
+---
+
+## 3. Gitea Integration
+
+**Status:** Not started. pfSense port forward for SSH already confirmed working.
+
+- **Webhooks → Cortex:** push/PR/issue events → `POST /webhook/gitea` → orchestrator
+  - Router pattern already established; add `cortex/routers/gitea.py`
+- **Gitea Actions CI:** `.gitea/workflows/check.yml` — run `py_compile`/`svelte-check` on push
+- **Cortex → Gitea:** after human approval, call Gitea API to create PR or push branch
+
+SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
+
+---
+
+## 4. Knowledge Layer (AE Journals)
+
+**Status:** Tools exist, import script not yet built.
+
+AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".
+
+**Existing tools:** `ae_journal_search`, `ae_journal_entry_create` — already in orchestrator tool suite.
+
+**Import script (to build):**
+- Walk a markdown directory (Nextcloud, agents_sync docs)
+- Chunk by H2 section
+- Search before creating (deduplication)
+- Tag from frontmatter, filename, directory path
+- Target sources: `~/DgrZone_Nextcloud/`, `~/OSIT_Nextcloud/`
+
+**Agent workflow:**
+```
+"Summarize my notes on WireGuard setup"
+    → orchestrator calls ae_journal_search("wireguard")
+    → returns matching entries
+    → Claude synthesizes response
+```
+
+---
+
+## 5. Intelligent Model Routing
+
+**Status:** Deferred. Currently user-toggled.
+
+Route automatically based on task characteristics rather than requiring manual backend selection:
+
+| Task type | Backend | Reason |
+|---|---|---|
+| User-facing conversation | Claude | Quality prose, persona fidelity |
+| Tool use / orchestration | Gemini API | Native function calling, free tier |
+| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
+| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
+| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
+
+Routing logic would live in `llm_client.py` or a new `router.py` — map task metadata to backend choice.
+
+---
+
+## 6. RAG via Open WebUI
+
+**Status:** Future — Open WebUI already supports it.
+
+Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via `"files": [{"type": "collection", "id": "..."}]`.
+
+Would complement AE Journals for local-only contexts where data shouldn't leave the network.
+
+API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG section.
+
+---
+
+## 8. Agent Architecture Ideas (from Claude Code leak)
+
+**Status:** Research — review before building dev agent pipeline and orchestrator.
+
+The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
+
+- https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
+- https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself
+
+**Ideas worth incorporating:**
+
+**Tiered permission architecture** — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a `brief` cron job accidentally in write mode.
+
+**Agent lineage tracking** — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.
+
+**Cost/budget enforcement** — hard token and cost budgets per operation, multiple budget types. `ORCHESTRATOR_MAX_ROUNDS=10` is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).
+
+**Context compaction/snipping** — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.
+
+**Nested agent delegation with dependency-aware batching** — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).
+
+**File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
+
+**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
+
+---
+
+## 7. Permanent Fleet Hosting
+
+**Status:** Deferred.
+
+Currently running on `scott_lpt` (main laptop). Long-term target: home server (always-on, Docker).
+
+`docker-compose.yml` already exists in the project root. Deployment path:
+1. Copy to home server
+2. Configure reverse proxy (Nginx, already Docker-hosted)
+3. Set subdomain `cortex.dgrzone.com` → home server internal IP
+4. WireGuard required for all access — not internet-exposed