feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul
Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)
Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)
Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results
Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override
Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object
UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local
Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
192
documentation/ARCH__FUTURE.md
Normal file
192
documentation/ARCH__FUTURE.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Architecture: Planned Features
|
||||
|
||||
> What's next and how it's designed to work.
|
||||
> Last updated: 2026-04-04
|
||||
|
||||
For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
|
||||
|
||||
---
|
||||
|
||||
## 1. Local Orchestrator
|
||||
|
||||
**Status:** High priority — design complete, not yet built.
|
||||
|
||||
Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
|
||||
|
||||
**Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
|
||||
|
||||
**Design:**
|
||||
```
|
||||
POST /orchestrate (mode: "local")
|
||||
↓
|
||||
local_orchestrator_engine.py
|
||||
• converts tools/ to OpenAI tools format
|
||||
• POST /api/chat/completions with tools array
|
||||
• parse tool_calls response
|
||||
• execute tool, append result
|
||||
• loop until finish_reason: "stop"
|
||||
↓
|
||||
response returned (local model generates final answer)
|
||||
```
|
||||
|
||||
Model selection:
|
||||
- **Gemma 4 E4B** (25 t/s, 72k ctx) — interactive/fast tasks
|
||||
- **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
|
||||
|
||||
Context budget per iteration (system prompt + memory + tool results + history):
|
||||
- Small model: budget ~40-50k tokens per round
|
||||
- Medium model: budget ~35-40k tokens per round
|
||||
|
||||
Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
|
||||
|
||||
---
|
||||
|
||||
## 2. Dev Agent Pipeline
|
||||
|
||||
**Status:** Design complete, not yet built.
|
||||
|
||||
Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
|
||||
|
||||
```
|
||||
Task (chat / Gitea issue / Kanban)
|
||||
↓
|
||||
Orchestrator — reads relevant files, routes to specialist
|
||||
↓
|
||||
Specialist Agent (Claude CLI in project directory)
|
||||
• implements the change
|
||||
• runs self-check: py_compile / svelte-check
|
||||
↓
|
||||
Supervisor Agent
|
||||
• reviews the diff
|
||||
• runs test suite
|
||||
• returns: PASS / NEEDS_REVIEW / FAIL + reason
|
||||
↓
|
||||
Human approval gate
|
||||
• summary in Cortex UI or NC Talk
|
||||
• approve → commit (+ optional push)
|
||||
• reject <20><> feedback back to specialist
|
||||
```
|
||||
|
||||
**Specialists** (both Claude CLI):
|
||||
- **Frontend** — working dir: `~/OSIT_dev/aether_app_sveltekit/` — runs `svelte-check` after every change
|
||||
- **Backend** — working dir: `~/OSIT_dev/aether_api_fastapi/` — runs `py_compile` + unit tests
|
||||
|
||||
**Supervisor** returns structured JSON:
|
||||
```json
|
||||
{
|
||||
"verdict": "PASS | NEEDS_REVIEW | FAIL",
|
||||
"checks_passed": ["py_compile"],
|
||||
"checks_failed": [],
|
||||
"review_notes": "...",
|
||||
"commit_message": "..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Gitea Integration
|
||||
|
||||
**Status:** Not started. pfSense port forward for SSH already confirmed working.
|
||||
|
||||
- **Webhooks → Cortex:** push/PR/issue events → `POST /webhook/gitea` → orchestrator
|
||||
- Router pattern already established; add `cortex/routers/gitea.py`
|
||||
- **Gitea Actions CI:** `.gitea/workflows/check.yml` — run `py_compile`/`svelte-check` on push
|
||||
- **Cortex → Gitea:** after human approval, call Gitea API to create PR or push branch
|
||||
|
||||
SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge Layer (AE Journals)
|
||||
|
||||
**Status:** Tools exist, import script not yet built.
|
||||
|
||||
AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".
|
||||
|
||||
**Existing tools:** `ae_journal_search`, `ae_journal_entry_create` — already in orchestrator tool suite.
|
||||
|
||||
**Import script (to build):**
|
||||
- Walk a markdown directory (Nextcloud, agents_sync docs)
|
||||
- Chunk by H2 section
|
||||
- Search before creating (deduplication)
|
||||
- Tag from frontmatter, filename, directory path
|
||||
- Target sources: `~/DgrZone_Nextcloud/`, `~/OSIT_Nextcloud/`
|
||||
|
||||
**Agent workflow:**
|
||||
```
|
||||
"Summarize my notes on WireGuard setup"
|
||||
→ orchestrator calls ae_journal_search("wireguard")
|
||||
→ returns matching entries
|
||||
→ Claude synthesizes response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Intelligent Model Routing
|
||||
|
||||
**Status:** Deferred. Currently user-toggled.
|
||||
|
||||
Route automatically based on task characteristics rather than requiring manual backend selection:
|
||||
|
||||
| Task type | Backend | Reason |
|
||||
|---|---|---|
|
||||
| User-facing conversation | Claude | Quality prose, persona fidelity |
|
||||
| Tool use / orchestration | Gemini API | Native function calling, free tier |
|
||||
| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
|
||||
| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
|
||||
| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
|
||||
|
||||
Routing logic would live in `llm_client.py` or a new `router.py` — map task metadata to backend choice.
|
||||
|
||||
---
|
||||
|
||||
## 6. RAG via Open WebUI
|
||||
|
||||
**Status:** Future — Open WebUI already supports it.
|
||||
|
||||
Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via `"files": [{"type": "collection", "id": "..."}]`.
|
||||
|
||||
Would complement AE Journals for local-only contexts where data shouldn't leave the network.
|
||||
|
||||
API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG section.
|
||||
|
||||
---
|
||||
|
||||
## 8. Agent Architecture Ideas (from Claude Code leak)
|
||||
|
||||
**Status:** Research — review before building dev agent pipeline and orchestrator.
|
||||
|
||||
The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
|
||||
|
||||
- https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
|
||||
- https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself
|
||||
|
||||
**Ideas worth incorporating:**
|
||||
|
||||
**Tiered permission architecture** — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a `brief` cron job accidentally in write mode.
|
||||
|
||||
**Agent lineage tracking** — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.
|
||||
|
||||
**Cost/budget enforcement** — hard token and cost budgets per operation, multiple budget types. `ORCHESTRATOR_MAX_ROUNDS=10` is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).
|
||||
|
||||
**Context compaction/snipping** — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.
|
||||
|
||||
**Nested agent delegation with dependency-aware batching** — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).
|
||||
|
||||
**File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
|
||||
|
||||
**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
|
||||
|
||||
---
|
||||
|
||||
## 7. Permanent Fleet Hosting
|
||||
|
||||
**Status:** Deferred.
|
||||
|
||||
Currently running on `scott_lpt` (main laptop). Long-term target: home server (always-on, Docker).
|
||||
|
||||
`docker-compose.yml` already exists in the project root. Deployment path:
|
||||
1. Copy to home server
|
||||
2. Configure reverse proxy (Nginx, already Docker-hosted)
|
||||
3. Set subdomain `cortex.dgrzone.com` → home server internal IP
|
||||
4. WireGuard required for all access — not internet-exposed
|
||||
Reference in New Issue
Block a user