Files

Scott Idem a4daebdc9b feat: local LLM multi-model, session search, cron proactive types, notifications, docs overhaul

Local LLM:
- user_settings.py: per-user hosts/models config (local_llm.json)
- routers/local_llm.py + static/local_llm.html: dedicated settings page
- llm_client.py: local OpenAI-compatible backend via httpx
- config.py: LOCAL_API_URL/KEY/MODEL + per-backend timeouts
- Active model shown near backend toggle (amber hint text)

Memory distillation:
- memory_distiller.py: DISTILL_BACKEND_MID/LONG .env overrides
- scheduler.py + notification.py: notify NC Talk after mid/long distill
- notification.py: outbound channel abstraction (NC Talk, extensible)

Session search:
- routers/files.py: GET /sessions/search?q= with excerpts grouped by date
- static/index.html + app.js: search UI in file sidebar with highlight
- _esc() helper to prevent XSS in search results

Proactive cron:
- cron_runner.py: new job types — message (send directly) and brief (LLM + send)
- Both support optional per-job channel override

Channels:
- routers/nextcloud_talk.py: consolidated using notification._send_nct_message()
- routers/auth.py: local backend status in /auth/status
- routers/chat.py: /backend returns {primary, fallback, local_model} object

UI / UX:
- Copy button for user messages (matching assistant)
- Autocomplete disabled on sensitive form fields
- settings.html: local model section replaced with link to /settings/local

Docs overhaul:
- MASTER.md hub + ARCH__SYSTEM/BACKENDS/PERSONA/CHANNELS/FUTURE.md
- ARCH__Intelligence_Layer.md replaced with redirect table
- CORTEX.md trimmed to vision only; README updated
- OPEN_WEBUI_API.md added to docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 20:53:06 -04:00

8.0 KiB

Raw Blame History

Architecture: Planned Features

What's next and how it's designed to work. Last updated: 2026-04-04

For the current task list see TODO__Agents.md. For phases and priorities see ROADMAP.md.

1. Local Orchestrator

Status: High priority — design complete, not yet built.

Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.

Why local models work for this now: Gemma 4 E4B and 26B A4B both support OpenAI tools / tool_choice function calling. The tool schema is nearly identical to Gemini's FunctionDeclaration — minor field renaming only.

Design:

POST /orchestrate  (mode: "local")
    ↓
local_orchestrator_engine.py
    • converts tools/ to OpenAI tools format
    • POST /api/chat/completions with tools array
    • parse tool_calls response
    • execute tool, append result
    • loop until finish_reason: "stop"
    ↓
response returned (local model generates final answer)

Model selection:

Gemma 4 E4B (25 t/s, 72k ctx) — interactive/fast tasks
Gemma 4 26B A4B (9 t/s, 50k ctx) — heavier reasoning, background tasks

Context budget per iteration (system prompt + memory + tool results + history):

Small model: budget ~40-50k tokens per round
Medium model: budget ~35-40k tokens per round

Full API reference: docs/OPEN_WEBUI_API.md

2. Dev Agent Pipeline

Status: Design complete, not yet built.

Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.

Task (chat / Gitea issue / Kanban)
    ↓
Orchestrator — reads relevant files, routes to specialist
    ↓
Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
Human approval gate
    • summary in Cortex UI or NC Talk
    • approve → commit (+ optional push)
    • reject <20><> feedback back to specialist

Specialists (both Claude CLI):

Frontend — working dir: ~/OSIT_dev/aether_app_sveltekit/ — runs svelte-check after every change
Backend — working dir: ~/OSIT_dev/aether_api_fastapi/ — runs py_compile + unit tests

Supervisor returns structured JSON:

{
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile"],
  "checks_failed": [],
  "review_notes": "...",
  "commit_message": "..."
}

3. Gitea Integration

Status: Not started. pfSense port forward for SSH already confirmed working.

Webhooks → Cortex: push/PR/issue events → POST /webhook/gitea → orchestrator
- Router pattern already established; add cortex/routers/gitea.py
Gitea Actions CI: .gitea/workflows/check.yml — run py_compile/svelte-check on push
Cortex → Gitea: after human approval, call Gitea API to create PR or push branch

SSH clone/push: git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git

4. Knowledge Layer (AE Journals)

Status: Tools exist, import script not yet built.

AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".

Existing tools: ae_journal_search, ae_journal_entry_create — already in orchestrator tool suite.

Import script (to build):

Walk a markdown directory (Nextcloud, agents_sync docs)
Chunk by H2 section
Search before creating (deduplication)
Tag from frontmatter, filename, directory path
Target sources: ~/DgrZone_Nextcloud/, ~/OSIT_Nextcloud/

Agent workflow:

"Summarize my notes on WireGuard setup"
    → orchestrator calls ae_journal_search("wireguard")
    → returns matching entries
    → Claude synthesizes response

5. Intelligent Model Routing

Status: Deferred. Currently user-toggled.

Route automatically based on task characteristics rather than requiring manual backend selection:

Task type	Backend	Reason
User-facing conversation	Claude	Quality prose, persona fidelity
Tool use / orchestration	Gemini API	Native function calling, free tier
Private / sensitive / offline	Local (Ollama)	No data leaves the network
Long context (>50k tokens)	Gemini 2.0	1M token context window
Fast/cheap simple queries	Local (E4B)	25 t/s, no API cost

Routing logic would live in llm_client.py or a new router.py — map task metadata to backend choice.

6. RAG via Open WebUI

Status: Future — Open WebUI already supports it.

Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via "files": [{"type": "collection", "id": "..."}].

Would complement AE Journals for local-only contexts where data shouldn't leave the network.

API reference: docs/OPEN_WEBUI_API.md — RAG section.

8. Agent Architecture Ideas (from Claude Code leak)

Status: Research — review before building dev agent pipeline and orchestrator.

The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:

https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself

Ideas worth incorporating:

Tiered permission architecture — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a brief cron job accidentally in write mode.

Agent lineage tracking — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.

Cost/budget enforcement — hard token and cost budgets per operation, multiple budget types. ORCHESTRATOR_MAX_ROUNDS=10 is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).

Context compaction/snipping — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.

Nested agent delegation with dependency-aware batching — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).

File history journaling — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.

Plugin/manifest-based tool extensions — tools declared via manifest rather than hardcoded in __init__.py. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.

7. Permanent Fleet Hosting

Status: Deferred.

Currently running on scott_lpt (main laptop). Long-term target: home server (always-on, Docker).

docker-compose.yml already exists in the project root. Deployment path:

Copy to home server
Configure reverse proxy (Nginx, already Docker-hosted)
Set subdomain cortex.dgrzone.com → home server internal IP
WireGuard required for all access — not internet-exposed

8.0 KiB Raw Blame History Unescape Escape