docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# Architecture: Planned Features
|
||||
|
||||
> What's next and how it's designed to work.
|
||||
> Last updated: 2026-04-28
|
||||
> Last updated: 2026-04-29
|
||||
|
||||
For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
|
||||
|
||||
@@ -9,17 +9,17 @@ For the current task list see `TODO__Agents.md`. For phases and priorities see `
|
||||
|
||||
## 1. Local Orchestrator
|
||||
|
||||
**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. If the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Full parity with the Gemini orchestrator (tool loop quality, error handling, context budget enforcement) is still in progress.
|
||||
**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. When the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.
|
||||
|
||||
Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
|
||||
Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
|
||||
|
||||
**Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
|
||||
|
||||
**Design:**
|
||||
```
|
||||
POST /orchestrate (mode: "local")
|
||||
POST /orchestrate (role resolves to local_openai model)
|
||||
↓
|
||||
local_orchestrator_engine.py
|
||||
openai_orchestrator.py
|
||||
• converts tools/ to OpenAI tools format
|
||||
• POST /api/chat/completions with tools array
|
||||
• parse tool_calls response
|
||||
@@ -34,16 +34,45 @@ Model selection:
|
||||
- **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
|
||||
|
||||
Context budget per iteration (system prompt + memory + tool results + history):
|
||||
- Small model: budget ~40-50k tokens per round
|
||||
- Medium model: budget ~35-40k tokens per round
|
||||
- Small model: budget ~40–50k tokens per round
|
||||
- Medium model: budget ~35–40k tokens per round
|
||||
|
||||
Context compaction (to implement): automatically trim stale tool results mid-run when
|
||||
approaching the budget ceiling, preserving only the most recent N tool exchanges.
|
||||
|
||||
Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
|
||||
|
||||
---
|
||||
|
||||
## 2. Dev Agent Pipeline
|
||||
## 2. Orchestrator Tool Expansions
|
||||
|
||||
**Status:** Design complete, not yet built.
|
||||
**Status:** Planned. Current tool count: 27. These fill obvious gaps.
|
||||
|
||||
New tools for `cortex/tools/` — each follows the existing async pattern (implement function,
|
||||
add `FunctionDeclaration`, register in `__init__.py`).
|
||||
|
||||
| Tool | Module | Description |
|
||||
|---|---|---|
|
||||
| `cortex_restart` | `system.py` | `systemctl --user restart cortex` — Inara can apply her own config changes; returns last 10 log lines after restart |
|
||||
| `cortex_logs` | `system.py` | `journalctl --user -u cortex -n N` — tail service logs for debugging |
|
||||
| `http_fetch` | `web.py` | Fetch a specific URL and return content; for health checks, API probing, webhook testing — not a search, a direct GET/POST |
|
||||
| `file_list` | `scratch.py` or new `files.py` | List files and directories at a path; currently only `file_read` exists |
|
||||
| `file_write` | `files.py` | Write content to a file with a path allow-list (persona dir + scratch by default) |
|
||||
| `nc_talk_send` | new `notify.py` | Proactively send a message to the user via Nextcloud Talk outbound API |
|
||||
| `email_send` | `notify.py` | Send email via existing `email_utils.py` SMTP helper |
|
||||
| `web_push` | `notify.py` | Browser push notification via Web Push API (requires push subscription stored per-user in `home/{user}/push_sub.json`; pairs with the PWA service worker) |
|
||||
|
||||
**Safety note for `cortex_restart`:** The service will drop in-flight SSE connections on restart.
|
||||
Only call if no streaming response is active. Add a check or a short delay before restarting.
|
||||
|
||||
**Safety note for `file_write`:** Enforce an allow-list at the tool level, not just in the prompt.
|
||||
Default allow: `home/{user}/persona/{name}/` and `/tmp/cortex-scratch/`. Reject any path outside.
|
||||
|
||||
---
|
||||
|
||||
## 3. Dev Agent Pipeline
|
||||
|
||||
**Status:** Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.
|
||||
|
||||
Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
|
||||
|
||||
@@ -64,7 +93,7 @@ Supervisor Agent
|
||||
Human approval gate
|
||||
• summary in Cortex UI or NC Talk
|
||||
• approve → commit (+ optional push)
|
||||
• reject <EFBFBD><EFBFBD> feedback back to specialist
|
||||
• reject → feedback back to specialist
|
||||
```
|
||||
|
||||
**Specialists** (both Claude CLI):
|
||||
@@ -84,7 +113,7 @@ Human approval gate
|
||||
|
||||
---
|
||||
|
||||
## 3. Gitea Integration
|
||||
## 4. Gitea Integration
|
||||
|
||||
**Status:** Not started. pfSense port forward for SSH already confirmed working.
|
||||
|
||||
@@ -97,7 +126,7 @@ SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge Layer (AE Journals)
|
||||
## 5. Knowledge Layer (AE Journals)
|
||||
|
||||
**Status:** Tools exist, import script not yet built.
|
||||
|
||||
@@ -122,16 +151,19 @@ AE Journals becomes the searchable long-term knowledge base. Complements memory
|
||||
|
||||
---
|
||||
|
||||
## 5. Intelligent Model Routing
|
||||
## 6. Intelligent Model Routing
|
||||
|
||||
**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing — `chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup model chain, and the UI role toggle lets users manually select which role handles a message. Automatic task-characteristic routing (below) is still deferred.
|
||||
**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing —
|
||||
`chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup
|
||||
model chain, and the UI role toggle lets users manually select which role handles a message.
|
||||
Automatic task-characteristic routing (below) is still deferred.
|
||||
|
||||
Route automatically based on task characteristics rather than requiring manual backend selection:
|
||||
Route automatically based on task characteristics rather than requiring manual selection:
|
||||
|
||||
| Task type | Backend | Reason |
|
||||
|---|---|---|
|
||||
| User-facing conversation | Claude | Quality prose, persona fidelity |
|
||||
| Tool use / orchestration | Gemini API | Native function calling, free tier |
|
||||
| Tool use / orchestration | Gemini API or local | Native function calling |
|
||||
| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
|
||||
| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
|
||||
| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
|
||||
@@ -140,7 +172,7 @@ Routing logic would live in `llm_client.py` or a new `router.py` — map task me
|
||||
|
||||
---
|
||||
|
||||
## 6. RAG via Open WebUI
|
||||
## 7. RAG via Open WebUI
|
||||
|
||||
**Status:** Future — Open WebUI already supports it.
|
||||
|
||||
@@ -152,9 +184,9 @@ API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG sec
|
||||
|
||||
---
|
||||
|
||||
## 8. Agent Architecture Ideas (from Claude Code leak)
|
||||
## 8. Agent Architecture Patterns — Research
|
||||
|
||||
**Status:** Research — review before building dev agent pipeline and orchestrator.
|
||||
**Status:** Research — review before building dev agent pipeline and local orchestrator.
|
||||
|
||||
The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
|
||||
|
||||
@@ -175,25 +207,26 @@ The Claude Code system prompt was leaked in early April 2026. Two reimplementati
|
||||
|
||||
**File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
|
||||
|
||||
**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
|
||||
**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).
|
||||
|
||||
---
|
||||
|
||||
## 7. Permanent Fleet Hosting
|
||||
## 9. Permanent Fleet Hosting
|
||||
|
||||
**Status:** Deferred.
|
||||
**Status:** Deferred. Currently running on `scott-lt-i7-rtx` (gaming/agents laptop).
|
||||
|
||||
Currently running on `scott-lt-i7-rtx` (gaming/agents laptop). Disabled on `scott_lpt` (2026-04-28) — that machine is a dev/editing node only. Long-term target: home server (always-on, Docker).
|
||||
Long-term target: home server (always-on, Docker). `docker-compose.yml` already exists in the project root.
|
||||
|
||||
`docker-compose.yml` already exists in the project root. Deployment path:
|
||||
Deployment path:
|
||||
1. Copy to home server
|
||||
2. Configure reverse proxy (Nginx, already Docker-hosted)
|
||||
3. Set subdomain `cortex.dgrzone.com` → home server internal IP
|
||||
3. Update `cortex.dgrzone.com` → home server internal IP in pfSense
|
||||
4. WireGuard required for all access — not internet-exposed
|
||||
5. Update `FLEET_MANIFEST.md` and CLAUDE.md fleet table
|
||||
|
||||
---
|
||||
|
||||
## 9. Cortex Mesh (Multi-Instance Fleet)
|
||||
## 10. Cortex Mesh — Multi-Instance Fleet
|
||||
|
||||
**Status:** Concept — no design yet.
|
||||
|
||||
|
||||
@@ -7,34 +7,43 @@
|
||||
|
||||
## 🔴 High Priority
|
||||
|
||||
### [Local] Tool-capable local orchestrator
|
||||
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
|
||||
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
|
||||
Gemini API orchestrator for private/offline tasks.
|
||||
### [Local] Local orchestrator — reach full parity with Gemini orchestrator
|
||||
`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
|
||||
When the `orchestrator` role resolves to a `local_openai` model it routes there
|
||||
automatically. Remaining work is quality/reliability parity, not ground-up design.
|
||||
|
||||
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
|
||||
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
|
||||
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
|
||||
append result → loop until `finish_reason: stop`
|
||||
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
|
||||
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
|
||||
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
|
||||
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
|
||||
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
|
||||
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
|
||||
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
|
||||
(minor field rename, already partially done)
|
||||
- [ ] Context budget enforcement per iteration (40–50k for E4B, 35–40k for 26B A4B)
|
||||
- [ ] Context compaction — trim stale tool results mid-run when approaching limit
|
||||
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
|
||||
- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
|
||||
- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
|
||||
- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
|
||||
|
||||
---
|
||||
|
||||
## 🟡 Medium Priority
|
||||
|
||||
### [UI] Progressive Web App (PWA)
|
||||
Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.
|
||||
- [ ] Add `manifest.json` (name, icons, theme color, display: standalone, start_url)
|
||||
- [ ] Serve `manifest.json` from `cortex/routers/ui.py` or as a static file
|
||||
- [ ] Add `<link rel="manifest">` to `index.html`
|
||||
- [ ] Basic service worker for offline shell (cache static assets; network-first for API)
|
||||
- [ ] Register service worker in `app.js`
|
||||
- [ ] Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts
|
||||
### [UI] Progressive Web App (PWA) ✅ — 2026-04-29
|
||||
- manifest.json, sw.js, icon-192/512.png, SW registration in app.js
|
||||
- `/manifest.json` and `/sw.js` served at root; added to `_PUBLIC` in auth_middleware
|
||||
- Tested: install prompt confirmed working in Chromium
|
||||
|
||||
### [Tools] Orchestrator tool expansions
|
||||
New tools for `cortex/tools/` — higher-value additions that fill obvious gaps.
|
||||
- [ ] **`cortex_restart`** — `systemctl --user restart cortex`; lets Inara apply her own
|
||||
config changes without human intervention. Return last 10 log lines after restart.
|
||||
- [ ] **`cortex_logs`** — `journalctl --user -u cortex -n N` — tail service logs for debugging
|
||||
- [ ] **`http_fetch`** — fetch a URL and return content; for health checks, API probing,
|
||||
webhook testing. Different from `web_search` — direct URL, returns raw response.
|
||||
- [ ] **`file_list`** — list files and directories at a path; currently only `file_read` exists
|
||||
- [ ] **`file_write`** — write content to a file (with path allow-list for safety)
|
||||
- [ ] **`nc_talk_send`** — proactively send a message to the user via Nextcloud Talk
|
||||
(outbound; complements the proactive notifications channel work)
|
||||
- [ ] **`email_send`** — send email via existing `email_utils.py` SMTP helper
|
||||
- [ ] **`web_push`** — send a browser push notification (requires push subscription stored
|
||||
per-user; pairs well with the PWA service worker already in place)
|
||||
|
||||
### [Channel] Proactive notifications
|
||||
Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder
|
||||
@@ -115,6 +124,22 @@ See `ARCH__Intelligence_Layer.md` for full design.
|
||||
|
||||
## 🟢 Lower Priority / Future
|
||||
|
||||
### [Research] Agent architecture patterns — review before building dev agent pipeline
|
||||
The Claude Code system prompt was leaked April 2026. Two reimplementation repos have
|
||||
useful design ideas directly applicable to the local orchestrator and dev agent work.
|
||||
Read before finalising either design.
|
||||
- [ ] Review https://github.com/HarnessLab/claw-code-agent (Python, targets local models)
|
||||
- [ ] Review https://github.com/ultraworkers/claw-code (community port, interesting source)
|
||||
- Key ideas to evaluate for Cortex:
|
||||
- Tiered permission model (read-only / write / shell / unsafe) — relevant once dev
|
||||
agent is writing and executing code
|
||||
- Agent lineage tracking — which agent spawned which sub-agent; essential for the
|
||||
orchestrator → specialist → supervisor chain
|
||||
- Hard token/cost budgets per operation — local models have fixed context ceilings
|
||||
- Context compaction mid-session — trim stale tool results before hitting limit
|
||||
- Nested agent delegation with dependency-aware batching
|
||||
- Plugin/manifest-based tool registration — worth considering before tool suite grows
|
||||
|
||||
### [Sessions] Cross-session search
|
||||
The file browser has per-file session search, but no way to query across all sessions
|
||||
for a persona. A unified search would make the session archive useful as a knowledge source.
|
||||
@@ -155,18 +180,52 @@ base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md
|
||||
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
|
||||
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
|
||||
|
||||
### [Backend] Intelligent model routing
|
||||
- Currently hardcoded: Claude default, Gemini fallback, local third
|
||||
- Design direction (now informed by real local model perf):
|
||||
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
|
||||
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
|
||||
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
|
||||
- Future: auto-route by task type rather than requiring user to toggle backend manually
|
||||
### [Backend] Intelligent model routing — automatic task-type dispatch
|
||||
Model Registry V2 (2026-04-27) added role-based routing and manual role toggle — that's
|
||||
the foundation. What remains is removing the need to toggle manually.
|
||||
- [ ] Classify incoming messages by task type (heuristic or lightweight classifier)
|
||||
- [ ] Map task type → role → model automatically:
|
||||
- User conversation → `chat` role → Claude (quality prose, persona fidelity)
|
||||
- Tool/research tasks → `orchestrator` role → Gemini API or local
|
||||
- Private/sensitive → `local` role → Ollama (no data leaves network)
|
||||
- Long context (>50k tokens) → Gemini 2.0 (1M ctx window)
|
||||
- Fast/cheap queries → local E4B (25 t/s, no API cost)
|
||||
- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
|
||||
|
||||
### [Ops] Permanent fleet hosting — home server deployment
|
||||
Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
|
||||
home server for always-on reliability. `docker-compose.yml` already exists.
|
||||
- [ ] Copy project to home server
|
||||
- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
|
||||
- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
|
||||
- [ ] WireGuard required for all access — not internet-exposed
|
||||
- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
|
||||
|
||||
### [Future] Cortex Mesh — multi-instance fleet coordination
|
||||
Each fleet device runs its own Cortex instance. Instances delegate tasks to each
|
||||
other based on resources and specialisation. No central coordinator required.
|
||||
- Concept only — no design yet. Resolve these questions before building:
|
||||
- Auth between instances (shared JWT secret vs. per-instance API keys)
|
||||
- Capability advertisement (model registry over HTTP? shared Syncthing file?)
|
||||
- Whether `ae_send_message` / the inbox system is the right coordination layer
|
||||
- Session continuity — does a conversation stay on one node or migrate?
|
||||
- Natural foundation already in place: Syncthing-synced `home/` and shared
|
||||
`model_registry.json` mean instances share persona memory without a central DB
|
||||
|
||||
---
|
||||
|
||||
## ✅ Completed
|
||||
|
||||
### [UI] Progressive Web App (PWA) — 2026-04-29
|
||||
- `manifest.json`, `sw.js`, PNG icons (192/512) generated via rsvg-convert
|
||||
- `/manifest.json` and `/sw.js` served at root via ui.py; exempted in auth_middleware
|
||||
- Theme-color meta tag updated dynamically on light/dark toggle
|
||||
- Install prompt confirmed working in Chromium desktop; apple-touch-icon for iOS
|
||||
|
||||
### [UI] CodeMirror markdown editor for identity/memory files — 2026-04-28
|
||||
- Replaced textarea in Files panel with CodeMirror 5 (markdown mode, CDN)
|
||||
- Syntax highlighting, line wrapping, Ctrl+S to save, per-file undo history
|
||||
|
||||
### [UI] Input area polish — 2026-04-28
|
||||
- Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
|
||||
- S size collapses mode-select to a row (compact); M/L keep vertical column layout
|
||||
|
||||
Reference in New Issue
Block a user