docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:05:11 -04:00
parent 25182a1765
commit 1603ad5124
2 changed files with 147 additions and 55 deletions
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -1,7 +1,7 @@
 # Architecture: Planned Features

 > What's next and how it's designed to work.
-> Last updated: 2026-04-28
+> Last updated: 2026-04-29

 For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.

@@ -9,17 +9,17 @@ For the current task list see `TODO__Agents.md`. For phases and priorities see `

 ## 1. Local Orchestrator

-**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. If the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Full parity with the Gemini orchestrator (tool loop quality, error handling, context budget enforcement) is still in progress.
+**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. When the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.

-Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
+Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.

 **Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.

 **Design:**
 ```
-POST /orchestrate  (mode: "local")
+POST /orchestrate  (role resolves to local_openai model)
    ↓
-local_orchestrator_engine.py
+openai_orchestrator.py
    • converts tools/ to OpenAI tools format
    • POST /api/chat/completions with tools array
    • parse tool_calls response
@@ -34,16 +34,45 @@ Model selection:
 - **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks

 Context budget per iteration (system prompt + memory + tool results + history):
- Small model: budget ~40-50k tokens per round
- Medium model: budget ~35-40k tokens per round
+- Small model: budget ~40–50k tokens per round
+- Medium model: budget ~35–40k tokens per round
+
+Context compaction (to implement): automatically trim stale tool results mid-run when
+approaching the budget ceiling, preserving only the most recent N tool exchanges.

 Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)

 ---

-## 2. Dev Agent Pipeline
+## 2. Orchestrator Tool Expansions

-**Status:** Design complete, not yet built.
+**Status:** Planned. Current tool count: 27. These fill obvious gaps.
+
+New tools for `cortex/tools/` — each follows the existing async pattern (implement function,
+add `FunctionDeclaration`, register in `__init__.py`).
+
+| Tool | Module | Description |
+|---|---|---|
+| `cortex_restart` | `system.py` | `systemctl --user restart cortex` — Inara can apply her own config changes; returns last 10 log lines after restart |
+| `cortex_logs` | `system.py` | `journalctl --user -u cortex -n N` — tail service logs for debugging |
+| `http_fetch` | `web.py` | Fetch a specific URL and return content; for health checks, API probing, webhook testing — not a search, a direct GET/POST |
+| `file_list` | `scratch.py` or new `files.py` | List files and directories at a path; currently only `file_read` exists |
+| `file_write` | `files.py` | Write content to a file with a path allow-list (persona dir + scratch by default) |
+| `nc_talk_send` | new `notify.py` | Proactively send a message to the user via Nextcloud Talk outbound API |
+| `email_send` | `notify.py` | Send email via existing `email_utils.py` SMTP helper |
+| `web_push` | `notify.py` | Browser push notification via Web Push API (requires push subscription stored per-user in `home/{user}/push_sub.json`; pairs with the PWA service worker) |
+
+**Safety note for `cortex_restart`:** The service will drop in-flight SSE connections on restart.
+Only call if no streaming response is active. Add a check or a short delay before restarting.
+
+**Safety note for `file_write`:** Enforce an allow-list at the tool level, not just in the prompt.
+Default allow: `home/{user}/persona/{name}/` and `/tmp/cortex-scratch/`. Reject any path outside.
+
+---
+
+## 3. Dev Agent Pipeline
+
+**Status:** Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.

 Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.

@@ -64,7 +93,7 @@ Supervisor Agent
 Human approval gate
    • summary in Cortex UI or NC Talk
    • approve → commit (+ optional push)
-    • reject <EFBFBD><EFBFBD> feedback back to specialist
+    • reject → feedback back to specialist
 ```

 **Specialists** (both Claude CLI):
@@ -84,7 +113,7 @@ Human approval gate

 ---

-## 3. Gitea Integration
+## 4. Gitea Integration

 **Status:** Not started. pfSense port forward for SSH already confirmed working.

@@ -97,7 +126,7 @@ SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`

 ---

-## 4. Knowledge Layer (AE Journals)
+## 5. Knowledge Layer (AE Journals)

 **Status:** Tools exist, import script not yet built.

@@ -122,16 +151,19 @@ AE Journals becomes the searchable long-term knowledge base. Complements memory

 ---

-## 5. Intelligent Model Routing
+## 6. Intelligent Model Routing

-**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing — `chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup model chain, and the UI role toggle lets users manually select which role handles a message. Automatic task-characteristic routing (below) is still deferred.
+**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing —
+`chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup
+model chain, and the UI role toggle lets users manually select which role handles a message.
+Automatic task-characteristic routing (below) is still deferred.

-Route automatically based on task characteristics rather than requiring manual backend selection:
+Route automatically based on task characteristics rather than requiring manual selection:

 | Task type | Backend | Reason |
 |---|---|---|
 | User-facing conversation | Claude | Quality prose, persona fidelity |
-| Tool use / orchestration | Gemini API | Native function calling, free tier |
+| Tool use / orchestration | Gemini API or local | Native function calling |
 | Private / sensitive / offline | Local (Ollama) | No data leaves the network |
 | Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
 | Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
@@ -140,7 +172,7 @@ Routing logic would live in `llm_client.py` or a new `router.py` — map task me

 ---

-## 6. RAG via Open WebUI
+## 7. RAG via Open WebUI

 **Status:** Future — Open WebUI already supports it.

@@ -152,9 +184,9 @@ API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG sec

 ---

-## 8. Agent Architecture Ideas (from Claude Code leak)
+## 8. Agent Architecture Patterns — Research

-**Status:** Research — review before building dev agent pipeline and orchestrator.
+**Status:** Research — review before building dev agent pipeline and local orchestrator.

 The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:

@@ -175,25 +207,26 @@ The Claude Code system prompt was leaked in early April 2026. Two reimplementati

 **File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.

-**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
+**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).

 ---

-## 7. Permanent Fleet Hosting
+## 9. Permanent Fleet Hosting

-**Status:** Deferred.
+**Status:** Deferred. Currently running on `scott-lt-i7-rtx` (gaming/agents laptop).

-Currently running on `scott-lt-i7-rtx` (gaming/agents laptop). Disabled on `scott_lpt` (2026-04-28) — that machine is a dev/editing node only. Long-term target: home server (always-on, Docker).
+Long-term target: home server (always-on, Docker). `docker-compose.yml` already exists in the project root.

-`docker-compose.yml` already exists in the project root. Deployment path:
+Deployment path:
 1. Copy to home server
 2. Configure reverse proxy (Nginx, already Docker-hosted)
-3. Set subdomain `cortex.dgrzone.com` → home server internal IP
+3. Update `cortex.dgrzone.com` → home server internal IP in pfSense
 4. WireGuard required for all access — not internet-exposed
+5. Update `FLEET_MANIFEST.md` and CLAUDE.md fleet table

 ---

-## 9. Cortex Mesh (Multi-Instance Fleet)
+## 10. Cortex Mesh — Multi-Instance Fleet

 **Status:** Concept — no design yet.

--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -7,34 +7,43 @@

 ## 🔴 High Priority

-### [Local] Tool-capable local orchestrator
-Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
-a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
-Gemini API orchestrator for private/offline tasks.
+### [Local] Local orchestrator — reach full parity with Gemini orchestrator
+`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
+When the `orchestrator` role resolves to a `local_openai` model it routes there
+automatically. Remaining work is quality/reliability parity, not ground-up design.

- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
-      `FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
-      append result → loop until `finish_reason: stop`
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
-      Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
-      Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
+- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
+      (minor field rename, already partially done)
+- [ ] Context budget enforcement per iteration (40–50k for E4B, 35–40k for 26B A4B)
+- [ ] Context compaction — trim stale tool results mid-run when approaching limit
+- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
+- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
+- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
+- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1

 ---

 ## 🟡 Medium Priority

-### [UI] Progressive Web App (PWA)
-Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.
- [ ] Add `manifest.json` (name, icons, theme color, display: standalone, start_url)
- [ ] Serve `manifest.json` from `cortex/routers/ui.py` or as a static file
- [ ] Add `<link rel="manifest">` to `index.html`
- [ ] Basic service worker for offline shell (cache static assets; network-first for API)
- [ ] Register service worker in `app.js`
- [ ] Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts
+### [UI] Progressive Web App (PWA) ✅ — 2026-04-29
+- manifest.json, sw.js, icon-192/512.png, SW registration in app.js
+- `/manifest.json` and `/sw.js` served at root; added to `_PUBLIC` in auth_middleware
+- Tested: install prompt confirmed working in Chromium
+
+### [Tools] Orchestrator tool expansions
+New tools for `cortex/tools/` — higher-value additions that fill obvious gaps.
+- [ ] **`cortex_restart`** — `systemctl --user restart cortex`; lets Inara apply her own
+      config changes without human intervention. Return last 10 log lines after restart.
+- [ ] **`cortex_logs`** — `journalctl --user -u cortex -n N` — tail service logs for debugging
+- [ ] **`http_fetch`** — fetch a URL and return content; for health checks, API probing,
+      webhook testing. Different from `web_search` — direct URL, returns raw response.
+- [ ] **`file_list`** — list files and directories at a path; currently only `file_read` exists
+- [ ] **`file_write`** — write content to a file (with path allow-list for safety)
+- [ ] **`nc_talk_send`** — proactively send a message to the user via Nextcloud Talk
+      (outbound; complements the proactive notifications channel work)
+- [ ] **`email_send`** — send email via existing `email_utils.py` SMTP helper
+- [ ] **`web_push`** — send a browser push notification (requires push subscription stored
+      per-user; pairs well with the PWA service worker already in place)

 ### [Channel] Proactive notifications
 Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder
@@ -115,6 +124,22 @@ See `ARCH__Intelligence_Layer.md` for full design.

 ## 🟢 Lower Priority / Future

+### [Research] Agent architecture patterns — review before building dev agent pipeline
+The Claude Code system prompt was leaked April 2026. Two reimplementation repos have
+useful design ideas directly applicable to the local orchestrator and dev agent work.
+Read before finalising either design.
+- [ ] Review https://github.com/HarnessLab/claw-code-agent (Python, targets local models)
+- [ ] Review https://github.com/ultraworkers/claw-code (community port, interesting source)
+- Key ideas to evaluate for Cortex:
+  - Tiered permission model (read-only / write / shell / unsafe) — relevant once dev
+    agent is writing and executing code
+  - Agent lineage tracking — which agent spawned which sub-agent; essential for the
+    orchestrator → specialist → supervisor chain
+  - Hard token/cost budgets per operation — local models have fixed context ceilings
+  - Context compaction mid-session — trim stale tool results before hitting limit
+  - Nested agent delegation with dependency-aware batching
+  - Plugin/manifest-based tool registration — worth considering before tool suite grows
+
 ### [Sessions] Cross-session search
 The file browser has per-file session search, but no way to query across all sessions
 for a persona. A unified search would make the session archive useful as a knowledge source.
@@ -155,18 +180,52 @@ base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md
 - `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
 - Reference in chat via `"files": [{"type": "collection", "id": "..."}]`

-### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback, local third
- Design direction (now informed by real local model perf):
-  - **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
-  - **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
-  - **Final user-facing responses** → Claude (quality prose, persona fidelity)
- Future: auto-route by task type rather than requiring user to toggle backend manually
+### [Backend] Intelligent model routing — automatic task-type dispatch
+Model Registry V2 (2026-04-27) added role-based routing and manual role toggle — that's
+the foundation. What remains is removing the need to toggle manually.
+- [ ] Classify incoming messages by task type (heuristic or lightweight classifier)
+- [ ] Map task type → role → model automatically:
+  - User conversation → `chat` role → Claude (quality prose, persona fidelity)
+  - Tool/research tasks → `orchestrator` role → Gemini API or local
+  - Private/sensitive → `local` role → Ollama (no data leaves network)
+  - Long context (>50k tokens) → Gemini 2.0 (1M ctx window)
+  - Fast/cheap queries → local E4B (25 t/s, no API cost)
+- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
+
+### [Ops] Permanent fleet hosting — home server deployment
+Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
+home server for always-on reliability. `docker-compose.yml` already exists.
+- [ ] Copy project to home server
+- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
+- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
+- [ ] WireGuard required for all access — not internet-exposed
+- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
+
+### [Future] Cortex Mesh — multi-instance fleet coordination
+Each fleet device runs its own Cortex instance. Instances delegate tasks to each
+other based on resources and specialisation. No central coordinator required.
+- Concept only — no design yet. Resolve these questions before building:
+  - Auth between instances (shared JWT secret vs. per-instance API keys)
+  - Capability advertisement (model registry over HTTP? shared Syncthing file?)
+  - Whether `ae_send_message` / the inbox system is the right coordination layer
+  - Session continuity — does a conversation stay on one node or migrate?
+- Natural foundation already in place: Syncthing-synced `home/` and shared
+  `model_registry.json` mean instances share persona memory without a central DB

 ---

 ## ✅ Completed

+### [UI] Progressive Web App (PWA) — 2026-04-29
+- `manifest.json`, `sw.js`, PNG icons (192/512) generated via rsvg-convert
+- `/manifest.json` and `/sw.js` served at root via ui.py; exempted in auth_middleware
+- Theme-color meta tag updated dynamically on light/dark toggle
+- Install prompt confirmed working in Chromium desktop; apple-touch-icon for iOS
+
+### [UI] CodeMirror markdown editor for identity/memory files — 2026-04-28
+- Replaced textarea in Files panel with CodeMirror 5 (markdown mode, CDN)
+- Syntax highlighting, line wrapping, Ctrl+S to save, per-file undo history
+
 ### [UI] Input area polish — 2026-04-28
 - Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
 - S size collapses mode-select to a row (compact); M/L keep vertical column layout