docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:05:11 -04:00
parent 25182a1765
commit 1603ad5124
2 changed files with 147 additions and 55 deletions
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -1,7 +1,7 @@
 # Architecture: Planned Features

 > What's next and how it's designed to work.
-> Last updated: 2026-04-28
+> Last updated: 2026-04-29

 For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.

@@ -9,17 +9,17 @@ For the current task list see `TODO__Agents.md`. For phases and priorities see `

 ## 1. Local Orchestrator

-**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. If the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Full parity with the Gemini orchestrator (tool loop quality, error handling, context budget enforcement) is still in progress.
+**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. When the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.

-Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
+Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.

 **Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.

 **Design:**
 ```
-POST /orchestrate  (mode: "local")
+POST /orchestrate  (role resolves to local_openai model)
    ↓
-local_orchestrator_engine.py
+openai_orchestrator.py
    • converts tools/ to OpenAI tools format
    • POST /api/chat/completions with tools array
    • parse tool_calls response
@@ -34,16 +34,45 @@ Model selection:
 - **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks

 Context budget per iteration (system prompt + memory + tool results + history):
- Small model: budget ~40-50k tokens per round
- Medium model: budget ~35-40k tokens per round
+- Small model: budget ~40–50k tokens per round
+- Medium model: budget ~35–40k tokens per round
+
+Context compaction (to implement): automatically trim stale tool results mid-run when
+approaching the budget ceiling, preserving only the most recent N tool exchanges.

 Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)

 ---

-## 2. Dev Agent Pipeline
+## 2. Orchestrator Tool Expansions

-**Status:** Design complete, not yet built.
+**Status:** Planned. Current tool count: 27. These fill obvious gaps.
+
+New tools for `cortex/tools/` — each follows the existing async pattern (implement function,
+add `FunctionDeclaration`, register in `__init__.py`).
+
+| Tool | Module | Description |
+|---|---|---|
+| `cortex_restart` | `system.py` | `systemctl --user restart cortex` — Inara can apply her own config changes; returns last 10 log lines after restart |
+| `cortex_logs` | `system.py` | `journalctl --user -u cortex -n N` — tail service logs for debugging |
+| `http_fetch` | `web.py` | Fetch a specific URL and return content; for health checks, API probing, webhook testing — not a search, a direct GET/POST |
+| `file_list` | `scratch.py` or new `files.py` | List files and directories at a path; currently only `file_read` exists |
+| `file_write` | `files.py` | Write content to a file with a path allow-list (persona dir + scratch by default) |
+| `nc_talk_send` | new `notify.py` | Proactively send a message to the user via Nextcloud Talk outbound API |
+| `email_send` | `notify.py` | Send email via existing `email_utils.py` SMTP helper |
+| `web_push` | `notify.py` | Browser push notification via Web Push API (requires push subscription stored per-user in `home/{user}/push_sub.json`; pairs with the PWA service worker) |
+
+**Safety note for `cortex_restart`:** The service will drop in-flight SSE connections on restart.
+Only call if no streaming response is active. Add a check or a short delay before restarting.
+
+**Safety note for `file_write`:** Enforce an allow-list at the tool level, not just in the prompt.
+Default allow: `home/{user}/persona/{name}/` and `/tmp/cortex-scratch/`. Reject any path outside.
+
+---
+
+## 3. Dev Agent Pipeline
+
+**Status:** Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.

 Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.

@@ -64,7 +93,7 @@ Supervisor Agent
 Human approval gate
    • summary in Cortex UI or NC Talk
    • approve → commit (+ optional push)
-    • reject <EFBFBD><EFBFBD> feedback back to specialist
+    • reject → feedback back to specialist
 ```

 **Specialists** (both Claude CLI):
@@ -84,7 +113,7 @@ Human approval gate

 ---

-## 3. Gitea Integration
+## 4. Gitea Integration

 **Status:** Not started. pfSense port forward for SSH already confirmed working.

@@ -97,7 +126,7 @@ SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`

 ---

-## 4. Knowledge Layer (AE Journals)
+## 5. Knowledge Layer (AE Journals)

 **Status:** Tools exist, import script not yet built.

@@ -122,16 +151,19 @@ AE Journals becomes the searchable long-term knowledge base. Complements memory

 ---

-## 5. Intelligent Model Routing
+## 6. Intelligent Model Routing

-**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing — `chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup model chain, and the UI role toggle lets users manually select which role handles a message. Automatic task-characteristic routing (below) is still deferred.
+**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing —
+`chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup
+model chain, and the UI role toggle lets users manually select which role handles a message.
+Automatic task-characteristic routing (below) is still deferred.

-Route automatically based on task characteristics rather than requiring manual backend selection:
+Route automatically based on task characteristics rather than requiring manual selection:

 | Task type | Backend | Reason |
 |---|---|---|
 | User-facing conversation | Claude | Quality prose, persona fidelity |
-| Tool use / orchestration | Gemini API | Native function calling, free tier |
+| Tool use / orchestration | Gemini API or local | Native function calling |
 | Private / sensitive / offline | Local (Ollama) | No data leaves the network |
 | Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
 | Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
@@ -140,7 +172,7 @@ Routing logic would live in `llm_client.py` or a new `router.py` — map task me

 ---

-## 6. RAG via Open WebUI
+## 7. RAG via Open WebUI

 **Status:** Future — Open WebUI already supports it.

@@ -152,9 +184,9 @@ API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG sec

 ---

-## 8. Agent Architecture Ideas (from Claude Code leak)
+## 8. Agent Architecture Patterns — Research

-**Status:** Research — review before building dev agent pipeline and orchestrator.
+**Status:** Research — review before building dev agent pipeline and local orchestrator.

 The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:

@@ -175,25 +207,26 @@ The Claude Code system prompt was leaked in early April 2026. Two reimplementati

 **File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.

-**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
+**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).

 ---

-## 7. Permanent Fleet Hosting
+## 9. Permanent Fleet Hosting

-**Status:** Deferred.
+**Status:** Deferred. Currently running on `scott-lt-i7-rtx` (gaming/agents laptop).

-Currently running on `scott-lt-i7-rtx` (gaming/agents laptop). Disabled on `scott_lpt` (2026-04-28) — that machine is a dev/editing node only. Long-term target: home server (always-on, Docker).
+Long-term target: home server (always-on, Docker). `docker-compose.yml` already exists in the project root.

-`docker-compose.yml` already exists in the project root. Deployment path:
+Deployment path:
 1. Copy to home server
 2. Configure reverse proxy (Nginx, already Docker-hosted)
-3. Set subdomain `cortex.dgrzone.com` → home server internal IP
+3. Update `cortex.dgrzone.com` → home server internal IP in pfSense
 4. WireGuard required for all access — not internet-exposed
+5. Update `FLEET_MANIFEST.md` and CLAUDE.md fleet table

 ---

-## 9. Cortex Mesh (Multi-Instance Fleet)
+## 10. Cortex Mesh — Multi-Instance Fleet

 **Status:** Concept — no design yet.