docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# Architecture: Planned Features
|
||||
|
||||
> What's next and how it's designed to work.
|
||||
> Last updated: 2026-04-28
|
||||
> Last updated: 2026-04-29
|
||||
|
||||
For the current task list see `TODO__Agents.md`. For phases and priorities see `ROADMAP.md`.
|
||||
|
||||
@@ -9,17 +9,17 @@ For the current task list see `TODO__Agents.md`. For phases and priorities see `
|
||||
|
||||
## 1. Local Orchestrator
|
||||
|
||||
**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. If the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Full parity with the Gemini orchestrator (tool loop quality, error handling, context budget enforcement) is still in progress.
|
||||
**Status:** Partially built — `openai_orchestrator.py` exists and is wired into `POST /orchestrate`. When the `orchestrator` role in the model registry resolves to a `local_openai` model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.
|
||||
|
||||
Same ReAct tool loop as the Gemini API orchestrator, but driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
|
||||
Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
|
||||
|
||||
**Why local models work for this now:** Gemma 4 E4B and 26B A4B both support OpenAI `tools` / `tool_choice` function calling. The tool schema is nearly identical to Gemini's `FunctionDeclaration` — minor field renaming only.
|
||||
|
||||
**Design:**
|
||||
```
|
||||
POST /orchestrate (mode: "local")
|
||||
POST /orchestrate (role resolves to local_openai model)
|
||||
↓
|
||||
local_orchestrator_engine.py
|
||||
openai_orchestrator.py
|
||||
• converts tools/ to OpenAI tools format
|
||||
• POST /api/chat/completions with tools array
|
||||
• parse tool_calls response
|
||||
@@ -34,16 +34,45 @@ Model selection:
|
||||
- **Gemma 4 26B A4B** (9 t/s, 50k ctx) — heavier reasoning, background tasks
|
||||
|
||||
Context budget per iteration (system prompt + memory + tool results + history):
|
||||
- Small model: budget ~40-50k tokens per round
|
||||
- Medium model: budget ~35-40k tokens per round
|
||||
- Small model: budget ~40–50k tokens per round
|
||||
- Medium model: budget ~35–40k tokens per round
|
||||
|
||||
Context compaction (to implement): automatically trim stale tool results mid-run when
|
||||
approaching the budget ceiling, preserving only the most recent N tool exchanges.
|
||||
|
||||
Full API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md)
|
||||
|
||||
---
|
||||
|
||||
## 2. Dev Agent Pipeline
|
||||
## 2. Orchestrator Tool Expansions
|
||||
|
||||
**Status:** Design complete, not yet built.
|
||||
**Status:** Planned. Current tool count: 27. These fill obvious gaps.
|
||||
|
||||
New tools for `cortex/tools/` — each follows the existing async pattern (implement function,
|
||||
add `FunctionDeclaration`, register in `__init__.py`).
|
||||
|
||||
| Tool | Module | Description |
|
||||
|---|---|---|
|
||||
| `cortex_restart` | `system.py` | `systemctl --user restart cortex` — Inara can apply her own config changes; returns last 10 log lines after restart |
|
||||
| `cortex_logs` | `system.py` | `journalctl --user -u cortex -n N` — tail service logs for debugging |
|
||||
| `http_fetch` | `web.py` | Fetch a specific URL and return content; for health checks, API probing, webhook testing — not a search, a direct GET/POST |
|
||||
| `file_list` | `scratch.py` or new `files.py` | List files and directories at a path; currently only `file_read` exists |
|
||||
| `file_write` | `files.py` | Write content to a file with a path allow-list (persona dir + scratch by default) |
|
||||
| `nc_talk_send` | new `notify.py` | Proactively send a message to the user via Nextcloud Talk outbound API |
|
||||
| `email_send` | `notify.py` | Send email via existing `email_utils.py` SMTP helper |
|
||||
| `web_push` | `notify.py` | Browser push notification via Web Push API (requires push subscription stored per-user in `home/{user}/push_sub.json`; pairs with the PWA service worker) |
|
||||
|
||||
**Safety note for `cortex_restart`:** The service will drop in-flight SSE connections on restart.
|
||||
Only call if no streaming response is active. Add a check or a short delay before restarting.
|
||||
|
||||
**Safety note for `file_write`:** Enforce an allow-list at the tool level, not just in the prompt.
|
||||
Default allow: `home/{user}/persona/{name}/` and `/tmp/cortex-scratch/`. Reject any path outside.
|
||||
|
||||
---
|
||||
|
||||
## 3. Dev Agent Pipeline
|
||||
|
||||
**Status:** Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.
|
||||
|
||||
Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
|
||||
|
||||
@@ -64,7 +93,7 @@ Supervisor Agent
|
||||
Human approval gate
|
||||
• summary in Cortex UI or NC Talk
|
||||
• approve → commit (+ optional push)
|
||||
• reject <EFBFBD><EFBFBD> feedback back to specialist
|
||||
• reject → feedback back to specialist
|
||||
```
|
||||
|
||||
**Specialists** (both Claude CLI):
|
||||
@@ -84,7 +113,7 @@ Human approval gate
|
||||
|
||||
---
|
||||
|
||||
## 3. Gitea Integration
|
||||
## 4. Gitea Integration
|
||||
|
||||
**Status:** Not started. pfSense port forward for SSH already confirmed working.
|
||||
|
||||
@@ -97,7 +126,7 @@ SSH clone/push: `git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git`
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge Layer (AE Journals)
|
||||
## 5. Knowledge Layer (AE Journals)
|
||||
|
||||
**Status:** Tools exist, import script not yet built.
|
||||
|
||||
@@ -122,16 +151,19 @@ AE Journals becomes the searchable long-term knowledge base. Complements memory
|
||||
|
||||
---
|
||||
|
||||
## 5. Intelligent Model Routing
|
||||
## 6. Intelligent Model Routing
|
||||
|
||||
**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing — `chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup model chain, and the UI role toggle lets users manually select which role handles a message. Automatic task-characteristic routing (below) is still deferred.
|
||||
**Status:** Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing —
|
||||
`chat`, `orchestrator`, `distill`, `coder`, `research` roles each have their own primary/backup
|
||||
model chain, and the UI role toggle lets users manually select which role handles a message.
|
||||
Automatic task-characteristic routing (below) is still deferred.
|
||||
|
||||
Route automatically based on task characteristics rather than requiring manual backend selection:
|
||||
Route automatically based on task characteristics rather than requiring manual selection:
|
||||
|
||||
| Task type | Backend | Reason |
|
||||
|---|---|---|
|
||||
| User-facing conversation | Claude | Quality prose, persona fidelity |
|
||||
| Tool use / orchestration | Gemini API | Native function calling, free tier |
|
||||
| Tool use / orchestration | Gemini API or local | Native function calling |
|
||||
| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
|
||||
| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
|
||||
| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
|
||||
@@ -140,7 +172,7 @@ Routing logic would live in `llm_client.py` or a new `router.py` — map task me
|
||||
|
||||
---
|
||||
|
||||
## 6. RAG via Open WebUI
|
||||
## 7. RAG via Open WebUI
|
||||
|
||||
**Status:** Future — Open WebUI already supports it.
|
||||
|
||||
@@ -152,9 +184,9 @@ API reference: [`docs/OPEN_WEBUI_API.md`](../docs/OPEN_WEBUI_API.md) — RAG sec
|
||||
|
||||
---
|
||||
|
||||
## 8. Agent Architecture Ideas (from Claude Code leak)
|
||||
## 8. Agent Architecture Patterns — Research
|
||||
|
||||
**Status:** Research — review before building dev agent pipeline and orchestrator.
|
||||
**Status:** Research — review before building dev agent pipeline and local orchestrator.
|
||||
|
||||
The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
|
||||
|
||||
@@ -175,25 +207,26 @@ The Claude Code system prompt was leaked in early April 2026. Two reimplementati
|
||||
|
||||
**File history journaling** — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
|
||||
|
||||
**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger.
|
||||
**Plugin/manifest-based tool extensions** — tools declared via manifest rather than hardcoded in `__init__.py`. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).
|
||||
|
||||
---
|
||||
|
||||
## 7. Permanent Fleet Hosting
|
||||
## 9. Permanent Fleet Hosting
|
||||
|
||||
**Status:** Deferred.
|
||||
**Status:** Deferred. Currently running on `scott-lt-i7-rtx` (gaming/agents laptop).
|
||||
|
||||
Currently running on `scott-lt-i7-rtx` (gaming/agents laptop). Disabled on `scott_lpt` (2026-04-28) — that machine is a dev/editing node only. Long-term target: home server (always-on, Docker).
|
||||
Long-term target: home server (always-on, Docker). `docker-compose.yml` already exists in the project root.
|
||||
|
||||
`docker-compose.yml` already exists in the project root. Deployment path:
|
||||
Deployment path:
|
||||
1. Copy to home server
|
||||
2. Configure reverse proxy (Nginx, already Docker-hosted)
|
||||
3. Set subdomain `cortex.dgrzone.com` → home server internal IP
|
||||
3. Update `cortex.dgrzone.com` → home server internal IP in pfSense
|
||||
4. WireGuard required for all access — not internet-exposed
|
||||
5. Update `FLEET_MANIFEST.md` and CLAUDE.md fleet table
|
||||
|
||||
---
|
||||
|
||||
## 9. Cortex Mesh (Multi-Instance Fleet)
|
||||
## 10. Cortex Mesh — Multi-Instance Fleet
|
||||
|
||||
**Status:** Concept — no design yet.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user