docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 19:05:11 -04:00
parent 25182a1765
commit 1603ad5124
2 changed files with 147 additions and 55 deletions
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -7,34 +7,43 @@

 ## 🔴 High Priority

-### [Local] Tool-capable local orchestrator
-Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
-a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
-Gemini API orchestrator for private/offline tasks.
+### [Local] Local orchestrator — reach full parity with Gemini orchestrator
+`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
+When the `orchestrator` role resolves to a `local_openai` model it routes there
+automatically. Remaining work is quality/reliability parity, not ground-up design.

- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
-      `FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
-      append result → loop until `finish_reason: stop`
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
-      Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
-      Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
+- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
+      (minor field rename, already partially done)
+- [ ] Context budget enforcement per iteration (40–50k for E4B, 35–40k for 26B A4B)
+- [ ] Context compaction — trim stale tool results mid-run when approaching limit
+- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
+- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
+- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
+- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1

 ---

 ## 🟡 Medium Priority

-### [UI] Progressive Web App (PWA)
-Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.
- [ ] Add `manifest.json` (name, icons, theme color, display: standalone, start_url)
- [ ] Serve `manifest.json` from `cortex/routers/ui.py` or as a static file
- [ ] Add `<link rel="manifest">` to `index.html`
- [ ] Basic service worker for offline shell (cache static assets; network-first for API)
- [ ] Register service worker in `app.js`
- [ ] Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts
+### [UI] Progressive Web App (PWA) ✅ — 2026-04-29
+- manifest.json, sw.js, icon-192/512.png, SW registration in app.js
+- `/manifest.json` and `/sw.js` served at root; added to `_PUBLIC` in auth_middleware
+- Tested: install prompt confirmed working in Chromium
+
+### [Tools] Orchestrator tool expansions
+New tools for `cortex/tools/` — higher-value additions that fill obvious gaps.
+- [ ] **`cortex_restart`** — `systemctl --user restart cortex`; lets Inara apply her own
+      config changes without human intervention. Return last 10 log lines after restart.
+- [ ] **`cortex_logs`** — `journalctl --user -u cortex -n N` — tail service logs for debugging
+- [ ] **`http_fetch`** — fetch a URL and return content; for health checks, API probing,
+      webhook testing. Different from `web_search` — direct URL, returns raw response.
+- [ ] **`file_list`** — list files and directories at a path; currently only `file_read` exists
+- [ ] **`file_write`** — write content to a file (with path allow-list for safety)
+- [ ] **`nc_talk_send`** — proactively send a message to the user via Nextcloud Talk
+      (outbound; complements the proactive notifications channel work)
+- [ ] **`email_send`** — send email via existing `email_utils.py` SMTP helper
+- [ ] **`web_push`** — send a browser push notification (requires push subscription stored
+      per-user; pairs well with the PWA service worker already in place)

 ### [Channel] Proactive notifications
 Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder
@@ -115,6 +124,22 @@ See `ARCH__Intelligence_Layer.md` for full design.

 ## 🟢 Lower Priority / Future

+### [Research] Agent architecture patterns — review before building dev agent pipeline
+The Claude Code system prompt was leaked April 2026. Two reimplementation repos have
+useful design ideas directly applicable to the local orchestrator and dev agent work.
+Read before finalising either design.
+- [ ] Review https://github.com/HarnessLab/claw-code-agent (Python, targets local models)
+- [ ] Review https://github.com/ultraworkers/claw-code (community port, interesting source)
+- Key ideas to evaluate for Cortex:
+  - Tiered permission model (read-only / write / shell / unsafe) — relevant once dev
+    agent is writing and executing code
+  - Agent lineage tracking — which agent spawned which sub-agent; essential for the
+    orchestrator → specialist → supervisor chain
+  - Hard token/cost budgets per operation — local models have fixed context ceilings
+  - Context compaction mid-session — trim stale tool results before hitting limit
+  - Nested agent delegation with dependency-aware batching
+  - Plugin/manifest-based tool registration — worth considering before tool suite grows
+
 ### [Sessions] Cross-session search
 The file browser has per-file session search, but no way to query across all sessions
 for a persona. A unified search would make the session archive useful as a knowledge source.
@@ -155,18 +180,52 @@ base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md
 - `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
 - Reference in chat via `"files": [{"type": "collection", "id": "..."}]`

-### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback, local third
- Design direction (now informed by real local model perf):
-  - **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
-  - **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
-  - **Final user-facing responses** → Claude (quality prose, persona fidelity)
- Future: auto-route by task type rather than requiring user to toggle backend manually
+### [Backend] Intelligent model routing — automatic task-type dispatch
+Model Registry V2 (2026-04-27) added role-based routing and manual role toggle — that's
+the foundation. What remains is removing the need to toggle manually.
+- [ ] Classify incoming messages by task type (heuristic or lightweight classifier)
+- [ ] Map task type → role → model automatically:
+  - User conversation → `chat` role → Claude (quality prose, persona fidelity)
+  - Tool/research tasks → `orchestrator` role → Gemini API or local
+  - Private/sensitive → `local` role → Ollama (no data leaves network)
+  - Long context (>50k tokens) → Gemini 2.0 (1M ctx window)
+  - Fast/cheap queries → local E4B (25 t/s, no API cost)
+- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
+
+### [Ops] Permanent fleet hosting — home server deployment
+Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
+home server for always-on reliability. `docker-compose.yml` already exists.
+- [ ] Copy project to home server
+- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
+- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
+- [ ] WireGuard required for all access — not internet-exposed
+- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
+
+### [Future] Cortex Mesh — multi-instance fleet coordination
+Each fleet device runs its own Cortex instance. Instances delegate tasks to each
+other based on resources and specialisation. No central coordinator required.
+- Concept only — no design yet. Resolve these questions before building:
+  - Auth between instances (shared JWT secret vs. per-instance API keys)
+  - Capability advertisement (model registry over HTTP? shared Syncthing file?)
+  - Whether `ae_send_message` / the inbox system is the right coordination layer
+  - Session continuity — does a conversation stay on one node or migrate?
+- Natural foundation already in place: Syncthing-synced `home/` and shared
+  `model_registry.json` mean instances share persona memory without a central DB

 ---

 ## ✅ Completed

+### [UI] Progressive Web App (PWA) — 2026-04-29
+- `manifest.json`, `sw.js`, PNG icons (192/512) generated via rsvg-convert
+- `/manifest.json` and `/sw.js` served at root via ui.py; exempted in auth_middleware
+- Theme-color meta tag updated dynamically on light/dark toggle
+- Install prompt confirmed working in Chromium desktop; apple-touch-icon for iOS
+
+### [UI] CodeMirror markdown editor for identity/memory files — 2026-04-28
+- Replaced textarea in Files panel with CodeMirror 5 (markdown mode, CDN)
+- Syntax highlighting, line wrapping, Ctrl+S to save, per-file undo history
+
 ### [UI] Input area polish — 2026-04-28
 - Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
 - S size collapses mode-select to a row (compact); M/L keep vertical column layout