docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-04-29 19:05:11 -04:00
parent 25182a1765
commit 1603ad5124
2 changed files with 147 additions and 55 deletions

View File

@@ -7,34 +7,43 @@
## 🔴 High Priority
### [Local] Tool-capable local orchestrator
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
Gemini API orchestrator for private/offline tasks.
### [Local] Local orchestrator — reach full parity with Gemini orchestrator
`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
When the `orchestrator` role resolves to a `local_openai` model it routes there
automatically. Remaining work is quality/reliability parity, not ground-up design.
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
append result → loop until `finish_reason: stop`
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
(minor field rename, already partially done)
- [ ] Context budget enforcement per iteration (4050k for E4B, 3540k for 26B A4B)
- [ ] Context compaction — trim stale tool results mid-run when approaching limit
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
---
## 🟡 Medium Priority
### [UI] Progressive Web App (PWA)
Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.
- [ ] Add `manifest.json` (name, icons, theme color, display: standalone, start_url)
- [ ] Serve `manifest.json` from `cortex/routers/ui.py` or as a static file
- [ ] Add `<link rel="manifest">` to `index.html`
- [ ] Basic service worker for offline shell (cache static assets; network-first for API)
- [ ] Register service worker in `app.js`
- [ ] Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts
### [UI] Progressive Web App (PWA) ✅ — 2026-04-29
- manifest.json, sw.js, icon-192/512.png, SW registration in app.js
- `/manifest.json` and `/sw.js` served at root; added to `_PUBLIC` in auth_middleware
- Tested: install prompt confirmed working in Chromium
### [Tools] Orchestrator tool expansions
New tools for `cortex/tools/` — higher-value additions that fill obvious gaps.
- [ ] **`cortex_restart`** — `systemctl --user restart cortex`; lets Inara apply her own
config changes without human intervention. Return last 10 log lines after restart.
- [ ] **`cortex_logs`** — `journalctl --user -u cortex -n N` — tail service logs for debugging
- [ ] **`http_fetch`** — fetch a URL and return content; for health checks, API probing,
webhook testing. Different from `web_search` — direct URL, returns raw response.
- [ ] **`file_list`** — list files and directories at a path; currently only `file_read` exists
- [ ] **`file_write`** — write content to a file (with path allow-list for safety)
- [ ] **`nc_talk_send`** — proactively send a message to the user via Nextcloud Talk
(outbound; complements the proactive notifications channel work)
- [ ] **`email_send`** — send email via existing `email_utils.py` SMTP helper
- [ ] **`web_push`** — send a browser push notification (requires push subscription stored
per-user; pairs well with the PWA service worker already in place)
### [Channel] Proactive notifications
Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder
@@ -115,6 +124,22 @@ See `ARCH__Intelligence_Layer.md` for full design.
## 🟢 Lower Priority / Future
### [Research] Agent architecture patterns — review before building dev agent pipeline
The Claude Code system prompt was leaked April 2026. Two reimplementation repos have
useful design ideas directly applicable to the local orchestrator and dev agent work.
Read before finalising either design.
- [ ] Review https://github.com/HarnessLab/claw-code-agent (Python, targets local models)
- [ ] Review https://github.com/ultraworkers/claw-code (community port, interesting source)
- Key ideas to evaluate for Cortex:
- Tiered permission model (read-only / write / shell / unsafe) — relevant once dev
agent is writing and executing code
- Agent lineage tracking — which agent spawned which sub-agent; essential for the
orchestrator → specialist → supervisor chain
- Hard token/cost budgets per operation — local models have fixed context ceilings
- Context compaction mid-session — trim stale tool results before hitting limit
- Nested agent delegation with dependency-aware batching
- Plugin/manifest-based tool registration — worth considering before tool suite grows
### [Sessions] Cross-session search
The file browser has per-file session search, but no way to query across all sessions
for a persona. A unified search would make the session archive useful as a knowledge source.
@@ -155,18 +180,52 @@ base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback, local third
- Design direction (now informed by real local model perf):
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
- Future: auto-route by task type rather than requiring user to toggle backend manually
### [Backend] Intelligent model routing — automatic task-type dispatch
Model Registry V2 (2026-04-27) added role-based routing and manual role toggle — that's
the foundation. What remains is removing the need to toggle manually.
- [ ] Classify incoming messages by task type (heuristic or lightweight classifier)
- [ ] Map task type → role → model automatically:
- User conversation → `chat` role → Claude (quality prose, persona fidelity)
- Tool/research tasks → `orchestrator` role → Gemini API or local
- Private/sensitive → `local` role → Ollama (no data leaves network)
- Long context (>50k tokens) → Gemini 2.0 (1M ctx window)
- Fast/cheap queries → local E4B (25 t/s, no API cost)
- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
### [Ops] Permanent fleet hosting — home server deployment
Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
home server for always-on reliability. `docker-compose.yml` already exists.
- [ ] Copy project to home server
- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
- [ ] WireGuard required for all access — not internet-exposed
- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
### [Future] Cortex Mesh — multi-instance fleet coordination
Each fleet device runs its own Cortex instance. Instances delegate tasks to each
other based on resources and specialisation. No central coordinator required.
- Concept only — no design yet. Resolve these questions before building:
- Auth between instances (shared JWT secret vs. per-instance API keys)
- Capability advertisement (model registry over HTTP? shared Syncthing file?)
- Whether `ae_send_message` / the inbox system is the right coordination layer
- Session continuity — does a conversation stay on one node or migrate?
- Natural foundation already in place: Syncthing-synced `home/` and shared
`model_registry.json` mean instances share persona memory without a central DB
---
## ✅ Completed
### [UI] Progressive Web App (PWA) — 2026-04-29
- `manifest.json`, `sw.js`, PNG icons (192/512) generated via rsvg-convert
- `/manifest.json` and `/sw.js` served at root via ui.py; exempted in auth_middleware
- Theme-color meta tag updated dynamically on light/dark toggle
- Install prompt confirmed working in Chromium desktop; apple-touch-icon for iOS
### [UI] CodeMirror markdown editor for identity/memory files — 2026-04-28
- Replaced textarea in Files panel with CodeMirror 5 (markdown mode, CDN)
- Syntax highlighting, line wrapping, Ctrl+S to save, per-file undo history
### [UI] Input area polish — 2026-04-28
- Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
- S size collapses mode-select to a row (compact); M/L keep vertical column layout