docs: sync TODO and ARCH__FUTURE — local orchestrator status, new tools, fleet/mesh plans
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -7,34 +7,43 @@
|
||||
|
||||
## 🔴 High Priority
|
||||
|
||||
### [Local] Tool-capable local orchestrator
|
||||
Design and implement `local_orchestrator_engine.py` — a ReAct tool loop driven by
|
||||
a local model via Open WebUI's OpenAI-compatible API, as an alternative to the
|
||||
Gemini API orchestrator for private/offline tasks.
|
||||
### [Local] Local orchestrator — reach full parity with Gemini orchestrator
|
||||
`openai_orchestrator.py` is partially built and wired into `POST /orchestrate`.
|
||||
When the `orchestrator` role resolves to a `local_openai` model it routes there
|
||||
automatically. Remaining work is quality/reliability parity, not ground-up design.
|
||||
|
||||
- [ ] Convert existing Cortex tool definitions (`cortex/tools/`) from Gemini
|
||||
`FunctionDeclaration` format to OpenAI `tools` format (minor schema diff)
|
||||
- [ ] Implement tool loop: send tools → parse `tool_calls` response → execute →
|
||||
append result → loop until `finish_reason: stop`
|
||||
- [ ] Wire into `routers/orchestrator.py` — new `mode` param: `"local"` vs `"gemini"`
|
||||
- [ ] UI: Agent mode button routes to local orchestrator when local backend active
|
||||
- [ ] Recommended models (scott_gaming, 8 GB VRAM):
|
||||
Gemma 4 E4B — 25 t/s, 72k practical ctx — interactive/fast tasks
|
||||
Gemma 4 26B A4B — 9 t/s, 50k practical ctx — heavier reasoning, background tasks
|
||||
- Reference: `docs/OPEN_WEBUI_API.md` for full tool call request/response format
|
||||
- [ ] Audit tool schema conversion — Gemini `FunctionDeclaration` → OpenAI `tools` format
|
||||
(minor field rename, already partially done)
|
||||
- [ ] Context budget enforcement per iteration (40–50k for E4B, 35–40k for 26B A4B)
|
||||
- [ ] Context compaction — trim stale tool results mid-run when approaching limit
|
||||
- [ ] Error handling parity with Gemini orchestrator (retry logic, malformed tool calls)
|
||||
- [ ] Test end-to-end with Gemma 4 E4B and 26B A4B on scott_gaming
|
||||
- [ ] Review `ARCH__FUTURE.md` agent architecture ideas before finalising design
|
||||
- Reference: `docs/OPEN_WEBUI_API.md`, `documentation/ARCH__FUTURE.md` §1
|
||||
|
||||
---
|
||||
|
||||
## 🟡 Medium Priority
|
||||
|
||||
### [UI] Progressive Web App (PWA)
|
||||
Low effort, meaningful mobile UX improvement — install Cortex as a home screen app.
|
||||
- [ ] Add `manifest.json` (name, icons, theme color, display: standalone, start_url)
|
||||
- [ ] Serve `manifest.json` from `cortex/routers/ui.py` or as a static file
|
||||
- [ ] Add `<link rel="manifest">` to `index.html`
|
||||
- [ ] Basic service worker for offline shell (cache static assets; network-first for API)
|
||||
- [ ] Register service worker in `app.js`
|
||||
- [ ] Test on iOS (Safari) and Android (Chrome) — both support PWA install prompts
|
||||
### [UI] Progressive Web App (PWA) ✅ — 2026-04-29
|
||||
- manifest.json, sw.js, icon-192/512.png, SW registration in app.js
|
||||
- `/manifest.json` and `/sw.js` served at root; added to `_PUBLIC` in auth_middleware
|
||||
- Tested: install prompt confirmed working in Chromium
|
||||
|
||||
### [Tools] Orchestrator tool expansions
|
||||
New tools for `cortex/tools/` — higher-value additions that fill obvious gaps.
|
||||
- [ ] **`cortex_restart`** — `systemctl --user restart cortex`; lets Inara apply her own
|
||||
config changes without human intervention. Return last 10 log lines after restart.
|
||||
- [ ] **`cortex_logs`** — `journalctl --user -u cortex -n N` — tail service logs for debugging
|
||||
- [ ] **`http_fetch`** — fetch a URL and return content; for health checks, API probing,
|
||||
webhook testing. Different from `web_search` — direct URL, returns raw response.
|
||||
- [ ] **`file_list`** — list files and directories at a path; currently only `file_read` exists
|
||||
- [ ] **`file_write`** — write content to a file (with path allow-list for safety)
|
||||
- [ ] **`nc_talk_send`** — proactively send a message to the user via Nextcloud Talk
|
||||
(outbound; complements the proactive notifications channel work)
|
||||
- [ ] **`email_send`** — send email via existing `email_utils.py` SMTP helper
|
||||
- [ ] **`web_push`** — send a browser push notification (requires push subscription stored
|
||||
per-user; pairs well with the PWA service worker already in place)
|
||||
|
||||
### [Channel] Proactive notifications
|
||||
Inara reaches out on her own initiative via NC Talk or Google Chat when a reminder
|
||||
@@ -115,6 +124,22 @@ See `ARCH__Intelligence_Layer.md` for full design.
|
||||
|
||||
## 🟢 Lower Priority / Future
|
||||
|
||||
### [Research] Agent architecture patterns — review before building dev agent pipeline
|
||||
The Claude Code system prompt was leaked April 2026. Two reimplementation repos have
|
||||
useful design ideas directly applicable to the local orchestrator and dev agent work.
|
||||
Read before finalising either design.
|
||||
- [ ] Review https://github.com/HarnessLab/claw-code-agent (Python, targets local models)
|
||||
- [ ] Review https://github.com/ultraworkers/claw-code (community port, interesting source)
|
||||
- Key ideas to evaluate for Cortex:
|
||||
- Tiered permission model (read-only / write / shell / unsafe) — relevant once dev
|
||||
agent is writing and executing code
|
||||
- Agent lineage tracking — which agent spawned which sub-agent; essential for the
|
||||
orchestrator → specialist → supervisor chain
|
||||
- Hard token/cost budgets per operation — local models have fixed context ceilings
|
||||
- Context compaction mid-session — trim stale tool results before hitting limit
|
||||
- Nested agent delegation with dependency-aware batching
|
||||
- Plugin/manifest-based tool registration — worth considering before tool suite grows
|
||||
|
||||
### [Sessions] Cross-session search
|
||||
The file browser has per-file session search, but no way to query across all sessions
|
||||
for a persona. A unified search would make the session archive useful as a knowledge source.
|
||||
@@ -155,18 +180,52 @@ base accessible to local models. Endpoints documented in `docs/OPEN_WEBUI_API.md
|
||||
- `/api/v1/files/` upload + `/api/v1/retrieval/process/web` for URLs
|
||||
- Reference in chat via `"files": [{"type": "collection", "id": "..."}]`
|
||||
|
||||
### [Backend] Intelligent model routing
|
||||
- Currently hardcoded: Claude default, Gemini fallback, local third
|
||||
- Design direction (now informed by real local model perf):
|
||||
- **Private/offline tasks** → local (Gemma 4 E4B for speed, 26B A4B for reasoning)
|
||||
- **Complex tool tasks / long context** → Gemini (1M token context, strong function calling)
|
||||
- **Final user-facing responses** → Claude (quality prose, persona fidelity)
|
||||
- Future: auto-route by task type rather than requiring user to toggle backend manually
|
||||
### [Backend] Intelligent model routing — automatic task-type dispatch
|
||||
Model Registry V2 (2026-04-27) added role-based routing and manual role toggle — that's
|
||||
the foundation. What remains is removing the need to toggle manually.
|
||||
- [ ] Classify incoming messages by task type (heuristic or lightweight classifier)
|
||||
- [ ] Map task type → role → model automatically:
|
||||
- User conversation → `chat` role → Claude (quality prose, persona fidelity)
|
||||
- Tool/research tasks → `orchestrator` role → Gemini API or local
|
||||
- Private/sensitive → `local` role → Ollama (no data leaves network)
|
||||
- Long context (>50k tokens) → Gemini 2.0 (1M ctx window)
|
||||
- Fast/cheap queries → local E4B (25 t/s, no API cost)
|
||||
- [ ] Routing logic in `llm_client.py` or new `router.py`; expose override in UI
|
||||
|
||||
### [Ops] Permanent fleet hosting — home server deployment
|
||||
Currently running on `scott-lt-i7-rtx` (gaming laptop). Long-term target is the
|
||||
home server for always-on reliability. `docker-compose.yml` already exists.
|
||||
- [ ] Copy project to home server
|
||||
- [ ] Configure Nginx reverse proxy (already Docker-hosted on that machine)
|
||||
- [ ] Point `cortex.dgrzone.com` → home server internal IP (pfSense alias update)
|
||||
- [ ] WireGuard required for all access — not internet-exposed
|
||||
- [ ] Update `FLEET_MANIFEST.md` to reflect new hosting location
|
||||
|
||||
### [Future] Cortex Mesh — multi-instance fleet coordination
|
||||
Each fleet device runs its own Cortex instance. Instances delegate tasks to each
|
||||
other based on resources and specialisation. No central coordinator required.
|
||||
- Concept only — no design yet. Resolve these questions before building:
|
||||
- Auth between instances (shared JWT secret vs. per-instance API keys)
|
||||
- Capability advertisement (model registry over HTTP? shared Syncthing file?)
|
||||
- Whether `ae_send_message` / the inbox system is the right coordination layer
|
||||
- Session continuity — does a conversation stay on one node or migrate?
|
||||
- Natural foundation already in place: Syncthing-synced `home/` and shared
|
||||
`model_registry.json` mean instances share persona memory without a central DB
|
||||
|
||||
---
|
||||
|
||||
## ✅ Completed
|
||||
|
||||
### [UI] Progressive Web App (PWA) — 2026-04-29
|
||||
- `manifest.json`, `sw.js`, PNG icons (192/512) generated via rsvg-convert
|
||||
- `/manifest.json` and `/sw.js` served at root via ui.py; exempted in auth_middleware
|
||||
- Theme-color meta tag updated dynamically on light/dark toggle
|
||||
- Install prompt confirmed working in Chromium desktop; apple-touch-icon for iOS
|
||||
|
||||
### [UI] CodeMirror markdown editor for identity/memory files — 2026-04-28
|
||||
- Replaced textarea in Files panel with CodeMirror 5 (markdown mode, CDN)
|
||||
- Syntax highlighting, line wrapping, Ctrl+S to save, per-file undo history
|
||||
|
||||
### [UI] Input area polish — 2026-04-28
|
||||
- Single cycling S/M/L button replaces 3 separate height buttons (same UX as font size)
|
||||
- S size collapses mode-select to a row (compact); M/L keep vertical column layout
|
||||
|
||||
Reference in New Issue
Block a user