session_search (tools/files.py): - Full-text search across past session logs, exposed to the orchestrator - Params: query (required), limit (default 5, max 20) - Returns dated excerpts, newest first; own sessions only via ContextVars - User-level — no TOOL_ROLES gating needed - Registered in __init__.py callables + TOOL_CATEGORIES["Files"] ARCH__FUTURE.md §2: updated tool count to 44, marked prior tools complete, added Round 2 planned tools table (session_search now done, reminders due dates, http_post, nc_talk_history, task_list priority filter, http_fetch max_chars), noted datetime_now is not needed (already in system prompt via context_loader) TODO__Agents.md: session_search checked off, Round 2 task list added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
Architecture: Planned Features
What's next and how it's designed to work. Last updated: 2026-04-29
For the current task list see TODO__Agents.md. For phases and priorities see ROADMAP.md.
1. Local Orchestrator
Status: Partially built — openai_orchestrator.py exists and is wired into POST /orchestrate. When the orchestrator role in the model registry resolves to a local_openai model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.
Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.
Why local models work for this now: Gemma 4 E4B and 26B A4B both support OpenAI tools / tool_choice function calling. The tool schema is nearly identical to Gemini's FunctionDeclaration — minor field renaming only.
Design:
POST /orchestrate (role resolves to local_openai model)
↓
openai_orchestrator.py
• converts tools/ to OpenAI tools format
• POST /api/chat/completions with tools array
• parse tool_calls response
• execute tool, append result
• loop until finish_reason: "stop"
↓
response returned (local model generates final answer)
Model selection:
- Gemma 4 E4B (25 t/s, 72k ctx) — interactive/fast tasks
- Gemma 4 26B A4B (9 t/s, 50k ctx) — heavier reasoning, background tasks
Context budget per iteration (system prompt + memory + tool results + history):
- Small model: budget ~40–50k tokens per round
- Medium model: budget ~35–40k tokens per round
Context compaction (to implement): automatically trim stale tool results mid-run when approaching the budget ceiling, preserving only the most recent N tool exchanges.
Full API reference: docs/OPEN_WEBUI_API.md
2. Orchestrator Tool Expansions
Status: Ongoing. Current tool count: 44. Previously planned tools are all complete.
Completed
All originally planned tools are live: cortex_restart, cortex_logs, http_fetch,
file_list, file_write, nc_talk_send, email_send, web_push, agent_notes_*.
Next additions
Datetime note: The current date and time is already injected into every system prompt
via context_loader.py (--- System --- Current date and time: ...). A dedicated
datetime_now tool is not needed — the timestamp is always in context.
| Tool | Module | Priority | Description |
|---|---|---|---|
session_search |
new search.py or files.py |
High | Full-text search across past session logs. The UI search already exists (GET /sessions/search) — this exposes it to the orchestrator so the agent can answer "what did we discuss about X last month?" |
reminders due dates |
reminders.py |
Medium | Add optional due field to reminders_add. Surface only due/overdue reminders in context rather than the full flat list. Makes reminders time-aware rather than always-on noise. |
http_post |
web.py |
Medium | POST to an external URL — for webhooks, REST APIs, form submissions. Requires a per-user host allowlist (same pattern as email_send) to prevent misuse. |
nc_talk_history |
notify.py |
Medium | Read recent messages from a Nextcloud Talk conversation. The bot can send but cannot read — adding read capability gives it full context before replying. |
task_list priority filter |
tasks.py |
Low | task_list accepts status but not priority. Add priority param so the agent can ask "what are my high-priority tasks?" without returning everything. |
http_fetch max_chars |
web.py |
Low | Currently hardcapped at 8,192 chars. Accept optional max_chars param so callers can request more or less content. |
Not needed / deferred
datetime_now— already in system prompt (see note above)memory_read— memory files are already loaded into system prompt at Tier 2+; a tool adds no value except at Tier 1, which is a rare edge case- Calculator — modern models handle arithmetic well;
shell_execcovers edge cases for admins - Google Calendar — useful but requires Google API OAuth scope expansion; defer until auth layer supports it
3. Dev Agent Pipeline
Status: Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.
Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.
Task (chat / Gitea issue / Kanban)
↓
Orchestrator — reads relevant files, routes to specialist
↓
Specialist Agent (Claude CLI in project directory)
• implements the change
• runs self-check: py_compile / svelte-check
↓
Supervisor Agent
• reviews the diff
• runs test suite
• returns: PASS / NEEDS_REVIEW / FAIL + reason
↓
Human approval gate
• summary in Cortex UI or NC Talk
• approve → commit (+ optional push)
• reject → feedback back to specialist
Specialists (both Claude CLI):
- Frontend — working dir:
~/OSIT_dev/aether_app_sveltekit/— runssvelte-checkafter every change - Backend — working dir:
~/OSIT_dev/aether_api_fastapi/— runspy_compile+ unit tests
Supervisor returns structured JSON:
{
"verdict": "PASS | NEEDS_REVIEW | FAIL",
"checks_passed": ["py_compile"],
"checks_failed": [],
"review_notes": "...",
"commit_message": "..."
}
4. Gitea Integration
Status: Not started. pfSense port forward for SSH already confirmed working.
- Webhooks → Cortex: push/PR/issue events →
POST /webhook/gitea→ orchestrator- Router pattern already established; add
cortex/routers/gitea.py
- Router pattern already established; add
- Gitea Actions CI:
.gitea/workflows/check.yml— runpy_compile/svelte-checkon push - Cortex → Gitea: after human approval, call Gitea API to create PR or push branch
SSH clone/push: git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git
5. Knowledge Layer (AE Journals)
Status: Tools exist, import script not yet built.
AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".
Existing tools: ae_journal_search, ae_journal_entry_create — already in orchestrator tool suite.
Import script (to build):
- Walk a markdown directory (Nextcloud, agents_sync docs)
- Chunk by H2 section
- Search before creating (deduplication)
- Tag from frontmatter, filename, directory path
- Target sources:
~/DgrZone_Nextcloud/,~/OSIT_Nextcloud/
Agent workflow:
"Summarize my notes on WireGuard setup"
→ orchestrator calls ae_journal_search("wireguard")
→ returns matching entries
→ Claude synthesizes response
6. Intelligent Model Routing
Status: Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing —
chat, orchestrator, distill, coder, research roles each have their own primary/backup
model chain, and the UI role toggle lets users manually select which role handles a message.
Automatic task-characteristic routing (below) is still deferred.
Route automatically based on task characteristics rather than requiring manual selection:
| Task type | Backend | Reason |
|---|---|---|
| User-facing conversation | Claude | Quality prose, persona fidelity |
| Tool use / orchestration | Gemini API or local | Native function calling |
| Private / sensitive / offline | Local (Ollama) | No data leaves the network |
| Long context (>50k tokens) | Gemini 2.0 | 1M token context window |
| Fast/cheap simple queries | Local (E4B) | 25 t/s, no API cost |
Routing logic would live in llm_client.py or a new router.py — map task metadata to backend choice.
7. RAG via Open WebUI
Status: Future — Open WebUI already supports it.
Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via "files": [{"type": "collection", "id": "..."}].
Would complement AE Journals for local-only contexts where data shouldn't leave the network.
API reference: docs/OPEN_WEBUI_API.md — RAG section.
8. Agent Architecture Patterns — Research
Status: Research — review before building dev agent pipeline and local orchestrator.
The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:
- https://github.com/HarnessLab/claw-code-agent — Python reimplementation targeting local models (Qwen3-Coder recommended); most technically detailed
- https://github.com/ultraworkers/claw-code — Community porting/reverse-engineering project; reportedly has interesting detail in the source code itself
Ideas worth incorporating:
Tiered permission architecture — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a brief cron job accidentally in write mode.
Agent lineage tracking — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.
Cost/budget enforcement — hard token and cost budgets per operation, multiple budget types. ORCHESTRATOR_MAX_ROUNDS=10 is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).
Context compaction/snipping — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.
Nested agent delegation with dependency-aware batching — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).
File history journaling — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.
Plugin/manifest-based tool extensions — tools declared via manifest rather than hardcoded in __init__.py. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).
9. Permanent Fleet Hosting
Status: Deferred. Currently running on scott-lt-i7-rtx (gaming/agents laptop).
Long-term target: home server (always-on, Docker). docker-compose.yml already exists in the project root.
Deployment path:
- Copy to home server
- Configure reverse proxy (Nginx, already Docker-hosted)
- Update
cortex.dgrzone.com→ home server internal IP in pfSense - WireGuard required for all access — not internet-exposed
- Update
FLEET_MANIFEST.mdand CLAUDE.md fleet table
10. Cortex Mesh — Multi-Instance Fleet
Status: Concept — no design yet.
Rather than a single Cortex instance, each device in the fleet runs its own instance with its own persona(s), local models, and capabilities. Instances can delegate tasks to each other based on available resources and roles.
Use cases:
scott_lpt(edit/dev node) delegates code tasks toscott-lt-i7-rtx(GPU/Ollama host)- A background cron on one instance triggers an orchestrated task on another
- Each instance has its own "best available" model — mesh routing picks the right node automatically
Design questions to resolve:
- Auth between instances (shared JWT secret vs. per-instance API keys)
- How instances advertise capabilities (model registry over HTTP? shared Syncthing file?)
- Whether
ae_send_message/ the existing inbox system is the right coordination layer or if a dedicated Cortex-to-Cortex protocol is needed - Session continuity — does a conversation that starts on one node stay there, or can it migrate?
The Syncthing-synced home/ directory and shared model_registry.json already provide a natural foundation — instances share persona memory and context without a central DB.