Files
Cortex-Inara/documentation/ARCH__FUTURE.md
Scott Idem 658c508925 feat: multi-level agent management — background agents, lifecycle tools, 3-level hierarchy
agent_manager.py (new):
- AgentRecord dataclass: agent_id, level (1/2/3), role, task, status, started,
  parent_id (lineage), finished, result, notify, _task_ref
- register() / finish() / cancel_agent() / list_agents() / get() / set_task_ref()
- Calls notification.notify() on completion when notify=True (same channel as
  reminders and cron completions)
- 24-hour pruning of completed records on each new registration

spawn_agent (tools/agents.py):
- background=True: fires asyncio.create_task(), registers in agent_manager, returns
  agent_id string immediately — sync path unchanged (no regression)
- notify=True: push/Talk notification when the background task completes
- Level enforcement: _agent_level param tracks hierarchy depth; when spawning from
  Level 2, child automatically gets spawn_agent + aider_run denied so Level 3 agents
  cannot delegate further

New lifecycle tools (tools/agents.py + __init__.py):
- agent_status(agent_id) — status, role, level, elapsed, task, result preview; user-level
- agent_list(status, limit) — all agents for current user, newest first; user-level
- agent_cancel(agent_id) — kills background task; admin-only, confirm-required

tests/test_agent_manager.py (new, 41 tests):
- agent_manager CRUD, pruning, notification hook
- spawn_agent background: returns immediately, completes async, timeout, failure
- Level enforcement: L1→L2 permits spawn, L2→L3 auto-denies; explicit tool_list path
- agent_status / agent_list / agent_cancel output formatting
- aider_run background: returns agent_id, completes async, sync path unchanged
- All tests run without browser or Cortex service (~2.5s total)
  Run: cd cortex && .venv/bin/python -m pytest tests/test_agent_manager.py -v

Docs: ARCH__FUTURE.md §13 (full design), ROADMAP.md, TODO__Agents.md, MASTER.md,
HELP.md (orchestrator description corrected, tool schema line updated to reflect
keyword routing), CLAUDE.md tool count 66→69.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 22:40:20 -04:00

27 KiB
Raw Blame History

Architecture: Planned Features

What's next and how it's designed to work. Last updated: 2026-05-11

For the current task list see TODO__Agents.md. For phases and priorities see ROADMAP.md.


1. Local Orchestrator

Status: Partially built — openai_orchestrator.py exists and is wired into POST /orchestrate. When the orchestrator role in the model registry resolves to a local_openai model, it routes there automatically. Remaining work is quality/reliability parity with the Gemini orchestrator, not ground-up design.

Same ReAct tool loop as the Gemini API orchestrator, driven by a local model via Open WebUI's OpenAI-compatible API. Enables offline/private agent tasks with no API cost.

Why local models work for this now: Gemma 4 E4B and 26B A4B both support OpenAI tools / tool_choice function calling. The tool schema is nearly identical to Gemini's FunctionDeclaration — minor field renaming only.

Design:

POST /orchestrate  (role resolves to local_openai model)
    ↓
openai_orchestrator.py
    • converts tools/ to OpenAI tools format
    • POST /api/chat/completions with tools array
    • parse tool_calls response
    • execute tool, append result
    • loop until finish_reason: "stop"
    ↓
response returned (local model generates final answer)

Model selection:

  • Gemma 4 E4B (25 t/s, 72k ctx) — interactive/fast tasks
  • Gemma 4 26B A4B (9 t/s, 50k ctx) — heavier reasoning, background tasks

Context budget per iteration (system prompt + memory + tool results + history):

  • Small model: budget ~4050k tokens per round
  • Medium model: budget ~3540k tokens per round

Context compaction (to implement): automatically trim stale tool results mid-run when approaching the budget ceiling, preserving only the most recent N tool exchanges.

Full API reference: docs/OPEN_WEBUI_API.md


2. Orchestrator Tool Expansions

Status: Ongoing. Current tool count: 45. Previously planned tools are all complete.

Completed

All originally planned tools are live: cortex_restart, cortex_logs, http_fetch, file_list, file_write, nc_talk_send, email_send, web_push, agent_notes_*.

Next additions

Datetime note: The current date and time is already injected into every system prompt via context_loader.py (--- System --- Current date and time: ...). A dedicated datetime_now tool is not needed — the timestamp is always in context.

Completed Round 2

Tool Notes
session_search tools/files.py — full-text grep across session logs; params: query, limit (max 20); own sessions only via ContextVars. 2026-05-08
reminders due dates tools/reminders.py — optional due: YYYY-MM-DD on reminders_add; load_due_reminders() suppresses future-dated entries from context. 2026-05-08
spawn_agent tools/agents.py — sync sub-agent via role model; semaphore per host (max_concurrent in host schema); asyncio.wait_for timeout; admin-only. 2026-05-08

Remaining Round 2

Tool Module Priority Description
http_post web.py Medium POST to an external URL — for webhooks, REST APIs, form submissions. Requires a per-user host allowlist (same pattern as email_send) to prevent misuse.
nc_talk_history notify.py Medium Read recent messages from a Nextcloud Talk conversation. The bot can send but cannot read — adding read capability gives it full context before replying.
task_list priority filter tasks.py Low task_list accepts status but not priority. Add priority param so the agent can ask "what are my high-priority tasks?" without returning everything.
http_fetch max_chars web.py Low Currently hardcapped at 8,192 chars. Accept optional max_chars param so callers can request more or less content.

Not needed / deferred

  • datetime_now — already in system prompt (see note above)
  • memory_read — memory files are already loaded into system prompt at Tier 2+; a tool adds no value except at Tier 1, which is a rare edge case
  • Calculator — modern models handle arithmetic well; shell_exec covers edge cases for admins
  • Google Calendar — useful but requires Google API OAuth scope expansion; defer until auth layer supports it

3. Dev Agent Pipeline

Status: Design complete, not yet built. Review §8 (Agent Architecture Patterns) before starting.

Accept a plain-English task, implement code changes, verify them, and present for human approval before committing.

Task (chat / Gitea issue / Kanban)
    ↓
Orchestrator — reads relevant files, routes to specialist
    ↓
Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
Human approval gate
    • summary in Cortex UI or NC Talk
    • approve → commit (+ optional push)
    • reject → feedback back to specialist

Specialists (both Claude CLI):

  • Frontend — working dir: ~/OSIT_dev/aether_app_sveltekit/ — runs svelte-check after every change
  • Backend — working dir: ~/OSIT_dev/aether_api_fastapi/ — runs py_compile + unit tests

Supervisor returns structured JSON:

{
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile"],
  "checks_failed": [],
  "review_notes": "...",
  "commit_message": "..."
}

4. Gitea Integration

Status: Not started. pfSense port forward for SSH already confirmed working.

  • Webhooks → Cortex: push/PR/issue events → POST /webhook/gitea → orchestrator
    • Router pattern already established; add cortex/routers/gitea.py
  • Gitea Actions CI: .gitea/workflows/check.yml — run py_compile/svelte-check on push
  • Cortex → Gitea: after human approval, call Gitea API to create PR or push branch

SSH clone/push: git clone ssh://git@git.dgrzone.com:2222/<user>/<repo>.git


5. Knowledge Layer (AE Journals)

Status: Tools exist, import script not yet built.

AE Journals becomes the searchable long-term knowledge base. Complements memory distillation: memory files cover "what have we been working on lately"; Journals cover "what do I know about topic X".

Existing tools: ae_journal_search, ae_journal_entry_create — already in orchestrator tool suite.

Import script (to build):

  • Walk a markdown directory (Nextcloud, agents_sync docs)
  • Chunk by H2 section
  • Search before creating (deduplication)
  • Tag from frontmatter, filename, directory path
  • Target sources: ~/DgrZone_Nextcloud/, ~/OSIT_Nextcloud/

Agent workflow:

"Summarize my notes on WireGuard setup"
    → orchestrator calls ae_journal_search("wireguard")
    → returns matching entries
    → Claude synthesizes response

6. Intelligent Model Routing

Status: Partially addressed. Model Registry V2 (2026-04-27) introduced role-based routing — chat, orchestrator, distill, coder, research roles each have their own primary/backup model chain, and the UI role toggle lets users manually select which role handles a message. Automatic task-characteristic routing (below) is still deferred.

Route automatically based on task characteristics rather than requiring manual selection:

Task type Backend Reason
User-facing conversation Claude Quality prose, persona fidelity
Tool use / orchestration Gemini API or local Native function calling
Private / sensitive / offline Local (Ollama) No data leaves the network
Long context (>50k tokens) Gemini 2.0 1M token context window
Fast/cheap simple queries Local (E4B) 25 t/s, no API cost

Routing logic would live in llm_client.py or a new router.py — map task metadata to backend choice.


7. RAG via Open WebUI

Status: Future — Open WebUI already supports it.

Feed Nextcloud documents or session logs into Open WebUI knowledge collections. Reference them in local model chat via "files": [{"type": "collection", "id": "..."}].

Would complement AE Journals for local-only contexts where data shouldn't leave the network.

API reference: docs/OPEN_WEBUI_API.md — RAG section.


8. Agent Architecture Patterns — Research

Status: Research — review before building dev agent pipeline and local orchestrator.

The Claude Code system prompt was leaked in early April 2026. Two reimplementation repos are worth reading for design ideas before building out the dev agent pipeline and local orchestrator:

Ideas worth incorporating:

Tiered permission architecture — explicit read-only / write / shell / unsafe modes, each requiring an opt-in flag. Currently Cortex has implicit trust for agent operations. Relevant once the dev agent pipeline is writing and executing code — don't want a brief cron job accidentally in write mode.

Agent lineage tracking — agent manager records which agent spawned which sub-agent. Useful for debugging multi-step orchestrated tasks and essential for the supervisor → specialist → approval gate chain.

Cost/budget enforcement — hard token and cost budgets per operation, multiple budget types. ORCHESTRATOR_MAX_ROUNDS=10 is Cortex's only guardrail today. Worth adding a token budget check to the tool loop, especially relevant for local models with hard context ceilings (72k/50k practical).

Context compaction/snipping — automatic mid-session context trimming when approaching limits. Important for long orchestrator runs against local models. Could trim tool results that are no longer needed for the current reasoning step.

Nested agent delegation with dependency-aware batching — sub-agents that know their parent; parallel sub-tasks batched by dependency order. Directly applicable to the dev agent pipeline (orchestrator → specialist → supervisor, with some steps parallelizable).

File history journaling — beyond session logs, a journal of what files changed and why, with replay summaries. Different from memory distillation — more like a git log for agent actions. Could complement the supervisor agent's diff review.

Plugin/manifest-based tool extensions — tools declared via manifest rather than hardcoded in __init__.py. Would make adding new orchestrator tools less invasive. Worth considering before the tool suite grows much larger (currently 27 tools).


9. Permanent Fleet Hosting

Status: Deferred. Currently running on scott-lt-i7-rtx (gaming/agents laptop).

Long-term target: home server (always-on, Docker). docker-compose.yml already exists in the project root.

Deployment path:

  1. Copy to home server
  2. Configure reverse proxy (Nginx, already Docker-hosted)
  3. Update cortex.dgrzone.com → home server internal IP in pfSense
  4. WireGuard required for all access — not internet-exposed
  5. Update FLEET_MANIFEST.md and CLAUDE.md fleet table

10. Cortex Mesh — Multi-Instance Fleet

Status: Concept — no design yet.

Rather than a single Cortex instance, each device in the fleet runs its own instance with its own persona(s), local models, and capabilities. Instances can delegate tasks to each other based on available resources and roles.

Use cases:

  • scott_lpt (edit/dev node) delegates code tasks to scott-lt-i7-rtx (GPU/Ollama host)
  • A background cron on one instance triggers an orchestrated task on another
  • Each instance has its own "best available" model — mesh routing picks the right node automatically

Design questions to resolve:

  • Auth between instances (shared JWT secret vs. per-instance API keys)
  • How instances advertise capabilities (model registry over HTTP? shared Syncthing file?)
  • Whether ae_send_message / the existing inbox system is the right coordination layer or if a dedicated Cortex-to-Cortex protocol is needed
  • Session continuity — does a conversation that starts on one node stay there, or can it migrate?

The Syncthing-synced home/ directory and shared model_registry.json already provide a natural foundation — instances share persona memory and context without a central DB.

11. LLM Wiki — Persistent Knowledge Compilation (Karpathy Pattern)

Status: Concept — no design yet. Inspired by Karpathy's llm-wiki gist.

Core idea: Instead of treating AE Journals as an archive you retrieve from, evolve them into a living wiki that the LLM incrementally builds and maintains. When a new source is added, the LLM doesn't just index it — it reads it, extracts key information, and integrates it into the existing wiki: updating entity pages, revising topic summaries, flagging contradictions, strengthening or challenging the evolving synthesis. Knowledge is compiled once and kept current, not re-derived on every query.

This is a philosophical shift from our current approach (RAG/retrieval) toward compounding knowledge — the wiki gets richer with every source added and every question asked.

Three-Layer Architecture

Raw Sources (immutable)          ↓
    → LLM reads, extracts, cross-references
Wiki (LLM-maintained markdown)  ← the persistent artifact
    → Human reads, LLM writes
Schema (CLAUDE.md / AGENTS.md)  ← configuration + conventions
  1. Raw sources — curated, immutable originals (articles, papers, session logs, transcripts). LLM reads from them, never modifies them.
  2. The wiki — directory of LLM-generated markdown files: summaries, entity pages, concept pages, comparisons, synthesis. The LLM owns this layer entirely. Creates pages, updates them when new sources arrive, maintains cross-references.
  3. Schema — a configuration document (analogous to our PROTOCOLS.md) that tells the LLM how the wiki is structured, what conventions to follow, and what workflows to use when ingesting sources or answering questions. Co-evolved with the human over time.

Operations

Ingest. Drop a new source into the raw collection and tell the LLM to process it. Flow: LLM reads source → discusses key takeaways with human → writes summary page → updates index → updates relevant entity/concept pages (a single source might touch 10-15 pages) → appends to log. Human stays involved, guiding emphasis.

Query. Ask questions against the wiki. LLM reads the index to find relevant pages, drills in, synthesizes an answer with citations. Key insight: good answers get filed back into the wiki as new pages. A comparison table, an analysis, a connection discovered — these are valuable and shouldn't disappear into chat history.

Lint. Periodic health check: contradictions between pages, stale claims superseded by newer sources, orphan pages with no inbound links, missing cross-references, data gaps that could be filled with a web search.

Index and Log (Two Navigation Files)

index.md — content-oriented catalog. Every wiki page listed with link, one-line summary, and optional metadata (date, source count). Organized by category. LLM updates on every ingest. At moderate scale (~100 sources, ~hundreds of pages), this replaces the need for embedding-based RAG.

log.md — chronological, append-only record of what happened and when (ingests, queries, lint passes). Each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title) making it parseable with simple tools like grep "^## \[" log.md | tail -5.

Applicability to Cortex / Inara

This pattern maps naturally to several existing concepts:

Karpathy Concept Cortex Equivalent Gap
Raw sources Session logs, imported docs No curated raw-source collection yet
Wiki pages AE Journals Journals are entry-based, not interlinked-wiki-based
Index + Log No equivalent Would need wiki_index.md and wiki_log.md
Schema/Protocols PROTOCOLS.md, OPERATIONS.md Not configured for wiki maintenance workflows
Lint operation No equivalent No periodic wiki health-check exists
Answers filed back Session chat history Answers are lost after session (unless distilled)
Obsidian as IDE Cortex UI / Files panel Files panel could serve as the browsing surface

Next steps (if pursued):

  1. Design the wiki directory structure within agents_sync/ — separate from session logs and memory files
  2. Define the schema document — what goes in a wiki page, cross-reference format, category taxonomy
  3. Build an ingest tool/script that reads a source and updates wiki pages (LLM-driven)
  4. Build a lint cron job that health-checks the wiki periodically
  5. Consider Obsidian compatibility for human browsing of the wiki graph

13. Multi-Level Agent Management

Status: Design complete — implementation not yet started. See TODO__Agents.md for the task breakdown.

Cortex personas can spawn specialized sub-agents to handle parallel or long-running work. Sub-agents can in turn spawn lightweight support agents for simple subtasks. The hierarchy is capped at three levels to prevent runaway delegation.

Level Definitions

Level Name Created by Can spawn Tool scope
1 Cortex Persona (Inara) HTTP request / cron Level 2 Full orchestrator tool set
2 Specialized Sub-Agent Level 1 spawn_agent Level 3 only Role-scoped; spawn_agent auto-restricted so children are Level 3
3 Basic Support Agent Level 2 spawn_agent Nothing Narrow tool set; spawn_agent and aider_run denied

Examples:

  • Level 1 spawns a Level 2 Coder agent (has file + git + shell tools; can spawn a Level 3 syntax-checker)
  • Level 1 spawns a Level 2 Research agent (web tools only; can spawn a Level 3 web reader for parallel page fetches)
  • Level 2 spawns a Level 3 Support agent for a focused subtask (web_search only, no writes, no further delegation)

Core Problem: Everything is Currently Synchronous

Both spawn_agent and aider_run block the calling coroutine for their full duration (default 120s / 300s respectively). Level 1 (Inara) cannot respond to the user, send notifications, or inspect other agents while waiting. For 5-minute Aider runs or multi-step research agents this is unusable — the user sees nothing until completion or timeout.

Design

1. Agent Manager (cortex/agent_manager.py)

A lightweight in-process registry of running and recently completed agents. Module-level dict protected by asyncio.Lock():

@dataclass
class AgentRecord:
    agent_id: str           # UUID
    level: int              # 1 / 2 / 3
    role: str               # e.g. "coder", "research"
    task: str               # first 200 chars of the task
    status: str             # running / done / failed / cancelled / timeout
    started: datetime
    finished: datetime | None
    parent_id: str | None   # lineage — which agent spawned this one
    result: str | None      # populated on completion (first 500 chars)
    notify: bool            # fire web_push/NC Talk notification on completion
    user: str

_agents: dict[str, AgentRecord] = {}
_lock = asyncio.Lock()

On completion, the manager calls notification.py notify() if notify=True — the same function used by reminder checks and cron completions. Completed agents stay in the registry for 24 hours then are pruned on next access.

2. Background Mode for spawn_agent

Add background: bool = False and notify: bool = False to spawn_agent. When background=False (default): existing synchronous blocking behaviour — unchanged, no regression. When background=True: wraps the run in asyncio.create_task(), registers in the agent manager, returns an agent_id string immediately.

# Level 1 — non-blocking delegation:
agent_id = await spawn_agent(
    task="Research Zigbee mesh repeaters; summarize findings to my journal",
    role="research",
    background=True,
    notify=True,        # web_push + NC Talk when done
)
# Returns "550e8400-..." immediately. Inara continues responding to the user.

3. Agent Lifecycle Tools

Three new tools, wired into cortex/tools/__init__.py under the "Agents" category:

Tool Params Description
agent_status(agent_id) agent_id: str Status, role, task, elapsed, result preview
agent_list(status=None, limit=10) status: str | None All agents for current user; filter by status
agent_cancel(agent_id) agent_id: str Cancel a running background agent (admin, confirm-required)

Level 1 can call these between tool rounds to check on delegated work without blocking.

4. Level Enforcement

agent_level is passed through spawn_agent calls as a ContextVar so each agent knows where it sits in the hierarchy. Enforcement is automatic and simple:

  • L1 → spawns L2: spawn_agent called normally. Child agent inherits role tools.
  • L2 → spawns L3: spawn_agent automatically adds deny_tools=["spawn_agent", "aider_run"] to the child's effective tool set. Level 3 agents cannot further delegate.
  • Level 3: spawn_agent and aider_run are never in the tool list.

Level is stored in AgentRecord.level — the lineage (parent_id) provides a full call tree.

5. aider_run Background Mode

Add background: bool = False and notify: bool = False to aider_run. When True, runs the Aider subprocess via asyncio.create_task(), registers in the agent manager, returns agent_id immediately. When called in background mode, aider_run is removed from CONFIRM_REQUIRED — the user is not blocking on a confirmation gate since the call returns instantly.

# Level 1 or 2 — fire and forget a code change:
agent_id = await aider_run(
    project="cortex",
    task="Add max_chars param to http_fetch in tools/web.py, cap at 32768",
    background=True,
    notify=True,
)

Implementation Order

  1. agent_manager.py — AgentRecord + registry CRUD + completion notification hook. Foundation for everything else; ~100 lines.
  2. spawn_agent background modebackground + notify + agent_level params; asyncio.create_task(); registers in manager. Existing sync path unchanged.
  3. agent_status / agent_list / agent_cancel — wire into __init__.py; add to TOOL_CATEGORIES["Agents"], TOOL_ROLES (cancel = admin), CONFIRM_REQUIRED (cancel).
  4. Level enforcementagent_level ContextVar; auto deny_tools at L2→L3 boundary.
  5. aider_run background mode — same pattern as step 2.

Files to Create/Modify

File Change
cortex/agent_manager.py New — AgentRecord, registry dict, start/finish/cancel/list functions
cortex/tools/agents.py Add background, notify, agent_level to spawn_agent; add agent_status, agent_list, agent_cancel functions + declarations
cortex/tools/aider.py Add background, notify params; register with agent_manager when background
cortex/tools/__init__.py Register new agent tools; update TOOL_CATEGORIES, TOOL_ROLES, CONFIRM_REQUIRED

See §12 for the existing allow_tools / deny_tools per-call restrictions that level enforcement builds on.


12. Spawner-Level Tool Restrictions — spawn_agent Permission Control

Status: Design complete, not yet built.

Problem

spawn_agent currently grants sub-agents the full tool set of whatever role they're assigned. The spawning agent (Inara) cannot restrict a sub-agent to a subset of tools — the role config is the only gate. This means every spawned agent implicitly has access to everything the role allows, including potentially destructive operations (shell_exec, file_write, cortex_restart).

Design

Add two optional parameters to spawn_agent: allow_tools and deny_tools.

  • allow_tools — explicit allow list. If set, the sub-agent can only use tools in this list (intersected with what the role allows). If omitted, the role's full tool set is available.
  • deny_tools — explicit deny list. If set, these tools are removed from whatever the sub-agent would otherwise have access to. If omitted, nothing is denied beyond what the role already excludes.

Effective tool set formula:

effective = (role_base_tools ∩ allow_tools) ∩ (role_base_tools \ deny_tools)

Where role_base_tools is the full tool set the role config grants, allow_tools is the spawner's allow list (default: full set), and deny_tools is the spawner's deny list (default: empty set).

Usage Examples

# Research agent — web only, no file access, no shell
spawn_agent(
    "Research the latest on Zigbee mesh repeaters",
    role="chat",
    allow_tools=["web_search", "web_read", "http_fetch"]
)

# Code review — read-only, no shell
spawn_agent(
    "Review this file for security issues",
    role="coder",
    deny_tools=["shell_exec", "file_write", "cortex_restart", "cortex_update"]
)

# Full access (same as today — omit both params)
spawn_agent("Refactor the auth module", role="coder")

# Narrow data migration — just file ops, no web
spawn_agent(
    "Migrate the task files to the new format",
    role="coder",
    allow_tools=["file_read", "file_write", "file_list"]
)

Implementation Plan

1. Model registry / role config — no changes needed.

The role config (role_cfg.get("tools")) remains the authoritative ceiling. No schema changes at this level.

2. spawn_agent function — new parameters + filtering logic.

File: cortex/tools/agents.py. Add allow_tools and deny_tools as optional list[str] | None parameters. After resolving tool_list from role_cfg.get("tools"), apply the filter:

if allow_tools is not None:
    tool_list = [t for t in tool_list if t in allow_tools]
if deny_tools is not None:
    tool_list = [t for t in tool_list if t not in deny_tools]

3. Declaration — update the Gemini FunctionDeclaration.

Add allow_tools and deny_tools as optional parameters in the declaration so the orchestrator knows they exist.

4. Confirmation gate behavior — explicit.

If a sub-agent with restricted tools hits a confirmation gate (e.g., trying shell_exec with it denied), the gate blocks as normal — it does not silently fail. The sub-agent returns the "requires user confirmation" message as it already does.

What Doesn't Change

  • Existing spawn_agent calls with no allow_tools/deny_tools continue to work exactly as before
  • Role config remains the authoritative max — no security regression
  • No schema changes to model_registry.json
  • No UI changes needed