feat: tool schema optimization, keyword routing, aider_run coding agent

Tool schema optimization (PLAN__Tool_Schema_Optimization.md Phases 1-3): - model_registry.py: ROLE_DEFAULT_TOOLS — distill gets [], research/coder get narrow tool lists by default; applied in get_role_config() when user hasn't configured a custom list - openai_orchestrator.py: keyword routing via narrow_tools_by_keywords() — scans user message + last assistant turn; narrows active schemas to matched categories only (e.g. "weather" → 3 web tools instead of 69); zero tools sent for pure chat - openai_orchestrator.py: _get_cached_tools() — module-level schema cache keyed by (role, sorted_tool_list, risk_params); eliminates redundant schema rebuilds - openai_orchestrator.py: _TOOL_SCHEMA_OVERHEAD 3000 → 500 tokens (schemas now excluded from the per-call fixed estimate since they're cached separately) - tools/__init__.py: CATEGORY_TOOL_MAP + _KEYWORD_CATEGORY_MAP + classify_tool_categories() + narrow_tools_by_keywords() — the classifier logic lives here so both orchestrators can share it aider_run tool (cortex/tools/aider.py): - Invokes Aider as a subprocess with --message --yes-always --no-pretty --no-stream - Project aliases: cortex / aether_api / aether_frontend / aether_container - Auto-injects OpenRouter API key from Cortex model registry (no ~/.env needed) - background=True fires async + registers in agent_manager; notify=True sends push notification on completion - admin-only, confirm-required, TOOL_RISK=high - .gitignore: added .aider.chat.history.md / .aider.input.history / .aider.llm.history Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 22:39:44 -04:00
parent 29940c299b
commit 29d8aa4aae
6 changed files with 830 additions and 10 deletions
--- a/documentation/PLAN__Tool_Schema_Optimization.md
+++ b/documentation/PLAN__Tool_Schema_Optimization.md
@@ -0,0 +1,362 @@
+# PLAN — Reduce Tool Schema Overhead in Cortex
+
+**Goal:** Eliminate the per-round, per-message transmission of all 45 tool definitions.
+Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a
+relevant subset for orchestrated work.
+
+**Status:** Draft — ready for Claude Code implementation.
+
+---
+
+## Background
+
+Every orchestrated (⚡ toggled on) message triggers a ReAct tool loop. The full 45-tool
+schema is rebuilt and transmitted **on every round of every call** — including rounds
+where no tool is invoked and messages where no tool is needed at all. This wastes
+thousands of tokens per interaction.
+
+The architecture already has the building blocks for a fix: role configs support a
+`tools` allow-list, and `get_openai_tools_for_role()` already accepts filtering
+parameters. They're just not being wired together effectively.
+
+---
+
+## Phase 1 — Role-Based Tool Filtering (Foundation)
+
+**Effort:** Small. **Impact:** High.
+
+### What
+
+Define which tools each role actually needs, then enforce the filtering so roles
+only receive their relevant tool subset.
+
+### Implementation
+
+**1. Audit every role and define tool lists.**
+
+| Role | Tools needed | Approx count |
+|------|-------------|-------------|
+| `chat` | None (zero tools — should never be in the orchestration loop) | 0 |
+| `orchestrator` | web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin) | 25-30 |
+| `distill` | None (pure text processing) | 0 |
+| `coder` | file (admin), shell (admin), git, file_diff, file_syntax_check | 8-10 |
+| `research` | web_search, web_read, http_fetch | 3 |
+| `admin` (role) | All 45 (admin-level access) | 45 |
+
+**2. Store tool lists per role in `config.yaml` or the model registry defaults.**
+The role config already has a `tools` field — populate it with the lists above.
+
+**3. Enforce in `get_openai_tools_for_role()`.**
+The function is called from `openai_orchestrator.py` around line 451. Currently if
+`tools` is empty/missing it returns all tools. Change so that:
+
+- If role config has a `tools` list → return only those tools
+- If role config has `tools: false` → return empty list
+- If role config has no `tools` field → return all (backward compat)
+
+At the call site (`_run_from_messages`), pass the role's tool allow-list into
+`get_openai_tools_for_role()` via the `tool_list` parameter that already exists.
+
+### Files to change
+
+- `cortex/openai_orchestrator.py` — wire role config `tools` into the call to
+  `get_openai_tools_for_role()`
+- `cortex/model_registry.py` — ensure `get_role_config()` returns the `tools` field
+  (it does already, line 487)
+- `cortex/config.py` or `home/{user}/model_registry.json` — define the tool lists
+  per default role
+
+---
+
+## Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)
+
+**Effort:** Small. **Impact:** Very High.
+
+### What
+
+Before entering the ReAct tool loop, scan the user's message with a lightweight
+keyword classifier to determine which tool categories are relevant. Only include
+tools from matched categories — typically 3-8 tools instead of 45.
+
+This is the **core optimization.** For the 80%+ of messages that only need a narrow
+set of tools (or none at all), this eliminates the bulk of schema overhead on every
+round.
+
+### The Hybrid Stack
+
+```
+User message
+    ↓
+[1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role
+    ↓
+[2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools
+    ↓
+[3] ReAct loop — only transmitting the relevant subset each round
+```
+
+If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you
+think?"), it returns an empty tool set — effectively routing the message as a pure
+chat interaction with zero tool overhead.
+
+### Keyword Category Map
+
+Each category maps keywords → tool names. Simple regex/contains matching.
+
+| Category | Trigger keywords | Tools included |
+|----------|-----------------|---------------|
+| `web` | search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research | web_search, web_read, http_fetch |
+| `web_post` | post to, send to, webhook, trigger, notify | http_post |
+| `file` | read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check | file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat |
+| `git` | git, commit, branch, pushed, pulled, merge, repo, repository | git_status, git_log, git_diff |
+| `system` | restart, update, status, logs, deploy, shell, command, run, health, is it running | cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec |
+| `tasks` | task, todo, to-do, to do, add task, create task, what's on my list, pending | task_list, task_create, task_update, task_complete |
+| `cron` | schedule, cron, every day, every week, recurring, automate, job | cron_list, cron_add, cron_remove, cron_toggle |
+| `reminders` | remind, reminder, remember, don't forget | reminders_add, reminders_list, reminders_remove, reminders_clear |
+| `scratchpad` | scratch, scratchpad, working notes, jot down, notepad | scratch_read, scratch_write, scratch_append, scratch_clear |
+| `ha` | home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature | ha_get_state, ha_get_states, ha_call_service |
+| `aether` | journal, aether, note entry, log entry, search journals, ae_ | ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend |
+| `aether_db` | database, query, sql, select, db, table, schema, maria | ae_db_query, ae_db_describe, ae_db_show_view |
+| `notifications` | notify, push, send email, email, message, talk, nextcloud | web_push, email_send, nc_talk_send, nc_talk_history |
+| `agents` | spawn, sub-agent, delegate, agent | spawn_agent |
+| `notes` | agent notes, private notes, my notes | agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear |
+| `session` | remember, session, history, last time, what did we, earlier, yesterday, last week | session_read, session_search |
+| `ae_tasks` | ae task, kanban, board | ae_task_list |
+| `claude` | claude, allow directory, permissions | claude_allow_dir |
+
+### Implementation
+
+In `openai_orchestrator.py`, before the ReAct loop starts:
+
+```python
+def _classify_tool_categories(user_message: str) -> list[str]:
+    """Classify a user message into tool categories based on keywords.
+    
+    Returns a list of category names whose tools should be included.
+    Returns empty list if no categories match (pure chat).
+    """
+    message_lower = user_message.lower()
+    
+    category_keywords = {
+        "web":          ["search", "look up", "what is", "who is", "weather",
+                         "forecast", "news", "find on", "google", "website",
+                         "article", "research", "temperature"],
+        "web_post":     ["post to", "send to", "webhook", "trigger webhook"],
+        "file":         ["read file", "show file", "list file", "directory",
+                         "grep", "search in", "find in", "diff", "compare",
+                         "syntax check", "open file"],
+        "git":          ["git", "commit", "branch", "pulled", "merged",
+                         "repository", "repo"],
+        "system":       ["restart", "update", "status", "logs", "deploy",
+                         "run command", "shell", "is it running", "health"],
+        "tasks":        ["task", "todo", "to-do", "to do", "add task",
+                         "create task", "pending", "what's on my list"],
+        "cron":         ["schedule", "cron", "every day", "every week",
+                         "recurring", "automate", "job"],
+        "reminders":    ["remind", "reminder", "remember", "don't forget"],
+        "scratchpad":   ["scratch", "scratchpad", "working note", "jot down",
+                         "notepad"],
+        "ha":           ["home assistant", "light", "thermostat", "turn on",
+                         "turn off", "switch", "sensor", "temperature in",
+                         "kitchen", "bedroom", "garage"],
+        "aether":       ["journal", "aether journal", "note entry", "log entry",
+                         "search journal", "ae_journal"],
+        "aether_db":    ["database", "query", "sql", "select", "db", "table",
+                         "schema", "maria", "run query"],
+        "notifications":["notify", "push notification", "send email", "email",
+                         "talk message", "nextcloud"],
+        "agents":       ["spawn", "sub-agent", "delegate", "spawn agent"],
+        "notes":        ["agent notes", "private notes", "my notes",
+                         "agent_notes"],
+        "session":      ["remember", "session", "history", "last time",
+                         "what did we", "earlier", "yesterday", "last week",
+                         "previously"],
+        "ae_tasks":     ["ae task", "kanban", "board", "ae_task"],
+        "claude":       ["claude allow", "claude directory"],
+    }
+    
+    matched = []
+    for category, keywords in category_keywords.items():
+        if any(kw in message_lower for kw in keywords):
+            matched.append(category)
+    
+    return matched
+```
+
+Then at the orchestration entry point, after determining the role's base tool list
+(Phase 1), apply the keyword filter:
+
+```python
+# Phase 1: Get role's base tool list
+role_tools = get_role_config(username, role).get("tools")
+
+# Phase 2: Dynamically narrow based on message content
+matched_categories = _classify_tool_categories(user_message)
+if matched_categories:
+    category_tool_map = { ... }  # defined at module level
+    dynamic_tools = []
+    for cat in matched_categories:
+        dynamic_tools.extend(category_tool_map.get(cat, []))
+    # Intersect with role_tools so we never grant more than the role allows
+    if role_tools:
+        dynamic_tools = [t for t in dynamic_tools if t in role_tools]
+    active_tools = get_openai_tools_for_role(
+        role=user_role,
+        tool_list=dynamic_tools or None
+    )
+else:
+    # No keywords matched — likely causal chat route to /chat
+    # or use empty tool list
+    active_tools = []
+```
+
+### Edge Cases to Handle
+
+1. **Multiple categories match:** Union all matched tool sets. The `for cat in matched_categories` loop handles this naturally.
+
+2. **No categories match:** Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase.
+
+3. **Ambiguous short messages:** "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent.
+
+4. **Over-broad keywords:** "search" in "search journals" could trigger both `web` and `aether`. The union handles this — both categories' tools are included, which is what you want.
+
+### File to change
+
+- `cortex/openai_orchestrator.py` — add `_classify_tool_categories()` function and
+  wire it into the orchestration entry point before the ReAct loop
+
+---
+
+## Phase 3 — Cache Tool Schema per Session
+
+**Effort:** Medium. **Impact:** Medium.
+
+### What
+
+The tool schema doesn't change between rounds of the same session for a given role.
+After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical
+every round. Cache them.
+
+### Implementation
+
+Add a session-scoped cache in `openai_orchestrator.py`:
+
+```python
+# Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}"
+_tool_schema_cache: dict[str, list[dict]] = {}
+
+def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]:
+    key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}"
+    if key in _tool_schema_cache:
+        return _tool_schema_cache[key]
+    schemas = get_openai_tools_for_role(role=role, tool_list=tool_list)
+    _tool_schema_cache[key] = schemas
+    return schemas
+```
+
+Invalidation: Cache key includes the tool list, so if the dynamic classifier returns
+different categories on the next message, it gets a fresh cache entry. No explicit
+invalidation needed.
+
+### File to change
+
+- `cortex/openai_orchestrator.py` — add cache dict and lookup before calling
+  `get_openai_tools_for_role()`
+
+---
+
+## Phase 4 — Reduce Default Max Rounds
+
+**Effort:** Trivial. **Impact:** Low-to-medium.
+
+### What
+
+Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted
+schema transmissions on edge cases.
+
+### Implementation
+
+1. Make `max_rounds` configurable per model in the model registry (it already exists
+   in some model configs — see `home/brian/model_registry.json` line 42).
+2. Read it from the model config during orchestration instead of using the global
+   `.env` value.
+3. Lower the default from 10 to 5.
+
+### Files to change
+
+- `cortex/.env` — change `ORCHESTRATOR_MAX_ROUNDS=10` to `=5`
+- `cortex/openai_orchestrator.py` — read per-model `max_rounds` from `model_cfg`
+  instead of only from settings
+
+---
+
+## Phase 5 — UI Improvements (Independent)
+
+**Effort:** Small. **Impact:** Medium (UX).
+
+### What
+
+Make the tool mode indicator more obvious so the user can quickly tell whether
+they're incurring the tool tax.
+
+### Ideas
+
+- Change ⚡ color: green when tools are on, gray when off
+- Swap icon: ⚡ (tools) vs. 💬 (chat only)
+- Add tooltip: "Tools enabled — all 45 tool schemas sent with each message"
+- Optional: add a "Quick Question" button that sends to `/chat` directly, bypassing
+  the orchestrator entirely
+
+### Files to change
+
+- Svelte UI component — likely `ChatInput.svelte` or the chat mode toggle component
+
+---
+
+## Recommended Execution Order
+
+1. **Phase 1** (role filtering) — foundation. Defines the base tool set per role.
+2. **Phase 2** (keyword routing) — **the big one.** Slashes 45 tools → 3-8 for the
+   vast majority of messages. Builds on Phase 1's role filtering.
+3. **Phase 4** (lower max_rounds) — trivial change, do alongside Phase 2.
+4. **Phase 3** (schema caching) — more involved, compounds savings from Phase 2.
+5. **Phase 5** (UI) — independent UX polish, can be done any time.
+
+### Quick Win Path (Recommended First Session)
+
+Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in
+`openai_orchestrator.py` and `model_registry.py` — the same few files. Estimated
+effort: 45-60 minutes of coding.
+
+Phase 3 (caching) is a separate, focused session afterward.
+
+---
+
+## Appendix A: Code Locations (from grep audit 2026-05-15)
+
+| What | File | Line |
+|------|------|------|
+| `get_openai_tools_for_role` definition | `cortex/tools.py` | ~540 |
+| Call site (decides active_tools) | `cortex/openai_orchestrator.py` | ~449 |
+| `_run_from_messages()` tool loop | `cortex/openai_orchestrator.py` | ~260 |
+| Role config tools field | `cortex/model_registry.py` | ~487 |
+| `get_role_config()` | `cortex/model_registry.py` | ~473 |
+| `save_role_config()` (tools allow-list) | `cortex/model_registry.py` | ~455 |
+| Global `ORCHESTRATOR_MAX_ROUNDS` | `cortex/.env` | 35 |
+| `REQUIRED_ROLES` | `cortex/model_registry.py` | 163 |
+| `DEFINED_ROLES` config | `cortex/config.py` | 80 |
+| Per-model `max_rounds` example | `home/brian/model_registry.json` | 42 |
+
+## Appendix B: Token Savings Estimate
+
+| Scenario | Before (per round) | After Phase 1 | After Phase 1+2 | After All Phases |
+|----------|-------------------|--------------|-----------------|-----------------|
+| "What's the weather?" | ~9K tokens | ~5K (25 tools) | ~600 (3 web tools) | ~600 (cached) |
+| "Good morning" | ~9K tokens | ~5K (25 tools) | 0 (routed to chat) | 0 |
+| "Turn off kitchen lights" | ~9K tokens | ~5K (25 tools) | ~600 (3 HA tools) | ~600 (cached) |
+| "Search journals for X" | ~9K tokens | ~5K (25 tools) | ~2K (10 aether tools) | ~2K (cached) |
+| "Create a task" | ~9K tokens | ~5K (25 tools) | ~800 (4 task tools) | ~800 (cached) |
+| "Run a SQL query" | ~9K tokens | ~5K (25 tools) | ~600 (3 db tools) | ~600 (cached) |
+
+At 3 rounds per request and 50 requests/day, that's roughly **1.3M tokens/day saved**
+vs. **~13K/day after all optimizations** — a 99% reduction for casual chat, ~90% for
+most tool-using queries.