Files
Cortex-Inara/documentation/PLAN__Tool_Schema_Optimization.md
Scott Idem 29d8aa4aae feat: tool schema optimization, keyword routing, aider_run coding agent
Tool schema optimization (PLAN__Tool_Schema_Optimization.md Phases 1-3):
- model_registry.py: ROLE_DEFAULT_TOOLS — distill gets [], research/coder get
  narrow tool lists by default; applied in get_role_config() when user hasn't
  configured a custom list
- openai_orchestrator.py: keyword routing via narrow_tools_by_keywords() — scans
  user message + last assistant turn; narrows active schemas to matched categories
  only (e.g. "weather" → 3 web tools instead of 69); zero tools sent for pure chat
- openai_orchestrator.py: _get_cached_tools() — module-level schema cache keyed by
  (role, sorted_tool_list, risk_params); eliminates redundant schema rebuilds
- openai_orchestrator.py: _TOOL_SCHEMA_OVERHEAD 3000 → 500 tokens (schemas now
  excluded from the per-call fixed estimate since they're cached separately)
- tools/__init__.py: CATEGORY_TOOL_MAP + _KEYWORD_CATEGORY_MAP + classify_tool_categories()
  + narrow_tools_by_keywords() — the classifier logic lives here so both orchestrators
  can share it

aider_run tool (cortex/tools/aider.py):
- Invokes Aider as a subprocess with --message --yes-always --no-pretty --no-stream
- Project aliases: cortex / aether_api / aether_frontend / aether_container
- Auto-injects OpenRouter API key from Cortex model registry (no ~/.env needed)
- background=True fires async + registers in agent_manager; notify=True sends push
  notification on completion
- admin-only, confirm-required, TOOL_RISK=high
- .gitignore: added .aider.chat.history.md / .aider.input.history / .aider.llm.history

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 22:39:44 -04:00

16 KiB

PLAN — Reduce Tool Schema Overhead in Cortex

Goal: Eliminate the per-round, per-message transmission of all 45 tool definitions. Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a relevant subset for orchestrated work.

Status: Draft — ready for Claude Code implementation.


Background

Every orchestrated ( toggled on) message triggers a ReAct tool loop. The full 45-tool schema is rebuilt and transmitted on every round of every call — including rounds where no tool is invoked and messages where no tool is needed at all. This wastes thousands of tokens per interaction.

The architecture already has the building blocks for a fix: role configs support a tools allow-list, and get_openai_tools_for_role() already accepts filtering parameters. They're just not being wired together effectively.


Phase 1 — Role-Based Tool Filtering (Foundation)

Effort: Small. Impact: High.

What

Define which tools each role actually needs, then enforce the filtering so roles only receive their relevant tool subset.

Implementation

1. Audit every role and define tool lists.

Role Tools needed Approx count
chat None (zero tools — should never be in the orchestration loop) 0
orchestrator web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin) 25-30
distill None (pure text processing) 0
coder file (admin), shell (admin), git, file_diff, file_syntax_check 8-10
research web_search, web_read, http_fetch 3
admin (role) All 45 (admin-level access) 45

2. Store tool lists per role in config.yaml or the model registry defaults. The role config already has a tools field — populate it with the lists above.

3. Enforce in get_openai_tools_for_role(). The function is called from openai_orchestrator.py around line 451. Currently if tools is empty/missing it returns all tools. Change so that:

  • If role config has a tools list → return only those tools
  • If role config has tools: false → return empty list
  • If role config has no tools field → return all (backward compat)

At the call site (_run_from_messages), pass the role's tool allow-list into get_openai_tools_for_role() via the tool_list parameter that already exists.

Files to change

  • cortex/openai_orchestrator.py — wire role config tools into the call to get_openai_tools_for_role()
  • cortex/model_registry.py — ensure get_role_config() returns the tools field (it does already, line 487)
  • cortex/config.py or home/{user}/model_registry.json — define the tool lists per default role

Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)

Effort: Small. Impact: Very High.

What

Before entering the ReAct tool loop, scan the user's message with a lightweight keyword classifier to determine which tool categories are relevant. Only include tools from matched categories — typically 3-8 tools instead of 45.

This is the core optimization. For the 80%+ of messages that only need a narrow set of tools (or none at all), this eliminates the bulk of schema overhead on every round.

The Hybrid Stack

User message
    ↓
[1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role
    ↓
[2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools
    ↓
[3] ReAct loop — only transmitting the relevant subset each round

If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you think?"), it returns an empty tool set — effectively routing the message as a pure chat interaction with zero tool overhead.

Keyword Category Map

Each category maps keywords → tool names. Simple regex/contains matching.

Category Trigger keywords Tools included
web search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research web_search, web_read, http_fetch
web_post post to, send to, webhook, trigger, notify http_post
file read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat
git git, commit, branch, pushed, pulled, merge, repo, repository git_status, git_log, git_diff
system restart, update, status, logs, deploy, shell, command, run, health, is it running cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec
tasks task, todo, to-do, to do, add task, create task, what's on my list, pending task_list, task_create, task_update, task_complete
cron schedule, cron, every day, every week, recurring, automate, job cron_list, cron_add, cron_remove, cron_toggle
reminders remind, reminder, remember, don't forget reminders_add, reminders_list, reminders_remove, reminders_clear
scratchpad scratch, scratchpad, working notes, jot down, notepad scratch_read, scratch_write, scratch_append, scratch_clear
ha home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature ha_get_state, ha_get_states, ha_call_service
aether journal, aether, note entry, log entry, search journals, ae_ ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend
aether_db database, query, sql, select, db, table, schema, maria ae_db_query, ae_db_describe, ae_db_show_view
notifications notify, push, send email, email, message, talk, nextcloud web_push, email_send, nc_talk_send, nc_talk_history
agents spawn, sub-agent, delegate, agent spawn_agent
notes agent notes, private notes, my notes agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear
session remember, session, history, last time, what did we, earlier, yesterday, last week session_read, session_search
ae_tasks ae task, kanban, board ae_task_list
claude claude, allow directory, permissions claude_allow_dir

Implementation

In openai_orchestrator.py, before the ReAct loop starts:

def _classify_tool_categories(user_message: str) -> list[str]:
    """Classify a user message into tool categories based on keywords.
    
    Returns a list of category names whose tools should be included.
    Returns empty list if no categories match (pure chat).
    """
    message_lower = user_message.lower()
    
    category_keywords = {
        "web":          ["search", "look up", "what is", "who is", "weather",
                         "forecast", "news", "find on", "google", "website",
                         "article", "research", "temperature"],
        "web_post":     ["post to", "send to", "webhook", "trigger webhook"],
        "file":         ["read file", "show file", "list file", "directory",
                         "grep", "search in", "find in", "diff", "compare",
                         "syntax check", "open file"],
        "git":          ["git", "commit", "branch", "pulled", "merged",
                         "repository", "repo"],
        "system":       ["restart", "update", "status", "logs", "deploy",
                         "run command", "shell", "is it running", "health"],
        "tasks":        ["task", "todo", "to-do", "to do", "add task",
                         "create task", "pending", "what's on my list"],
        "cron":         ["schedule", "cron", "every day", "every week",
                         "recurring", "automate", "job"],
        "reminders":    ["remind", "reminder", "remember", "don't forget"],
        "scratchpad":   ["scratch", "scratchpad", "working note", "jot down",
                         "notepad"],
        "ha":           ["home assistant", "light", "thermostat", "turn on",
                         "turn off", "switch", "sensor", "temperature in",
                         "kitchen", "bedroom", "garage"],
        "aether":       ["journal", "aether journal", "note entry", "log entry",
                         "search journal", "ae_journal"],
        "aether_db":    ["database", "query", "sql", "select", "db", "table",
                         "schema", "maria", "run query"],
        "notifications":["notify", "push notification", "send email", "email",
                         "talk message", "nextcloud"],
        "agents":       ["spawn", "sub-agent", "delegate", "spawn agent"],
        "notes":        ["agent notes", "private notes", "my notes",
                         "agent_notes"],
        "session":      ["remember", "session", "history", "last time",
                         "what did we", "earlier", "yesterday", "last week",
                         "previously"],
        "ae_tasks":     ["ae task", "kanban", "board", "ae_task"],
        "claude":       ["claude allow", "claude directory"],
    }
    
    matched = []
    for category, keywords in category_keywords.items():
        if any(kw in message_lower for kw in keywords):
            matched.append(category)
    
    return matched

Then at the orchestration entry point, after determining the role's base tool list (Phase 1), apply the keyword filter:

# Phase 1: Get role's base tool list
role_tools = get_role_config(username, role).get("tools")

# Phase 2: Dynamically narrow based on message content
matched_categories = _classify_tool_categories(user_message)
if matched_categories:
    category_tool_map = { ... }  # defined at module level
    dynamic_tools = []
    for cat in matched_categories:
        dynamic_tools.extend(category_tool_map.get(cat, []))
    # Intersect with role_tools so we never grant more than the role allows
    if role_tools:
        dynamic_tools = [t for t in dynamic_tools if t in role_tools]
    active_tools = get_openai_tools_for_role(
        role=user_role,
        tool_list=dynamic_tools or None
    )
else:
    # No keywords matched — likely causal chat route to /chat
    # or use empty tool list
    active_tools = []

Edge Cases to Handle

  1. Multiple categories match: Union all matched tool sets. The for cat in matched_categories loop handles this naturally.

  2. No categories match: Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase.

  3. Ambiguous short messages: "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent.

  4. Over-broad keywords: "search" in "search journals" could trigger both web and aether. The union handles this — both categories' tools are included, which is what you want.

File to change

  • cortex/openai_orchestrator.py — add _classify_tool_categories() function and wire it into the orchestration entry point before the ReAct loop

Phase 3 — Cache Tool Schema per Session

Effort: Medium. Impact: Medium.

What

The tool schema doesn't change between rounds of the same session for a given role. After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical every round. Cache them.

Implementation

Add a session-scoped cache in openai_orchestrator.py:

# Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}"
_tool_schema_cache: dict[str, list[dict]] = {}

def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]:
    key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}"
    if key in _tool_schema_cache:
        return _tool_schema_cache[key]
    schemas = get_openai_tools_for_role(role=role, tool_list=tool_list)
    _tool_schema_cache[key] = schemas
    return schemas

Invalidation: Cache key includes the tool list, so if the dynamic classifier returns different categories on the next message, it gets a fresh cache entry. No explicit invalidation needed.

File to change

  • cortex/openai_orchestrator.py — add cache dict and lookup before calling get_openai_tools_for_role()

Phase 4 — Reduce Default Max Rounds

Effort: Trivial. Impact: Low-to-medium.

What

Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted schema transmissions on edge cases.

Implementation

  1. Make max_rounds configurable per model in the model registry (it already exists in some model configs — see home/brian/model_registry.json line 42).
  2. Read it from the model config during orchestration instead of using the global .env value.
  3. Lower the default from 10 to 5.

Files to change

  • cortex/.env — change ORCHESTRATOR_MAX_ROUNDS=10 to =5
  • cortex/openai_orchestrator.py — read per-model max_rounds from model_cfg instead of only from settings

Phase 5 — UI Improvements (Independent)

Effort: Small. Impact: Medium (UX).

What

Make the tool mode indicator more obvious so the user can quickly tell whether they're incurring the tool tax.

Ideas

  • Change color: green when tools are on, gray when off
  • Swap icon: (tools) vs. 💬 (chat only)
  • Add tooltip: "Tools enabled — all 45 tool schemas sent with each message"
  • Optional: add a "Quick Question" button that sends to /chat directly, bypassing the orchestrator entirely

Files to change

  • Svelte UI component — likely ChatInput.svelte or the chat mode toggle component

  1. Phase 1 (role filtering) — foundation. Defines the base tool set per role.
  2. Phase 2 (keyword routing) — the big one. Slashes 45 tools → 3-8 for the vast majority of messages. Builds on Phase 1's role filtering.
  3. Phase 4 (lower max_rounds) — trivial change, do alongside Phase 2.
  4. Phase 3 (schema caching) — more involved, compounds savings from Phase 2.
  5. Phase 5 (UI) — independent UX polish, can be done any time.

Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in openai_orchestrator.py and model_registry.py — the same few files. Estimated effort: 45-60 minutes of coding.

Phase 3 (caching) is a separate, focused session afterward.


Appendix A: Code Locations (from grep audit 2026-05-15)

What File Line
get_openai_tools_for_role definition cortex/tools.py ~540
Call site (decides active_tools) cortex/openai_orchestrator.py ~449
_run_from_messages() tool loop cortex/openai_orchestrator.py ~260
Role config tools field cortex/model_registry.py ~487
get_role_config() cortex/model_registry.py ~473
save_role_config() (tools allow-list) cortex/model_registry.py ~455
Global ORCHESTRATOR_MAX_ROUNDS cortex/.env 35
REQUIRED_ROLES cortex/model_registry.py 163
DEFINED_ROLES config cortex/config.py 80
Per-model max_rounds example home/brian/model_registry.json 42

Appendix B: Token Savings Estimate

Scenario Before (per round) After Phase 1 After Phase 1+2 After All Phases
"What's the weather?" ~9K tokens ~5K (25 tools) ~600 (3 web tools) ~600 (cached)
"Good morning" ~9K tokens ~5K (25 tools) 0 (routed to chat) 0
"Turn off kitchen lights" ~9K tokens ~5K (25 tools) ~600 (3 HA tools) ~600 (cached)
"Search journals for X" ~9K tokens ~5K (25 tools) ~2K (10 aether tools) ~2K (cached)
"Create a task" ~9K tokens ~5K (25 tools) ~800 (4 task tools) ~800 (cached)
"Run a SQL query" ~9K tokens ~5K (25 tools) ~600 (3 db tools) ~600 (cached)

At 3 rounds per request and 50 requests/day, that's roughly 1.3M tokens/day saved vs. ~13K/day after all optimizations — a 99% reduction for casual chat, ~90% for most tool-using queries.