Files

Scott Idem 29d8aa4aae feat: tool schema optimization, keyword routing, aider_run coding agent

Tool schema optimization (PLAN__Tool_Schema_Optimization.md Phases 1-3):
- model_registry.py: ROLE_DEFAULT_TOOLS — distill gets [], research/coder get
  narrow tool lists by default; applied in get_role_config() when user hasn't
  configured a custom list
- openai_orchestrator.py: keyword routing via narrow_tools_by_keywords() — scans
  user message + last assistant turn; narrows active schemas to matched categories
  only (e.g. "weather" → 3 web tools instead of 69); zero tools sent for pure chat
- openai_orchestrator.py: _get_cached_tools() — module-level schema cache keyed by
  (role, sorted_tool_list, risk_params); eliminates redundant schema rebuilds
- openai_orchestrator.py: _TOOL_SCHEMA_OVERHEAD 3000 → 500 tokens (schemas now
  excluded from the per-call fixed estimate since they're cached separately)
- tools/__init__.py: CATEGORY_TOOL_MAP + _KEYWORD_CATEGORY_MAP + classify_tool_categories()
  + narrow_tools_by_keywords() — the classifier logic lives here so both orchestrators
  can share it

aider_run tool (cortex/tools/aider.py):
- Invokes Aider as a subprocess with --message --yes-always --no-pretty --no-stream
- Project aliases: cortex / aether_api / aether_frontend / aether_container
- Auto-injects OpenRouter API key from Cortex model registry (no ~/.env needed)
- background=True fires async + registers in agent_manager; notify=True sends push
  notification on completion
- admin-only, confirm-required, TOOL_RISK=high
- .gitignore: added .aider.chat.history.md / .aider.input.history / .aider.llm.history

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-03 22:39:44 -04:00

16 KiB

Raw Blame History

PLAN — Reduce Tool Schema Overhead in Cortex

Goal: Eliminate the per-round, per-message transmission of all 45 tool definitions. Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a relevant subset for orchestrated work.

Status: Draft — ready for Claude Code implementation.

Background

Every orchestrated (⚡ toggled on) message triggers a ReAct tool loop. The full 45-tool schema is rebuilt and transmitted on every round of every call — including rounds where no tool is invoked and messages where no tool is needed at all. This wastes thousands of tokens per interaction.

The architecture already has the building blocks for a fix: role configs support a tools allow-list, and get_openai_tools_for_role() already accepts filtering parameters. They're just not being wired together effectively.

Phase 1 — Role-Based Tool Filtering (Foundation)

Effort: Small. Impact: High.

What

Define which tools each role actually needs, then enforce the filtering so roles only receive their relevant tool subset.

Implementation

1. Audit every role and define tool lists.

Role	Tools needed	Approx count
`chat`	None (zero tools — should never be in the orchestration loop)	0
`orchestrator`	web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin)	25-30
`distill`	None (pure text processing)	0
`coder`	file (admin), shell (admin), git, file_diff, file_syntax_check	8-10
`research`	web_search, web_read, http_fetch	3
`admin` (role)	All 45 (admin-level access)	45

2. Store tool lists per role in config.yaml or the model registry defaults. The role config already has a tools field — populate it with the lists above.

3. Enforce in get_openai_tools_for_role(). The function is called from openai_orchestrator.py around line 451. Currently if tools is empty/missing it returns all tools. Change so that:

If role config has a tools list → return only those tools
If role config has tools: false → return empty list
If role config has no tools field → return all (backward compat)

At the call site (_run_from_messages), pass the role's tool allow-list into get_openai_tools_for_role() via the tool_list parameter that already exists.

Files to change

cortex/openai_orchestrator.py — wire role config tools into the call to get_openai_tools_for_role()
cortex/model_registry.py — ensure get_role_config() returns the tools field (it does already, line 487)
cortex/config.py or home/{user}/model_registry.json — define the tool lists per default role

Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)

Effort: Small. Impact: Very High.

What

Before entering the ReAct tool loop, scan the user's message with a lightweight keyword classifier to determine which tool categories are relevant. Only include tools from matched categories — typically 3-8 tools instead of 45.

This is the core optimization. For the 80%+ of messages that only need a narrow set of tools (or none at all), this eliminates the bulk of schema overhead on every round.

The Hybrid Stack

User message
    ↓
[1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role
    ↓
[2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools
    ↓
[3] ReAct loop — only transmitting the relevant subset each round

If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you think?"), it returns an empty tool set — effectively routing the message as a pure chat interaction with zero tool overhead.

Keyword Category Map

Each category maps keywords → tool names. Simple regex/contains matching.

Category	Trigger keywords	Tools included
`web`	search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research	web_search, web_read, http_fetch
`web_post`	post to, send to, webhook, trigger, notify	http_post
`file`	read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check	file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat
`git`	git, commit, branch, pushed, pulled, merge, repo, repository	git_status, git_log, git_diff
`system`	restart, update, status, logs, deploy, shell, command, run, health, is it running	cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec
`tasks`	task, todo, to-do, to do, add task, create task, what's on my list, pending	task_list, task_create, task_update, task_complete
`cron`	schedule, cron, every day, every week, recurring, automate, job	cron_list, cron_add, cron_remove, cron_toggle
`reminders`	remind, reminder, remember, don't forget	reminders_add, reminders_list, reminders_remove, reminders_clear
`scratchpad`	scratch, scratchpad, working notes, jot down, notepad	scratch_read, scratch_write, scratch_append, scratch_clear
`ha`	home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature	ha_get_state, ha_get_states, ha_call_service
`aether`	journal, aether, note entry, log entry, search journals, ae_	ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend
`aether_db`	database, query, sql, select, db, table, schema, maria	ae_db_query, ae_db_describe, ae_db_show_view
`notifications`	notify, push, send email, email, message, talk, nextcloud	web_push, email_send, nc_talk_send, nc_talk_history
`agents`	spawn, sub-agent, delegate, agent	spawn_agent
`notes`	agent notes, private notes, my notes	agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear
`session`	remember, session, history, last time, what did we, earlier, yesterday, last week	session_read, session_search
`ae_tasks`	ae task, kanban, board	ae_task_list
`claude`	claude, allow directory, permissions	claude_allow_dir

Implementation

In openai_orchestrator.py, before the ReAct loop starts:

def _classify_tool_categories(user_message: str) -> list[str]:
    """Classify a user message into tool categories based on keywords.
    
    Returns a list of category names whose tools should be included.
    Returns empty list if no categories match (pure chat).
    """
    message_lower = user_message.lower()
    
    category_keywords = {
        "web":          ["search", "look up", "what is", "who is", "weather",
                         "forecast", "news", "find on", "google", "website",
                         "article", "research", "temperature"],
        "web_post":     ["post to", "send to", "webhook", "trigger webhook"],
        "file":         ["read file", "show file", "list file", "directory",
                         "grep", "search in", "find in", "diff", "compare",
                         "syntax check", "open file"],
        "git":          ["git", "commit", "branch", "pulled", "merged",
                         "repository", "repo"],
        "system":       ["restart", "update", "status", "logs", "deploy",
                         "run command", "shell", "is it running", "health"],
        "tasks":        ["task", "todo", "to-do", "to do", "add task",
                         "create task", "pending", "what's on my list"],
        "cron":         ["schedule", "cron", "every day", "every week",
                         "recurring", "automate", "job"],
        "reminders":    ["remind", "reminder", "remember", "don't forget"],
        "scratchpad":   ["scratch", "scratchpad", "working note", "jot down",
                         "notepad"],
        "ha":           ["home assistant", "light", "thermostat", "turn on",
                         "turn off", "switch", "sensor", "temperature in",
                         "kitchen", "bedroom", "garage"],
        "aether":       ["journal", "aether journal", "note entry", "log entry",
                         "search journal", "ae_journal"],
        "aether_db":    ["database", "query", "sql", "select", "db", "table",
                         "schema", "maria", "run query"],
        "notifications":["notify", "push notification", "send email", "email",
                         "talk message", "nextcloud"],
        "agents":       ["spawn", "sub-agent", "delegate", "spawn agent"],
        "notes":        ["agent notes", "private notes", "my notes",
                         "agent_notes"],
        "session":      ["remember", "session", "history", "last time",
                         "what did we", "earlier", "yesterday", "last week",
                         "previously"],
        "ae_tasks":     ["ae task", "kanban", "board", "ae_task"],
        "claude":       ["claude allow", "claude directory"],
    }
    
    matched = []
    for category, keywords in category_keywords.items():
        if any(kw in message_lower for kw in keywords):
            matched.append(category)
    
    return matched

Then at the orchestration entry point, after determining the role's base tool list (Phase 1), apply the keyword filter:

# Phase 1: Get role's base tool list
role_tools = get_role_config(username, role).get("tools")

# Phase 2: Dynamically narrow based on message content
matched_categories = _classify_tool_categories(user_message)
if matched_categories:
    category_tool_map = { ... }  # defined at module level
    dynamic_tools = []
    for cat in matched_categories:
        dynamic_tools.extend(category_tool_map.get(cat, []))
    # Intersect with role_tools so we never grant more than the role allows
    if role_tools:
        dynamic_tools = [t for t in dynamic_tools if t in role_tools]
    active_tools = get_openai_tools_for_role(
        role=user_role,
        tool_list=dynamic_tools or None
    )
else:
    # No keywords matched — likely causal chat route to /chat
    # or use empty tool list
    active_tools = []

Edge Cases to Handle

Multiple categories match: Union all matched tool sets. The for cat in matched_categories loop handles this naturally.
No categories match: Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase.
Ambiguous short messages: "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent.
Over-broad keywords: "search" in "search journals" could trigger both web and aether. The union handles this — both categories' tools are included, which is what you want.

File to change

cortex/openai_orchestrator.py — add _classify_tool_categories() function and wire it into the orchestration entry point before the ReAct loop

Phase 3 — Cache Tool Schema per Session

Effort: Medium. Impact: Medium.

What

The tool schema doesn't change between rounds of the same session for a given role. After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical every round. Cache them.

Implementation

Add a session-scoped cache in openai_orchestrator.py:

# Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}"
_tool_schema_cache: dict[str, list[dict]] = {}

def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]:
    key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}"
    if key in _tool_schema_cache:
        return _tool_schema_cache[key]
    schemas = get_openai_tools_for_role(role=role, tool_list=tool_list)
    _tool_schema_cache[key] = schemas
    return schemas

Invalidation: Cache key includes the tool list, so if the dynamic classifier returns different categories on the next message, it gets a fresh cache entry. No explicit invalidation needed.

File to change

cortex/openai_orchestrator.py — add cache dict and lookup before calling get_openai_tools_for_role()

Phase 4 — Reduce Default Max Rounds

Effort: Trivial. Impact: Low-to-medium.

What

Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted schema transmissions on edge cases.

Implementation

Make max_rounds configurable per model in the model registry (it already exists in some model configs — see home/brian/model_registry.json line 42).
Read it from the model config during orchestration instead of using the global .env value.
Lower the default from 10 to 5.

Files to change

cortex/.env — change ORCHESTRATOR_MAX_ROUNDS=10 to =5
cortex/openai_orchestrator.py — read per-model max_rounds from model_cfg instead of only from settings

Phase 5 — UI Improvements (Independent)

Effort: Small. Impact: Medium (UX).

What

Make the tool mode indicator more obvious so the user can quickly tell whether they're incurring the tool tax.

Ideas

Change ⚡ color: green when tools are on, gray when off
Swap icon: ⚡ (tools) vs. 💬 (chat only)
Add tooltip: "Tools enabled — all 45 tool schemas sent with each message"
Optional: add a "Quick Question" button that sends to /chat directly, bypassing the orchestrator entirely

Files to change

Svelte UI component — likely ChatInput.svelte or the chat mode toggle component

Recommended Execution Order

Phase 1 (role filtering) — foundation. Defines the base tool set per role.
Phase 2 (keyword routing) — the big one. Slashes 45 tools → 3-8 for the vast majority of messages. Builds on Phase 1's role filtering.
Phase 4 (lower max_rounds) — trivial change, do alongside Phase 2.
Phase 3 (schema caching) — more involved, compounds savings from Phase 2.
Phase 5 (UI) — independent UX polish, can be done any time.

Quick Win Path (Recommended First Session)

Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in openai_orchestrator.py and model_registry.py — the same few files. Estimated effort: 45-60 minutes of coding.

Phase 3 (caching) is a separate, focused session afterward.

Appendix A: Code Locations (from grep audit 2026-05-15)

What	File	Line
`get_openai_tools_for_role` definition	`cortex/tools.py`	~540
Call site (decides active_tools)	`cortex/openai_orchestrator.py`	~449
`_run_from_messages()` tool loop	`cortex/openai_orchestrator.py`	~260
Role config tools field	`cortex/model_registry.py`	~487
`get_role_config()`	`cortex/model_registry.py`	~473
`save_role_config()` (tools allow-list)	`cortex/model_registry.py`	~455
Global `ORCHESTRATOR_MAX_ROUNDS`	`cortex/.env`	35
`REQUIRED_ROLES`	`cortex/model_registry.py`	163
`DEFINED_ROLES` config	`cortex/config.py`	80
Per-model `max_rounds` example	`home/brian/model_registry.json`	42

Appendix B: Token Savings Estimate

Scenario	Before (per round)	After Phase 1	After Phase 1+2	After All Phases
"What's the weather?"	~9K tokens	~5K (25 tools)	~600 (3 web tools)	~600 (cached)
"Good morning"	~9K tokens	~5K (25 tools)	0 (routed to chat)	0
"Turn off kitchen lights"	~9K tokens	~5K (25 tools)	~600 (3 HA tools)	~600 (cached)
"Search journals for X"	~9K tokens	~5K (25 tools)	~2K (10 aether tools)	~2K (cached)
"Create a task"	~9K tokens	~5K (25 tools)	~800 (4 task tools)	~800 (cached)
"Run a SQL query"	~9K tokens	~5K (25 tools)	~600 (3 db tools)	~600 (cached)

At 3 rounds per request and 50 requests/day, that's roughly 1.3M tokens/day saved vs. ~13K/day after all optimizations — a 99% reduction for casual chat, ~90% for most tool-using queries.

16 KiB Raw Blame History

PLAN — Reduce Tool Schema Overhead in Cortex

Background

Phase 1 — Role-Based Tool Filtering (Foundation)

What

Implementation

Files to change

Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)

What

The Hybrid Stack

Keyword Category Map

Implementation

Edge Cases to Handle

File to change

Phase 3 — Cache Tool Schema per Session

What

Implementation

File to change

Phase 4 — Reduce Default Max Rounds

What

Implementation

Files to change

Phase 5 — UI Improvements (Independent)

What

Ideas

Files to change

Recommended Execution Order

Quick Win Path (Recommended First Session)

Appendix A: Code Locations (from grep audit 2026-05-15)

Appendix B: Token Savings Estimate

16 KiB

Raw Blame History