Tool schema optimization (PLAN__Tool_Schema_Optimization.md Phases 1-3): - model_registry.py: ROLE_DEFAULT_TOOLS — distill gets [], research/coder get narrow tool lists by default; applied in get_role_config() when user hasn't configured a custom list - openai_orchestrator.py: keyword routing via narrow_tools_by_keywords() — scans user message + last assistant turn; narrows active schemas to matched categories only (e.g. "weather" → 3 web tools instead of 69); zero tools sent for pure chat - openai_orchestrator.py: _get_cached_tools() — module-level schema cache keyed by (role, sorted_tool_list, risk_params); eliminates redundant schema rebuilds - openai_orchestrator.py: _TOOL_SCHEMA_OVERHEAD 3000 → 500 tokens (schemas now excluded from the per-call fixed estimate since they're cached separately) - tools/__init__.py: CATEGORY_TOOL_MAP + _KEYWORD_CATEGORY_MAP + classify_tool_categories() + narrow_tools_by_keywords() — the classifier logic lives here so both orchestrators can share it aider_run tool (cortex/tools/aider.py): - Invokes Aider as a subprocess with --message --yes-always --no-pretty --no-stream - Project aliases: cortex / aether_api / aether_frontend / aether_container - Auto-injects OpenRouter API key from Cortex model registry (no ~/.env needed) - background=True fires async + registers in agent_manager; notify=True sends push notification on completion - admin-only, confirm-required, TOOL_RISK=high - .gitignore: added .aider.chat.history.md / .aider.input.history / .aider.llm.history Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
16 KiB
PLAN — Reduce Tool Schema Overhead in Cortex
Goal: Eliminate the per-round, per-message transmission of all 45 tool definitions. Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a relevant subset for orchestrated work.
Status: Draft — ready for Claude Code implementation.
Background
Every orchestrated (⚡ toggled on) message triggers a ReAct tool loop. The full 45-tool schema is rebuilt and transmitted on every round of every call — including rounds where no tool is invoked and messages where no tool is needed at all. This wastes thousands of tokens per interaction.
The architecture already has the building blocks for a fix: role configs support a
tools allow-list, and get_openai_tools_for_role() already accepts filtering
parameters. They're just not being wired together effectively.
Phase 1 — Role-Based Tool Filtering (Foundation)
Effort: Small. Impact: High.
What
Define which tools each role actually needs, then enforce the filtering so roles only receive their relevant tool subset.
Implementation
1. Audit every role and define tool lists.
| Role | Tools needed | Approx count |
|---|---|---|
chat |
None (zero tools — should never be in the orchestration loop) | 0 |
orchestrator |
web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin) | 25-30 |
distill |
None (pure text processing) | 0 |
coder |
file (admin), shell (admin), git, file_diff, file_syntax_check | 8-10 |
research |
web_search, web_read, http_fetch | 3 |
admin (role) |
All 45 (admin-level access) | 45 |
2. Store tool lists per role in config.yaml or the model registry defaults.
The role config already has a tools field — populate it with the lists above.
3. Enforce in get_openai_tools_for_role().
The function is called from openai_orchestrator.py around line 451. Currently if
tools is empty/missing it returns all tools. Change so that:
- If role config has a
toolslist → return only those tools - If role config has
tools: false→ return empty list - If role config has no
toolsfield → return all (backward compat)
At the call site (_run_from_messages), pass the role's tool allow-list into
get_openai_tools_for_role() via the tool_list parameter that already exists.
Files to change
cortex/openai_orchestrator.py— wire role configtoolsinto the call toget_openai_tools_for_role()cortex/model_registry.py— ensureget_role_config()returns thetoolsfield (it does already, line 487)cortex/config.pyorhome/{user}/model_registry.json— define the tool lists per default role
Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)
Effort: Small. Impact: Very High.
What
Before entering the ReAct tool loop, scan the user's message with a lightweight keyword classifier to determine which tool categories are relevant. Only include tools from matched categories — typically 3-8 tools instead of 45.
This is the core optimization. For the 80%+ of messages that only need a narrow set of tools (or none at all), this eliminates the bulk of schema overhead on every round.
The Hybrid Stack
User message
↓
[1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role
↓
[2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools
↓
[3] ReAct loop — only transmitting the relevant subset each round
If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you think?"), it returns an empty tool set — effectively routing the message as a pure chat interaction with zero tool overhead.
Keyword Category Map
Each category maps keywords → tool names. Simple regex/contains matching.
| Category | Trigger keywords | Tools included |
|---|---|---|
web |
search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research | web_search, web_read, http_fetch |
web_post |
post to, send to, webhook, trigger, notify | http_post |
file |
read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check | file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat |
git |
git, commit, branch, pushed, pulled, merge, repo, repository | git_status, git_log, git_diff |
system |
restart, update, status, logs, deploy, shell, command, run, health, is it running | cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec |
tasks |
task, todo, to-do, to do, add task, create task, what's on my list, pending | task_list, task_create, task_update, task_complete |
cron |
schedule, cron, every day, every week, recurring, automate, job | cron_list, cron_add, cron_remove, cron_toggle |
reminders |
remind, reminder, remember, don't forget | reminders_add, reminders_list, reminders_remove, reminders_clear |
scratchpad |
scratch, scratchpad, working notes, jot down, notepad | scratch_read, scratch_write, scratch_append, scratch_clear |
ha |
home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature | ha_get_state, ha_get_states, ha_call_service |
aether |
journal, aether, note entry, log entry, search journals, ae_ | ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend |
aether_db |
database, query, sql, select, db, table, schema, maria | ae_db_query, ae_db_describe, ae_db_show_view |
notifications |
notify, push, send email, email, message, talk, nextcloud | web_push, email_send, nc_talk_send, nc_talk_history |
agents |
spawn, sub-agent, delegate, agent | spawn_agent |
notes |
agent notes, private notes, my notes | agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear |
session |
remember, session, history, last time, what did we, earlier, yesterday, last week | session_read, session_search |
ae_tasks |
ae task, kanban, board | ae_task_list |
claude |
claude, allow directory, permissions | claude_allow_dir |
Implementation
In openai_orchestrator.py, before the ReAct loop starts:
def _classify_tool_categories(user_message: str) -> list[str]:
"""Classify a user message into tool categories based on keywords.
Returns a list of category names whose tools should be included.
Returns empty list if no categories match (pure chat).
"""
message_lower = user_message.lower()
category_keywords = {
"web": ["search", "look up", "what is", "who is", "weather",
"forecast", "news", "find on", "google", "website",
"article", "research", "temperature"],
"web_post": ["post to", "send to", "webhook", "trigger webhook"],
"file": ["read file", "show file", "list file", "directory",
"grep", "search in", "find in", "diff", "compare",
"syntax check", "open file"],
"git": ["git", "commit", "branch", "pulled", "merged",
"repository", "repo"],
"system": ["restart", "update", "status", "logs", "deploy",
"run command", "shell", "is it running", "health"],
"tasks": ["task", "todo", "to-do", "to do", "add task",
"create task", "pending", "what's on my list"],
"cron": ["schedule", "cron", "every day", "every week",
"recurring", "automate", "job"],
"reminders": ["remind", "reminder", "remember", "don't forget"],
"scratchpad": ["scratch", "scratchpad", "working note", "jot down",
"notepad"],
"ha": ["home assistant", "light", "thermostat", "turn on",
"turn off", "switch", "sensor", "temperature in",
"kitchen", "bedroom", "garage"],
"aether": ["journal", "aether journal", "note entry", "log entry",
"search journal", "ae_journal"],
"aether_db": ["database", "query", "sql", "select", "db", "table",
"schema", "maria", "run query"],
"notifications":["notify", "push notification", "send email", "email",
"talk message", "nextcloud"],
"agents": ["spawn", "sub-agent", "delegate", "spawn agent"],
"notes": ["agent notes", "private notes", "my notes",
"agent_notes"],
"session": ["remember", "session", "history", "last time",
"what did we", "earlier", "yesterday", "last week",
"previously"],
"ae_tasks": ["ae task", "kanban", "board", "ae_task"],
"claude": ["claude allow", "claude directory"],
}
matched = []
for category, keywords in category_keywords.items():
if any(kw in message_lower for kw in keywords):
matched.append(category)
return matched
Then at the orchestration entry point, after determining the role's base tool list (Phase 1), apply the keyword filter:
# Phase 1: Get role's base tool list
role_tools = get_role_config(username, role).get("tools")
# Phase 2: Dynamically narrow based on message content
matched_categories = _classify_tool_categories(user_message)
if matched_categories:
category_tool_map = { ... } # defined at module level
dynamic_tools = []
for cat in matched_categories:
dynamic_tools.extend(category_tool_map.get(cat, []))
# Intersect with role_tools so we never grant more than the role allows
if role_tools:
dynamic_tools = [t for t in dynamic_tools if t in role_tools]
active_tools = get_openai_tools_for_role(
role=user_role,
tool_list=dynamic_tools or None
)
else:
# No keywords matched — likely causal chat route to /chat
# or use empty tool list
active_tools = []
Edge Cases to Handle
-
Multiple categories match: Union all matched tool sets. The
for cat in matched_categoriesloop handles this naturally. -
No categories match: Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase.
-
Ambiguous short messages: "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent.
-
Over-broad keywords: "search" in "search journals" could trigger both
webandaether. The union handles this — both categories' tools are included, which is what you want.
File to change
cortex/openai_orchestrator.py— add_classify_tool_categories()function and wire it into the orchestration entry point before the ReAct loop
Phase 3 — Cache Tool Schema per Session
Effort: Medium. Impact: Medium.
What
The tool schema doesn't change between rounds of the same session for a given role. After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical every round. Cache them.
Implementation
Add a session-scoped cache in openai_orchestrator.py:
# Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}"
_tool_schema_cache: dict[str, list[dict]] = {}
def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]:
key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}"
if key in _tool_schema_cache:
return _tool_schema_cache[key]
schemas = get_openai_tools_for_role(role=role, tool_list=tool_list)
_tool_schema_cache[key] = schemas
return schemas
Invalidation: Cache key includes the tool list, so if the dynamic classifier returns different categories on the next message, it gets a fresh cache entry. No explicit invalidation needed.
File to change
cortex/openai_orchestrator.py— add cache dict and lookup before callingget_openai_tools_for_role()
Phase 4 — Reduce Default Max Rounds
Effort: Trivial. Impact: Low-to-medium.
What
Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted schema transmissions on edge cases.
Implementation
- Make
max_roundsconfigurable per model in the model registry (it already exists in some model configs — seehome/brian/model_registry.jsonline 42). - Read it from the model config during orchestration instead of using the global
.envvalue. - Lower the default from 10 to 5.
Files to change
cortex/.env— changeORCHESTRATOR_MAX_ROUNDS=10to=5cortex/openai_orchestrator.py— read per-modelmax_roundsfrommodel_cfginstead of only from settings
Phase 5 — UI Improvements (Independent)
Effort: Small. Impact: Medium (UX).
What
Make the tool mode indicator more obvious so the user can quickly tell whether they're incurring the tool tax.
Ideas
- Change ⚡ color: green when tools are on, gray when off
- Swap icon: ⚡ (tools) vs. 💬 (chat only)
- Add tooltip: "Tools enabled — all 45 tool schemas sent with each message"
- Optional: add a "Quick Question" button that sends to
/chatdirectly, bypassing the orchestrator entirely
Files to change
- Svelte UI component — likely
ChatInput.svelteor the chat mode toggle component
Recommended Execution Order
- Phase 1 (role filtering) — foundation. Defines the base tool set per role.
- Phase 2 (keyword routing) — the big one. Slashes 45 tools → 3-8 for the vast majority of messages. Builds on Phase 1's role filtering.
- Phase 4 (lower max_rounds) — trivial change, do alongside Phase 2.
- Phase 3 (schema caching) — more involved, compounds savings from Phase 2.
- Phase 5 (UI) — independent UX polish, can be done any time.
Quick Win Path (Recommended First Session)
Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in
openai_orchestrator.py and model_registry.py — the same few files. Estimated
effort: 45-60 minutes of coding.
Phase 3 (caching) is a separate, focused session afterward.
Appendix A: Code Locations (from grep audit 2026-05-15)
| What | File | Line |
|---|---|---|
get_openai_tools_for_role definition |
cortex/tools.py |
~540 |
| Call site (decides active_tools) | cortex/openai_orchestrator.py |
~449 |
_run_from_messages() tool loop |
cortex/openai_orchestrator.py |
~260 |
| Role config tools field | cortex/model_registry.py |
~487 |
get_role_config() |
cortex/model_registry.py |
~473 |
save_role_config() (tools allow-list) |
cortex/model_registry.py |
~455 |
Global ORCHESTRATOR_MAX_ROUNDS |
cortex/.env |
35 |
REQUIRED_ROLES |
cortex/model_registry.py |
163 |
DEFINED_ROLES config |
cortex/config.py |
80 |
Per-model max_rounds example |
home/brian/model_registry.json |
42 |
Appendix B: Token Savings Estimate
| Scenario | Before (per round) | After Phase 1 | After Phase 1+2 | After All Phases |
|---|---|---|---|---|
| "What's the weather?" | ~9K tokens | ~5K (25 tools) | ~600 (3 web tools) | ~600 (cached) |
| "Good morning" | ~9K tokens | ~5K (25 tools) | 0 (routed to chat) | 0 |
| "Turn off kitchen lights" | ~9K tokens | ~5K (25 tools) | ~600 (3 HA tools) | ~600 (cached) |
| "Search journals for X" | ~9K tokens | ~5K (25 tools) | ~2K (10 aether tools) | ~2K (cached) |
| "Create a task" | ~9K tokens | ~5K (25 tools) | ~800 (4 task tools) | ~800 (cached) |
| "Run a SQL query" | ~9K tokens | ~5K (25 tools) | ~600 (3 db tools) | ~600 (cached) |
At 3 rounds per request and 50 requests/day, that's roughly 1.3M tokens/day saved vs. ~13K/day after all optimizations — a 99% reduction for casual chat, ~90% for most tool-using queries.