# PLAN — Reduce Tool Schema Overhead in Cortex **Goal:** Eliminate the per-round, per-message transmission of all 45 tool definitions. Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a relevant subset for orchestrated work. **Status:** Draft — ready for Claude Code implementation. --- ## Background Every orchestrated (⚡ toggled on) message triggers a ReAct tool loop. The full 45-tool schema is rebuilt and transmitted **on every round of every call** — including rounds where no tool is invoked and messages where no tool is needed at all. This wastes thousands of tokens per interaction. The architecture already has the building blocks for a fix: role configs support a `tools` allow-list, and `get_openai_tools_for_role()` already accepts filtering parameters. They're just not being wired together effectively. --- ## Phase 1 — Role-Based Tool Filtering (Foundation) **Effort:** Small. **Impact:** High. ### What Define which tools each role actually needs, then enforce the filtering so roles only receive their relevant tool subset. ### Implementation **1. Audit every role and define tool lists.** | Role | Tools needed | Approx count | |------|-------------|-------------| | `chat` | None (zero tools — should never be in the orchestration loop) | 0 | | `orchestrator` | web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin) | 25-30 | | `distill` | None (pure text processing) | 0 | | `coder` | file (admin), shell (admin), git, file_diff, file_syntax_check | 8-10 | | `research` | web_search, web_read, http_fetch | 3 | | `admin` (role) | All 45 (admin-level access) | 45 | **2. Store tool lists per role in `config.yaml` or the model registry defaults.** The role config already has a `tools` field — populate it with the lists above. **3. Enforce in `get_openai_tools_for_role()`.** The function is called from `openai_orchestrator.py` around line 451. Currently if `tools` is empty/missing it returns all tools. Change so that: - If role config has a `tools` list → return only those tools - If role config has `tools: false` → return empty list - If role config has no `tools` field → return all (backward compat) At the call site (`_run_from_messages`), pass the role's tool allow-list into `get_openai_tools_for_role()` via the `tool_list` parameter that already exists. ### Files to change - `cortex/openai_orchestrator.py` — wire role config `tools` into the call to `get_openai_tools_for_role()` - `cortex/model_registry.py` — ensure `get_role_config()` returns the `tools` field (it does already, line 487) - `cortex/config.py` or `home/{user}/model_registry.json` — define the tool lists per default role --- ## Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact) **Effort:** Small. **Impact:** Very High. ### What Before entering the ReAct tool loop, scan the user's message with a lightweight keyword classifier to determine which tool categories are relevant. Only include tools from matched categories — typically 3-8 tools instead of 45. This is the **core optimization.** For the 80%+ of messages that only need a narrow set of tools (or none at all), this eliminates the bulk of schema overhead on every round. ### The Hybrid Stack ``` User message ↓ [1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role ↓ [2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools ↓ [3] ReAct loop — only transmitting the relevant subset each round ``` If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you think?"), it returns an empty tool set — effectively routing the message as a pure chat interaction with zero tool overhead. ### Keyword Category Map Each category maps keywords → tool names. Simple regex/contains matching. | Category | Trigger keywords | Tools included | |----------|-----------------|---------------| | `web` | search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research | web_search, web_read, http_fetch | | `web_post` | post to, send to, webhook, trigger, notify | http_post | | `file` | read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check | file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat | | `git` | git, commit, branch, pushed, pulled, merge, repo, repository | git_status, git_log, git_diff | | `system` | restart, update, status, logs, deploy, shell, command, run, health, is it running | cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec | | `tasks` | task, todo, to-do, to do, add task, create task, what's on my list, pending | task_list, task_create, task_update, task_complete | | `cron` | schedule, cron, every day, every week, recurring, automate, job | cron_list, cron_add, cron_remove, cron_toggle | | `reminders` | remind, reminder, remember, don't forget | reminders_add, reminders_list, reminders_remove, reminders_clear | | `scratchpad` | scratch, scratchpad, working notes, jot down, notepad | scratch_read, scratch_write, scratch_append, scratch_clear | | `ha` | home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature | ha_get_state, ha_get_states, ha_call_service | | `aether` | journal, aether, note entry, log entry, search journals, ae_ | ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend | | `aether_db` | database, query, sql, select, db, table, schema, maria | ae_db_query, ae_db_describe, ae_db_show_view | | `notifications` | notify, push, send email, email, message, talk, nextcloud | web_push, email_send, nc_talk_send, nc_talk_history | | `agents` | spawn, sub-agent, delegate, agent | spawn_agent | | `notes` | agent notes, private notes, my notes | agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear | | `session` | remember, session, history, last time, what did we, earlier, yesterday, last week | session_read, session_search | | `ae_tasks` | ae task, kanban, board | ae_task_list | | `claude` | claude, allow directory, permissions | claude_allow_dir | ### Implementation In `openai_orchestrator.py`, before the ReAct loop starts: ```python def _classify_tool_categories(user_message: str) -> list[str]: """Classify a user message into tool categories based on keywords. Returns a list of category names whose tools should be included. Returns empty list if no categories match (pure chat). """ message_lower = user_message.lower() category_keywords = { "web": ["search", "look up", "what is", "who is", "weather", "forecast", "news", "find on", "google", "website", "article", "research", "temperature"], "web_post": ["post to", "send to", "webhook", "trigger webhook"], "file": ["read file", "show file", "list file", "directory", "grep", "search in", "find in", "diff", "compare", "syntax check", "open file"], "git": ["git", "commit", "branch", "pulled", "merged", "repository", "repo"], "system": ["restart", "update", "status", "logs", "deploy", "run command", "shell", "is it running", "health"], "tasks": ["task", "todo", "to-do", "to do", "add task", "create task", "pending", "what's on my list"], "cron": ["schedule", "cron", "every day", "every week", "recurring", "automate", "job"], "reminders": ["remind", "reminder", "remember", "don't forget"], "scratchpad": ["scratch", "scratchpad", "working note", "jot down", "notepad"], "ha": ["home assistant", "light", "thermostat", "turn on", "turn off", "switch", "sensor", "temperature in", "kitchen", "bedroom", "garage"], "aether": ["journal", "aether journal", "note entry", "log entry", "search journal", "ae_journal"], "aether_db": ["database", "query", "sql", "select", "db", "table", "schema", "maria", "run query"], "notifications":["notify", "push notification", "send email", "email", "talk message", "nextcloud"], "agents": ["spawn", "sub-agent", "delegate", "spawn agent"], "notes": ["agent notes", "private notes", "my notes", "agent_notes"], "session": ["remember", "session", "history", "last time", "what did we", "earlier", "yesterday", "last week", "previously"], "ae_tasks": ["ae task", "kanban", "board", "ae_task"], "claude": ["claude allow", "claude directory"], } matched = [] for category, keywords in category_keywords.items(): if any(kw in message_lower for kw in keywords): matched.append(category) return matched ``` Then at the orchestration entry point, after determining the role's base tool list (Phase 1), apply the keyword filter: ```python # Phase 1: Get role's base tool list role_tools = get_role_config(username, role).get("tools") # Phase 2: Dynamically narrow based on message content matched_categories = _classify_tool_categories(user_message) if matched_categories: category_tool_map = { ... } # defined at module level dynamic_tools = [] for cat in matched_categories: dynamic_tools.extend(category_tool_map.get(cat, [])) # Intersect with role_tools so we never grant more than the role allows if role_tools: dynamic_tools = [t for t in dynamic_tools if t in role_tools] active_tools = get_openai_tools_for_role( role=user_role, tool_list=dynamic_tools or None ) else: # No keywords matched — likely causal chat route to /chat # or use empty tool list active_tools = [] ``` ### Edge Cases to Handle 1. **Multiple categories match:** Union all matched tool sets. The `for cat in matched_categories` loop handles this naturally. 2. **No categories match:** Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase. 3. **Ambiguous short messages:** "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent. 4. **Over-broad keywords:** "search" in "search journals" could trigger both `web` and `aether`. The union handles this — both categories' tools are included, which is what you want. ### File to change - `cortex/openai_orchestrator.py` — add `_classify_tool_categories()` function and wire it into the orchestration entry point before the ReAct loop --- ## Phase 3 — Cache Tool Schema per Session **Effort:** Medium. **Impact:** Medium. ### What The tool schema doesn't change between rounds of the same session for a given role. After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical every round. Cache them. ### Implementation Add a session-scoped cache in `openai_orchestrator.py`: ```python # Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}" _tool_schema_cache: dict[str, list[dict]] = {} def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]: key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}" if key in _tool_schema_cache: return _tool_schema_cache[key] schemas = get_openai_tools_for_role(role=role, tool_list=tool_list) _tool_schema_cache[key] = schemas return schemas ``` Invalidation: Cache key includes the tool list, so if the dynamic classifier returns different categories on the next message, it gets a fresh cache entry. No explicit invalidation needed. ### File to change - `cortex/openai_orchestrator.py` — add cache dict and lookup before calling `get_openai_tools_for_role()` --- ## Phase 4 — Reduce Default Max Rounds **Effort:** Trivial. **Impact:** Low-to-medium. ### What Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted schema transmissions on edge cases. ### Implementation 1. Make `max_rounds` configurable per model in the model registry (it already exists in some model configs — see `home/brian/model_registry.json` line 42). 2. Read it from the model config during orchestration instead of using the global `.env` value. 3. Lower the default from 10 to 5. ### Files to change - `cortex/.env` — change `ORCHESTRATOR_MAX_ROUNDS=10` to `=5` - `cortex/openai_orchestrator.py` — read per-model `max_rounds` from `model_cfg` instead of only from settings --- ## Phase 5 — UI Improvements (Independent) **Effort:** Small. **Impact:** Medium (UX). ### What Make the tool mode indicator more obvious so the user can quickly tell whether they're incurring the tool tax. ### Ideas - Change ⚡ color: green when tools are on, gray when off - Swap icon: ⚡ (tools) vs. 💬 (chat only) - Add tooltip: "Tools enabled — all 45 tool schemas sent with each message" - Optional: add a "Quick Question" button that sends to `/chat` directly, bypassing the orchestrator entirely ### Files to change - Svelte UI component — likely `ChatInput.svelte` or the chat mode toggle component --- ## Recommended Execution Order 1. **Phase 1** (role filtering) — foundation. Defines the base tool set per role. 2. **Phase 2** (keyword routing) — **the big one.** Slashes 45 tools → 3-8 for the vast majority of messages. Builds on Phase 1's role filtering. 3. **Phase 4** (lower max_rounds) — trivial change, do alongside Phase 2. 4. **Phase 3** (schema caching) — more involved, compounds savings from Phase 2. 5. **Phase 5** (UI) — independent UX polish, can be done any time. ### Quick Win Path (Recommended First Session) Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in `openai_orchestrator.py` and `model_registry.py` — the same few files. Estimated effort: 45-60 minutes of coding. Phase 3 (caching) is a separate, focused session afterward. --- ## Appendix A: Code Locations (from grep audit 2026-05-15) | What | File | Line | |------|------|------| | `get_openai_tools_for_role` definition | `cortex/tools.py` | ~540 | | Call site (decides active_tools) | `cortex/openai_orchestrator.py` | ~449 | | `_run_from_messages()` tool loop | `cortex/openai_orchestrator.py` | ~260 | | Role config tools field | `cortex/model_registry.py` | ~487 | | `get_role_config()` | `cortex/model_registry.py` | ~473 | | `save_role_config()` (tools allow-list) | `cortex/model_registry.py` | ~455 | | Global `ORCHESTRATOR_MAX_ROUNDS` | `cortex/.env` | 35 | | `REQUIRED_ROLES` | `cortex/model_registry.py` | 163 | | `DEFINED_ROLES` config | `cortex/config.py` | 80 | | Per-model `max_rounds` example | `home/brian/model_registry.json` | 42 | ## Appendix B: Token Savings Estimate | Scenario | Before (per round) | After Phase 1 | After Phase 1+2 | After All Phases | |----------|-------------------|--------------|-----------------|-----------------| | "What's the weather?" | ~9K tokens | ~5K (25 tools) | ~600 (3 web tools) | ~600 (cached) | | "Good morning" | ~9K tokens | ~5K (25 tools) | 0 (routed to chat) | 0 | | "Turn off kitchen lights" | ~9K tokens | ~5K (25 tools) | ~600 (3 HA tools) | ~600 (cached) | | "Search journals for X" | ~9K tokens | ~5K (25 tools) | ~2K (10 aether tools) | ~2K (cached) | | "Create a task" | ~9K tokens | ~5K (25 tools) | ~800 (4 task tools) | ~800 (cached) | | "Run a SQL query" | ~9K tokens | ~5K (25 tools) | ~600 (3 db tools) | ~600 (cached) | At 3 rounds per request and 50 requests/day, that's roughly **1.3M tokens/day saved** vs. **~13K/day after all optimizations** — a 99% reduction for casual chat, ~90% for most tool-using queries.