feat: tool schema optimization, keyword routing, aider_run coding agent
Tool schema optimization (PLAN__Tool_Schema_Optimization.md Phases 1-3): - model_registry.py: ROLE_DEFAULT_TOOLS — distill gets [], research/coder get narrow tool lists by default; applied in get_role_config() when user hasn't configured a custom list - openai_orchestrator.py: keyword routing via narrow_tools_by_keywords() — scans user message + last assistant turn; narrows active schemas to matched categories only (e.g. "weather" → 3 web tools instead of 69); zero tools sent for pure chat - openai_orchestrator.py: _get_cached_tools() — module-level schema cache keyed by (role, sorted_tool_list, risk_params); eliminates redundant schema rebuilds - openai_orchestrator.py: _TOOL_SCHEMA_OVERHEAD 3000 → 500 tokens (schemas now excluded from the per-call fixed estimate since they're cached separately) - tools/__init__.py: CATEGORY_TOOL_MAP + _KEYWORD_CATEGORY_MAP + classify_tool_categories() + narrow_tools_by_keywords() — the classifier logic lives here so both orchestrators can share it aider_run tool (cortex/tools/aider.py): - Invokes Aider as a subprocess with --message --yes-always --no-pretty --no-stream - Project aliases: cortex / aether_api / aether_frontend / aether_container - Auto-injects OpenRouter API key from Cortex model registry (no ~/.env needed) - background=True fires async + registers in agent_manager; notify=True sends push notification on completion - admin-only, confirm-required, TOOL_RISK=high - .gitignore: added .aider.chat.history.md / .aider.input.history / .aider.llm.history Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
362
documentation/PLAN__Tool_Schema_Optimization.md
Normal file
362
documentation/PLAN__Tool_Schema_Optimization.md
Normal file
@@ -0,0 +1,362 @@
|
||||
# PLAN — Reduce Tool Schema Overhead in Cortex
|
||||
|
||||
**Goal:** Eliminate the per-round, per-message transmission of all 45 tool definitions.
|
||||
Drop overhead from ~8K-10K tokens per round to near zero for casual chat, and to a
|
||||
relevant subset for orchestrated work.
|
||||
|
||||
**Status:** Draft — ready for Claude Code implementation.
|
||||
|
||||
---
|
||||
|
||||
## Background
|
||||
|
||||
Every orchestrated (⚡ toggled on) message triggers a ReAct tool loop. The full 45-tool
|
||||
schema is rebuilt and transmitted **on every round of every call** — including rounds
|
||||
where no tool is invoked and messages where no tool is needed at all. This wastes
|
||||
thousands of tokens per interaction.
|
||||
|
||||
The architecture already has the building blocks for a fix: role configs support a
|
||||
`tools` allow-list, and `get_openai_tools_for_role()` already accepts filtering
|
||||
parameters. They're just not being wired together effectively.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Role-Based Tool Filtering (Foundation)
|
||||
|
||||
**Effort:** Small. **Impact:** High.
|
||||
|
||||
### What
|
||||
|
||||
Define which tools each role actually needs, then enforce the filtering so roles
|
||||
only receive their relevant tool subset.
|
||||
|
||||
### Implementation
|
||||
|
||||
**1. Audit every role and define tool lists.**
|
||||
|
||||
| Role | Tools needed | Approx count |
|
||||
|------|-------------|-------------|
|
||||
| `chat` | None (zero tools — should never be in the orchestration loop) | 0 |
|
||||
| `orchestrator` | web, file (admin), shell (admin), tasks, cron, reminders, scratchpad, Aether journals, agent notes, system (admin), spawn_agent, HA, ae_db, git, file_diff, file_syntax_check, notifications (admin) | 25-30 |
|
||||
| `distill` | None (pure text processing) | 0 |
|
||||
| `coder` | file (admin), shell (admin), git, file_diff, file_syntax_check | 8-10 |
|
||||
| `research` | web_search, web_read, http_fetch | 3 |
|
||||
| `admin` (role) | All 45 (admin-level access) | 45 |
|
||||
|
||||
**2. Store tool lists per role in `config.yaml` or the model registry defaults.**
|
||||
The role config already has a `tools` field — populate it with the lists above.
|
||||
|
||||
**3. Enforce in `get_openai_tools_for_role()`.**
|
||||
The function is called from `openai_orchestrator.py` around line 451. Currently if
|
||||
`tools` is empty/missing it returns all tools. Change so that:
|
||||
|
||||
- If role config has a `tools` list → return only those tools
|
||||
- If role config has `tools: false` → return empty list
|
||||
- If role config has no `tools` field → return all (backward compat)
|
||||
|
||||
At the call site (`_run_from_messages`), pass the role's tool allow-list into
|
||||
`get_openai_tools_for_role()` via the `tool_list` parameter that already exists.
|
||||
|
||||
### Files to change
|
||||
|
||||
- `cortex/openai_orchestrator.py` — wire role config `tools` into the call to
|
||||
`get_openai_tools_for_role()`
|
||||
- `cortex/model_registry.py` — ensure `get_role_config()` returns the `tools` field
|
||||
(it does already, line 487)
|
||||
- `cortex/config.py` or `home/{user}/model_registry.json` — define the tool lists
|
||||
per default role
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Dynamic Keyword-Based Tool Routing (High Impact)
|
||||
|
||||
**Effort:** Small. **Impact:** Very High.
|
||||
|
||||
### What
|
||||
|
||||
Before entering the ReAct tool loop, scan the user's message with a lightweight
|
||||
keyword classifier to determine which tool categories are relevant. Only include
|
||||
tools from matched categories — typically 3-8 tools instead of 45.
|
||||
|
||||
This is the **core optimization.** For the 80%+ of messages that only need a narrow
|
||||
set of tools (or none at all), this eliminates the bulk of schema overhead on every
|
||||
round.
|
||||
|
||||
### The Hybrid Stack
|
||||
|
||||
```
|
||||
User message
|
||||
↓
|
||||
[1] Role filter (Phase 1) — narrows 45 tools → ~25 for orchestrator role
|
||||
↓
|
||||
[2] Keyword classifier (Phase 2) — narrows ~25 → 3-8 relevant tools
|
||||
↓
|
||||
[3] ReAct loop — only transmitting the relevant subset each round
|
||||
```
|
||||
|
||||
If the keyword classifier matches nothing (e.g. "good morning", "test", "what do you
|
||||
think?"), it returns an empty tool set — effectively routing the message as a pure
|
||||
chat interaction with zero tool overhead.
|
||||
|
||||
### Keyword Category Map
|
||||
|
||||
Each category maps keywords → tool names. Simple regex/contains matching.
|
||||
|
||||
| Category | Trigger keywords | Tools included |
|
||||
|----------|-----------------|---------------|
|
||||
| `web` | search, google, look up, what is, who is, weather, forecast, temperature, news, article, website, find, research | web_search, web_read, http_fetch |
|
||||
| `web_post` | post to, send to, webhook, trigger, notify | http_post |
|
||||
| `file` | read file, show file, open file, list files, directory, grep, find in, search in, diff, compare, syntax check | file_read, file_list, file_write, file_diff, file_grep, file_syntax_check, file_stat |
|
||||
| `git` | git, commit, branch, pushed, pulled, merge, repo, repository | git_status, git_log, git_diff |
|
||||
| `system` | restart, update, status, logs, deploy, shell, command, run, health, is it running | cortex_status, cortex_logs, cortex_restart, cortex_update, shell_exec |
|
||||
| `tasks` | task, todo, to-do, to do, add task, create task, what's on my list, pending | task_list, task_create, task_update, task_complete |
|
||||
| `cron` | schedule, cron, every day, every week, recurring, automate, job | cron_list, cron_add, cron_remove, cron_toggle |
|
||||
| `reminders` | remind, reminder, remember, don't forget | reminders_add, reminders_list, reminders_remove, reminders_clear |
|
||||
| `scratchpad` | scratch, scratchpad, working notes, jot down, notepad | scratch_read, scratch_write, scratch_append, scratch_clear |
|
||||
| `ha` | home assistant, light, thermostat, turn on, turn off, kitchen, bedroom, switch, sensor, temperature | ha_get_state, ha_get_states, ha_call_service |
|
||||
| `aether` | journal, aether, note entry, log entry, search journals, ae_ | ae_journal_list, ae_journal_search, ae_journal_entry_read, ae_journal_entries_list, ae_journal_entry_create, ae_journal_entry_update, ae_journal_entry_disable, ae_journal_entry_append, ae_journal_entry_prepend |
|
||||
| `aether_db` | database, query, sql, select, db, table, schema, maria | ae_db_query, ae_db_describe, ae_db_show_view |
|
||||
| `notifications` | notify, push, send email, email, message, talk, nextcloud | web_push, email_send, nc_talk_send, nc_talk_history |
|
||||
| `agents` | spawn, sub-agent, delegate, agent | spawn_agent |
|
||||
| `notes` | agent notes, private notes, my notes | agent_notes_read, agent_notes_write, agent_notes_append, agent_notes_clear |
|
||||
| `session` | remember, session, history, last time, what did we, earlier, yesterday, last week | session_read, session_search |
|
||||
| `ae_tasks` | ae task, kanban, board | ae_task_list |
|
||||
| `claude` | claude, allow directory, permissions | claude_allow_dir |
|
||||
|
||||
### Implementation
|
||||
|
||||
In `openai_orchestrator.py`, before the ReAct loop starts:
|
||||
|
||||
```python
|
||||
def _classify_tool_categories(user_message: str) -> list[str]:
|
||||
"""Classify a user message into tool categories based on keywords.
|
||||
|
||||
Returns a list of category names whose tools should be included.
|
||||
Returns empty list if no categories match (pure chat).
|
||||
"""
|
||||
message_lower = user_message.lower()
|
||||
|
||||
category_keywords = {
|
||||
"web": ["search", "look up", "what is", "who is", "weather",
|
||||
"forecast", "news", "find on", "google", "website",
|
||||
"article", "research", "temperature"],
|
||||
"web_post": ["post to", "send to", "webhook", "trigger webhook"],
|
||||
"file": ["read file", "show file", "list file", "directory",
|
||||
"grep", "search in", "find in", "diff", "compare",
|
||||
"syntax check", "open file"],
|
||||
"git": ["git", "commit", "branch", "pulled", "merged",
|
||||
"repository", "repo"],
|
||||
"system": ["restart", "update", "status", "logs", "deploy",
|
||||
"run command", "shell", "is it running", "health"],
|
||||
"tasks": ["task", "todo", "to-do", "to do", "add task",
|
||||
"create task", "pending", "what's on my list"],
|
||||
"cron": ["schedule", "cron", "every day", "every week",
|
||||
"recurring", "automate", "job"],
|
||||
"reminders": ["remind", "reminder", "remember", "don't forget"],
|
||||
"scratchpad": ["scratch", "scratchpad", "working note", "jot down",
|
||||
"notepad"],
|
||||
"ha": ["home assistant", "light", "thermostat", "turn on",
|
||||
"turn off", "switch", "sensor", "temperature in",
|
||||
"kitchen", "bedroom", "garage"],
|
||||
"aether": ["journal", "aether journal", "note entry", "log entry",
|
||||
"search journal", "ae_journal"],
|
||||
"aether_db": ["database", "query", "sql", "select", "db", "table",
|
||||
"schema", "maria", "run query"],
|
||||
"notifications":["notify", "push notification", "send email", "email",
|
||||
"talk message", "nextcloud"],
|
||||
"agents": ["spawn", "sub-agent", "delegate", "spawn agent"],
|
||||
"notes": ["agent notes", "private notes", "my notes",
|
||||
"agent_notes"],
|
||||
"session": ["remember", "session", "history", "last time",
|
||||
"what did we", "earlier", "yesterday", "last week",
|
||||
"previously"],
|
||||
"ae_tasks": ["ae task", "kanban", "board", "ae_task"],
|
||||
"claude": ["claude allow", "claude directory"],
|
||||
}
|
||||
|
||||
matched = []
|
||||
for category, keywords in category_keywords.items():
|
||||
if any(kw in message_lower for kw in keywords):
|
||||
matched.append(category)
|
||||
|
||||
return matched
|
||||
```
|
||||
|
||||
Then at the orchestration entry point, after determining the role's base tool list
|
||||
(Phase 1), apply the keyword filter:
|
||||
|
||||
```python
|
||||
# Phase 1: Get role's base tool list
|
||||
role_tools = get_role_config(username, role).get("tools")
|
||||
|
||||
# Phase 2: Dynamically narrow based on message content
|
||||
matched_categories = _classify_tool_categories(user_message)
|
||||
if matched_categories:
|
||||
category_tool_map = { ... } # defined at module level
|
||||
dynamic_tools = []
|
||||
for cat in matched_categories:
|
||||
dynamic_tools.extend(category_tool_map.get(cat, []))
|
||||
# Intersect with role_tools so we never grant more than the role allows
|
||||
if role_tools:
|
||||
dynamic_tools = [t for t in dynamic_tools if t in role_tools]
|
||||
active_tools = get_openai_tools_for_role(
|
||||
role=user_role,
|
||||
tool_list=dynamic_tools or None
|
||||
)
|
||||
else:
|
||||
# No keywords matched — likely causal chat route to /chat
|
||||
# or use empty tool list
|
||||
active_tools = []
|
||||
```
|
||||
|
||||
### Edge Cases to Handle
|
||||
|
||||
1. **Multiple categories match:** Union all matched tool sets. The `for cat in matched_categories` loop handles this naturally.
|
||||
|
||||
2. **No categories match:** Return empty tool set. The orchestrator loop won't start — this effectively becomes a chat message without incurring the schema tax. If the LLM needs tools anyway, it will respond with a natural language request, and the user can rephrase.
|
||||
|
||||
3. **Ambiguous short messages:** "Hey can you check something" — matches nothing, falls through to empty tools. This is correct behavior; the LLM will ask "what do you want me to check?" and the next message will have a clear intent.
|
||||
|
||||
4. **Over-broad keywords:** "search" in "search journals" could trigger both `web` and `aether`. The union handles this — both categories' tools are included, which is what you want.
|
||||
|
||||
### File to change
|
||||
|
||||
- `cortex/openai_orchestrator.py` — add `_classify_tool_categories()` function and
|
||||
wire it into the orchestration entry point before the ReAct loop
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Cache Tool Schema per Session
|
||||
|
||||
**Effort:** Medium. **Impact:** Medium.
|
||||
|
||||
### What
|
||||
|
||||
The tool schema doesn't change between rounds of the same session for a given role.
|
||||
After Phase 2 narrows it to, say, 5 tools, those 5 tool definitions are identical
|
||||
every round. Cache them.
|
||||
|
||||
### Implementation
|
||||
|
||||
Add a session-scoped cache in `openai_orchestrator.py`:
|
||||
|
||||
```python
|
||||
# Module-level cache: key = f"{session_id}:{role}:{sorted_tool_list}"
|
||||
_tool_schema_cache: dict[str, list[dict]] = {}
|
||||
|
||||
def _get_cached_tool_schema(session_id: str, role: str, tool_list: list[str] | None) -> list[dict]:
|
||||
key = f"{session_id}:{role}:{sorted(tool_list) if tool_list else 'all'}"
|
||||
if key in _tool_schema_cache:
|
||||
return _tool_schema_cache[key]
|
||||
schemas = get_openai_tools_for_role(role=role, tool_list=tool_list)
|
||||
_tool_schema_cache[key] = schemas
|
||||
return schemas
|
||||
```
|
||||
|
||||
Invalidation: Cache key includes the tool list, so if the dynamic classifier returns
|
||||
different categories on the next message, it gets a fresh cache entry. No explicit
|
||||
invalidation needed.
|
||||
|
||||
### File to change
|
||||
|
||||
- `cortex/openai_orchestrator.py` — add cache dict and lookup before calling
|
||||
`get_openai_tools_for_role()`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Reduce Default Max Rounds
|
||||
|
||||
**Effort:** Trivial. **Impact:** Low-to-medium.
|
||||
|
||||
### What
|
||||
|
||||
Most requests resolve in 1-3 tool calls. A global cap of 10 means up to 7 wasted
|
||||
schema transmissions on edge cases.
|
||||
|
||||
### Implementation
|
||||
|
||||
1. Make `max_rounds` configurable per model in the model registry (it already exists
|
||||
in some model configs — see `home/brian/model_registry.json` line 42).
|
||||
2. Read it from the model config during orchestration instead of using the global
|
||||
`.env` value.
|
||||
3. Lower the default from 10 to 5.
|
||||
|
||||
### Files to change
|
||||
|
||||
- `cortex/.env` — change `ORCHESTRATOR_MAX_ROUNDS=10` to `=5`
|
||||
- `cortex/openai_orchestrator.py` — read per-model `max_rounds` from `model_cfg`
|
||||
instead of only from settings
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — UI Improvements (Independent)
|
||||
|
||||
**Effort:** Small. **Impact:** Medium (UX).
|
||||
|
||||
### What
|
||||
|
||||
Make the tool mode indicator more obvious so the user can quickly tell whether
|
||||
they're incurring the tool tax.
|
||||
|
||||
### Ideas
|
||||
|
||||
- Change ⚡ color: green when tools are on, gray when off
|
||||
- Swap icon: ⚡ (tools) vs. 💬 (chat only)
|
||||
- Add tooltip: "Tools enabled — all 45 tool schemas sent with each message"
|
||||
- Optional: add a "Quick Question" button that sends to `/chat` directly, bypassing
|
||||
the orchestrator entirely
|
||||
|
||||
### Files to change
|
||||
|
||||
- Svelte UI component — likely `ChatInput.svelte` or the chat mode toggle component
|
||||
|
||||
---
|
||||
|
||||
## Recommended Execution Order
|
||||
|
||||
1. **Phase 1** (role filtering) — foundation. Defines the base tool set per role.
|
||||
2. **Phase 2** (keyword routing) — **the big one.** Slashes 45 tools → 3-8 for the
|
||||
vast majority of messages. Builds on Phase 1's role filtering.
|
||||
3. **Phase 4** (lower max_rounds) — trivial change, do alongside Phase 2.
|
||||
4. **Phase 3** (schema caching) — more involved, compounds savings from Phase 2.
|
||||
5. **Phase 5** (UI) — independent UX polish, can be done any time.
|
||||
|
||||
### Quick Win Path (Recommended First Session)
|
||||
|
||||
Phases 1 + 2 + 4 can be done in a single Claude Code session. They're all in
|
||||
`openai_orchestrator.py` and `model_registry.py` — the same few files. Estimated
|
||||
effort: 45-60 minutes of coding.
|
||||
|
||||
Phase 3 (caching) is a separate, focused session afterward.
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Code Locations (from grep audit 2026-05-15)
|
||||
|
||||
| What | File | Line |
|
||||
|------|------|------|
|
||||
| `get_openai_tools_for_role` definition | `cortex/tools.py` | ~540 |
|
||||
| Call site (decides active_tools) | `cortex/openai_orchestrator.py` | ~449 |
|
||||
| `_run_from_messages()` tool loop | `cortex/openai_orchestrator.py` | ~260 |
|
||||
| Role config tools field | `cortex/model_registry.py` | ~487 |
|
||||
| `get_role_config()` | `cortex/model_registry.py` | ~473 |
|
||||
| `save_role_config()` (tools allow-list) | `cortex/model_registry.py` | ~455 |
|
||||
| Global `ORCHESTRATOR_MAX_ROUNDS` | `cortex/.env` | 35 |
|
||||
| `REQUIRED_ROLES` | `cortex/model_registry.py` | 163 |
|
||||
| `DEFINED_ROLES` config | `cortex/config.py` | 80 |
|
||||
| Per-model `max_rounds` example | `home/brian/model_registry.json` | 42 |
|
||||
|
||||
## Appendix B: Token Savings Estimate
|
||||
|
||||
| Scenario | Before (per round) | After Phase 1 | After Phase 1+2 | After All Phases |
|
||||
|----------|-------------------|--------------|-----------------|-----------------|
|
||||
| "What's the weather?" | ~9K tokens | ~5K (25 tools) | ~600 (3 web tools) | ~600 (cached) |
|
||||
| "Good morning" | ~9K tokens | ~5K (25 tools) | 0 (routed to chat) | 0 |
|
||||
| "Turn off kitchen lights" | ~9K tokens | ~5K (25 tools) | ~600 (3 HA tools) | ~600 (cached) |
|
||||
| "Search journals for X" | ~9K tokens | ~5K (25 tools) | ~2K (10 aether tools) | ~2K (cached) |
|
||||
| "Create a task" | ~9K tokens | ~5K (25 tools) | ~800 (4 task tools) | ~800 (cached) |
|
||||
| "Run a SQL query" | ~9K tokens | ~5K (25 tools) | ~600 (3 db tools) | ~600 (cached) |
|
||||
|
||||
At 3 rounds per request and 50 requests/day, that's roughly **1.3M tokens/day saved**
|
||||
vs. **~13K/day after all optimizations** — a 99% reduction for casual chat, ~90% for
|
||||
most tool-using queries.
|
||||
Reference in New Issue
Block a user