Files
Cortex-Inara/documentation/DESIGN__Model_Registry_V2.md
Scott Idem fc6600c33e feat: Home Assistant API tools (ha_get_state, ha_get_states, ha_call_service)
Register three HA orchestrator tools so Inara can read device states and
control devices via the HA REST API. ha_call_service requires admin role
and user confirmation. Also includes accumulated UI fixes (setProcessing
helper, wasNewSession flag cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 21:39:35 -04:00

8.9 KiB
Raw Blame History

Model Registry V2 — Design Document

Status: Phase 3 in progress Goal: Unified, provider-agnostic model management with clean role-based routing


Problem Statement

The original system had two classes of models with different treatment:

Type How configured How selected
Claude, Gemini Hardcoded built-ins (claude_cli, gemini_api) Backend toggle string ("claude"/"gemini")
Local (Ollama, Open WebUI) Configured via /settings/local Backend toggle string "local"

This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models, role assignments spanning all provider types, or a toggle that shows which model is active instead of which service.


Architecture

Core concept: Providers + Credentials + Models + Roles

Providers (built-in, fixed set)
  └─ Anthropic       ← catalog of Claude model IDs (code constants)
  └─ Google          ← catalog of Gemini model IDs (code constants)
  └─ Local Host      ← OpenAI-compatible endpoint (user adds these)

Credentials (user-configured, stored in model_registry.json)
  └─ Anthropic       ← Claude CLI (OAuth, default) — API key support in Phase 4
  └─ Google          ← one or more API keys (one per Google account)
  └─ Local Host      ← api_key stored on the host record

Model Entries (user-registered)
  └─ Provider + model ID + credential = one usable model entry

Role Assignments (unified — any model entry can fill any role)
  └─ chat:         primary → backup_1 → backup_2
  └─ orchestrator: primary → backup_1
  └─ distill:      primary
  └─ (etc.)

Catalog design decision

Catalogs (ANTHROPIC_CATALOG, GOOGLE_CATALOG) are Python constants in model_registry.py, not stored in the per-user JSON. Updated with each code deploy. Per-user catalog customisation is deferred to Phase 4.

Backend toggle redesign (Phase 3)

Before: cycles service type strings — auto → claude → gemini → local

After: cycles through the chat role's configured models by label:

Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps)
  • Shows the resolved model label on the toggle button
  • If no chat role models are configured: shows "auto", uses existing role routing
  • Click skips empty slots automatically
  • Color: claude_cli = default, gemini_* = blue, local_openai = amber

UI sends slot: "primary" | "backup_1" | "backup_2" (not backend type string). llm_client.complete() resolves that slot from the chat role and dispatches by type.


Data Model (V2 Schema)

Stored in home/{user}/model_registry.json.

{
  "version": 2,
  "providers": {
    "anthropic": {
      "credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}]
    },
    "google": {
      "accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}]
    }
  },
  "hosts": [
    {"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"}
  ],
  "models": [
    {"id": "m1", "type": "claude_cli",   "label": "Sonnet 4.6 (CLI)",     "model_name": "claude-sonnet-4-6",  "provider": "anthropic", "credential_id": "cli",  "context_k": 1000, "tags": []},
    {"id": "m2", "type": "gemini_api",   "label": "Gemini 2.5 Flash",     "model_name": "gemini-2.5-flash",   "provider": "google",    "account_id": "a1b2",    "context_k": 1000, "tags": []},
    {"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B",          "model_name": "gemma4:e4b",         "provider": "local",     "host_id": "h1",         "context_k": 72,   "tags": []},
    {"id": "m4", "type": "local_openai", "label": "DeepSeek: V4 Flash",   "model_name": "deepseek/deepseek-v4-flash", "provider": "local", "host_id": "h1", "context_k": 750, "reasoning_budget_tokens": 4096, "tags": ["frontier"]}
  ],
  "roles": {
    "chat":        {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
    "orchestrator":{"primary": "m2", "backup_1": "m3"},
    "distill":     {"primary": "m1"}
  }
}

Model types and dispatch

type Dispatches via Notes
claude_cli Claude CLI subprocess ~/.claude/.credentials.json OAuth
gemini_cli Gemini CLI subprocess
gemini_api Currently: Gemini CLI (gap — see Phase 4) Should use google-genai SDK
local_openai HTTP to OpenAI-compatible endpoint host_type controls path

Optional model fields

Field Type Default Meaning
context_k int 32 Context window in thousands of tokens. Used for compaction budget (75% of window).
max_rounds int | null null Per-model tool loop cap. null = use global orchestrator_max_rounds. Effective limit = min(per_model, global).
tools bool true Whether this model supports tool calling. false = skip tool loop entirely; model gets a plain chat request.
reasoning_budget_tokens int | null null Per-model reasoning/thinking budget for models that support it (e.g., DeepSeek V4 via OpenRouter). null = no reasoning override. When set, injected as {"reasoning": {"budget_tokens": <value>}} in the API call to OpenRouter-compatible endpoints.

Built-in model IDs

Always resolvable without a registry entry (used as .env role defaults): claude_cli, gemini_cli, gemini_api


Resolution Logic

get_model_for_role(username, role) — walks primary → backup_1 → backup_2 → backup_3 → backup_4, returns first resolved model config with credentials merged in. Falls back to .env defaults, then hardcoded last-resort.

get_model_for_slot(username, role, slot) — resolves only the named slot, no fallback chain. Used by Phase 3 explicit slot selection.


Routing Code

llm_client.complete() (Phase 3 update)

slot: str | None  → resolve specific slot, no fallback (explicit selection)
model: str | None → legacy backend strings, kept for backward compat
(neither)         → auto: role-based routing with full fallback chain

Dispatch table (type → backend function):

  • claude_cli_claude()
  • gemini_cli_gemini()
  • gemini_api_gemini()gap: should be _gemini_api() (Phase 4)
  • local_openai_local()

routers/chat.py (Phase 3 update)

  • ChatRequest gets slot: str | None = None
  • GET /backend returns chat_models: [{slot, label, type}] for the UI toggle
  • _stream_chat resolves model label from slot when req.slot is set

app.js (Phase 3 update)

  • Loads chat_models from GET /backend on page init
  • Toggle cycles through chat_models by label, sends slot in chat payload
  • Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator"

Known Gaps (not yet implemented)

Gap A — gemini_api dispatch in llm_client (Phase 4)

_TYPE_TO_BACKEND maps gemini_api → "gemini" (CLI subprocess). If a user assigns a gemini_api type model to the chat role, it silently routes to the Gemini CLI instead of the Google genai SDK. Fix: add _gemini_api() in llm_client.py that calls the SDK directly, matching how orchestrator_engine.py does it. Needs API key from resolved config.

Gap B — Agent mode placeholder (Phase 3, quick fix)

app.js lines 257258 hard-code "Gemini tool loop". Should say "orchestrator" since the orchestrator role can now be a local model.


Phases

Phase 1 — Data model + routing 2026-04-27

  • V2 schema with providers section
  • Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts)
  • _resolve_model() merges account API key for gemini_api type
  • get_google_api_key(), save_cloud_model(), save/remove_google_account()
  • Orchestrator router uses model-resolved API key

Phase 2 — Cloud provider UI 2026-04-27

  • /settings/models (canonical, /settings/local redirects)
  • Cloud Providers section: Anthropic info + Google account add/remove
  • Add Model form with provider tabs (Local / Google / Anthropic)
  • Provider badges on model rows (Anthropic / Google / Local)
  • Settings page updated: Gemini Key section replaced by Model Registry card

Phase 3 — Toggle redesign + routing cleanup 🔄 in progress

  • model_registry.get_model_for_slot() — resolve a specific slot without fallback chain
  • llm_client.complete() — add slot parameter
  • routers/chat.pyChatRequest.slot, extend GET /backend, slot label in response tag
  • app.js — data-driven toggle cycling model labels; send slot not backend string
  • Fix Gap B: agent mode placeholder

Phase 4 — Polish + future providers

  • Fix Gap A: gemini_api dispatch in llm_client → direct Google genai SDK for chat
  • Claude direct API key support (alternative to CLI OAuth)
  • OpenRouter as a named provider (already works as local host; could be promoted)
  • Per-role "test" button in role assignments UI
  • Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)