Files

Scott Idem fc6600c33e feat: Home Assistant API tools (ha_get_state, ha_get_states, ha_call_service)

Register three HA orchestrator tools so Inara can read device states and
control devices via the HA REST API. ha_call_service requires admin role
and user confirmation. Also includes accumulated UI fixes (setProcessing
helper, wasNewSession flag cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-11 21:39:35 -04:00

8.9 KiB

Raw Blame History

Model Registry V2 — Design Document

Status: Phase 3 in progress Goal: Unified, provider-agnostic model management with clean role-based routing

Problem Statement

The original system had two classes of models with different treatment:

Type	How configured	How selected
Claude, Gemini	Hardcoded built-ins (`claude_cli`, `gemini_api`)	Backend toggle string ("claude"/"gemini")
Local (Ollama, Open WebUI)	Configured via `/settings/local`	Backend toggle string "local"

This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models, role assignments spanning all provider types, or a toggle that shows which model is active instead of which service.

Architecture

Core concept: Providers + Credentials + Models + Roles

Providers (built-in, fixed set)
  └─ Anthropic       ← catalog of Claude model IDs (code constants)
  └─ Google          ← catalog of Gemini model IDs (code constants)
  └─ Local Host      ← OpenAI-compatible endpoint (user adds these)

Credentials (user-configured, stored in model_registry.json)
  └─ Anthropic       ← Claude CLI (OAuth, default) — API key support in Phase 4
  └─ Google          ← one or more API keys (one per Google account)
  └─ Local Host      ← api_key stored on the host record

Model Entries (user-registered)
  └─ Provider + model ID + credential = one usable model entry

Role Assignments (unified — any model entry can fill any role)
  └─ chat:         primary → backup_1 → backup_2
  └─ orchestrator: primary → backup_1
  └─ distill:      primary
  └─ (etc.)

Catalog design decision

Catalogs (ANTHROPIC_CATALOG, GOOGLE_CATALOG) are Python constants in model_registry.py, not stored in the per-user JSON. Updated with each code deploy. Per-user catalog customisation is deferred to Phase 4.

Backend toggle redesign (Phase 3)

Before: cycles service type strings — auto → claude → gemini → local

After: cycles through the chat role's configured models by label:

Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps)

Shows the resolved model label on the toggle button
If no chat role models are configured: shows "auto", uses existing role routing
Click skips empty slots automatically
Color: claude_cli = default, gemini_* = blue, local_openai = amber

UI sends slot: "primary" | "backup_1" | "backup_2" (not backend type string). llm_client.complete() resolves that slot from the chat role and dispatches by type.

Data Model (V2 Schema)

Stored in home/{user}/model_registry.json.

{
  "version": 2,
  "providers": {
    "anthropic": {
      "credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}]
    },
    "google": {
      "accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}]
    }
  },
  "hosts": [
    {"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"}
  ],
  "models": [
    {"id": "m1", "type": "claude_cli",   "label": "Sonnet 4.6 (CLI)",     "model_name": "claude-sonnet-4-6",  "provider": "anthropic", "credential_id": "cli",  "context_k": 1000, "tags": []},
    {"id": "m2", "type": "gemini_api",   "label": "Gemini 2.5 Flash",     "model_name": "gemini-2.5-flash",   "provider": "google",    "account_id": "a1b2",    "context_k": 1000, "tags": []},
    {"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B",          "model_name": "gemma4:e4b",         "provider": "local",     "host_id": "h1",         "context_k": 72,   "tags": []},
    {"id": "m4", "type": "local_openai", "label": "DeepSeek: V4 Flash",   "model_name": "deepseek/deepseek-v4-flash", "provider": "local", "host_id": "h1", "context_k": 750, "reasoning_budget_tokens": 4096, "tags": ["frontier"]}
  ],
  "roles": {
    "chat":        {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
    "orchestrator":{"primary": "m2", "backup_1": "m3"},
    "distill":     {"primary": "m1"}
  }
}

Model types and dispatch

`type`	Dispatches via	Notes
`claude_cli`	Claude CLI subprocess	`~/.claude/.credentials.json` OAuth
`gemini_cli`	Gemini CLI subprocess
`gemini_api`	Currently: Gemini CLI (gap — see Phase 4)	Should use google-genai SDK
`local_openai`	HTTP to OpenAI-compatible endpoint	host_type controls path

Optional model fields

Field	Type	Default	Meaning
`context_k`	int	32	Context window in thousands of tokens. Used for compaction budget (75% of window).
`max_rounds`	int \| null	null	Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`.
`tools`	bool	true	Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request.
`reasoning_budget_tokens`	int \| null	null	Per-model reasoning/thinking budget for models that support it (e.g., DeepSeek V4 via OpenRouter). `null` = no reasoning override. When set, injected as `{"reasoning": {"budget_tokens": <value>}}` in the API call to OpenRouter-compatible endpoints.

Built-in model IDs

Always resolvable without a registry entry (used as .env role defaults): claude_cli, gemini_cli, gemini_api

Resolution Logic

get_model_for_role(username, role) — walks primary → backup_1 → backup_2 → backup_3 → backup_4, returns first resolved model config with credentials merged in. Falls back to .env defaults, then hardcoded last-resort.

get_model_for_slot(username, role, slot) — resolves only the named slot, no fallback chain. Used by Phase 3 explicit slot selection.

Routing Code

`llm_client.complete()` (Phase 3 update)

slot: str | None  → resolve specific slot, no fallback (explicit selection)
model: str | None → legacy backend strings, kept for backward compat
(neither)         → auto: role-based routing with full fallback chain

Dispatch table (type → backend function):

claude_cli → _claude()
gemini_cli → _gemini()
gemini_api → _gemini() ← gap: should be _gemini_api() (Phase 4)
local_openai → _local()

`routers/chat.py` (Phase 3 update)

ChatRequest gets slot: str | None = None
GET /backend returns chat_models: [{slot, label, type}] for the UI toggle
_stream_chat resolves model label from slot when req.slot is set

`app.js` (Phase 3 update)

Loads chat_models from GET /backend on page init
Toggle cycles through chat_models by label, sends slot in chat payload
Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator"

Known Gaps (not yet implemented)

Gap A — `gemini_api` dispatch in `llm_client` (Phase 4)

_TYPE_TO_BACKEND maps gemini_api → "gemini" (CLI subprocess). If a user assigns a gemini_api type model to the chat role, it silently routes to the Gemini CLI instead of the Google genai SDK. Fix: add _gemini_api() in llm_client.py that calls the SDK directly, matching how orchestrator_engine.py does it. Needs API key from resolved config.

Gap B — Agent mode placeholder (Phase 3, quick fix)

app.js lines 257–258 hard-code "Gemini tool loop". Should say "orchestrator" since the orchestrator role can now be a local model.

Phases

Phase 1 — Data model + routing ✅ 2026-04-27

V2 schema with providers section
Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts)
_resolve_model() merges account API key for gemini_api type
get_google_api_key(), save_cloud_model(), save/remove_google_account()
Orchestrator router uses model-resolved API key

Phase 2 — Cloud provider UI ✅ 2026-04-27

/settings/models (canonical, /settings/local redirects)
Cloud Providers section: Anthropic info + Google account add/remove
Add Model form with provider tabs (Local / Google / Anthropic)
Provider badges on model rows (Anthropic / Google / Local)
Settings page updated: Gemini Key section replaced by Model Registry card

Phase 3 — Toggle redesign + routing cleanup 🔄 in progress

model_registry.get_model_for_slot() — resolve a specific slot without fallback chain
llm_client.complete() — add slot parameter
routers/chat.py — ChatRequest.slot, extend GET /backend, slot label in response tag
app.js — data-driven toggle cycling model labels; send slot not backend string
Fix Gap B: agent mode placeholder

Phase 4 — Polish + future providers

Fix Gap A: gemini_api dispatch in llm_client → direct Google genai SDK for chat
Claude direct API key support (alternative to CLI OAuth)
OpenRouter as a named provider (already works as local host; could be promoted)
Per-role "test" button in role assignments UI
Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)

8.9 KiB Raw Blame History Unescape Escape