# Model Registry V2 — Design Document > Status: Phase 3 in progress > Goal: Unified, provider-agnostic model management with clean role-based routing --- ## Problem Statement The original system had two classes of models with different treatment: | Type | How configured | How selected | |---|---|---| | Claude, Gemini | Hardcoded built-ins (`claude_cli`, `gemini_api`) | Backend toggle string ("claude"/"gemini") | | Local (Ollama, Open WebUI) | Configured via `/settings/local` | Backend toggle string "local" | This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models, role assignments spanning all provider types, or a toggle that shows which model is active instead of which service. --- ## Architecture ### Core concept: Providers + Credentials + Models + Roles ``` Providers (built-in, fixed set) └─ Anthropic ← catalog of Claude model IDs (code constants) └─ Google ← catalog of Gemini model IDs (code constants) └─ Local Host ← OpenAI-compatible endpoint (user adds these) Credentials (user-configured, stored in model_registry.json) └─ Anthropic ← Claude CLI (OAuth, default) — API key support in Phase 4 └─ Google ← one or more API keys (one per Google account) └─ Local Host ← api_key stored on the host record Model Entries (user-registered) └─ Provider + model ID + credential = one usable model entry Role Assignments (unified — any model entry can fill any role) └─ chat: primary → backup_1 → backup_2 └─ orchestrator: primary → backup_1 └─ distill: primary └─ (etc.) ``` ### Catalog design decision Catalogs (`ANTHROPIC_CATALOG`, `GOOGLE_CATALOG`) are **Python constants** in `model_registry.py`, not stored in the per-user JSON. Updated with each code deploy. Per-user catalog customisation is deferred to Phase 4. ### Backend toggle redesign (Phase 3) **Before:** cycles service type strings — `auto → claude → gemini → local` **After:** cycles through the chat role's configured models by label: ``` Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps) ``` - Shows the resolved model label on the toggle button - If no chat role models are configured: shows "auto", uses existing role routing - Click skips empty slots automatically - Color: `claude_cli` = default, `gemini_*` = blue, `local_openai` = amber UI sends `slot: "primary" | "backup_1" | "backup_2"` (not backend type string). `llm_client.complete()` resolves that slot from the chat role and dispatches by `type`. --- ## Data Model (V2 Schema) Stored in `home/{user}/model_registry.json`. ```json { "version": 2, "providers": { "anthropic": { "credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}] }, "google": { "accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}] } }, "hosts": [ {"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"} ], "models": [ {"id": "m1", "type": "claude_cli", "label": "Sonnet 4.6 (CLI)", "model_name": "claude-sonnet-4-6", "provider": "anthropic", "credential_id": "cli", "context_k": 1000, "tags": []}, {"id": "m2", "type": "gemini_api", "label": "Gemini 2.5 Flash", "model_name": "gemini-2.5-flash", "provider": "google", "account_id": "a1b2", "context_k": 1000, "tags": []}, {"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B", "model_name": "gemma4:e4b", "provider": "local", "host_id": "h1", "context_k": 72, "tags": []}, {"id": "m4", "type": "local_openai", "label": "DeepSeek: V4 Flash", "model_name": "deepseek/deepseek-v4-flash", "provider": "local", "host_id": "h1", "context_k": 750, "reasoning_budget_tokens": 4096, "tags": ["frontier"]} ], "roles": { "chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"}, "orchestrator":{"primary": "m2", "backup_1": "m3"}, "distill": {"primary": "m1"} } } ``` ### Model types and dispatch | `type` | Dispatches via | Notes | |---|---|---| | `claude_cli` | Claude CLI subprocess | `~/.claude/.credentials.json` OAuth | | `gemini_cli` | Gemini CLI subprocess | | | `gemini_api` | Currently: Gemini CLI (gap — see Phase 4) | Should use google-genai SDK | | `local_openai` | HTTP to OpenAI-compatible endpoint | host_type controls path | ### Optional model fields | Field | Type | Default | Meaning | |---|---|---|---| | `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). | | `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. | | `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. | | `reasoning_budget_tokens` | int \| null | null | Per-model reasoning/thinking budget for models that support it (e.g., DeepSeek V4 via OpenRouter). `null` = no reasoning override. When set, injected as `{"reasoning": {"budget_tokens": }}` in the API call to OpenRouter-compatible endpoints. | ### Built-in model IDs Always resolvable without a registry entry (used as `.env` role defaults): `claude_cli`, `gemini_cli`, `gemini_api` --- ## Resolution Logic `get_model_for_role(username, role)` — walks `primary → backup_1 → backup_2 → backup_3 → backup_4`, returns first resolved model config with credentials merged in. Falls back to `.env` defaults, then hardcoded last-resort. `get_model_for_slot(username, role, slot)` — resolves *only* the named slot, no fallback chain. Used by Phase 3 explicit slot selection. --- ## Routing Code ### `llm_client.complete()` (Phase 3 update) ``` slot: str | None → resolve specific slot, no fallback (explicit selection) model: str | None → legacy backend strings, kept for backward compat (neither) → auto: role-based routing with full fallback chain ``` Dispatch table (`type` → backend function): - `claude_cli` → `_claude()` - `gemini_cli` → `_gemini()` - `gemini_api` → `_gemini()` ← **gap: should be `_gemini_api()` (Phase 4)** - `local_openai` → `_local()` ### `routers/chat.py` (Phase 3 update) - `ChatRequest` gets `slot: str | None = None` - `GET /backend` returns `chat_models: [{slot, label, type}]` for the UI toggle - `_stream_chat` resolves model label from slot when `req.slot` is set ### `app.js` (Phase 3 update) - Loads `chat_models` from `GET /backend` on page init - Toggle cycles through `chat_models` by label, sends `slot` in chat payload - Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator" --- ## Known Gaps (not yet implemented) ### Gap A — `gemini_api` dispatch in `llm_client` (Phase 4) `_TYPE_TO_BACKEND` maps `gemini_api → "gemini"` (CLI subprocess). If a user assigns a `gemini_api` type model to the `chat` role, it silently routes to the Gemini CLI instead of the Google genai SDK. Fix: add `_gemini_api()` in `llm_client.py` that calls the SDK directly, matching how `orchestrator_engine.py` does it. Needs API key from resolved config. ### Gap B — Agent mode placeholder (Phase 3, quick fix) `app.js` lines 257–258 hard-code `"Gemini tool loop"`. Should say `"orchestrator"` since the orchestrator role can now be a local model. --- ## Phases ### Phase 1 — Data model + routing ✅ 2026-04-27 - V2 schema with `providers` section - Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts) - `_resolve_model()` merges account API key for `gemini_api` type - `get_google_api_key()`, `save_cloud_model()`, `save/remove_google_account()` - Orchestrator router uses model-resolved API key ### Phase 2 — Cloud provider UI ✅ 2026-04-27 - `/settings/models` (canonical, `/settings/local` redirects) - Cloud Providers section: Anthropic info + Google account add/remove - Add Model form with provider tabs (Local / Google / Anthropic) - Provider badges on model rows (Anthropic / Google / Local) - Settings page updated: Gemini Key section replaced by Model Registry card ### Phase 3 — Toggle redesign + routing cleanup 🔄 in progress - `model_registry.get_model_for_slot()` — resolve a specific slot without fallback chain - `llm_client.complete()` — add `slot` parameter - `routers/chat.py` — `ChatRequest.slot`, extend `GET /backend`, slot label in response tag - `app.js` — data-driven toggle cycling model labels; send `slot` not backend string - Fix Gap B: agent mode placeholder ### Phase 4 — Polish + future providers - Fix Gap A: `gemini_api` dispatch in `llm_client` → direct Google genai SDK for chat - Claude direct API key support (alternative to CLI OAuth) - OpenRouter as a named provider (already works as local host; could be promoted) - Per-role "test" button in role assignments UI - Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)