Backend toggle now cycles through chat role models by label instead of
cycling service type strings (auto/claude/gemini/local).
- model_registry: get_model_for_slot() — resolves a specific priority
slot without walking the fallback chain
- llm_client: complete() gains slot param; explicit slot selection
dispatches directly to that model with no silent fallback
- routers/chat.py: ChatRequest.slot; GET /backend returns chat_models
[{slot, label, type}] for the UI; _stream_chat uses resolved model
label for the response tag when a slot is pinned
- app.js: toggle loads chat_models from /backend, cycles by label,
sends slot in chat payload; legacy model field removed from payload
- app.js: fix Gap B — agent mode placeholder no longer says "Gemini
tool loop"; now says "orchestrator"
- DESIGN doc: updated to reflect phases 1+2 complete, catalog-as-code
decision, Gap A/B documented, Phase 3 implementation details
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.9 KiB
Model Registry V2 — Design Document
Status: Phase 3 in progress Goal: Unified, provider-agnostic model management with clean role-based routing
Problem Statement
The original system had two classes of models with different treatment:
| Type | How configured | How selected |
|---|---|---|
| Claude, Gemini | Hardcoded built-ins (claude_cli, gemini_api) |
Backend toggle string ("claude"/"gemini") |
| Local (Ollama, Open WebUI) | Configured via /settings/local |
Backend toggle string "local" |
This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models, role assignments spanning all provider types, or a toggle that shows which model is active instead of which service.
Architecture
Core concept: Providers + Credentials + Models + Roles
Providers (built-in, fixed set)
└─ Anthropic ← catalog of Claude model IDs (code constants)
└─ Google ← catalog of Gemini model IDs (code constants)
└─ Local Host ← OpenAI-compatible endpoint (user adds these)
Credentials (user-configured, stored in model_registry.json)
└─ Anthropic ← Claude CLI (OAuth, default) — API key support in Phase 4
└─ Google ← one or more API keys (one per Google account)
└─ Local Host ← api_key stored on the host record
Model Entries (user-registered)
└─ Provider + model ID + credential = one usable model entry
Role Assignments (unified — any model entry can fill any role)
└─ chat: primary → backup_1 → backup_2
└─ orchestrator: primary → backup_1
└─ distill: primary
└─ (etc.)
Catalog design decision
Catalogs (ANTHROPIC_CATALOG, GOOGLE_CATALOG) are Python constants in
model_registry.py, not stored in the per-user JSON. Updated with each code deploy.
Per-user catalog customisation is deferred to Phase 4.
Backend toggle redesign (Phase 3)
Before: cycles service type strings — auto → claude → gemini → local
After: cycles through the chat role's configured models by label:
Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps)
- Shows the resolved model label on the toggle button
- If no chat role models are configured: shows "auto", uses existing role routing
- Click skips empty slots automatically
- Color:
claude_cli= default,gemini_*= blue,local_openai= amber
UI sends slot: "primary" | "backup_1" | "backup_2" (not backend type string).
llm_client.complete() resolves that slot from the chat role and dispatches by type.
Data Model (V2 Schema)
Stored in home/{user}/model_registry.json.
{
"version": 2,
"providers": {
"anthropic": {
"credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}]
},
"google": {
"accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}]
}
},
"hosts": [
{"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"}
],
"models": [
{"id": "m1", "type": "claude_cli", "label": "Sonnet 4.6 (CLI)", "model_name": "claude-sonnet-4-6", "provider": "anthropic", "credential_id": "cli", "context_k": 1000, "tags": []},
{"id": "m2", "type": "gemini_api", "label": "Gemini 2.5 Flash", "model_name": "gemini-2.5-flash", "provider": "google", "account_id": "a1b2", "context_k": 1000, "tags": []},
{"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B", "model_name": "gemma4:e4b", "provider": "local", "host_id": "h1", "context_k": 72, "tags": []}
],
"roles": {
"chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
"orchestrator":{"primary": "m2", "backup_1": "m3"},
"distill": {"primary": "m1"}
}
}
Model types and dispatch
type |
Dispatches via | Notes |
|---|---|---|
claude_cli |
Claude CLI subprocess | ~/.claude/.credentials.json OAuth |
gemini_cli |
Gemini CLI subprocess | |
gemini_api |
Currently: Gemini CLI (gap — see Phase 4) | Should use google-genai SDK |
local_openai |
HTTP to OpenAI-compatible endpoint | host_type controls path |
Built-in model IDs
Always resolvable without a registry entry (used as .env role defaults):
claude_cli, gemini_cli, gemini_api
Resolution Logic
get_model_for_role(username, role) — walks primary → backup_1 → backup_2 → backup_3 → backup_4, returns first resolved model config with credentials merged in. Falls back to .env defaults, then hardcoded last-resort.
get_model_for_slot(username, role, slot) — resolves only the named slot, no fallback chain. Used by Phase 3 explicit slot selection.
Routing Code
llm_client.complete() (Phase 3 update)
slot: str | None → resolve specific slot, no fallback (explicit selection)
model: str | None → legacy backend strings, kept for backward compat
(neither) → auto: role-based routing with full fallback chain
Dispatch table (type → backend function):
claude_cli→_claude()gemini_cli→_gemini()gemini_api→_gemini()← gap: should be_gemini_api()(Phase 4)local_openai→_local()
routers/chat.py (Phase 3 update)
ChatRequestgetsslot: str | None = NoneGET /backendreturnschat_models: [{slot, label, type}]for the UI toggle_stream_chatresolves model label from slot whenreq.slotis set
app.js (Phase 3 update)
- Loads
chat_modelsfromGET /backendon page init - Toggle cycles through
chat_modelsby label, sendsslotin chat payload - Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator"
Known Gaps (not yet implemented)
Gap A — gemini_api dispatch in llm_client (Phase 4)
_TYPE_TO_BACKEND maps gemini_api → "gemini" (CLI subprocess). If a user assigns a
gemini_api type model to the chat role, it silently routes to the Gemini CLI instead
of the Google genai SDK. Fix: add _gemini_api() in llm_client.py that calls the SDK
directly, matching how orchestrator_engine.py does it. Needs API key from resolved config.
Gap B — Agent mode placeholder (Phase 3, quick fix)
app.js lines 257–258 hard-code "Gemini tool loop". Should say "orchestrator" since
the orchestrator role can now be a local model.
Phases
Phase 1 — Data model + routing ✅ 2026-04-27
- V2 schema with
providerssection - Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts)
_resolve_model()merges account API key forgemini_apitypeget_google_api_key(),save_cloud_model(),save/remove_google_account()- Orchestrator router uses model-resolved API key
Phase 2 — Cloud provider UI ✅ 2026-04-27
/settings/models(canonical,/settings/localredirects)- Cloud Providers section: Anthropic info + Google account add/remove
- Add Model form with provider tabs (Local / Google / Anthropic)
- Provider badges on model rows (Anthropic / Google / Local)
- Settings page updated: Gemini Key section replaced by Model Registry card
Phase 3 — Toggle redesign + routing cleanup 🔄 in progress
model_registry.get_model_for_slot()— resolve a specific slot without fallback chainllm_client.complete()— addslotparameterrouters/chat.py—ChatRequest.slot, extendGET /backend, slot label in response tagapp.js— data-driven toggle cycling model labels; sendslotnot backend string- Fix Gap B: agent mode placeholder
Phase 4 — Polish + future providers
- Fix Gap A:
gemini_apidispatch inllm_client→ direct Google genai SDK for chat - Claude direct API key support (alternative to CLI OAuth)
- OpenRouter as a named provider (already works as local host; could be promoted)
- Per-role "test" button in role assignments UI
- Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)