feat: model registry Phase 3 — slot-based backend toggle

Backend toggle now cycles through chat role models by label instead of
cycling service type strings (auto/claude/gemini/local).

- model_registry: get_model_for_slot() — resolves a specific priority
  slot without walking the fallback chain
- llm_client: complete() gains slot param; explicit slot selection
  dispatches directly to that model with no silent fallback
- routers/chat.py: ChatRequest.slot; GET /backend returns chat_models
  [{slot, label, type}] for the UI; _stream_chat uses resolved model
  label for the response tag when a slot is pinned
- app.js: toggle loads chat_models from /backend, cycles by label,
  sends slot in chat payload; legacy model field removed from payload
- app.js: fix Gap B — agent mode placeholder no longer says "Gemini
  tool loop"; now says "orchestrator"
- DESIGN doc: updated to reflect phases 1+2 complete, catalog-as-code
  decision, Gap A/B documented, Phase 3 implementation details

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-04-27 21:43:08 -04:00
parent 3bc6b45f9f
commit 962d58d2e2
5 changed files with 248 additions and 296 deletions

View File

@@ -49,14 +49,17 @@ async def complete(
messages: list[dict], messages: list[dict],
model: str | None = None, model: str | None = None,
role: str = "chat", role: str = "chat",
slot: str | None = None,
max_tokens: int = 2048, max_tokens: int = 2048,
) -> tuple[str, str]: ) -> tuple[str, str]:
""" """
Returns (response_text, actual_backend_used). Returns (response_text, actual_backend_used).
model: explicit backend override ("claude" | "gemini" | "local") from UI toggle. slot: Phase 3 — specific role slot ("primary" | "backup_1" | "backup_2").
Resolves that exact slot, no fallback chain. Takes priority over model.
model: legacy backend override ("claude" | "gemini" | "local") from old toggle.
None = resolve via model registry for the given role. None = resolve via model registry for the given role.
role: registry role used when model is None (default: "chat"). role: registry role used for slot/auto routing (default: "chat").
""" """
import model_registry as _reg import model_registry as _reg
from persona import _user from persona import _user
@@ -64,21 +67,31 @@ async def complete(
username = _user.get() username = _user.get()
resolved_cfg: dict | None = None resolved_cfg: dict | None = None
if model in _EXPLICIT_BACKENDS: if slot is not None:
# User explicitly selected a backend in the UI # Phase 3: explicit slot selection — no fallback within the role
if model == "local": resolved_cfg = _reg.get_model_for_slot(username, role, slot)
resolved_cfg = _reg.get_best_local_model(username, role) if resolved_cfg:
if not resolved_cfg: primary = _TYPE_TO_BACKEND.get(resolved_cfg["type"], "claude")
raise RuntimeError("No local model configured — add one at /settings/models")
primary = model
else:
# Role-based routing via model registry
resolved = _reg.get_model_for_role(username, role)
if resolved:
resolved_cfg = resolved
primary = _TYPE_TO_BACKEND.get(resolved["type"], "claude")
else: else:
primary = settings.primary_backend # Slot not configured — fall through to auto routing
slot = None
if slot is None:
if model in _EXPLICIT_BACKENDS:
# Legacy: explicit backend override from old UI toggle
if model == "local":
resolved_cfg = _reg.get_best_local_model(username, role)
if not resolved_cfg:
raise RuntimeError("No local model configured — add one at /settings/models")
primary = model
else:
# Auto: role-based routing via model registry
resolved = _reg.get_model_for_role(username, role)
if resolved:
resolved_cfg = resolved
primary = _TYPE_TO_BACKEND.get(resolved["type"], "claude")
else:
primary = settings.primary_backend
fallback = _FALLBACK.get(primary, "claude") fallback = _FALLBACK.get(primary, "claude")
@@ -89,9 +102,7 @@ async def complete(
err_str = str(e) err_str = str(e)
if primary == "claude" and any(k in err_str for k in ("401", "authenticate", "expired", "OAuth")): if primary == "claude" and any(k in err_str for k in ("401", "authenticate", "expired", "OAuth")):
await event_bus.publish({"type": "claude_auth_expired"}) await event_bus.publish({"type": "claude_auth_expired"})
# Only fall back when using a default/auto backend. # Surface errors when a model is explicitly configured or a specific slot was pinned.
# If the user has explicitly configured a model via the registry,
# surface the error so they know something is wrong.
if resolved_cfg is not None: if resolved_cfg is not None:
logger.error("%s failed (no fallback — model explicitly configured): %s", primary, e) logger.error("%s failed (no fallback — model explicitly configured): %s", primary, e)
raise raise

View File

@@ -415,6 +415,23 @@ def get_best_local_model(username: str, role: str = "chat") -> dict | None:
return None return None
def get_model_for_slot(username: str, role: str, slot: str) -> dict | None:
"""
Resolve a single named priority slot from a role without walking the fallback chain.
Used by Phase 3 explicit slot selection — the user has pinned a specific model;
don't silently redirect to another slot if this one is empty or broken.
Returns None if the slot is unset or the model can't be resolved.
"""
if slot not in PRIORITY_KEYS:
return None
registry = _load(username)
model_id = registry.get("roles", {}).get(role, {}).get(slot)
if not model_id:
return None
return _resolve_model(registry, model_id)
def get_google_api_key(username: str, account_id: str | None = None) -> str | None: def get_google_api_key(username: str, account_id: str | None = None) -> str | None:
""" """
Return the best available Gemini API key for the user. Return the best available Gemini API key for the user.

View File

@@ -20,7 +20,7 @@ router = APIRouter()
def _backend_label(backend: str, username: str, role: str = "chat") -> str: def _backend_label(backend: str, username: str, role: str = "chat") -> str:
"""Human-readable label for the model that handled a request.""" """Human-readable label for the model that handled a request (legacy path)."""
if backend == "claude": if backend == "claude":
return "Claude" return "Claude"
if backend == "gemini": if backend == "gemini":
@@ -33,15 +33,24 @@ def _backend_label(backend: str, username: str, role: str = "chat") -> str:
return backend.title() return backend.title()
def _resolve_slot_label(username: str, slot: str) -> str | None:
"""Return the configured model label for a chat role slot, or None."""
cfg = model_registry.get_model_for_slot(username, "chat", slot)
if cfg:
return cfg.get("label") or cfg.get("model_name")
return None
class ChatRequest(BaseModel): class ChatRequest(BaseModel):
message: str message: str
session_id: str | None = None session_id: str | None = None
tier: int | None = None tier: int | None = None
model: str | None = None # "claude" or "gemini" to override; None = use primary_backend model: str | None = None # legacy backend override ("claude"|"gemini"|"local")
slot: str | None = None # Phase 3: role slot ("primary"|"backup_1"|"backup_2")
include_long: bool = True include_long: bool = True
include_mid: bool = True include_mid: bool = True
include_short: bool = True include_short: bool = True
off_record: bool = False # skip session log (in-memory context preserved) off_record: bool = False # skip session log (in-memory context preserved)
user: str = "scott" user: str = "scott"
persona: str = "inara" persona: str = "inara"
@@ -94,6 +103,7 @@ async def _stream_chat(req: ChatRequest):
system_prompt=system_prompt, system_prompt=system_prompt,
messages=history, messages=history,
model=req.model, model=req.model,
slot=req.slot,
)) ))
try: try:
@@ -109,7 +119,11 @@ async def _stream_chat(req: ChatRequest):
try: try:
response_text, actual_backend = task.result() response_text, actual_backend = task.result()
backend_label = _backend_label(actual_backend, user, role="chat") # Use the slot's model label when a slot was pinned; fall back to generic label
if req.slot:
backend_label = _resolve_slot_label(user, req.slot) or _backend_label(actual_backend, user)
else:
backend_label = _backend_label(actual_backend, user, role="chat")
host = platform.node() host = platform.node()
history.append({ history.append({
"role": "assistant", "role": "assistant",
@@ -164,28 +178,59 @@ _BACKEND_CYCLE = ("claude", "gemini", "local")
_BACKEND_FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"} _BACKEND_FALLBACK = {"claude": "gemini", "gemini": "claude", "local": "claude"}
def _request_user(request: Request) -> str | None:
"""Extract username from JWT cookie, or None."""
try:
token = request.cookies.get(COOKIE_NAME)
return decode_token(token) if token else None
except (jwt.InvalidTokenError, Exception):
return None
def _local_model_info(request: Request) -> dict | None: def _local_model_info(request: Request) -> dict | None:
"""Return the best local model {label, model_name} for the session user, or None.""" """Return the best local model {label, model_name} for the session user, or None."""
username = _request_user(request)
if not username:
return None
try: try:
token = request.cookies.get(COOKIE_NAME)
username = decode_token(token) if token else None
if not username:
return None
cfg = model_registry.get_best_local_model(username, "chat") cfg = model_registry.get_best_local_model(username, "chat")
if cfg: if cfg:
return {"label": cfg.get("label", ""), "model_name": cfg.get("model_name", "")} return {"label": cfg.get("label", ""), "model_name": cfg.get("model_name", "")}
except (jwt.InvalidTokenError, Exception): except Exception:
pass pass
return None return None
def _chat_models_for_toggle(username: str) -> list[dict]:
"""Return non-empty chat role slots as [{slot, label, type}] for the UI toggle."""
registry = model_registry.get_registry(username)
role_cfg = registry.get("roles", {}).get("chat", {})
result = []
for slot in model_registry.PRIORITY_KEYS[:3]:
model_id = role_cfg.get(slot)
if not model_id:
continue
resolved = model_registry._resolve_model(registry, model_id)
if resolved:
result.append({
"slot": slot,
"label": resolved.get("label") or resolved.get("model_name") or slot,
"type": resolved.get("type", ""),
})
return result
@router.get("/backend") @router.get("/backend")
async def get_backend(request: Request) -> dict: async def get_backend(request: Request) -> dict:
username = _request_user(request)
chat_models = _chat_models_for_toggle(username) if username else []
p = settings.primary_backend p = settings.primary_backend
return { return {
"primary": p, "chat_models": chat_models,
"fallback": _BACKEND_FALLBACK.get(p, "claude"), # Legacy fields kept for backward compat
"local_model": _local_model_info(request), "primary": p,
"fallback": _BACKEND_FALLBACK.get(p, "claude"),
"local_model": _local_model_info(request),
} }

View File

@@ -254,8 +254,8 @@
: 'Private note — only you see this…'; : 'Private note — only you see this…';
} else if (current_mode === 'agent') { } else if (current_mode === 'agent') {
inputEl.placeholder = ctrlEnterMode inputEl.placeholder = ctrlEnterMode
? `Task for ${personaLabel}… (Gemini tool loop — Ctrl+Enter to run)` ? `Task for ${personaLabel}… (orchestrator — Ctrl+Enter to run)`
: `Task for ${personaLabel}… (Gemini tool loop)`; : `Task for ${personaLabel}… (orchestrator)`;
} else if (current_mode === 'otr') { } else if (current_mode === 'otr') {
inputEl.placeholder = 'Off the record — not logged or distilled…'; inputEl.placeholder = 'Off the record — not logged or distilled…';
} else { } else {
@@ -340,58 +340,48 @@
} }
// ── Backend toggle ─────────────────────────────────────────── // ── Backend toggle ───────────────────────────────────────────
// null = "auto" — uses role-based routing from model registry // Phase 3: cycles through the chat role's configured models by label.
// 'claude' / 'gemini' / 'local' = explicit override // Sends slot ("primary"|"backup_1"|"backup_2") in chat requests.
// Falls back to legacy "auto" behavior when no models are configured.
// On load only fetch local_model hint; don't override primaryBackend default (null) const TYPE_CLASS = { claude_cli: '', gemini_api: 'mem-on', gemini_cli: 'mem-on', local_openai: 'local-on' };
fetch('/backend').then(r => r.json()).then(d => {
if (backendModelHint && d.local_model) {
// Pre-fill hint in case user is already in local mode
backendModelHint.textContent = d.local_model.label || d.local_model.model_name;
}
});
const BACKEND_CYCLE = [null, 'claude', 'gemini', 'local'];
const BACKEND_CLASS = { claude: '', gemini: 'mem-on', local: 'local-on' };
const backendModelHint = document.getElementById('backend-model-hint'); const backendModelHint = document.getElementById('backend-model-hint');
function setBackendUI(backend, localModel) { let chatSlots = []; // [{slot, label, type}] from /backend
primaryBackend = backend; let slotIdx = 0; // index into chatSlots; -1 = auto (no registry models)
backendToggle.textContent = backend === null ? 'auto' : backend;
const extra = backend === null ? '' : (BACKEND_CLASS[backend] || '');
backendToggle.className = 'ctx-btn' + (extra ? ' ' + extra : '');
function activeSlot() {
return chatSlots.length > 0 ? chatSlots[slotIdx] : null;
}
function setToggleUI(entry) {
if (!entry) {
backendToggle.textContent = 'auto';
backendToggle.className = 'ctx-btn';
primaryBackend = null;
} else {
backendToggle.textContent = entry.label;
backendToggle.className = 'ctx-btn ' + (TYPE_CLASS[entry.type] || '');
primaryBackend = entry.slot; // used as legacy compat in payload
}
if (backendModelHint) { if (backendModelHint) {
if (backend === 'local' && localModel) { backendModelHint.textContent = '';
backendModelHint.textContent = localModel.label || localModel.model_name; backendModelHint.style.display = 'none';
backendModelHint.style.display = '';
} else {
backendModelHint.textContent = '';
backendModelHint.style.display = 'none';
}
} }
} }
// Initialize to auto mode fetch('/backend').then(r => r.json()).then(d => {
setBackendUI(null, null); chatSlots = d.chat_models || [];
slotIdx = 0;
setToggleUI(chatSlots[0] || null);
});
backendToggle.addEventListener('click', async () => { backendToggle.addEventListener('click', () => {
const idx = BACKEND_CYCLE.indexOf(primaryBackend); if (chatSlots.length === 0) return;
const next = BACKEND_CYCLE[(idx + 1) % BACKEND_CYCLE.length]; slotIdx = (slotIdx + 1) % chatSlots.length;
if (next === null) { const entry = chatSlots[slotIdx];
// Auto: role-based routing — no server call needed setToggleUI(entry);
setBackendUI(null, null); addMessage('system', `Backend: ${entry.label}`);
addMessage('system', 'Backend: auto (role-based routing)');
} else {
const res = await fetch('/backend', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ primary: next }),
});
const d = await res.json();
setBackendUI(next, d.local_model);
addMessage('system', `Backend: ${next} (fallback: ${d.fallback})`);
}
}); });
// ── Sessions panel ─────────────────────────────────────────── // ── Sessions panel ───────────────────────────────────────────
@@ -1066,7 +1056,7 @@
include_mid: memMid, include_mid: memMid,
include_short: memShort, include_short: memShort,
off_record: current_mode === 'otr', off_record: current_mode === 'otr',
model: primaryBackend, slot: activeSlot()?.slot || null,
user: CORTEX_USER, user: CORTEX_USER,
persona: CORTEX_PERSONA, persona: CORTEX_PERSONA,
}; };

View File

@@ -1,62 +1,71 @@
# Model Registry V2 — Design Document # Model Registry V2 — Design Document
> Status: Planning / Pre-implementation > Status: Phase 3 in progress
> Goal: Unified, provider-agnostic model management with clean role-based routing > Goal: Unified, provider-agnostic model management with clean role-based routing
--- ---
## Problem Statement ## Problem Statement
The current system has two classes of models with different treatment: The original system had two classes of models with different treatment:
| Type | How configured | How selected | | Type | How configured | How selected |
|---|---|---| |---|---|---|
| Claude, Gemini | Hardcoded built-ins (`claude_cli`, `gemini_api`) | Backend toggle string ("claude"/"gemini") | | Claude, Gemini | Hardcoded built-ins (`claude_cli`, `gemini_api`) | Backend toggle string ("claude"/"gemini") |
| Local (Ollama, Open WebUI) | Configured via `/settings/local` | Backend toggle string "local" | | Local (Ollama, Open WebUI) | Configured via `/settings/local` | Backend toggle string "local" |
This breaks down when you want: This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models,
- Multiple Gemini API keys (e.g. one per Google account) role assignments spanning all provider types, or a toggle that shows which model is active
- Claude via direct API key instead of OAuth CLI instead of which service.
- OpenRouter or other hosted providers alongside local models
- Role assignments to span all provider types uniformly
- A chat toggle that shows "which model" not "which service"
--- ---
## Proposed Architecture ## Architecture
### Core concept: Providers + Credentials + Models + Roles ### Core concept: Providers + Credentials + Models + Roles
``` ```
Providers (built-in, fixed set) Providers (built-in, fixed set)
└─ Anthropic ← has a catalog of Claude model IDs └─ Anthropic ← catalog of Claude model IDs (code constants)
└─ Google ← has a catalog of Gemini model IDs └─ Google ← catalog of Gemini model IDs (code constants)
└─ Local Host ← OpenAI-compatible endpoint (user adds these) └─ Local Host ← OpenAI-compatible endpoint (user adds these)
Credentials (user-configured, per provider) Credentials (user-configured, stored in model_registry.json)
└─ Anthropic ← Claude CLI (OAuth, default) or API key └─ Anthropic ← Claude CLI (OAuth, default) API key support in Phase 4
└─ Google ← one or more API keys (one per Google account) └─ Google ← one or more API keys (one per Google account)
└─ Local Host ← api_key stored on the host record (existing) └─ Local Host ← api_key stored on the host record
Model Entries (user-registered — "I want to use this model") Model Entries (user-registered)
└─ Provider + model ID + credential = one usable model entry └─ Provider + model ID + credential = one usable model entry
└─ Same model ID with two different accounts = two model entries
Role Assignments (unified — any model entry can fill any role) Role Assignments (unified — any model entry can fill any role)
└─ chat: primary → backup_1 → backup_2 └─ chat: primary → backup_1 → backup_2
└─ orchestrator: primary → backup_1 └─ orchestrator: primary → backup_1
└─ distill: primary └─ distill: primary
└─ (etc.) └─ (etc.)
``` ```
### Backend toggle redesign ### Catalog design decision
**Current:** cycles service type strings — `auto → claude → gemini → local` Catalogs (`ANTHROPIC_CATALOG`, `GOOGLE_CATALOG`) are **Python constants** in
**New:** cycles through the chat role's assigned models — `Primary → Backup 1 → Backup 2` `model_registry.py`, not stored in the per-user JSON. Updated with each code deploy.
Per-user catalog customisation is deferred to Phase 4.
The toggle displays the active model's label (e.g. "Sonnet 4.6" / "Gemini 2.5 Flash" / "Gemma 4 E4B"). Auto defaults to Primary. ### Backend toggle redesign (Phase 3)
This means the toggle is context-free — it just picks a slot — and all the "what model, what provider, what credentials" logic lives in the registry. **Before:** cycles service type strings — `auto → claude → gemini → local`
**After:** cycles through the chat role's configured models by label:
```
Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps)
```
- Shows the resolved model label on the toggle button
- If no chat role models are configured: shows "auto", uses existing role routing
- Click skips empty slots automatically
- Color: `claude_cli` = default, `gemini_*` = blue, `local_openai` = amber
UI sends `slot: "primary" | "backup_1" | "backup_2"` (not backend type string).
`llm_client.complete()` resolves that slot from the chat role and dispatches by `type`.
--- ---
@@ -67,74 +76,22 @@ Stored in `home/{user}/model_registry.json`.
```json ```json
{ {
"version": 2, "version": 2,
"providers": { "providers": {
"anthropic": { "anthropic": {
"catalog": [ "credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}]
{"id": "claude-opus-4-7", "label": "Claude Opus 4.7", "context_k": 200},
{"id": "claude-sonnet-4-6", "label": "Claude Sonnet 4.6", "context_k": 200},
{"id": "claude-haiku-4-5", "label": "Claude Haiku 4.5", "context_k": 200}
],
"credentials": [
{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}
]
}, },
"google": { "google": {
"catalog": [ "accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}]
{"id": "gemini-2.5-pro", "label": "Gemini 2.5 Pro", "context_k": 1000},
{"id": "gemini-2.5-flash", "label": "Gemini 2.5 Flash", "context_k": 1000},
{"id": "gemini-2.0-flash", "label": "Gemini 2.0 Flash", "context_k": 1000},
{"id": "gemini-1.5-pro", "label": "Gemini 1.5 Pro", "context_k": 2000}
],
"accounts": [
{"id": "osit", "label": "One Sky IT (scott.idem@oneskyit.com)", "api_key": "AIza..."}
]
} }
}, },
"hosts": [ "hosts": [
{ {"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"}
"id": "h1",
"label": "Gaming Laptop",
"api_url": "http://192.168.x.x:3000",
"api_key": "",
"host_type": "openwebui"
}
], ],
"models": [ "models": [
{ {"id": "m1", "type": "claude_cli", "label": "Sonnet 4.6 (CLI)", "model_name": "claude-sonnet-4-6", "provider": "anthropic", "credential_id": "cli", "context_k": 1000, "tags": []},
"id": "m1", {"id": "m2", "type": "gemini_api", "label": "Gemini 2.5 Flash", "model_name": "gemini-2.5-flash", "provider": "google", "account_id": "a1b2", "context_k": 1000, "tags": []},
"label": "Sonnet 4.6 (CLI)", {"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B", "model_name": "gemma4:e4b", "provider": "local", "host_id": "h1", "context_k": 72, "tags": []}
"type": "claude_cli",
"provider": "anthropic",
"model_name": "claude-sonnet-4-6",
"credential_id": "cli",
"context_k": 200,
"tags": ["chat", "persona"]
},
{
"id": "m2",
"label": "Gemini 2.5 Flash (OSIT)",
"type": "gemini_api",
"provider": "google",
"model_name": "gemini-2.5-flash",
"account_id": "osit",
"context_k": 1000,
"tags": ["orchestrator", "research"]
},
{
"id": "m3",
"label": "Gemma 4 E4B",
"type": "local_openai",
"provider": "local",
"host_id": "h1",
"model_name": "gemma4:e4b",
"context_k": 72,
"tags": ["fast", "local"]
}
], ],
"roles": { "roles": {
"chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"}, "chat": {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
"orchestrator":{"primary": "m2", "backup_1": "m3"}, "orchestrator":{"primary": "m2", "backup_1": "m3"},
@@ -143,168 +100,100 @@ Stored in `home/{user}/model_registry.json`.
} }
``` ```
### Key differences from V1 ### Model types and dispatch
| V1 | V2 | | `type` | Dispatches via | Notes |
|---|---| |---|---|---|
| Built-ins (`claude_cli`, `gemini_api`) are hardcoded constants | All models are registry entries — built-ins become auto-populated defaults | | `claude_cli` | Claude CLI subprocess | `~/.claude/.credentials.json` OAuth |
| Single Gemini API key in `auth.json` | `providers.google.accounts[]` — list of accounts | | `gemini_cli` | Gemini CLI subprocess | |
| Role assignments only work with local models in UI | All models in all roles | | `gemini_api` | Currently: Gemini CLI (gap — see Phase 4) | Should use google-genai SDK |
| Host list only for local | Host list stays for local; `providers` section for cloud | | `local_openai` | HTTP to OpenAI-compatible endpoint | host_type controls path |
| `type` field existed but only `local_openai` was user-configurable | `type` fully determines dispatch for all models |
### Built-in model IDs
Always resolvable without a registry entry (used as `.env` role defaults):
`claude_cli`, `gemini_cli`, `gemini_api`
--- ---
## Resolution Logic (updated) ## Resolution Logic
`get_model_for_role(username, role)` stays the same interface. Internally: `get_model_for_role(username, role)` — walks `primary → backup_1 → backup_2 → backup_3 → backup_4`, returns first resolved model config with credentials merged in. Falls back to `.env` defaults, then hardcoded last-resort.
1. Walk `roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4` `get_model_for_slot(username, role, slot)` — resolves *only* the named slot, no fallback chain. Used by Phase 3 explicit slot selection.
2. For each slot: resolve the model entry → merge in credentials
3. If no registry entry for a role: fall back to `.env` defaults, then hardcoded
`_resolve_model(registry, model_id)` gains new merge cases:
- `type == "claude_cli"` → merge in credential from `providers.anthropic.credentials`
- `type == "gemini_api"` → merge in `api_key` from `providers.google.accounts[account_id]`
- `type == "local_openai"` → merge host fields (existing logic, unchanged)
### Backend toggle → dispatch
```
UI sends: slot = "primary" | "backup_1" | "backup_2" | null (auto)
```
`llm_client.complete()` resolves the slot against the `chat` role, gets a full model config, dispatches by `type`. No more `"claude"/"gemini"/"local"` string matching.
--- ---
## Routing Code Changes ## Routing Code
### `llm_client.complete()` ### `llm_client.complete()` (Phase 3 update)
- **Remove:** `model: str | None` → service type string
- **Add:** `slot: str | None = None` → role slot override ("primary"/"backup_1"/etc.)
- Dispatch table: `type` → handler
- `claude_cli``_claude()` (unchanged)
- `claude_api``_claude_api()` (new, direct Anthropic API — future phase)
- `gemini_cli``_gemini()` (unchanged)
- `gemini_api``_gemini_api()` (new, replaces current hardcoded gemini_api built-in)
- `local_openai``_local()` (unchanged)
### `orchestrator_engine.py` / `openai_orchestrator.py` ```
- Get orchestrator model via `get_model_for_role(username, "orchestrator")` slot: str | None → resolve specific slot, no fallback (explicit selection)
- Already works — `openai_orchestrator.py` runs when type is `local_openai` model: str | None → legacy backend strings, kept for backward compat
- `orchestrator_engine.py` (Gemini) runs when type is `gemini_api` (neither) → auto: role-based routing with full fallback chain
```
### Chat router (`routers/chat.py`) Dispatch table (`type` → backend function):
- Accept `slot` instead of `model` from UI - `claude_cli``_claude()`
- Pass to `llm_client.complete(slot=slot)` - `gemini_cli``_gemini()`
- `gemini_api``_gemini()`**gap: should be `_gemini_api()` (Phase 4)**
- `local_openai``_local()`
### `routers/chat.py` (Phase 3 update)
- `ChatRequest` gets `slot: str | None = None`
- `GET /backend` returns `chat_models: [{slot, label, type}]` for the UI toggle
- `_stream_chat` resolves model label from slot when `req.slot` is set
### `app.js` (Phase 3 update)
- Loads `chat_models` from `GET /backend` on page init
- Toggle cycles through `chat_models` by label, sends `slot` in chat payload
- Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator"
--- ---
## Settings UI Redesign ## Known Gaps (not yet implemented)
### New page structure ### Gap A — `gemini_api` dispatch in `llm_client` (Phase 4)
`_TYPE_TO_BACKEND` maps `gemini_api → "gemini"` (CLI subprocess). If a user assigns a
`gemini_api` type model to the `chat` role, it silently routes to the Gemini CLI instead
of the Google genai SDK. Fix: add `_gemini_api()` in `llm_client.py` that calls the SDK
directly, matching how `orchestrator_engine.py` does it. Needs API key from resolved config.
``` ### Gap B — Agent mode placeholder (Phase 3, quick fix)
/settings/models ← unified model registry (replaces /settings/local) `app.js` lines 257258 hard-code `"Gemini tool loop"`. Should say `"orchestrator"` since
Section 1: Cloud Providers the orchestrator role can now be a local model.
Anthropic
- credential: Claude CLI (OAuth) [default, always there]
- + Add API Key (future)
- model catalog [editable list of available Claude models]
Google
- accounts: [osit key ●●●●, + Add account]
- model catalog [editable list of available Gemini models]
Section 2: Local Hosts
[existing host cards, unchanged]
Section 3: Models
[unified list — all registered model entries across all providers]
+ Add Model (provider picker first, then model + credential/account dropdowns)
/settings/roles ← standalone page (or promoted to /settings/models bottom)
Role Assignments
chat: [primary ▾] [backup 1 ▾] [backup 2 ▾]
orchestrator: [primary ▾] [backup 1 ▾]
distill: [primary ▾]
(all dropdowns show all models from all providers)
```
### Backend toggle in chat UI
Replace the `claude → gemini → local → auto` cycle with:
```
[Model label] ▾ (clickable cycles through chat role slots)
```
- Shows the label of the currently active chat model
- Click cycles: Primary → Backup 1 → Backup 2 → Primary
- Slots with no model assigned are skipped
- Color: same purple/amber/slate theme, based on provider type (optional)
---
## Migration
V1 → V2 is handled in `_load()`:
1. Detect `version == 1` (or missing)
2. Synthesize `providers.anthropic` catalog from hardcoded defaults
3. Synthesize `providers.google` — migrate API key from `auth.json` as first account
4. Convert built-in role assignments (`claude_cli` / `gemini_api`) to new model entry IDs
5. Existing `hosts[]` and `local_openai` models carry over unchanged
6. Write `version: 2` and save
No data loss. Old `local_llm.json` migration path still works (V0 → V1 → V2).
--- ---
## Phases ## Phases
### Phase 1 — Data model + backend routing (no UI changes yet) ### Phase 1 — Data model + routing ✅ 2026-04-27
- Extend schema to V2 in `model_registry.py` - V2 schema with `providers` section
- Migration from V1 on first load - Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts)
- Update `_resolve_model()` to handle `gemini_api` + account lookup - `_resolve_model()` merges account API key for `gemini_api` type
- Update `llm_client.complete()` to accept `slot` parameter - `get_google_api_key()`, `save_cloud_model()`, `save/remove_google_account()`
- Update `routers/chat.py` to pass `slot` instead of backend string - Orchestrator router uses model-resolved API key
- Keep backend toggle UI working (map old strings to slots temporarily)
- **Deliverable:** routing works with multi-account Gemini, no UI changes needed yet
### Phase 2 — Cloud provider UI ### Phase 2 — Cloud provider UI ✅ 2026-04-27
- Add Anthropic and Google sections to `/settings/local` (rename to `/settings/models`) - `/settings/models` (canonical, `/settings/local` redirects)
- Google accounts: add/remove API keys with labels - Cloud Providers section: Anthropic info + Google account add/remove
- Editable model catalog for Anthropic + Google (add/remove model IDs from the list) - Add Model form with provider tabs (Local / Google / Anthropic)
- Model entry creation: provider picker → model dropdown (from catalog) → account/credential picker - Provider badges on model rows (Anthropic / Google / Local)
- **Deliverable:** can register cloud models in the UI just like local models - Settings page updated: Gemini Key section replaced by Model Registry card
### Phase 3 — Unified role assignments + toggle redesign ### Phase 3 — Toggle redesign + routing cleanup 🔄 in progress
- Promote role assignments to standalone `/settings/roles` page (or `/settings/models` bottom) - `model_registry.get_model_for_slot()` — resolve a specific slot without fallback chain
- All models from all providers appear in role selects - `llm_client.complete()` — add `slot` parameter
- Chat UI toggle: replace service-type cycle with slot cycle, show model label - `routers/chat.py``ChatRequest.slot`, extend `GET /backend`, slot label in response tag
- **Deliverable:** end-to-end unified experience - `app.js` — data-driven toggle cycling model labels; send `slot` not backend string
- Fix Gap B: agent mode placeholder
### Phase 4 — Polish + future providers ### Phase 4 — Polish + future providers
- Claude direct API key support (optional, CLI is fine for now) - Fix Gap A: `gemini_api` dispatch in `llm_client` → direct Google genai SDK for chat
- OpenRouter as a named provider (already works as a "local" host with host_type=openai — could be promoted) - Claude direct API key support (alternative to CLI OAuth)
- Model catalog sync: fetch available models from Anthropic/Google API if keys are present - OpenRouter as a named provider (already works as local host; could be promoted)
- Per-role "test" button in role assignments UI - Per-role "test" button in role assignments UI
- Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)
---
## Open Questions
1. **Claude direct API key:** Is this needed now, or is CLI OAuth sufficient for all users?
- Decision: CLI-only for Phase 1; add API key support in Phase 4 if needed
2. **Catalog management:** Should the Anthropic/Google catalogs be server-wide defaults
that users can extend, or fully per-user?
- Recommendation: ship sensible defaults in code (updated with each deploy);
users can add custom entries if needed
3. **Toggle UX:** Cycle through slot labels ("Primary / Backup 1 / Backup 2") or cycle
through model labels ("Sonnet 4.6 / Gemini 2.5 Flash / Gemma 4")?
- Model labels are more useful — clearer what you're switching to
4. **Orchestrator mode toggle:** Does agent mode also respect the slot toggle, or is it
always "use orchestrator role"?
- Keep orchestrator role separate; the UI toggle only affects `chat` role