# Model Registry V2 — Design Document

> Status: Phase 3 in progress
> Goal: Unified, provider-agnostic model management with clean role-based routing

---

## Problem Statement

The original system had two classes of models with different treatment:

| Type | How configured | How selected |
|---|---|---|
| Claude, Gemini | Hardcoded built-ins (`claude_cli`, `gemini_api`) | Backend toggle string ("claude"/"gemini") |
| Local (Ollama, Open WebUI) | Configured via `/settings/local` | Backend toggle string "local" |

This breaks down when you want multiple Gemini API keys, OpenRouter alongside local models,
role assignments spanning all provider types, or a toggle that shows which model is active
instead of which service.

---

## Architecture

### Core concept: Providers + Credentials + Models + Roles

```
Providers (built-in, fixed set)
  └─ Anthropic       ← catalog of Claude model IDs (code constants)
  └─ Google          ← catalog of Gemini model IDs (code constants)
  └─ Local Host      ← OpenAI-compatible endpoint (user adds these)

Credentials (user-configured, stored in model_registry.json)
  └─ Anthropic       ← Claude CLI (OAuth, default) — API key support in Phase 4
  └─ Google          ← one or more API keys (one per Google account)
  └─ Local Host      ← api_key stored on the host record

Model Entries (user-registered)
  └─ Provider + model ID + credential = one usable model entry

Role Assignments (unified — any model entry can fill any role)
  └─ chat:         primary → backup_1 → backup_2
  └─ orchestrator: primary → backup_1
  └─ distill:      primary
  └─ (etc.)
```

### Catalog design decision

Catalogs (`ANTHROPIC_CATALOG`, `GOOGLE_CATALOG`) are **Python constants** in
`model_registry.py`, not stored in the per-user JSON. Updated with each code deploy.
Per-user catalog customisation is deferred to Phase 4.

### Backend toggle redesign (Phase 3)

**Before:** cycles service type strings — `auto → claude → gemini → local`

**After:** cycles through the chat role's configured models by label:
```
Sonnet 4.6 (CLI) → Gemini 2.5 Flash → Gemma 4 E4B → (wraps)
```
- Shows the resolved model label on the toggle button
- If no chat role models are configured: shows "auto", uses existing role routing
- Click skips empty slots automatically
- Color: `claude_cli` = default, `gemini_*` = blue, `local_openai` = amber

UI sends `slot: "primary" | "backup_1" | "backup_2"` (not backend type string).
`llm_client.complete()` resolves that slot from the chat role and dispatches by `type`.

---

## Data Model (V2 Schema)

Stored in `home/{user}/model_registry.json`.

```json
{
  "version": 2,
  "providers": {
    "anthropic": {
      "credentials": [{"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}]
    },
    "google": {
      "accounts": [{"id": "a1b2", "label": "One Sky IT", "api_key": "AIza..."}]
    }
  },
  "hosts": [
    {"id": "h1", "label": "Gaming Laptop", "api_url": "http://...", "api_key": "", "host_type": "openwebui"}
  ],
  "models": [
    {"id": "m1", "type": "claude_cli",   "label": "Sonnet 4.6 (CLI)",     "model_name": "claude-sonnet-4-6",  "provider": "anthropic", "credential_id": "cli",  "context_k": 1000, "tags": []},
    {"id": "m2", "type": "gemini_api",   "label": "Gemini 2.5 Flash",     "model_name": "gemini-2.5-flash",   "provider": "google",    "account_id": "a1b2",    "context_k": 1000, "tags": []},
    {"id": "m3", "type": "local_openai", "label": "Gemma 4 E4B",          "model_name": "gemma4:e4b",         "provider": "local",     "host_id": "h1",         "context_k": 72,   "tags": []},
    {"id": "m4", "type": "local_openai", "label": "DeepSeek: V4 Flash",   "model_name": "deepseek/deepseek-v4-flash", "provider": "local", "host_id": "h1", "context_k": 750, "reasoning_budget_tokens": 4096, "tags": ["frontier"]}
  ],
  "roles": {
    "chat":        {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
    "orchestrator":{"primary": "m2", "backup_1": "m3"},
    "distill":     {"primary": "m1"}
  }
}
```

### Model types and dispatch

| `type` | Dispatches via | Notes |
|---|---|---|
| `claude_cli` | Claude CLI subprocess | `~/.claude/.credentials.json` OAuth |
| `gemini_cli` | Gemini CLI subprocess | |
| `gemini_api` | Currently: Gemini CLI (gap — see Phase 4) | Should use google-genai SDK |
| `local_openai` | HTTP to OpenAI-compatible endpoint | host_type controls path |

### Optional model fields

| Field | Type | Default | Meaning |
|---|---|---|---|
| `context_k` | int | 32 | Context window in thousands of tokens. Used for compaction budget (75% of window). |
| `max_rounds` | int \| null | null | Per-model tool loop cap. `null` = use global `orchestrator_max_rounds`. Effective limit = `min(per_model, global)`. |
| `tools` | bool | true | Whether this model supports tool calling. `false` = skip tool loop entirely; model gets a plain chat request. |
| `reasoning_budget_tokens` | int \| null | null | Per-model reasoning/thinking budget for models that support it (e.g., DeepSeek V4 via OpenRouter). `null` = no reasoning override. When set, injected as `{"reasoning": {"budget_tokens": <value>}}` in the API call to OpenRouter-compatible endpoints. |

### Built-in model IDs

Always resolvable without a registry entry (used as `.env` role defaults):
`claude_cli`, `gemini_cli`, `gemini_api`

---

## Resolution Logic

`get_model_for_role(username, role)` — walks `primary → backup_1 → backup_2 → backup_3 → backup_4`, returns first resolved model config with credentials merged in. Falls back to `.env` defaults, then hardcoded last-resort.

`get_model_for_slot(username, role, slot)` — resolves *only* the named slot, no fallback chain. Used by Phase 3 explicit slot selection.

---

## Routing Code

### `llm_client.complete()` (Phase 3 update)

```
slot: str | None  → resolve specific slot, no fallback (explicit selection)
model: str | None → legacy backend strings, kept for backward compat
(neither)         → auto: role-based routing with full fallback chain
```

Dispatch table (`type` → backend function):
- `claude_cli`   → `_claude()`
- `gemini_cli`   → `_gemini()`
- `gemini_api`   → `_gemini()` ← **gap: should be `_gemini_api()` (Phase 4)**
- `local_openai` → `_local()`

### `routers/chat.py` (Phase 3 update)

- `ChatRequest` gets `slot: str | None = None`
- `GET /backend` returns `chat_models: [{slot, label, type}]` for the UI toggle
- `_stream_chat` resolves model label from slot when `req.slot` is set

### `app.js` (Phase 3 update)

- Loads `chat_models` from `GET /backend` on page init
- Toggle cycles through `chat_models` by label, sends `slot` in chat payload
- Agent mode placeholder: remove "Gemini tool loop" hardcode → "orchestrator"

---

## Known Gaps (not yet implemented)

### Gap A — `gemini_api` dispatch in `llm_client` (Phase 4)
`_TYPE_TO_BACKEND` maps `gemini_api → "gemini"` (CLI subprocess). If a user assigns a
`gemini_api` type model to the `chat` role, it silently routes to the Gemini CLI instead
of the Google genai SDK. Fix: add `_gemini_api()` in `llm_client.py` that calls the SDK
directly, matching how `orchestrator_engine.py` does it. Needs API key from resolved config.

### Gap B — Agent mode placeholder (Phase 3, quick fix)
`app.js` lines 257–258 hard-code `"Gemini tool loop"`. Should say `"orchestrator"` since
the orchestrator role can now be a local model.

---

## Phases

### Phase 1 — Data model + routing ✅ 2026-04-27
- V2 schema with `providers` section
- Auto migration V1→V2 (pulls gemini_api_key from auth.json → Google accounts)
- `_resolve_model()` merges account API key for `gemini_api` type
- `get_google_api_key()`, `save_cloud_model()`, `save/remove_google_account()`
- Orchestrator router uses model-resolved API key

### Phase 2 — Cloud provider UI ✅ 2026-04-27
- `/settings/models` (canonical, `/settings/local` redirects)
- Cloud Providers section: Anthropic info + Google account add/remove
- Add Model form with provider tabs (Local / Google / Anthropic)
- Provider badges on model rows (Anthropic / Google / Local)
- Settings page updated: Gemini Key section replaced by Model Registry card

### Phase 3 — Toggle redesign + routing cleanup 🔄 in progress
- `model_registry.get_model_for_slot()` — resolve a specific slot without fallback chain
- `llm_client.complete()` — add `slot` parameter
- `routers/chat.py` — `ChatRequest.slot`, extend `GET /backend`, slot label in response tag
- `app.js` — data-driven toggle cycling model labels; send `slot` not backend string
- Fix Gap B: agent mode placeholder

### Phase 4 — Polish + future providers
- Fix Gap A: `gemini_api` dispatch in `llm_client` → direct Google genai SDK for chat
- Claude direct API key support (alternative to CLI OAuth)
- OpenRouter as a named provider (already works as local host; could be promoted)
- Per-role "test" button in role assignments UI
- Per-user catalog additions (extend ANTHROPIC_CATALOG / GOOGLE_CATALOG from UI)