Files
Cortex-Inara/documentation/DESIGN__Model_Registry_V2.md
Scott Idem 45c95d20ba feat: model registry V2 — provider-aware schema with multi-account support
Adds a providers section to the per-user model registry for Anthropic and
Google as first-class providers alongside local hosts. Google accounts
(API keys) are now stored as a list so multiple Google accounts can coexist.

Changes:
- model_registry.py: V2 schema, auto migration V1→V2 (pulls gemini_api_key
  from auth.json into providers.google.accounts), _resolve_model() merges
  account API key for gemini_api type models
- routers/orchestrator.py: uses model-resolved api_key when orchestrator
  role resolves to a gemini_api model with account_id
- ANTHROPIC_CATALOG and GOOGLE_CATALOG constants for model picker (Phase 2)
- New functions: get_google_api_key(), save/remove_google_account(), get_catalog()
- Documentation: ARCH__BACKENDS.md updated to V2 schema, DESIGN doc added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 20:21:04 -04:00

11 KiB

Model Registry V2 — Design Document

Status: Planning / Pre-implementation Goal: Unified, provider-agnostic model management with clean role-based routing


Problem Statement

The current system has two classes of models with different treatment:

Type How configured How selected
Claude, Gemini Hardcoded built-ins (claude_cli, gemini_api) Backend toggle string ("claude"/"gemini")
Local (Ollama, Open WebUI) Configured via /settings/local Backend toggle string "local"

This breaks down when you want:

  • Multiple Gemini API keys (e.g. one per Google account)
  • Claude via direct API key instead of OAuth CLI
  • OpenRouter or other hosted providers alongside local models
  • Role assignments to span all provider types uniformly
  • A chat toggle that shows "which model" not "which service"

Proposed Architecture

Core concept: Providers + Credentials + Models + Roles

Providers (built-in, fixed set)
  └─ Anthropic       ← has a catalog of Claude model IDs
  └─ Google          ← has a catalog of Gemini model IDs
  └─ Local Host      ← OpenAI-compatible endpoint (user adds these)

Credentials (user-configured, per provider)
  └─ Anthropic       ← Claude CLI (OAuth, default) or API key
  └─ Google          ← one or more API keys (one per Google account)
  └─ Local Host      ← api_key stored on the host record (existing)

Model Entries (user-registered — "I want to use this model")
  └─ Provider + model ID + credential = one usable model entry
  └─ Same model ID with two different accounts = two model entries

Role Assignments (unified — any model entry can fill any role)
  └─ chat:        primary → backup_1 → backup_2
  └─ orchestrator: primary → backup_1
  └─ distill:     primary
  └─ (etc.)

Backend toggle redesign

Current: cycles service type strings — auto → claude → gemini → local New: cycles through the chat role's assigned models — Primary → Backup 1 → Backup 2

The toggle displays the active model's label (e.g. "Sonnet 4.6" / "Gemini 2.5 Flash" / "Gemma 4 E4B"). Auto defaults to Primary.

This means the toggle is context-free — it just picks a slot — and all the "what model, what provider, what credentials" logic lives in the registry.


Data Model (V2 Schema)

Stored in home/{user}/model_registry.json.

{
  "version": 2,

  "providers": {
    "anthropic": {
      "catalog": [
        {"id": "claude-opus-4-7",    "label": "Claude Opus 4.7",    "context_k": 200},
        {"id": "claude-sonnet-4-6",  "label": "Claude Sonnet 4.6",  "context_k": 200},
        {"id": "claude-haiku-4-5",   "label": "Claude Haiku 4.5",   "context_k": 200}
      ],
      "credentials": [
        {"id": "cli", "label": "Claude CLI (OAuth)", "type": "cli"}
      ]
    },
    "google": {
      "catalog": [
        {"id": "gemini-2.5-pro",   "label": "Gemini 2.5 Pro",   "context_k": 1000},
        {"id": "gemini-2.5-flash", "label": "Gemini 2.5 Flash", "context_k": 1000},
        {"id": "gemini-2.0-flash", "label": "Gemini 2.0 Flash", "context_k": 1000},
        {"id": "gemini-1.5-pro",   "label": "Gemini 1.5 Pro",   "context_k": 2000}
      ],
      "accounts": [
        {"id": "osit", "label": "One Sky IT (scott.idem@oneskyit.com)", "api_key": "AIza..."}
      ]
    }
  },

  "hosts": [
    {
      "id": "h1",
      "label": "Gaming Laptop",
      "api_url": "http://192.168.x.x:3000",
      "api_key": "",
      "host_type": "openwebui"
    }
  ],

  "models": [
    {
      "id": "m1",
      "label": "Sonnet 4.6 (CLI)",
      "type": "claude_cli",
      "provider": "anthropic",
      "model_name": "claude-sonnet-4-6",
      "credential_id": "cli",
      "context_k": 200,
      "tags": ["chat", "persona"]
    },
    {
      "id": "m2",
      "label": "Gemini 2.5 Flash (OSIT)",
      "type": "gemini_api",
      "provider": "google",
      "model_name": "gemini-2.5-flash",
      "account_id": "osit",
      "context_k": 1000,
      "tags": ["orchestrator", "research"]
    },
    {
      "id": "m3",
      "label": "Gemma 4 E4B",
      "type": "local_openai",
      "provider": "local",
      "host_id": "h1",
      "model_name": "gemma4:e4b",
      "context_k": 72,
      "tags": ["fast", "local"]
    }
  ],

  "roles": {
    "chat":        {"primary": "m1", "backup_1": "m2", "backup_2": "m3"},
    "orchestrator":{"primary": "m2", "backup_1": "m3"},
    "distill":     {"primary": "m1"}
  }
}

Key differences from V1

V1 V2
Built-ins (claude_cli, gemini_api) are hardcoded constants All models are registry entries — built-ins become auto-populated defaults
Single Gemini API key in auth.json providers.google.accounts[] — list of accounts
Role assignments only work with local models in UI All models in all roles
Host list only for local Host list stays for local; providers section for cloud
type field existed but only local_openai was user-configurable type fully determines dispatch for all models

Resolution Logic (updated)

get_model_for_role(username, role) stays the same interface. Internally:

  1. Walk roles[role].primary → backup_1 → backup_2 → backup_3 → backup_4
  2. For each slot: resolve the model entry → merge in credentials
  3. If no registry entry for a role: fall back to .env defaults, then hardcoded

_resolve_model(registry, model_id) gains new merge cases:

  • type == "claude_cli" → merge in credential from providers.anthropic.credentials
  • type == "gemini_api" → merge in api_key from providers.google.accounts[account_id]
  • type == "local_openai" → merge host fields (existing logic, unchanged)

Backend toggle → dispatch

UI sends: slot = "primary" | "backup_1" | "backup_2" | null (auto)

llm_client.complete() resolves the slot against the chat role, gets a full model config, dispatches by type. No more "claude"/"gemini"/"local" string matching.


Routing Code Changes

llm_client.complete()

  • Remove: model: str | None → service type string
  • Add: slot: str | None = None → role slot override ("primary"/"backup_1"/etc.)
  • Dispatch table: type → handler
    • claude_cli_claude() (unchanged)
    • claude_api_claude_api() (new, direct Anthropic API — future phase)
    • gemini_cli_gemini() (unchanged)
    • gemini_api_gemini_api() (new, replaces current hardcoded gemini_api built-in)
    • local_openai_local() (unchanged)

orchestrator_engine.py / openai_orchestrator.py

  • Get orchestrator model via get_model_for_role(username, "orchestrator")
  • Already works — openai_orchestrator.py runs when type is local_openai
  • orchestrator_engine.py (Gemini) runs when type is gemini_api

Chat router (routers/chat.py)

  • Accept slot instead of model from UI
  • Pass to llm_client.complete(slot=slot)

Settings UI Redesign

New page structure

/settings/models     ← unified model registry (replaces /settings/local)
  Section 1: Cloud Providers
    Anthropic
      - credential: Claude CLI (OAuth) [default, always there]
      - + Add API Key (future)
      - model catalog [editable list of available Claude models]
    Google
      - accounts: [osit key ●●●●, + Add account]
      - model catalog [editable list of available Gemini models]
  Section 2: Local Hosts
    [existing host cards, unchanged]
  Section 3: Models  
    [unified list — all registered model entries across all providers]
    + Add Model (provider picker first, then model + credential/account dropdowns)

/settings/roles      ← standalone page (or promoted to /settings/models bottom)
  Role Assignments
    chat:         [primary ▾] [backup 1 ▾] [backup 2 ▾]
    orchestrator: [primary ▾] [backup 1 ▾]
    distill:      [primary ▾]
    (all dropdowns show all models from all providers)

Backend toggle in chat UI

Replace the claude → gemini → local → auto cycle with:

[Model label] ▾ (clickable cycles through chat role slots)
  • Shows the label of the currently active chat model
  • Click cycles: Primary → Backup 1 → Backup 2 → Primary
  • Slots with no model assigned are skipped
  • Color: same purple/amber/slate theme, based on provider type (optional)

Migration

V1 → V2 is handled in _load():

  1. Detect version == 1 (or missing)
  2. Synthesize providers.anthropic catalog from hardcoded defaults
  3. Synthesize providers.google — migrate API key from auth.json as first account
  4. Convert built-in role assignments (claude_cli / gemini_api) to new model entry IDs
  5. Existing hosts[] and local_openai models carry over unchanged
  6. Write version: 2 and save

No data loss. Old local_llm.json migration path still works (V0 → V1 → V2).


Phases

Phase 1 — Data model + backend routing (no UI changes yet)

  • Extend schema to V2 in model_registry.py
  • Migration from V1 on first load
  • Update _resolve_model() to handle gemini_api + account lookup
  • Update llm_client.complete() to accept slot parameter
  • Update routers/chat.py to pass slot instead of backend string
  • Keep backend toggle UI working (map old strings to slots temporarily)
  • Deliverable: routing works with multi-account Gemini, no UI changes needed yet

Phase 2 — Cloud provider UI

  • Add Anthropic and Google sections to /settings/local (rename to /settings/models)
  • Google accounts: add/remove API keys with labels
  • Editable model catalog for Anthropic + Google (add/remove model IDs from the list)
  • Model entry creation: provider picker → model dropdown (from catalog) → account/credential picker
  • Deliverable: can register cloud models in the UI just like local models

Phase 3 — Unified role assignments + toggle redesign

  • Promote role assignments to standalone /settings/roles page (or /settings/models bottom)
  • All models from all providers appear in role selects
  • Chat UI toggle: replace service-type cycle with slot cycle, show model label
  • Deliverable: end-to-end unified experience

Phase 4 — Polish + future providers

  • Claude direct API key support (optional, CLI is fine for now)
  • OpenRouter as a named provider (already works as a "local" host with host_type=openai — could be promoted)
  • Model catalog sync: fetch available models from Anthropic/Google API if keys are present
  • Per-role "test" button in role assignments UI

Open Questions

  1. Claude direct API key: Is this needed now, or is CLI OAuth sufficient for all users?

    • Decision: CLI-only for Phase 1; add API key support in Phase 4 if needed
  2. Catalog management: Should the Anthropic/Google catalogs be server-wide defaults that users can extend, or fully per-user?

    • Recommendation: ship sensible defaults in code (updated with each deploy); users can add custom entries if needed
  3. Toggle UX: Cycle through slot labels ("Primary / Backup 1 / Backup 2") or cycle through model labels ("Sonnet 4.6 / Gemini 2.5 Flash / Gemma 4")?

    • Model labels are more useful — clearer what you're switching to
  4. Orchestrator mode toggle: Does agent mode also respect the slot toggle, or is it always "use orchestrator role"?

    • Keep orchestrator role separate; the UI toggle only affects chat role