feat: unified model registry with role-based routing

Introduces model_registry.py as the single source of truth for all LLM backend configuration. Replaces scattered backend settings across user_settings, config distill_backend_*, and the UI toggle. model_registry.py: - Per-user home/{user}/model_registry.json with version, hosts, models, roles - Models have: type (local_openai|claude_cli|gemini_cli|gemini_api), label, model_name, host_id, context_k (tokens), tags (capability labels) - Roles map to priority chains: primary, backup_1..backup_4 - Built-in IDs (claude_cli, gemini_cli, gemini_api) always resolvable - Auto-migrates existing local_llm.json on first access - CRUD: save_host, remove_host, save_model, remove_model, set_role - get_model_for_role(): registry → .env default → hardcoded fallback config.py: - role_chat/orchestrator/distill/coder/research .env defaults - defined_roles: comma-separated standard role list (extensible) - get_defined_roles() and get_role_default() helper methods llm_client.complete(): - New role= parameter (default "chat") for registry-based routing - model= still accepted for explicit UI toggle override - _claude() and _local() accept model_cfg dict instead of raw string - _local() uses pre-resolved config from registry memory_distiller.py: - distill_mid/long now use role="distill" (no more distill_backend_* .env vars needed) cron_runner.py: - brief jobs use role="chat" routers/chat.py + auth.py: - Use model_registry instead of user_settings for local model info Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 21:25:18 -04:00
parent a4daebdc9b
commit 6a1a1c2686
7 changed files with 541 additions and 33 deletions
--- a/cortex/memory_distiller.py
+++ b/cortex/memory_distiller.py
@@ -92,7 +92,6 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    if not short_content.strip() or "Not yet populated" in short_content:
        return {"error": "MEMORY_SHORT.md is empty — run distill/short first"}

-    backend_override = settings.distill_backend_mid or None
    budget_tokens = settings.memory_budget_mid
    system_prompt = (
        f"You are {settings.agent_name}'s memory distillation system. "
@@ -107,7 +106,7 @@ async def distill_mid(username: str | None = None, persona: str | None = None) -
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": short_content}],
-        model=backend_override,
+        role="distill",
    )

    now = datetime.now().strftime("%Y-%m-%d %H:%M")
@@ -146,7 +145,6 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    if not mid_content.strip() or "Not yet populated" in mid_content:
        return {"error": "MEMORY_MID.md is empty — run distill/mid first"}

-    backend_override = settings.distill_backend_long or None
    budget_tokens = settings.memory_budget_long
    system_prompt = (
        f"You are {settings.agent_name}'s long-term memory curator. "
@@ -165,7 +163,7 @@ async def distill_long(username: str | None = None, persona: str | None = None)
    response_text, backend = await complete(
        system_prompt=system_prompt,
        messages=[{"role": "user", "content": user_content}],
-        model=backend_override,
+        role="distill",
    )

    # Ensure the file has the right header if the LLM dropped it