Cortex: FastAPI backend serving Inara via Claude/Gemini CLI backends. Includes SSE streaming chat, session persistence, Google Chat webhook handler, and Docker support. Inara: Identity files (persona, soul, protocols, memory, context tiers) mounted read-only into the container at runtime. Features in initial cut: - /chat endpoint with SSE keepalive + LLM fallback - Session store with rolling history window - Markdown rendering, copy-to-clipboard, links open in new tab - Stacked right-column input controls (height selector, enter toggle, note mode with public/private) — semi-hidden until textarea grows - /note endpoint for injecting public context into session history - Docker Compose config (local dev runs natively; Docker for server) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
34 lines
1.0 KiB
Plaintext
34 lines
1.0 KiB
Plaintext
# Auth is handled by the claude CLI (claude setup-token) — no API key needed here.
|
|
# ANTHROPIC_API_KEY=only_needed_if_switching_to_sdk
|
|
|
|
# Path to the inara/ identity directory — relative to cortex/ or absolute
|
|
INARA_DIR=../inara
|
|
|
|
# Path for persistent JSON session files
|
|
SESSIONS_DIR=./data/sessions
|
|
|
|
# LLM defaults
|
|
DEFAULT_MODEL=claude-sonnet-4-6
|
|
DEFAULT_TIER=2
|
|
|
|
# Session rolling window — number of messages to keep (user + assistant pairs)
|
|
# 40 = 20 turns
|
|
MAX_HISTORY_MESSAGES=40
|
|
|
|
# Per-backend timeouts (seconds)
|
|
# Gemini is generous — it frequently takes 30-60s under load
|
|
# Local models may need time to load into VRAM before first response
|
|
TIMEOUT_CLAUDE=60
|
|
TIMEOUT_GEMINI=120
|
|
TIMEOUT_LOCAL=300
|
|
|
|
# Google Chat — must respond within 30s or Chat shows an error to the user
|
|
GOOGLE_CHAT_TIMEOUT=25
|
|
# Backend pinned for Google Chat (claude recommended — more reliable within 25s)
|
|
GOOGLE_CHAT_BACKEND=claude
|
|
# TODO: add GOOGLE_CHAT_TOKEN for request verification once endpoint is public
|
|
|
|
# Server
|
|
PORT=8000
|
|
HOST=0.0.0.0
|