Initial commit — Cortex API + Inara identity

Cortex: FastAPI backend serving Inara via Claude/Gemini CLI backends. Includes SSE streaming chat, session persistence, Google Chat webhook handler, and Docker support. Inara: Identity files (persona, soul, protocols, memory, context tiers) mounted read-only into the container at runtime. Features in initial cut: - /chat endpoint with SSE keepalive + LLM fallback - Session store with rolling history window - Markdown rendering, copy-to-clipboard, links open in new tab - Stacked right-column input controls (height selector, enter toggle, note mode with public/private) — semi-hidden until textarea grows - /note endpoint for injecting public context into session history - Docker Compose config (local dev runs natively; Docker for server) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 03:41:00 -05:00
commit 2f675ee4bf
27 changed files with 2282 additions and 0 deletions
--- a/inara/CONTEXT_TIERS.md
+++ b/inara/CONTEXT_TIERS.md
@@ -0,0 +1,65 @@
+# CONTEXT_TIERS.md — Cortex Dispatcher Loading Spec
+
+This file defines which Inara context files to inject into a session based on the target model's
+context window. The dispatcher reads this to decide what to prepend.
+
+---
+
+## Tier 1 — Minimal (~1,500 tokens)
+
+**Target:** Local models with ~8k context or less (Qwen 8B small, etc.)
+
+**Load:**
+- `SOUL.md`
+- `IDENTITY.md`
+- `USER.md` — first 30 lines only (identity + what he cares about)
+
+**Notes:** Just enough for Inara to know who she is and who Scott is.
+
+---
+
+## Tier 2 — Standard (~5,000 tokens)
+
+**Target:** Models with 16k–32k context (Haiku, Gemini Flash, Qwen 8B full)
+
+**Load:**
+- `SOUL.md`
+- `IDENTITY.md`
+- `USER.md` — full
+- `MEMORY.md`
+- `PROTOCOLS.md`
+
+**Notes:** Full operational context. Sufficient for most routine tasks and conversations.
+
+---
+
+## Tier 3 — Extended (~15,000 tokens)
+
+**Target:** Models with 32k–128k context (Sonnet, Gemini Pro, Qwen 14B, Qwen 30B)
+
+**Load:**
+- Everything in Tier 2
+- `~/agents_sync/aether/docs/FLEET_MANIFEST.md`
+- Most recent 2 session files from `sessions/`
+- Relevant project doc (e.g., `CORTEX.md`) if task is project-related
+
+---
+
+## Tier 4 — Full (50,000+ tokens)
+
+**Target:** Frontier models with 200k+ context (Claude Opus/Sonnet, Gemini 2.5 Pro)
+
+**Load:**
+- Everything in Tier 3
+- Last 5–7 session files
+- Full project docs as relevant
+- `~/agents_sync/aether/docs/api_v3.md` if task involves Aether API
+
+---
+
+## Hard Rules
+
+- `SOUL.md` and `IDENTITY.md` are **always** loaded, regardless of tier.
+- **Never inject:** `.env` files, `TOOLS.md` (contains credentials), raw session logs older than 30 days.
+- **MEMORY.md must stay under 4,000 tokens** — enforce this during distillation.
+- When in doubt, use Tier 2. Over-loading small models degrades output quality.