Files
Cortex-Inara/inara/CONTEXT_TIERS.md
Scott Idem 2f675ee4bf Initial commit — Cortex API + Inara identity
Cortex: FastAPI backend serving Inara via Claude/Gemini CLI backends.
Includes SSE streaming chat, session persistence, Google Chat webhook
handler, and Docker support.

Inara: Identity files (persona, soul, protocols, memory, context tiers)
mounted read-only into the container at runtime.

Features in initial cut:
- /chat endpoint with SSE keepalive + LLM fallback
- Session store with rolling history window
- Markdown rendering, copy-to-clipboard, links open in new tab
- Stacked right-column input controls (height selector, enter toggle,
  note mode with public/private) — semi-hidden until textarea grows
- /note endpoint for injecting public context into session history
- Docker Compose config (local dev runs natively; Docker for server)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 03:41:00 -05:00

66 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CONTEXT_TIERS.md — Cortex Dispatcher Loading Spec
This file defines which Inara context files to inject into a session based on the target model's
context window. The dispatcher reads this to decide what to prepend.
---
## Tier 1 — Minimal (~1,500 tokens)
**Target:** Local models with ~8k context or less (Qwen 8B small, etc.)
**Load:**
- `SOUL.md`
- `IDENTITY.md`
- `USER.md` — first 30 lines only (identity + what he cares about)
**Notes:** Just enough for Inara to know who she is and who Scott is.
---
## Tier 2 — Standard (~5,000 tokens)
**Target:** Models with 16k32k context (Haiku, Gemini Flash, Qwen 8B full)
**Load:**
- `SOUL.md`
- `IDENTITY.md`
- `USER.md` — full
- `MEMORY.md`
- `PROTOCOLS.md`
**Notes:** Full operational context. Sufficient for most routine tasks and conversations.
---
## Tier 3 — Extended (~15,000 tokens)
**Target:** Models with 32k128k context (Sonnet, Gemini Pro, Qwen 14B, Qwen 30B)
**Load:**
- Everything in Tier 2
- `~/agents_sync/aether/docs/FLEET_MANIFEST.md`
- Most recent 2 session files from `sessions/`
- Relevant project doc (e.g., `CORTEX.md`) if task is project-related
---
## Tier 4 — Full (50,000+ tokens)
**Target:** Frontier models with 200k+ context (Claude Opus/Sonnet, Gemini 2.5 Pro)
**Load:**
- Everything in Tier 3
- Last 57 session files
- Full project docs as relevant
- `~/agents_sync/aether/docs/api_v3.md` if task involves Aether API
---
## Hard Rules
- `SOUL.md` and `IDENTITY.md` are **always** loaded, regardless of tier.
- **Never inject:** `.env` files, `TOOLS.md` (contains credentials), raw session logs older than 30 days.
- **MEMORY.md must stay under 4,000 tokens** — enforce this during distillation.
- When in doubt, use Tier 2. Over-loading small models degrades output quality.