Add tiered memory system with manual distillation

- config.py: memory_budget_long/mid/short settings (overridable in .env) - memory_distiller.py: distill_short (no LLM), distill_mid, distill_long (LLM) - routers/distill.py: POST /distill/{short,mid,long,all} endpoints - context_loader.py: rewrote to load long→mid→short order with include_* toggles - routers/chat.py: ChatRequest gains include_long/mid/short fields - routers/files.py: MEMORY_LONG/MID/SHORT.md added to ALLOWED set - main.py: register distill router - static/index.html: context bar — tier selector, L/M/S memory toggles, distill buttons with status feedback; send includes tier + memory flags - inara/MEMORY_LONG.md: migrated from MEMORY.md + Cortex/Talk bot notes - inara/MEMORY_MID.md, MEMORY_SHORT.md: stubs ready for distillation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 21:22:32 -04:00
parent 3455c7a09c
commit ce3c1f5f7f
11 changed files with 779 additions and 29 deletions
--- a/cortex/config.py
+++ b/cortex/config.py
@@ -26,6 +26,12 @@ class Settings(BaseSettings):
    nextcloud_talk_bot_secret: str = ""   # set in .env
    nextcloud_talk_timeout: int = 55

+    # Memory tier token budgets — soft caps used during distillation
+    # Override in .env: MEMORY_BUDGET_LONG=4000 etc.
+    memory_budget_long: int = 2000
+    memory_budget_mid: int = 2000
+    memory_budget_short: int = 3000
+
    host: str = "0.0.0.0"
    port: int = 8000