diff --git a/documentation/ROADMAP.md b/documentation/ROADMAP.md index af3fc2c..8df7103 100644 --- a/documentation/ROADMAP.md +++ b/documentation/ROADMAP.md @@ -78,3 +78,4 @@ - **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD - **WhatsApp** — requires Business API account or a bridge; not started - **Docling** (https://github.com/docling-project/docling) — IBM Research doc parser; converts PDF/DOCX/PPTX/images → clean Markdown/JSON for LLM ingestion; would enhance file attachments and the knowledge import pipeline +- **Session checkpoint compaction ("janitor" role)** — cheap/fast model summarizes stale session prefix on a turn/token threshold; keeps expensive model context lean; design in `TODO__Agents.md` diff --git a/documentation/TODO__Agents.md b/documentation/TODO__Agents.md index 97b6a5f..fbad3d0 100644 --- a/documentation/TODO__Agents.md +++ b/documentation/TODO__Agents.md @@ -211,6 +211,44 @@ Upload an image or document inline and have it flow into context. - [x] Text/code files read as UTF-8, injected as fenced code block in message - [x] Thumbnail/filename shown above sent message in UI +### [Intelligence] Session checkpoint compaction — "janitor" role +Proactive in-session context pruning using a cheap/fast model to keep expensive +model costs down as sessions grow. Not continuous per-token — checkpoint-triggered. + +**Design:** +- New `janitor` role in the model registry (alongside `chat`, `orchestrator`, `distill`) + - Assign a cheap/fast model: Haiku 4.5, local Gemma E4B, or similar + - Falls back to the `distill` role model if `janitor` is not configured +- Trigger condition (either/or): session exceeds N turns (e.g. 20) OR estimated token + count exceeds a threshold (e.g. 12K tokens of history) +- On trigger: call janitor model with the oldest half of session history; ask it to + write a compact "what we've established so far" summary block (3–8 sentences) +- Replace the compacted turns with a single synthetic `assistant` message: + `[Session checkpoint — {N} turns summarized]: {summary}` +- The remaining recent turns stay untouched — only the stale prefix is replaced +- Token estimate: count chars / 4 as a cheap heuristic; no exact tokenizer needed + +**Files to change:** +- `model_registry.py` — add `janitor` to `ROLE_DEFAULT_TOOLS` (empty list — no tools) + and to the roles UI in `settings/models` +- `session_store.py` — add `maybe_checkpoint(session_id)` that checks turn count / + estimated tokens and calls the janitor model if threshold is exceeded +- `openai_orchestrator.py` — call `maybe_checkpoint()` at the start of each run, + before building the active tool list and context +- `orchestrator_engine.py` — same, before building the Gemini context +- Settings UI — expose janitor turn/token thresholds as configurable values + (default: 20 turns or 12K history tokens) + +**Economics:** +- Haiku 4.5: ~$0.80/1M input — compacting 10K tokens costs ~$0.008 +- Saves 8–12K tokens on every subsequent Sonnet/Opus call in that session +- Break-even after 1–2 expensive model calls post-checkpoint +- Local janitor (Gemma E4B) = effectively free; ideal default when available + +**Not needed yet** — most sessions are short enough that existing `_compact_messages()` +heuristic handles the worst cases. Priority rises with dev-agent pipeline work where +aider tool results can be very large. + ### [Auth] Encrypted sessions Allow users to opt-in to per-session encryption so session logs on disk cannot be read without the user's key.