docs: add janitor role + session checkpoint compaction design

New TODO entry covering the "janitor" model role concept — cheap/fast model
(Haiku, local Gemma) that summarizes stale session prefixes on a turn/token
threshold, keeping expensive model context lean. Includes economics, file map,
and trigger design. Added pointer in ROADMAP deferred section.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-06-16 22:11:46 -04:00
parent 3eb3ce1146
commit 167ca235b5
2 changed files with 39 additions and 0 deletions

View File

@@ -78,3 +78,4 @@
- **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD - **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
- **WhatsApp** — requires Business API account or a bridge; not started - **WhatsApp** — requires Business API account or a bridge; not started
- **Docling** (https://github.com/docling-project/docling) — IBM Research doc parser; converts PDF/DOCX/PPTX/images → clean Markdown/JSON for LLM ingestion; would enhance file attachments and the knowledge import pipeline - **Docling** (https://github.com/docling-project/docling) — IBM Research doc parser; converts PDF/DOCX/PPTX/images → clean Markdown/JSON for LLM ingestion; would enhance file attachments and the knowledge import pipeline
- **Session checkpoint compaction ("janitor" role)** — cheap/fast model summarizes stale session prefix on a turn/token threshold; keeps expensive model context lean; design in `TODO__Agents.md`

View File

@@ -211,6 +211,44 @@ Upload an image or document inline and have it flow into context.
- [x] Text/code files read as UTF-8, injected as fenced code block in message - [x] Text/code files read as UTF-8, injected as fenced code block in message
- [x] Thumbnail/filename shown above sent message in UI - [x] Thumbnail/filename shown above sent message in UI
### [Intelligence] Session checkpoint compaction — "janitor" role
Proactive in-session context pruning using a cheap/fast model to keep expensive
model costs down as sessions grow. Not continuous per-token — checkpoint-triggered.
**Design:**
- New `janitor` role in the model registry (alongside `chat`, `orchestrator`, `distill`)
- Assign a cheap/fast model: Haiku 4.5, local Gemma E4B, or similar
- Falls back to the `distill` role model if `janitor` is not configured
- Trigger condition (either/or): session exceeds N turns (e.g. 20) OR estimated token
count exceeds a threshold (e.g. 12K tokens of history)
- On trigger: call janitor model with the oldest half of session history; ask it to
write a compact "what we've established so far" summary block (38 sentences)
- Replace the compacted turns with a single synthetic `assistant` message:
`[Session checkpoint — {N} turns summarized]: {summary}`
- The remaining recent turns stay untouched — only the stale prefix is replaced
- Token estimate: count chars / 4 as a cheap heuristic; no exact tokenizer needed
**Files to change:**
- `model_registry.py` — add `janitor` to `ROLE_DEFAULT_TOOLS` (empty list — no tools)
and to the roles UI in `settings/models`
- `session_store.py` — add `maybe_checkpoint(session_id)` that checks turn count /
estimated tokens and calls the janitor model if threshold is exceeded
- `openai_orchestrator.py` — call `maybe_checkpoint()` at the start of each run,
before building the active tool list and context
- `orchestrator_engine.py` — same, before building the Gemini context
- Settings UI — expose janitor turn/token thresholds as configurable values
(default: 20 turns or 12K history tokens)
**Economics:**
- Haiku 4.5: ~$0.80/1M input — compacting 10K tokens costs ~$0.008
- Saves 812K tokens on every subsequent Sonnet/Opus call in that session
- Break-even after 12 expensive model calls post-checkpoint
- Local janitor (Gemma E4B) = effectively free; ideal default when available
**Not needed yet** — most sessions are short enough that existing `_compact_messages()`
heuristic handles the worst cases. Priority rises with dev-agent pipeline work where
aider tool results can be very large.
### [Auth] Encrypted sessions ### [Auth] Encrypted sessions
Allow users to opt-in to per-session encryption so session logs on disk cannot be Allow users to opt-in to per-session encryption so session logs on disk cannot be
read without the user's key. read without the user's key.