docs: add janitor role + session checkpoint compaction design
New TODO entry covering the "janitor" model role concept — cheap/fast model (Haiku, local Gemma) that summarizes stale session prefixes on a turn/token threshold, keeping expensive model context lean. Includes economics, file map, and trigger design. Added pointer in ROADMAP deferred section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -78,3 +78,4 @@
|
|||||||
- **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
|
- **Multi-host local models** — per-user config already supports multiple hosts; routing logic TBD
|
||||||
- **WhatsApp** — requires Business API account or a bridge; not started
|
- **WhatsApp** — requires Business API account or a bridge; not started
|
||||||
- **Docling** (https://github.com/docling-project/docling) — IBM Research doc parser; converts PDF/DOCX/PPTX/images → clean Markdown/JSON for LLM ingestion; would enhance file attachments and the knowledge import pipeline
|
- **Docling** (https://github.com/docling-project/docling) — IBM Research doc parser; converts PDF/DOCX/PPTX/images → clean Markdown/JSON for LLM ingestion; would enhance file attachments and the knowledge import pipeline
|
||||||
|
- **Session checkpoint compaction ("janitor" role)** — cheap/fast model summarizes stale session prefix on a turn/token threshold; keeps expensive model context lean; design in `TODO__Agents.md`
|
||||||
|
|||||||
@@ -211,6 +211,44 @@ Upload an image or document inline and have it flow into context.
|
|||||||
- [x] Text/code files read as UTF-8, injected as fenced code block in message
|
- [x] Text/code files read as UTF-8, injected as fenced code block in message
|
||||||
- [x] Thumbnail/filename shown above sent message in UI
|
- [x] Thumbnail/filename shown above sent message in UI
|
||||||
|
|
||||||
|
### [Intelligence] Session checkpoint compaction — "janitor" role
|
||||||
|
Proactive in-session context pruning using a cheap/fast model to keep expensive
|
||||||
|
model costs down as sessions grow. Not continuous per-token — checkpoint-triggered.
|
||||||
|
|
||||||
|
**Design:**
|
||||||
|
- New `janitor` role in the model registry (alongside `chat`, `orchestrator`, `distill`)
|
||||||
|
- Assign a cheap/fast model: Haiku 4.5, local Gemma E4B, or similar
|
||||||
|
- Falls back to the `distill` role model if `janitor` is not configured
|
||||||
|
- Trigger condition (either/or): session exceeds N turns (e.g. 20) OR estimated token
|
||||||
|
count exceeds a threshold (e.g. 12K tokens of history)
|
||||||
|
- On trigger: call janitor model with the oldest half of session history; ask it to
|
||||||
|
write a compact "what we've established so far" summary block (3–8 sentences)
|
||||||
|
- Replace the compacted turns with a single synthetic `assistant` message:
|
||||||
|
`[Session checkpoint — {N} turns summarized]: {summary}`
|
||||||
|
- The remaining recent turns stay untouched — only the stale prefix is replaced
|
||||||
|
- Token estimate: count chars / 4 as a cheap heuristic; no exact tokenizer needed
|
||||||
|
|
||||||
|
**Files to change:**
|
||||||
|
- `model_registry.py` — add `janitor` to `ROLE_DEFAULT_TOOLS` (empty list — no tools)
|
||||||
|
and to the roles UI in `settings/models`
|
||||||
|
- `session_store.py` — add `maybe_checkpoint(session_id)` that checks turn count /
|
||||||
|
estimated tokens and calls the janitor model if threshold is exceeded
|
||||||
|
- `openai_orchestrator.py` — call `maybe_checkpoint()` at the start of each run,
|
||||||
|
before building the active tool list and context
|
||||||
|
- `orchestrator_engine.py` — same, before building the Gemini context
|
||||||
|
- Settings UI — expose janitor turn/token thresholds as configurable values
|
||||||
|
(default: 20 turns or 12K history tokens)
|
||||||
|
|
||||||
|
**Economics:**
|
||||||
|
- Haiku 4.5: ~$0.80/1M input — compacting 10K tokens costs ~$0.008
|
||||||
|
- Saves 8–12K tokens on every subsequent Sonnet/Opus call in that session
|
||||||
|
- Break-even after 1–2 expensive model calls post-checkpoint
|
||||||
|
- Local janitor (Gemma E4B) = effectively free; ideal default when available
|
||||||
|
|
||||||
|
**Not needed yet** — most sessions are short enough that existing `_compact_messages()`
|
||||||
|
heuristic handles the worst cases. Priority rises with dev-agent pipeline work where
|
||||||
|
aider tool results can be very large.
|
||||||
|
|
||||||
### [Auth] Encrypted sessions
|
### [Auth] Encrypted sessions
|
||||||
Allow users to opt-in to per-session encryption so session logs on disk cannot be
|
Allow users to opt-in to per-session encryption so session logs on disk cannot be
|
||||||
read without the user's key.
|
read without the user's key.
|
||||||
|
|||||||
Reference in New Issue
Block a user