feat: token streaming for orchestrator final response
Switches the orchestrator's final response from a fire-and-wait model to a
live SSE stream so text appears token-by-token as the model generates it.
- llm_client: complete() gains token_sink param; anthropic_api backend uses
client.messages.stream(); local backend uses httpx SSE streaming; non-streaming
backends (claude_cli, gemini_cli) emit the full text as one chunk
- orchestrator_engine + openai_orchestrator: token_sink threaded through run(),
_run_from_contents(), _claude_handoff(), and _run_from_messages()
- routers/orchestrator: each job gets an asyncio.Queue; _on_progress and
_token_sink write progress/token events to it; _finalize_job emits done,
error handler emits error, confirmation gate emits confirm; new GET
/orchestrate/{job_id}/stream SSE endpoint with 20s keepalive
- app.js: _doOrchestrate switches from 2s poll loop to EventSource; thinking
bubble converts to a streaming message on first token; auto-scroll while
streaming; confirm/error/done events handled; finalization unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -249,6 +249,30 @@ model costs down as sessions grow. Not continuous per-token — checkpoint-trigg
|
||||
heuristic handles the worst cases. Priority rises with dev-agent pipeline work where
|
||||
aider tool results can be very large.
|
||||
|
||||
### [UX] Token streaming for orchestrator final response ✅ — 2026-06-16
|
||||
Text appears token-by-token while the model is generating, instead of waiting for the
|
||||
full response after "Generating response…" completes.
|
||||
|
||||
- [x] **`llm_client.py`** — `complete()` gains `token_sink` param; `_dispatch()` routes to
|
||||
streaming variants when set; `_anthropic_api_streaming()` uses `client.messages.stream()`;
|
||||
`_local_streaming()` uses `httpx client.stream()` + SSE parsing; non-streaming backends
|
||||
(claude_cli, gemini_cli) emit full text as one chunk via `token_sink`
|
||||
- [x] **`orchestrator_engine.py`** — `run()`, `_run_from_contents()`, and `_claude_handoff()`
|
||||
all accept and thread `token_sink`; Gemini handoff to Claude/Anthropic API is the
|
||||
primary streaming path
|
||||
- [x] **`openai_orchestrator.py`** — `run()` and `_run_from_messages()` accept `token_sink`;
|
||||
local model final response emitted via `token_sink` (one chunk for now; true streaming
|
||||
left for future polish)
|
||||
- [x] **`routers/orchestrator.py`** — each job gets an `asyncio.Queue` (`_event_queue`);
|
||||
`_on_progress` and `_token_sink` write to the queue as events (`{type, text}`);
|
||||
`_finalize_job` emits `{type: done, ...}`, error handler emits `{type: error, ...}`,
|
||||
confirmation gate emits `{type: confirm, ...}`; new `GET /orchestrate/{job_id}/stream`
|
||||
SSE endpoint with 20s keepalive timeout; handles already-complete/error jobs immediately
|
||||
- [x] **`static/app.js`** — `_doOrchestrate` switches from poll loop to `EventSource`; renders
|
||||
thinking-bubble progress labels on `progress` events; converts bubble to streaming message
|
||||
on first `token` event (with auto-scroll); handles `confirm`, `error`, `done` events;
|
||||
finalization (metadata, history controls, tool calls) runs after `done`
|
||||
|
||||
### [Auth] Encrypted sessions
|
||||
Allow users to opt-in to per-session encryption so session logs on disk cannot be
|
||||
read without the user's key.
|
||||
|
||||
Reference in New Issue
Block a user