docs: add LLM wiki concept (Karpathy pattern) to ARCH__FUTURE.md

Inara's exploration of a living-wiki knowledge compilation architecture as an alternative to RAG — three-layer model, ingest/query/lint ops, and a mapping to existing Cortex concepts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs: comprehensive doc audit — sync all docs to current state
2026-05-09 13:22:55 -04:00 · 2026-05-09 13:13:45 -04:00 · 2026-05-09 13:08:17 -04:00 · 2026-05-09 13:05:04 -04:00 · 2026-05-09 13:04:24 -04:00 · 2026-05-09 12:39:34 -04:00
15 changed files with 295 additions and 56 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -185,6 +185,19 @@ Cortex is a no-black-box system. Docs must match reality — at all times.
 - **CLAUDE.md + ARCH__*.md are the developer contract:** Update them as the architecture evolves.
 - **Stale docs are bugs.** If you notice drift, fix it before moving on.
 ### Doc update checklist (run after any significant change)
 | Doc | Update when |
 |---|---|
 | `CLAUDE.md` | New tool, channel, router, major design change, tool count |
 | `cortex/static/HELP.md` | Any user-visible feature — tools, settings, UI, API endpoints |
 | `documentation/TODO__Agents.md` | Mark completed items; add new planned work |
 | `documentation/MASTER.md` | New capability goes live; tool count changes |
 | `documentation/ROADMAP.md` | Phase items completed or added |
 | `documentation/ARCH__CHANNELS.md` | New channel, notification trigger, or scheduler job |
 | `documentation/ARCH__SYSTEM.md` | New module, router, or tools/ file |
 | `README.md` | Architecture diagram, channels table, or setup steps change |
 ---
 ## Adding a New Tool
@@ -237,7 +250,7 @@ clearly asked for a directory to be unblocked.
 ---
-## Current State (2026-05-06)
+## Current State (2026-05-08)
 Cortex is running and stable. All channels are live:
@@ -252,11 +265,12 @@ Cortex is running and stable. All channels are live:
 | Tool audit log | ✅ Live | Every tool call logged to `home/{user}/tool_audit/YYYY-MM-DD.jsonl` |
 | Token usage tracking | ✅ Live | Per-user `home/{user}/usage.json`; summary in Settings |
 | Web push | ✅ Live | VAPID push notifications; `web_push` tool; subscribe via ☰ menu |
 | Proactive notifications | ✅ Live | Daily reminder check (09:00); distill/cron completions; `GET /settings/notifications` dedicated page |
 Active users: scott (inara), holly (tina), brian (wintermute)
-**45 orchestrator tools:** web_search, http_fetch,
+**47 orchestrator tools:** web_search, http_fetch, web_read,
-file_read/list/write/session_search, shell_exec, claude_allow_dir,
+file_read/list/write/session_read/session_search, shell_exec, claude_allow_dir,
 cortex_restart/logs/status/update,
 task_list/create/update/complete, cron_list/add/remove/toggle,
 reminders_add/list/remove/clear, scratch_read/write/append/clear,
--- a/README.md
+++ b/README.md
@@ -182,10 +182,10 @@ Back it up separately — it is required to restore from any snapshot.
    └─ POST /channels/google-chat/{username} — Google Chat Add-on (per-user)
        ↓
  LLM Backends
-  • Claude CLI   — primary, all user-facing responses
+  • Claude CLI      — primary, all user-facing responses
-  • Gemini CLI   — fallback
+  • Gemini CLI      — fallback
-  • Gemini API   — orchestrator tool loop only (not general chat)
+  • Gemini API      — orchestrator tool loop (two-brain: Gemini plans, Claude responds)
-  • Local        — Open WebUI/Ollama on scott_gaming (private/offline)
+  • Local OpenAI    — Open WebUI/Ollama on scott_gaming; also runs local orchestrator loop
        ↓
  Persona context loaded from home/{user}/persona/{name}/
 ```
@@ -213,11 +213,12 @@ Context is loaded at request time from `home/{user}/persona/{name}/` via `cortex
 Webhook endpoints are per-user — each user configures their own secrets in `home/{username}/channels.json`.
-| Channel | Status | Endpoint |
+| Channel | Status | Endpoint / Notes |
 |---|---|---|
 | Web UI | Live | `https://cortex.dgrzone.com` — session auth (login form + JWT cookie) |
 | Nextcloud Talk | Live | `POST /webhook/nextcloud/{username}` — HMAC-signed, async reply |
 | Google Chat | Live | `POST /channels/google-chat/{username}` — Workspace Add-on, JWT auth |
 | Browser Push | Live | VAPID push notifications — subscribe via ☰ menu; proactive reminders + distill alerts |
 See `docs/NEXTCLOUD_TALK_BOT.md` and `docs/GOOGLE_CHAT_BOT.md` for setup instructions.
--- a/cortex/openai_orchestrator.py
+++ b/cortex/openai_orchestrator.py
@@ -405,7 +405,7 @@ def _build_client(
    base_url = api_url.rstrip("/")
    if host_type == "openwebui":
        base_url = base_url + "/api"
-    client = AsyncOpenAI(base_url=base_url, api_key=api_key)
+    client = AsyncOpenAI(base_url=base_url, api_key=api_key, timeout=settings.timeout_local)
    if model_cfg.get("tools") is False:
        active_tools = []
    else:
--- a/cortex/requirements.txt
+++ b/cortex/requirements.txt
@@ -19,6 +19,9 @@ python-multipart>=0.0.9   # required by FastAPI for Form() data
 # Async HTTP client — used for local OpenAI-compatible backend (Open WebUI / Ollama)
 httpx>=0.27.0
 # Web content extraction — strips ads/nav/boilerplate, returns clean article text
 trafilatura>=1.6.0
 # OpenAI-compatible client — tool calling for OpenRouter / LiteLLM / any OAI-compat host
 openai>=1.0.0
--- a/cortex/static/HELP.md
+++ b/cortex/static/HELP.md
@@ -6,7 +6,7 @@
     and are appended automatically by help.html when present.
 -->
-*Last updated: 2026-05-08*
+*Last updated: 2026-05-09*
 ---
@@ -82,12 +82,12 @@ Orchestrated sessions persist to history exactly like regular chat.
 ### Available Tools
-45 tools across 12 categories. Each tool schema is sent to the model on every orchestrated call — fewer active tools means fewer tokens per call.
+47 tools across 12 categories. Each tool schema is sent to the model on every orchestrated call — fewer active tools means fewer tokens per call.
 | Category | Tools |
 |---|---|
-| **Web** | `web_search`, `http_fetch` |
+| **Web** | `web_search`, `http_fetch`, `web_read` |
-| **Files** | `file_read`, `file_list`, `file_write`, `session_search` |
+| **Files** | `file_read`, `file_list`, `file_write`, `session_read`, `session_search` |
 | **Shell** | `shell_exec`, `claude_allow_dir` |
 | **System** | `cortex_restart`, `cortex_logs`, `cortex_status`, `cortex_update` |
 | **Tasks** | `task_list`, `task_create`, `task_update`, `task_complete` |
@@ -176,7 +176,7 @@ Each response shows a **model tag** (bottom-right of message) with the model lab
 | **Account** | View your username, role badge (Admin / User), rename your username |
 | **Connected Accounts** | See which Google account is linked for OAuth sign-in |
 | **Email Allowlist** | Regex patterns controlling which addresses the `email_send` tool can reach |
-| **Notifications** | Set which channel (NC Talk, Google Chat, email) Inara uses for proactive messages |
+| **Notifications** | Dedicated page — set channel (Browser Push, NC Talk, Google Chat, email) for proactive messages; test buttons for instant verification |
 | **Tool Permissions** | Allow or block specific orchestrator tools for your account |
 | **Usage** | Token consumption by model — see below |
 | **Browser Cache** | Clear UI preferences stored locally (theme, font size, session ID, etc.) |
@@ -337,6 +337,8 @@ Cortex can send browser push notifications — even when the tab is closed.
 - Click again to disable. Subscriptions are stored per-device.
 - The orchestrator's `web_push` tool lets Inara send you a push proactively (e.g. when a long task completes).
 **Notification channel settings:** ☰ → **Account** → **Notification settings →** — choose Browser Push, Email, Nextcloud Talk, or Google Chat as the channel Inara uses for scheduled reminders, cron job completions, and memory digests. Use the **Send Test Notification** button to verify your setup, or **Check Reminders Now** to trigger the reminder check immediately.
 ---
 ## Context & Memory ( ⚙ panel )
@@ -424,6 +426,8 @@ For direct access or scripting:
 | `GET` | `/api/push/vapid-key` | VAPID public key (for push subscription) |
 | `POST` | `/api/push/subscribe` | Register a push subscription |
 | `DELETE` | `/api/push/subscribe` | Remove a push subscription |
 | `POST` | `/api/push/test` | Send a test notification via configured channel |
 | `POST` | `/api/push/reminders/check` | Run reminder check immediately; returns `{"reminders_found": n}` |
 | `GET` | `/api/audit/files` | List available audit log dates (own data) |
 | `GET` | `/api/audit/day?date=` | Tool call entries for a specific date (own data) |
 | `GET` | `/api/audit/recent` | Recent tool calls across days (admin) |
--- a/cortex/static/app.js
+++ b/cortex/static/app.js
@@ -1215,24 +1215,9 @@
            inputEl.focus();
        }
-        async function sendOrchestrate() {
+        // Extracted so the retry button can call it without re-adding the
-            const text = inputEl.value.trim();
+        // user message to the DOM or currentHistory.
-            if (!text || activeController) return;
+        async function _doOrchestrate(text, thinkingDiv, userMsgDiv) {
            inputEl.value = '';
            syncHeight();
            sendBtn.style.display = 'none';
            stopBtn.style.display = 'flex';
            headerEmoji.classList.add('processing');
            activeController = new AbortController();
            currentHistory.push({ role: 'user', content: text });
            const userMsgDiv = addMessage('user', text);
            scrollToBottom();
            const thinkingDiv = addMessage('assistant thinking', '⚡ working…');
            try {
                const res = await fetch('/orchestrate', {
                    method: 'POST',
@@ -1336,9 +1321,59 @@
                    thinkingDiv.textContent = 'Stopped.';
                } else {
                    thinkingDiv.className = 'message error';
-                    thinkingDiv.textContent = `Error: ${err.message}`;
+                    thinkingDiv.innerHTML = '';
                    const errSpan = document.createElement('span');
                    errSpan.textContent = `Error: ${err.message}`;
                    thinkingDiv.appendChild(errSpan);
                    const retryBtn = document.createElement('button');
                    retryBtn.className = 'retry-btn';
                    retryBtn.textContent = '↺ Retry';
                    retryBtn.addEventListener('click', async () => {
                        if (currentHistory.at(-1)?.role === 'user') currentHistory.pop();
                        currentHistory.push({ role: 'user', content: text });
                        thinkingDiv.className = 'message assistant thinking';
                        thinkingDiv.textContent = '⚡ working…';
                        activeController = new AbortController();
                        sendBtn.style.display = 'none';
                        stopBtn.style.display = 'flex';
                        headerEmoji.classList.add('processing');
                        await _doOrchestrate(text, thinkingDiv, userMsgDiv);
                        activeController = null;
                        headerEmoji.classList.remove('processing');
                        sendBtn.style.display = 'block';
                        stopBtn.style.display = 'none';
                        inputEl.focus();
                    });
                    thinkingDiv.appendChild(retryBtn);
                }
            }
        }
        async function sendOrchestrate() {
            const text = inputEl.value.trim();
            if (!text || activeController) return;
            inputEl.value = '';
            syncHeight();
            sendBtn.style.display = 'none';
            stopBtn.style.display = 'flex';
            headerEmoji.classList.add('processing');
            activeController = new AbortController();
            currentHistory.push({ role: 'user', content: text });
            const userMsgDiv = addMessage('user', text);
            scrollToBottom();
            const thinkingDiv = addMessage('assistant thinking', '⚡ working…');
            await _doOrchestrate(text, thinkingDiv, userMsgDiv);
            activeController = null;
            headerEmoji.classList.remove('processing');
--- a/cortex/tools/init.py
+++ b/cortex/tools/init.py
@@ -17,7 +17,7 @@ from google.genai import types
 # ── Callable imports ──────────────────────────────────────────────────────────
-from tools.web import search as _web_search, http_fetch as _http_fetch
+from tools.web import search as _web_search, http_fetch as _http_fetch, web_read as _web_read
 from tools.ae_knowledge import (
    journal_list         as _ae_journal_list,
    journal_search       as _ae_journal_search,
@@ -30,7 +30,7 @@ from tools.ae_knowledge import (
    journal_entry_prepend as _ae_journal_entry_prepend,
 )
 from tools.ae_tasks import task_list as _ae_task_list
-from tools.files import file_read as _file_read, file_list as _file_list, file_write as _file_write, session_search as _session_search
+from tools.files import file_read as _file_read, file_list as _file_list, file_write as _file_write, session_search as _session_search, session_read as _session_read
 from tools.system import (
    shell_exec      as _shell_exec,
    claude_allow_dir as _claude_allow_dir,
@@ -90,8 +90,8 @@ import tools.agents       as _mod_agents
 # ── Tool categories — used by the Model Registry UI for grouped checkboxes ───
 TOOL_CATEGORIES: dict[str, list[str]] = {
-    "Web":              ["web_search", "http_fetch"],
+    "Web":              ["web_search", "http_fetch", "web_read"],
-    "Files":            ["file_read", "file_list", "file_write", "session_search"],
+    "Files":            ["file_read", "file_list", "file_write", "session_read", "session_search"],
    "Shell":            ["shell_exec", "claude_allow_dir"],
    "System":           ["cortex_restart", "cortex_logs", "cortex_status", "cortex_update"],
    "Tasks":            ["task_list", "task_create", "task_update", "task_complete"],
@@ -116,6 +116,7 @@ TOOL_CATEGORIES: dict[str, list[str]] = {
 _CALLABLES: dict[str, callable] = {
    "web_search":                _web_search,
    "http_fetch":                _http_fetch,
    "web_read":                  _web_read,
    "ae_journal_list":           _ae_journal_list,
    "ae_journal_search":         _ae_journal_search,
    "ae_journal_entry_read":     _ae_journal_entry_read,
@@ -129,6 +130,7 @@ _CALLABLES: dict[str, callable] = {
    "file_read":                 _file_read,
    "file_list":                 _file_list,
    "file_write":                _file_write,
    "session_read":              _session_read,
    "session_search":            _session_search,
    "shell_exec":                _shell_exec,
    "claude_allow_dir":          _claude_allow_dir,
--- a/cortex/tools/files.py
+++ b/cortex/tools/files.py
@@ -230,6 +230,34 @@ def _sync_file_write(path: str, content: str, mode: str) -> str:
 _SEARCH_EXCERPT_CHARS = 150
 async def session_read(date: str) -> str:
    """Read a full session log by date (YYYY-MM-DD).
    Returns the complete session log for that date. If the date is not found,
    lists the most recent available dates instead.
    Only reads the current user's own sessions (per-persona isolation via ContextVars).
    """
    return await asyncio.to_thread(_sync_session_read, date.strip())
 def _sync_session_read(date: str) -> str:
    from persona import persona_path
    sessions_dir = persona_path() / "sessions"
    if not sessions_dir.exists():
        return "No session logs found."
    target = sessions_dir / f"{date}.md"
    if target.exists():
        content = target.read_text()
        return f"Session log for {date} ({len(content)} chars):\n\n{content}"
    available = sorted([f.stem for f in sessions_dir.glob("*.md")], reverse=True)
    if not available:
        return "No session logs found."
    recent = "\n".join(f"  {d}" for d in available[:15])
    return f"No session log found for '{date}'. Available dates (most recent first):\n{recent}"
 async def session_search(query: str, limit: int = 5) -> str:
    """Search past session logs for a keyword or phrase.
@@ -329,6 +357,22 @@ DECLARATIONS = [
            required=["path", "content"],
        ),
    ),
    types.FunctionDeclaration(
        name="session_read",
        description=(
            "Read a full session log by date (YYYY-MM-DD). Returns the complete conversation "
            "from that session — useful for continuity, recalling decisions, or reviewing "
            "what was discussed on a specific day. If the date is not found, lists available dates. "
            "Only reads this user's own sessions."
        ),
        parameters=types.Schema(
            type=types.Type.OBJECT,
            properties={
                "date": types.Schema(type=types.Type.STRING, description="Date in YYYY-MM-DD format (e.g. '2026-05-08')"),
            },
            required=["date"],
        ),
    ),
    types.FunctionDeclaration(
        name="session_search",
        description=(
--- a/cortex/tools/web.py
+++ b/cortex/tools/web.py
@@ -1,5 +1,5 @@
 """
-Web tools — search (DuckDuckGo) and direct HTTP fetch.
+Web tools — search (DuckDuckGo), direct HTTP fetch, and clean content extraction.
 """
 import asyncio
@@ -56,20 +56,25 @@ async def http_fetch(
    method: str = "GET",
    body: str | None = None,
    timeout: int = 15,
    max_chars: int = 8192,
 ) -> str:
-    """Fetch a URL directly and return the response body.
+    """Fetch a URL directly and return the raw response body.
    Unlike web_search, this hits a specific URL — useful for health checks,
-    API probing, JSON endpoints, webhook testing, etc.
+    API probing, JSON endpoints, webhook testing, or reading raw page source.
-    Response body is capped at 8 KB.
+    For readable article content, use web_read instead.
    Response body is capped at max_chars (default 8192, max 32768).
    """
    method = method.upper()
    timeout = min(max(int(timeout), 1), 60)
    max_chars = min(max(int(max_chars), 100), 131072)
    try:
        async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
            resp = await client.request(method, url, content=body)
-            body_text = resp.text[:8192]
+            body_text = resp.text[:max_chars]
-            return f"HTTP {resp.status_code} {resp.url}\n\n{body_text}"
+            truncated = len(resp.text) > max_chars
            suffix = f"\n\n[… truncated at {max_chars} chars]" if truncated else ""
            return f"HTTP {resp.status_code} {resp.url}\n\n{body_text}{suffix}"
    except httpx.HTTPError as e:
        return f"HTTP error: {e}"
    except Exception as e:
@@ -77,6 +82,39 @@ async def http_fetch(
        return f"Error: {e}"
 async def web_read(url: str, max_chars: int = 16000) -> str:
    """Fetch a URL and extract clean readable text, stripping ads, navigation, and boilerplate.
    Uses trafilatura to extract the main article content — ideal for blog posts,
    documentation, news articles, and any page where you want the text without
    surrounding noise. Returns markdown-formatted output.
    For raw responses (JSON APIs, health checks), use http_fetch instead.
    """
    max_chars = min(max(int(max_chars), 1000), 131072)
    return await asyncio.to_thread(_sync_web_read, url, max_chars)
 def _sync_web_read(url: str, max_chars: int) -> str:
    try:
        import trafilatura
    except ImportError:
        return "web_read requires trafilatura — run: pip install trafilatura"
    downloaded = trafilatura.fetch_url(url)
    if downloaded is None:
        return f"Failed to download content from: {url}"
    text = trafilatura.extract(downloaded, output_format="markdown", include_links=True, url=url)
    if not text:
        text = trafilatura.extract(downloaded, url=url)
    if not text:
        return f"Could not extract readable content from: {url}"
    if len(text) > max_chars:
        text = text[:max_chars] + f"\n\n[… truncated at {max_chars} chars — pass a larger max_chars (up to 131072) to see more]"
    return f"Content from {url}:\n\n{text}"
 DECLARATIONS = [
    types.FunctionDeclaration(
        name="web_search",
@@ -96,10 +134,10 @@ DECLARATIONS = [
    types.FunctionDeclaration(
        name="http_fetch",
        description=(
-            "Fetch a specific URL and return the response. Unlike web_search, this hits "
+            "Fetch a specific URL and return the raw response body. Unlike web_search, this hits "
            "a direct URL — useful for health checks, JSON API endpoints, webhook testing, "
-            "or reading a specific page when you already know the URL. "
+            "or inspecting raw page source. For readable article/doc content, use web_read instead. "
-            "Response body is capped at 8 KB."
+            "Response body is capped at max_chars (default 8192, max 32768)."
        ),
        parameters=types.Schema(
            type=types.Type.OBJECT,
@@ -108,6 +146,25 @@ DECLARATIONS = [
                "method": types.Schema(type=types.Type.STRING, description="HTTP method: GET (default), POST, HEAD"),
                "body": types.Schema(type=types.Type.STRING, description="Optional request body (for POST requests)"),
                "timeout": types.Schema(type=types.Type.INTEGER, description="Request timeout in seconds (default 15, max 60)"),
                "max_chars": types.Schema(type=types.Type.INTEGER, description="Max characters to return (default 8192, max 131072)"),
            },
            required=["url"],
        ),
    ),
    types.FunctionDeclaration(
        name="web_read",
        description=(
            "Fetch a URL and extract clean readable text, stripping ads, navigation, sidebars, "
            "and other boilerplate. Returns the main article/document content as markdown. "
            "Use this for blog posts, documentation, news articles, GitHub READMEs, or any page "
            "where you want the content without surrounding noise. "
            "For raw HTTP responses (JSON APIs, health checks, source inspection), use http_fetch."
        ),
        parameters=types.Schema(
            type=types.Type.OBJECT,
            properties={
                "url": types.Schema(type=types.Type.STRING, description="Full URL to fetch and extract"),
                "max_chars": types.Schema(type=types.Type.INTEGER, description="Max characters to return (default 16000, max 131072)"),
            },
            required=["url"],
        ),
--- a/documentation/ARCH__CHANNELS.md
+++ b/documentation/ARCH__CHANNELS.md
@@ -129,16 +129,24 @@ User-defined scheduled jobs stored in `home/{user}/persona/{name}/CRONS.json`. R
 ## Notification Channel Config
-`notification_channel` in `channels.json` sets the default outbound channel for all proactive messages (distill alerts, cron message/brief jobs):
+`notification_channel` in `channels.json` sets the default outbound channel for all proactive messages (distill alerts, cron jobs, reminder checks):
 ```json
 {
-  "notification_channel": "nextcloud",
+  "notification_channel": "web_push",
-  ...
+  "notification_email": "user@example.com",
  "nextcloud": { "notification_room": "<token>" },
  "google_chat": { "outbound_webhook": "https://..." }
 }
 ```
-If absent, defaults to `nextcloud` if configured. Currently only NC Talk is supported for outbound; Google Chat outbound is a future item.
+Supported channels: `web_push` (browser push via VAPID), `email`, `nextcloud` (NC Talk), `google_chat`. Configured via **Settings → Notifications** (`/settings/notifications`).
 **Proactive notification triggers:**
 - **Daily 09:00** — `_run_reminder_check()` in `scheduler.py`: reads due/overdue reminders per persona, fires `notify()` with a formatted summary
 - **Memory distillation** — `_run_mid()` / `_run_long()` call `notify()` on completion
 - **Cron jobs** — `message` / `brief` job types call `notify()` directly
 - **On-demand** — `POST /api/push/test` (test notification) and `POST /api/push/reminders/check` (immediate reminder check)
 ---
--- a/documentation/ARCH__FUTURE.md
+++ b/documentation/ARCH__FUTURE.md
@@ -256,3 +256,61 @@ Rather than a single Cortex instance, each device in the fleet runs its own inst
 - Session continuity — does a conversation that starts on one node stay there, or can it migrate?
 The Syncthing-synced `home/` directory and shared `model_registry.json` already provide a natural foundation — instances share persona memory and context without a central DB.
 ---
 ## 11. LLM Wiki — Persistent Knowledge Compilation (Karpathy Pattern)
 **Status:** Concept — no design yet. Inspired by [Karpathy's llm-wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) gist.
 **Core idea:** Instead of treating AE Journals as an archive you retrieve from, evolve them into a **living wiki** that the LLM incrementally builds and maintains. When a new source is added, the LLM doesn't just index it — it reads it, extracts key information, and integrates it into the existing wiki: updating entity pages, revising topic summaries, flagging contradictions, strengthening or challenging the evolving synthesis. Knowledge is compiled once and kept current, not re-derived on every query.
 This is a philosophical shift from our current approach (RAG/retrieval) toward **compounding knowledge** — the wiki gets richer with every source added and every question asked.
 ### Three-Layer Architecture
 ```
 Raw Sources (immutable)          ↓
    → LLM reads, extracts, cross-references
 Wiki (LLM-maintained markdown)  ← the persistent artifact
    → Human reads, LLM writes
 Schema (CLAUDE.md / AGENTS.md)  ← configuration + conventions
 ```
 1. **Raw sources** — curated, immutable originals (articles, papers, session logs, transcripts). LLM reads from them, never modifies them.
 2. **The wiki** — directory of LLM-generated markdown files: summaries, entity pages, concept pages, comparisons, synthesis. The LLM owns this layer entirely. Creates pages, updates them when new sources arrive, maintains cross-references.
 3. **Schema** — a configuration document (analogous to our `PROTOCOLS.md`) that tells the LLM how the wiki is structured, what conventions to follow, and what workflows to use when ingesting sources or answering questions. Co-evolved with the human over time.
 ### Operations
 **Ingest.** Drop a new source into the raw collection and tell the LLM to process it. Flow: LLM reads source → discusses key takeaways with human → writes summary page → updates index → updates relevant entity/concept pages (a single source might touch 10-15 pages) → appends to log. Human stays involved, guiding emphasis.
 **Query.** Ask questions against the wiki. LLM reads the index to find relevant pages, drills in, synthesizes an answer with citations. **Key insight: good answers get filed back into the wiki as new pages.** A comparison table, an analysis, a connection discovered — these are valuable and shouldn't disappear into chat history.
 **Lint.** Periodic health check: contradictions between pages, stale claims superseded by newer sources, orphan pages with no inbound links, missing cross-references, data gaps that could be filled with a web search.
 ### Index and Log (Two Navigation Files)
 **`index.md`** — content-oriented catalog. Every wiki page listed with link, one-line summary, and optional metadata (date, source count). Organized by category. LLM updates on every ingest. At moderate scale (~100 sources, ~hundreds of pages), this replaces the need for embedding-based RAG.
 **`log.md`** — chronological, append-only record of what happened and when (ingests, queries, lint passes). Each entry starts with a consistent prefix (e.g. `## [2026-04-02] ingest | Article Title`) making it parseable with simple tools like `grep "^## \[" log.md | tail -5`.
 ### Applicability to Cortex / Inara
 This pattern maps naturally to several existing concepts:
 | Karpathy Concept | Cortex Equivalent | Gap |
 |---|---|---|
 | Raw sources | Session logs, imported docs | No curated raw-source collection yet |
 | Wiki pages | AE Journals | Journals are entry-based, not interlinked-wiki-based |
 | Index + Log | No equivalent | Would need `wiki_index.md` and `wiki_log.md` |
 | Schema/Protocols | PROTOCOLS.md, OPERATIONS.md | Not configured for wiki maintenance workflows |
 | Lint operation | No equivalent | No periodic wiki health-check exists |
 | Answers filed back | Session chat history | Answers are lost after session (unless distilled) |
 | Obsidian as IDE | Cortex UI / Files panel | Files panel could serve as the browsing surface |
 **Next steps (if pursued):**
 1. Design the wiki directory structure within `agents_sync/` — separate from session logs and memory files
 2. Define the schema document — what goes in a wiki page, cross-reference format, category taxonomy
 3. Build an ingest tool/script that reads a source and updates wiki pages (LLM-driven)
 4. Build a lint cron job that health-checks the wiki periodically
 5. Consider Obsidian compatibility for human browsing of the wiki graph
--- a/documentation/ARCH__SYSTEM.md
+++ b/documentation/ARCH__SYSTEM.md
@@ -72,7 +72,7 @@ Details: [`ARCH__BACKENDS.md`](ARCH__BACKENDS.md) | [`ARCH__PERSONA.md`](ARCH__P
 | `email_utils.py` | SMTP invite emails |
 | `persona_template.py` | Bootstrap a new persona directory from templates |
 | `routers/` | One file per endpoint group — `chat`, `orchestrator`, `auth`, `files`, `ui`, `settings`, `local_llm`, `distill`, `audit`, `usage`, `push`, `help`, `onboarding`, `auth_google`, `nextcloud_talk`, `google_chat` |
-| `tools/` | Orchestrator tool implementations — `web`, `tasks`, `scratch`, `reminders`, `cron`, `system`, `notify`, `ae_journals`, `ae_tasks`, `agent_notes` |
+| `tools/` | Orchestrator tool implementations — `web` (search/fetch/web_read), `files` (file_read/write/session_read/search), `tasks`, `scratch`, `reminders`, `cron`, `system`, `notify`, `ae_journals`, `ae_tasks`, `agent_notes`, `agents` (spawn_agent) |
 | `static/` | Web UI — `index.html`, `app.js`, `style.css`, `login.html`, `setup.html`, `HELP.md`, `local_llm.html`, `settings.html` |
 | `tests/` | pytest suite |
--- a/documentation/MASTER.md
+++ b/documentation/MASTER.md
@@ -1,7 +1,7 @@
 # Cortex / Inara — Master Index
 > Start here. This document is a map, not a manual.
-> Last updated: 2026-05-06
+> Last updated: 2026-05-09
 >
 > **Documentation philosophy:** Cortex is a no-black-box system. Docs must match reality.
 > Update docs before implementing significant changes. Verify they still match after.
@@ -26,7 +26,7 @@ Cortex is a self-hosted personal AI platform. It routes messages from any input
 | Claude backend | ✅ Live | Primary — via Claude Code CLI |
 | Gemini backend | ✅ Live | Fallback — via Gemini CLI |
 | Local backend | ✅ Live | Open WebUI/Ollama on scott_gaming; per-user multi-model config |
-| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ toggle in UI (40 tools) |
+| Gemini orchestrator | ✅ Live | Tool loop → Claude response, ⚡ toggle in UI (47 tools) |
 | Local orchestrator | ✅ Live | OpenAI-compatible ReAct loop; used when orchestrator role → local model |
 | Model registry V2 | ✅ Live | Providers (Anthropic/Google/Local), multi-account Gemini, role assignments |
 | Memory distillation | ✅ Live | Short (daily) / Mid (weekly) / Long (monthly) |
@@ -36,6 +36,8 @@ Cortex is a self-hosted personal AI platform. It routes messages from any input
 | Tool audit log | ✅ Live | Every orchestrator tool call logged to `home/{user}/tool_audit/` |
 | Token usage tracking | ✅ Live | Per-user daily buckets in `home/{user}/usage.json`; visible in Settings |
 | Web push notifications | ✅ Live | VAPID push; `web_push` orchestrator tool; subscribe via ☰ menu |
 | Proactive notifications | ✅ Live | Daily reminder check (09:00); distill/cron completion alerts; dedicated `/settings/notifications` page |
 | Sub-agent spawning | ✅ Live | `spawn_agent` tool — synchronous sub-agents via any configured model |
 | Agent private notes | ✅ Live | `AGENT_NOTES.md` — orchestrator-only notepad; 3 rolling backups; user-visible as read-only |
 | Distill safety | ✅ Live | Per-persona asyncio lock, per-endpoint cooldowns, Rebuild option |
 | Guided onboarding | ✅ Live | Setup Step 3 for OpenRouter; existing-user banner; settings quick-link |
--- a/documentation/ROADMAP.md
+++ b/documentation/ROADMAP.md
@@ -1,7 +1,7 @@
 # Cortex — Roadmap
 > Phases and priorities. For active tasks see `TODO__Agents.md`.
-> Last updated: 2026-04-29
+> Last updated: 2026-05-09
 ---
@@ -39,7 +39,12 @@
 - ✅ Session search (full-text across past session logs)
 - ✅ Distill notifications (NC Talk after mid/long runs)
 - ✅ Local backend for distillation (DISTILL_BACKEND_MID/LONG in .env)
- [ ] **Local orchestrator** — ReAct tool loop using local model (High priority — see `TODO__Agents.md`)
+- ✅ Local orchestrator — OpenAI-compatible ReAct loop; fires when orchestrator role → local model
 - ✅ Web push notifications — VAPID; `web_push` tool; PWA-installable; subscribe via ☰ menu
 - ✅ Proactive notifications — daily reminder check (09:00); `notify()` routes to any configured channel; dedicated settings page
 - ✅ Sub-agent spawning — `spawn_agent` tool; per-host concurrency limit; Gemini API + local OpenAI backends
 - ✅ Web content extraction — `web_read` via trafilatura; strips ads/nav/boilerplate; 128K cap
 - ✅ Session log reader — `session_read(date)` tool; complements `session_search`
 - [ ] Knowledge import — markdown → AE Journals (import script)
 - [ ] Dev agent pipeline — specialist agents + supervisor + approval gate
 - [ ] Gitea webhook integration + Actions CI
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -96,8 +96,11 @@ system prompt by `context_loader.py` at all tiers.
  - Params: `conversation_token: str`, `limit: int = 20`
  - Returns last N messages with sender + timestamp
  - Admin-only (requires NC Talk API credentials from channels.json)
 - [ ] **`http_post`** — POST to external URLs with allowlist
 - [ ] **`task_list` priority filter** — add `priority` param alongside existing `status`
- [ ] **`http_fetch` max_chars** — optional param, default 8192, cap at 32768
+- [x] **`http_fetch` max_chars** — optional param, default 8192, cap at 32768 — 2026-05-09
 - [x] **`web_read(url, max_chars=16000)`** — clean article extraction via trafilatura; strips ads/nav/boilerplate, returns markdown — 2026-05-09
 - [x] **`session_read(date)`** — read a full session log by YYYY-MM-DD date; lists available dates if not found — 2026-05-09
 ### [Channel] Proactive notifications ✅ — 2026-05-08
 Inara reaches out on her own initiative via NC Talk, Google Chat, email, or browser push.
@@ -108,6 +111,9 @@ Inara reaches out on her own initiative via NC Talk, Google Chat, email, or brow
 - [x] `scheduler.py` — distill_mid and distill_long already call `notify()` on completion
 - [x] Settings UI — "Browser Push Notification" option added to Notification Channel selector
 - [x] `notification_channel` accepts `"web_push"` in `routers/settings.py`
 - [x] `GET /settings/notifications` — dedicated Notifications page (channel form + test buttons); Settings page now shows a link card
 - [x] `POST /api/push/test` + `POST /api/push/reminders/check` — on-demand test endpoints
 - [x] `push_utils.py` — fixed `pywebpush` 2.x key deserialisation (use `Vapid.from_pem()` instead of passing PEM string)
 ### [UI] File attachments in chat
 Upload an image or document inline and have it flow into context. Natural workflow
Author	SHA1	Message	Date
Scott Idem	b9a78819ac	docs: add LLM wiki concept (Karpathy pattern) to ARCH__FUTURE.md Inara's exploration of a living-wiki knowledge compilation architecture as an alternative to RAG — three-layer model, ingest/query/lint ops, and a mapping to existing Cortex concepts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:22:55 -04:00
Scott Idem	3672fa1506	docs: comprehensive doc audit — sync all docs to current state - MASTER.md: tool count 40→47, add proactive notifications + spawn_agent rows, date bump - ROADMAP.md: mark local orchestrator/web push/proactive notifs/spawn_agent/web_read/session_read as done, date bump - ARCH__CHANNELS.md: rewrite notification channel config section — all 4 channels, all triggers, on-demand endpoints - ARCH__SYSTEM.md: update tools/ module list to include files, agents - README.md: update LLM backends in architecture diagram, add browser push to channels table - CLAUDE.md: add doc update checklist to Documentation Philosophy section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:13:45 -04:00
Scott Idem	52c19afbcc	fix: raise web_read and http_fetch max_chars cap to 128K Both tools now accept max_chars up to 131072 to accommodate long documentation pages and large API responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:08:17 -04:00
Scott Idem	17e8869d12	docs: update tool count (45→47), HELP.md, and TODO for new web/file tools Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:05:04 -04:00
Scott Idem	7c3291960a	feat: web_read (trafilatura), session_read, http_fetch max_chars web_read(url, max_chars=16000) — fetches a URL and extracts clean article text via trafilatura, stripping ads/nav/boilerplate. Returns markdown. session_read(date) — reads a full session log by YYYY-MM-DD date; lists available dates if the requested one is not found. http_fetch gains a max_chars param (default 8192, max 32768) so the cap is configurable instead of hardcoded. Tool count: 45 → 47. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:04:24 -04:00
Scott Idem	a99ebb8c30	feat: retry button for orchestrator errors + explicit client timeout Extract orchestrator inner loop into _doOrchestrate() so the retry button can re-run without re-adding the user message to DOM or history — same pattern as the existing chat retry. Also set AsyncOpenAI(timeout=settings.timeout_local) so slow remote models (OpenRouter/DeepSeek) get the same 300s budget as local chat calls instead of the SDK default which varies by connection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 12:39:34 -04:00
Scott Idem	ff154b1ec0	docs: update CLAUDE.md, HELP.md, and TODO for notifications page + push fix - CLAUDE.md: date → 2026-05-08, add Proactive notifications row to channel table - HELP.md: update Notifications settings entry, expand Push Notifications section with channel config link, add test API endpoints to reference table - TODO__Agents.md: mark notifications dedicated page and pywebpush fix as done Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 23:58:47 -04:00