feat: Intelligence Layer Phase 1 — orchestrator service

Adds the Gemini API orchestrator (ReAct tool loop → Claude responder): Orchestrator engine + router: - orchestrator_engine.py: Gemini API tool loop, Claude CLI handoff - routers/orchestrator.py: POST /orchestrate (async job queue), GET /orchestrate/{job_id} Tools (cortex/tools/): - web.py: DuckDuckGo web search (no key required) - ae_knowledge.py: ae_journal_search + ae_journal_entry_create (AE V3 API) - ae_tasks.py: ae_task_list (reads agents_sync Kanban filesystem) - files.py: file_read (path-allowlisted to safe dirs) Config + deps: - config.py: orchestrator, DuckDuckGo, and AE API settings - requirements.txt: google-genai, duckduckgo-search - .env.default: reference config with all new keys documented Docs: - CLAUDE.md, README.md, documentation/ added to repo - Port references updated 7331 → 8000 throughout - Default model updated to gemini-2.5-flash Tested: ae_task_list, ae_journal_search, web_search all working end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 19:37:49 -04:00
parent 23f8659aaa
commit ed472ce9a0
15 changed files with 1840 additions and 1 deletions
--- a/.env.default
+++ b/.env.default
@@ -0,0 +1,55 @@
+# Cortex .env reference — copy to .env and fill in values
+# DO NOT commit .env — it contains secrets
+
+# ── Server ──────────────────────────────────────────────────────────────────
+HOST=0.0.0.0
+PORT=8000
+
+# ── Nextcloud Talk bot ───────────────────────────────────────────────────────
+NEXTCLOUD_URL=https://cloud.dgrzone.com
+NEXTCLOUD_TALK_BOT_SECRET=
+
+# ── LLM backends ────────────────────────────────────────────────────────────
+# Primary backend: "claude" or "gemini" (other is always fallback)
+PRIMARY_BACKEND=claude
+
+# Timeouts in seconds
+TIMEOUT_CLAUDE=60
+TIMEOUT_GEMINI=120
+
+# ── Orchestrator (Gemini API — not Gemini CLI) ───────────────────────────────
+# Required for /orchestrate endpoint and tool use
+# Free tier key: https://aistudio.google.com/apikey
+GEMINI_API_KEY=AIzaSyAnmzm31zO1kFkphxCkTnwgFizbfgB1JHI
+
+# Model for the orchestration tool loop (not the user-facing response)
+ORCHESTRATOR_MODEL=gemini-2.5-flash
+
+# Safety cap on tool loop iterations
+ORCHESTRATOR_MAX_ROUNDS=10
+
+# ── DuckDuckGo search ────────────────────────────────────────────────────────
+# Leave blank for free unauthenticated tier
+# Set to your API key for higher rate limits (paid DuckDuckGo account)
+DDG_API_KEY=
+DDG_MAX_RESULTS=5
+
+# ── Aether Platform API ───────────────────────────────────────────────────────
+# Used by orchestrator tools: ae_journal_search, ae_journal_entry_create, ae_task_list
+# Same values as agents_sync/mcp/.env — copy from there
+AE_API_URL=https://dev-api.oneskyit.com
+AE_API_KEY=
+AE_ACCOUNT_ID=
+AE_API_TIMEOUT=15
+
+# ── Distillation schedule ────────────────────────────────────────────────────
+SCHEDULER_TIMEZONE=America/New_York
+AUTO_DISTILL=true
+AUTO_DISTILL_SHORT=true
+AUTO_DISTILL_MID=true
+AUTO_DISTILL_LONG=false   # manual review recommended before enabling
+
+# Memory tier token budgets (soft caps)
+MEMORY_BUDGET_SHORT=3000
+MEMORY_BUDGET_MID=2000
+MEMORY_BUDGET_LONG=2000
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,171 @@
+# CLAUDE.md — Cortex / Inara Project
+
+This file is loaded automatically by Claude Code when working in this directory.
+Read it before touching any files.
+
+---
+
+## Identity & Context
+
+- **Project:** Cortex (dispatcher) + Inara (resident agent)
+- **Owner:** Scott Idem (One Sky IT / Danger Zone)
+- **Machine context:** See `~/CLAUDE.md` for fleet identity (`scott_lpt` = General Manager)
+- **Named after:** The 'verse-wide communications network (Firefly)
+
+---
+
+## Directory Map
+
+```
+Cortex_and_Inara_dev/
+  cortex/                ← FastAPI service (the dispatcher)
+    main.py              ← App entry point, router registration
+    config.py            ← All settings (pydantic-settings, reads .env)
+    llm_client.py        ← Claude CLI + Gemini CLI subprocess backends
+    orchestrator_engine.py ← Gemini API ReAct tool loop → Claude handoff
+    context_loader.py    ← Loads Inara's system prompt from inara/ files
+    session_store.py     ← In-memory + file session persistence
+    session_logger.py    ← Writes session turns to inara/sessions/
+    memory_distiller.py  ← Short/mid/long distill jobs (APScheduler)
+    scheduler.py         ← APScheduler setup
+    event_bus.py         ← Internal SSE pub/sub (NC Talk → browser)
+    routers/
+      chat.py            ← POST /chat (streaming SSE)
+      orchestrator.py    ← POST /orchestrate, GET /orchestrate/{job_id}
+      auth.py            ← GET /auth/status (Claude + Gemini CLI token checks)
+      distill.py         ← POST /distill/*, GET /distill/status
+      files.py           ← GET /files (inara/ file browser)
+      nextcloud_talk.py  ← POST /webhook/nextcloud (NC Talk bot)
+      google_chat.py     ← POST /webhook/google (Google Chat — stub)
+    tools/
+      __init__.py        ← Tool registry (Gemini FunctionDeclarations + dispatcher)
+      web.py             ← DuckDuckGo web_search tool
+    static/              ← Single-page web UI (index.html, style.css, app.js)
+    data/sessions/       ← Persisted session JSON files
+
+  inara/                 ← Inara identity, memory, context files
+    IDENTITY.md          ← Who Inara is
+    SOUL.md              ← Values, personality, voice
+    PROTOCOLS.md         ← Behavioral rules
+    CONTEXT_TIERS.md     ← What each tier (1–3) includes in the system prompt
+    USER.md              ← Scott's profile (loaded into context)
+    HELP.md              ← In-app help content (rendered in UI)
+    MEMORY.md            ← Persistent facts (written by distiller or manually)
+    MEMORY_SHORT.md      ← Rolling short-term memory (auto-distilled daily)
+    MEMORY_MID.md        ← Mid-term memory (auto-distilled weekly)
+    MEMORY_LONG.md       ← Long-term memory (auto-distilled monthly)
+    sessions/            ← Session turn logs (YYYY-MM-DD_<id>.md)
+
+  docs/                  ← Integration reference docs
+    NEXTCLOUD_TALK_BOT.md
+
+  documentation/         ← Architecture decisions and agent task list
+    TODO__Agents.md      ← READ THIS FIRST — active task list
+    ARCH__Intelligence_Layer.md ← Orchestrator, dev agent, knowledge architecture
+
+  docker-compose.yml     ← Docker deployment
+  .env.default           ← Reference config (copy to .env, fill in secrets)
+  README.md              ← Project orientation
+```
+
+---
+
+## Run Commands
+
+```bash
+# Start (Docker)
+docker compose up -d
+
+# Restart service (after any Python change)
+sudo systemctl restart cortex
+
+# Syntax check a file before restarting
+python3 -m py_compile cortex/<file>.py
+
+# Syntax check all routers
+for f in cortex/routers/*.py cortex/tools/*.py cortex/orchestrator_engine.py; do
+    python3 -m py_compile "$f" && echo "OK: $f"
+done
+
+# Install/update dependencies
+cd cortex && .venv/bin/pip install -r requirements.txt
+
+# Logs
+journalctl -u cortex -f
+
+# Web UI (local)
+http://localhost:8000
+
+# Swagger docs
+http://localhost:8000/docs
+```
+
+---
+
+## Key Design Decisions
+
+### Two-Brain Architecture (Orchestrator / Responder)
+- **Gemini API** (`orchestrator_engine.py`) — runs the ReAct tool loop; handles tool calling, planning, research
+- **Claude CLI** (`llm_client.py`) — produces all user-facing responses; receives enriched context from Gemini
+- **Direct chat** bypasses the orchestrator entirely — `POST /chat` goes straight to Claude (faster)
+- **Orchestrated tasks** go to `POST /orchestrate` — returns a job_id, result is polled
+
+### LLM Backends
+- `llm_client.py` manages Claude CLI (`claude --print`) and Gemini CLI (`gemini -p`) subprocesses
+- `orchestrator_engine.py` uses the Gemini **API** (google-genai SDK) — completely separate from the Gemini CLI
+- Claude OAuth token is read live from `~/.claude/.credentials.json` (never rely on stale env var)
+
+### Tool Strategy
+- Orchestrator tools live in `cortex/tools/` — separate from the `ae_*` MCP tools
+- **Do not modify** the `ae_*` MCP server to support orchestrator needs; add new tools to `cortex/tools/` instead
+- Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables
+
+### Context / Memory
+- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–3)
+- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full
+- Memory files are written by the distiller or manually — do not delete them
+
+### Security / Safety
+- **Never `rm`** — move files to `~/tmp/gemini_trash`
+- **Never commit secrets** — `.env` is gitignored; use `.env.default` as the reference
+- `NEXTCLOUD_TALK_BOT_SECRET` and `GEMINI_API_KEY` live in `.env` only
+- Cortex should only be accessible via WireGuard — never internet-exposed without VPN
+
+---
+
+## Adding a New Tool
+
+1. Implement the tool function in `cortex/tools/<domain>.py`
+   - Must be `async def`; use `asyncio.to_thread` for blocking calls
+   - Return a plain string result
+2. Add a `FunctionDeclaration` and register it in `cortex/tools/__init__.py`
+3. Syntax check: `python3 -m py_compile cortex/tools/<domain>.py`
+4. Restart Cortex
+
+## Adding a New Router
+
+1. Create `cortex/routers/<name>.py` with `router = APIRouter()`
+2. Import and register in `cortex/main.py`
+3. Syntax check, restart
+
+---
+
+## Active Tasks
+
+See `documentation/TODO__Agents.md` for the current task list.
+High priority items as of 2026-03-18:
+- Ollama backend (third LLM option — local, no API cost)
+- NC Talk integration stabilization
+- Knowledge consolidation (markdown → AE Journals)
+
+---
+
+## Related Docs
+
+| File | Purpose |
+|---|---|
+| `documentation/TODO__Agents.md` | Active task list — read before starting work |
+| `documentation/ARCH__Intelligence_Layer.md` | Full architecture design |
+| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+| `~/agents_sync/CLAUDE.md` | Fleet coordination rules |
+| `~/CLAUDE.md` | Machine identity (`scott_lpt`) |
--- a/README.md
+++ b/README.md
@@ -0,0 +1,91 @@
+# Cortex / Inara — Project Root
+
+**Owner:** Scott Idem (One Sky IT / Danger Zone)
+**Started:** 2026-03-04
+**Status:** Active development
+
+> *"You can't stop the signal."*
+
+Cortex is a self-hosted multi-agent orchestration layer. Inara is the primary conversational agent that lives inside it.
+
+---
+
+## Quick Orientation
+
+| Directory | What it is |
+|---|---|
+| `cortex/` | FastAPI service — dispatcher, routing, LLM backends, session management |
+| `inara/` | Inara identity, memory, context, and help files |
+| `docs/` | Integration reference docs (NC Talk bot, etc.) |
+| `documentation/` | Architecture decisions, project plans, agent task lists |
+
+---
+
+## Running Cortex
+
+```bash
+# Start (Docker)
+cd ~/agents_sync/projects/Cortex_and_Inara_dev
+docker compose up -d
+
+# Restart service only (after backend changes)
+sudo systemctl restart cortex
+
+# Logs
+journalctl -u cortex -f
+
+# Web UI
+http://localhost:8000   (or cortex.dgrzone.com on WireGuard)
+```
+
+Config lives in `cortex/config.py` and a `.env` file at the project root (not tracked — see `env.default`).
+
+---
+
+## Key Documentation
+
+| File | Purpose |
+|---|---|
+| `documentation/TODO__Agents.md` | Active task list — read first |
+| `documentation/ARCH__Intelligence_Layer.md` | Intelligence layer architecture (orchestrator, dev agents, knowledge) |
+| `docs/NEXTCLOUD_TALK_BOT.md` | NC Talk bot setup |
+| `inara/IDENTITY.md` | Inara persona and identity |
+| `inara/HELP.md` | In-app help content (rendered in UI) |
+| `inara/PROTOCOLS.md` | Inara behavioral protocols |
+| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+
+---
+
+## Architecture at a Glance
+
+```
+[User / Cron / Webhook]
+        ↓
+  Cortex Dispatcher  (FastAPI, cortex/)
+        ↓
+  LLM Backend(s)
+  • Claude CLI   — primary reasoning, coding, long-context
+  • Gemini CLI   — secondary / cost routing
+  • Ollama       — offline/private (scott_gaming, future)
+        ↓
+  Inara  (identity + memory in inara/)
+```
+
+See `documentation/ARCH__Intelligence_Layer.md` for the evolving orchestrator/responder and dev-agent architecture.
+
+---
+
+## Inara
+
+Inara is not tied to a specific model. The name is fixed; the backend may vary.
+Her identity and behavioral files live in `inara/` and are loaded at startup via `cortex/context_loader.py`.
+
+---
+
+## Related Projects
+
+| Project | Path |
+|---|---|
+| Aether Platform API | `~/OSIT_dev/aether_api_fastapi/` |
+| Aether Frontend | `~/OSIT_dev/aether_app_sveltekit/` |
+| Fleet coordination | `~/agents_sync/` |
--- a/cortex/config.py
+++ b/cortex/config.py
@@ -4,6 +4,24 @@ from pydantic_settings import BaseSettings, SettingsConfigDict

 class Settings(BaseSettings):
    anthropic_api_key: str | None = None  # not used — claude CLI handles auth
+
+    # Orchestrator (Gemini API — separate from Gemini CLI)
+    # Get a key at: https://aistudio.google.com/apikey (free tier is sufficient)
+    gemini_api_key: str | None = None
+    orchestrator_model: str = "gemini-2.5-flash"    # model used for tool loop
+    orchestrator_max_rounds: int = 10               # safety cap on tool iterations
+
+    # DuckDuckGo search (used by orchestrator web_search tool)
+    # Leave blank to use the free unauthenticated tier; set to your API key for higher limits
+    ddg_api_key: str | None = None
+    ddg_max_results: int = 5
+
+    # Aether Platform API (used by orchestrator ae_journal_* and ae_task_list tools)
+    ae_api_url: str = "https://dev-api.oneskyit.com"
+    ae_api_key: str = ""          # x-aether-api-key header
+    ae_account_id: str = ""       # x-account-id header
+    ae_api_timeout: int = 15      # per-request timeout in seconds
+
    inara_dir: Path = Path("../inara")
    sessions_dir: Path = Path("./data/sessions")
    default_model: str = "claude-sonnet-4-6"
--- a/cortex/main.py
+++ b/cortex/main.py
@@ -8,7 +8,7 @@ import uvicorn
 logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(message)s")

 from config import settings
-from routers import chat, google_chat, nextcloud_talk, files, distill, auth
+from routers import chat, google_chat, nextcloud_talk, files, distill, auth, orchestrator


@asynccontextmanager
@@ -29,6 +29,7 @@ app.include_router(nextcloud_talk.router)
 app.include_router(files.router)
 app.include_router(distill.router)
 app.include_router(auth.router)
+app.include_router(orchestrator.router)
 app.mount("/static", StaticFiles(directory="static"), name="static")


--- a/cortex/orchestrator_engine.py
+++ b/cortex/orchestrator_engine.py
@@ -0,0 +1,243 @@
+"""
+Orchestrator engine — two-brain architecture.
+
+Flow:
+  1. Gemini API runs a ReAct tool loop (reason → act → observe → repeat)
+  2. When Gemini has gathered enough context, it produces a final summary
+  3. That enriched context is handed off to Claude for the user-facing response
+
+Why this split:
+  - Gemini API has native structured tool calling (Gemini CLI subprocess does not)
+  - Claude produces higher-quality user-facing prose and reasoning
+  - Claude Pro subscription has no API cost; Gemini free tier handles orchestration load
+
+For direct chat (no tools needed), this engine is not invoked — the chat router
+calls llm_client.complete() directly, which is faster and has no orchestration overhead.
+"""
+
+import asyncio
+import logging
+from dataclasses import dataclass, field
+
+from google import genai
+from google.genai import types
+
+from config import settings
+from llm_client import complete
+from tools import TOOL_DECLARATIONS, call_tool
+
+logger = logging.getLogger(__name__)
+
+# System prompt given to Gemini during the tool loop.
+# Gemini's job is information gathering and planning — NOT writing the final response.
+_ORCHESTRATOR_SYSTEM = """You are an intelligent orchestrator. Your job is to:
+1. Understand the user's request
+2. Call tools to gather the information needed to answer it
+3. Once you have enough information, produce a concise summary of:
+   - What the user asked
+   - What you found (tool results, key facts)
+   - Any important context that would help generate a good answer
+
+Do NOT write a polished final answer — a human-facing AI will do that next.
+Keep your summary factual and complete. Include relevant URLs, data, and specifics.
+If no tools are needed, return an empty summary."""
+
+
+@dataclass
+class OrchestratorResult:
+    response: str                       # final user-facing response (from Claude)
+    tool_calls: list[dict] = field(default_factory=list)  # [{tool, args, result}]
+    backend: str = "claude"             # model that produced the final response
+    gemini_summary: str = ""            # what Gemini handed to Claude (debug/display)
+
+
+async def run(
+    task: str,
+    system_prompt: str = "",
+    session_messages: list[dict] | None = None,
+    respond_with_claude: bool = True,
+) -> OrchestratorResult:
+    """
+    Run the full orchestration loop for a task.
+
+    Args:
+        task:               The user's request (plain text)
+        system_prompt:      Inara's system prompt (from context_loader) — passed to Claude
+        session_messages:   Prior conversation history for session continuity
+        respond_with_claude: If False, return Gemini's summary as the response (useful for
+                             background/cron tasks where a polished reply isn't needed)
+
+    Returns:
+        OrchestratorResult with response, tool call log, backend used, and Gemini summary
+    """
+    if not settings.gemini_api_key:
+        raise RuntimeError(
+            "GEMINI_API_KEY not set — orchestrator requires Gemini API. "
+            "Get a free key at https://aistudio.google.com/apikey and add it to .env"
+        )
+
+    client = genai.Client(api_key=settings.gemini_api_key)
+
+    # Seed Gemini with the task — include recent session context if available
+    task_with_context = _build_task_prompt(task, session_messages)
+    contents: list[types.Content] = [
+        types.Content(role="user", parts=[types.Part(text=task_with_context)])
+    ]
+
+    tool_call_log: list[dict] = []
+    gemini_summary = ""
+
+    # --- ReAct tool loop ---
+    for round_num in range(settings.orchestrator_max_rounds):
+        logger.info("Orchestrator round %d for task: %.80s", round_num + 1, task)
+
+        response = await asyncio.to_thread(
+            client.models.generate_content,
+            model=settings.orchestrator_model,
+            contents=contents,
+            config=types.GenerateContentConfig(
+                tools=TOOL_DECLARATIONS,
+                system_instruction=_ORCHESTRATOR_SYSTEM,
+            ),
+        )
+
+        candidate = response.candidates[0]
+        parts = candidate.content.parts if candidate.content else []
+
+        # Check if Gemini wants to call any tools
+        tool_call_parts = [
+            p for p in parts
+            if hasattr(p, "function_call") and p.function_call and p.function_call.name
+        ]
+
+        if not tool_call_parts:
+            # No more tool calls — extract Gemini's text summary
+            gemini_summary = "".join(
+                p.text for p in parts if hasattr(p, "text") and p.text
+            ).strip()
+            logger.info("Orchestrator done after %d round(s). Tools used: %d",
+                        round_num + 1, len(tool_call_log))
+            break
+
+        # Add Gemini's response (with function calls) to the conversation
+        contents.append(candidate.content)
+
+        # Execute all tool calls in parallel
+        tool_tasks = [
+            _execute_tool(fc.function_call.name, dict(fc.function_call.args))
+            for fc in tool_call_parts
+        ]
+        tool_results = await asyncio.gather(*tool_tasks, return_exceptions=True)
+
+        # Build function response parts and update log
+        response_parts: list[types.Part] = []
+        for fc_part, result in zip(tool_call_parts, tool_results):
+            fc = fc_part.function_call
+            result_str = str(result) if not isinstance(result, Exception) else f"Error: {result}"
+            logger.info("Tool %s → %d chars", fc.name, len(result_str))
+
+            tool_call_log.append({
+                "tool": fc.name,
+                "args": dict(fc.args),
+                "result": result_str,
+            })
+            response_parts.append(
+                types.Part(
+                    function_response=types.FunctionResponse(
+                        name=fc.name,
+                        response={"result": result_str},
+                    )
+                )
+            )
+
+        contents.append(types.Content(role="user", parts=response_parts))
+
+    else:
+        # Hit the round limit — use whatever Gemini produced last
+        logger.warning("Orchestrator hit max rounds (%d)", settings.orchestrator_max_rounds)
+        gemini_summary = (
+            f"Reached the tool iteration limit ({settings.orchestrator_max_rounds} rounds). "
+            "Here is what was gathered so far:\n\n"
+            + "\n\n".join(f"**{t['tool']}**: {t['result'][:500]}" for t in tool_call_log)
+        )
+
+    # --- Claude handoff ---
+    if respond_with_claude:
+        claude_prompt = _build_claude_prompt(task, tool_call_log, gemini_summary)
+
+        # Merge with session history so Claude has conversation context
+        messages = list(session_messages or [])
+        messages.append({"role": "user", "content": claude_prompt})
+
+        response_text, backend = await complete(
+            system_prompt=system_prompt,
+            messages=messages,
+            model="claude",
+        )
+    else:
+        # Cron/background tasks: return Gemini's summary directly, no Claude call
+        response_text = gemini_summary or "No information gathered."
+        backend = "gemini"
+
+    return OrchestratorResult(
+        response=response_text,
+        tool_calls=tool_call_log,
+        backend=backend,
+        gemini_summary=gemini_summary,
+    )
+
+
+async def _execute_tool(name: str, args: dict) -> str:
+    """Execute a single tool call, catching all exceptions."""
+    try:
+        return await call_tool(name, args)
+    except Exception as e:
+        logger.warning("Tool %s failed: %s", name, e)
+        return f"Tool error: {e}"
+
+
+def _build_task_prompt(task: str, session_messages: list[dict] | None) -> str:
+    """Prepend recent session context so Gemini understands the conversation."""
+    if not session_messages:
+        return task
+
+    # Include last few turns for context (don't send the full history to keep tokens low)
+    recent = session_messages[-6:]  # last 3 turns
+    history_lines = []
+    for msg in recent:
+        label = "User" if msg["role"] == "user" else "Assistant"
+        history_lines.append(f"{label}: {msg['content'][:300]}")  # truncate long messages
+
+    context = "\n".join(history_lines)
+    return f"<recent_conversation>\n{context}\n</recent_conversation>\n\nCurrent request: {task}"
+
+
+def _build_claude_prompt(
+    task: str,
+    tool_calls: list[dict],
+    gemini_summary: str,
+) -> str:
+    """Build the enriched context handed from Gemini to Claude."""
+    parts = [f"User request: {task}\n"]
+
+    if tool_calls:
+        parts.append("## Research gathered\n")
+        for tc in tool_calls:
+            parts.append(f"### {tc['tool']}({_format_args(tc['args'])})")
+            # Truncate very long results — Claude gets the gist
+            result = tc["result"]
+            if len(result) > 2000:
+                result = result[:2000] + "\n… [truncated]"
+            parts.append(result)
+            parts.append("")
+
+    if gemini_summary:
+        parts.append("## Summary of findings\n")
+        parts.append(gemini_summary)
+
+    return "\n".join(parts)
+
+
+def _format_args(args: dict) -> str:
+    """Format tool args as a compact string for display."""
+    return ", ".join(f"{k}={repr(v)}" for k, v in args.items())
--- a/cortex/requirements.txt
+++ b/cortex/requirements.txt
@@ -4,5 +4,9 @@ uvicorn[standard]>=0.30.0
 pydantic-settings>=2.0.0
 python-dotenv>=1.0.0

+# Orchestrator — Gemini API (native tool calling) + web search
+google-genai>=1.0.0
+duckduckgo-search>=6.3.0
+
 # anthropic SDK not needed — using claude CLI subprocess for auth
 # anthropic>=0.40.0
--- a/cortex/routers/orchestrator.py
+++ b/cortex/routers/orchestrator.py
@@ -0,0 +1,174 @@
+"""
+Orchestrator router — POST /orchestrate, GET /orchestrate/{job_id}
+
+Accepts a task description, runs it through the orchestrator engine
+(Gemini tool loop → Claude response), and returns the result.
+
+Designed to be triggered from:
+  - The Cortex web UI (future "Agent mode" toggle)
+  - Cron jobs:  curl -X POST http://localhost:8000/orchestrate -d '{"task":"..."}'
+  - Webhooks:   Gitea, Aether events, etc.
+"""
+
+import asyncio
+import logging
+import uuid
+from datetime import datetime, timezone
+
+from fastapi import APIRouter
+from pydantic import BaseModel
+
+from config import settings
+from context_loader import load_context
+import orchestrator_engine
+
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/orchestrate", tags=["orchestrator"])
+
+# ---------------------------------------------------------------------------
+# In-memory job store
+# Jobs are keyed by UUID. For this phase, memory is fine — jobs are short-lived.
+# ---------------------------------------------------------------------------
+
+_jobs: dict[str, dict] = {}
+_jobs_lock = asyncio.Lock()
+
+
+# ---------------------------------------------------------------------------
+# Request / response models
+# ---------------------------------------------------------------------------
+
+class OrchestrateRequest(BaseModel):
+    task: str
+    session_id: str | None = None       # include session history in context
+    tier: int | None = None             # Inara context tier (default from settings)
+    respond_with_claude: bool = True    # False = return Gemini summary only (faster, for cron)
+    include_long: bool = True
+    include_mid: bool = True
+    include_short: bool = True
+
+
+class OrchestrateResponse(BaseModel):
+    job_id: str
+    status: str     # "queued" | "running" | "complete" | "error"
+
+
+class JobStatusResponse(BaseModel):
+    job_id: str
+    status: str
+    task: str
+    created_at: str
+    completed_at: str | None = None
+    response: str | None = None
+    tool_calls: list[dict] | None = None
+    backend: str | None = None
+    gemini_summary: str | None = None
+    error: str | None = None
+
+
+# ---------------------------------------------------------------------------
+# Endpoints
+# ---------------------------------------------------------------------------
+
+@router.post("", response_model=OrchestrateResponse)
+async def orchestrate(req: OrchestrateRequest) -> OrchestrateResponse:
+    """Submit a task to the orchestrator. Returns a job_id to poll."""
+    job_id = str(uuid.uuid4())
+    now = datetime.now(timezone.utc).isoformat()
+
+    job: dict = {
+        "job_id": job_id,
+        "status": "queued",
+        "task": req.task,
+        "created_at": now,
+        "completed_at": None,
+        "response": None,
+        "tool_calls": None,
+        "backend": None,
+        "gemini_summary": None,
+        "error": None,
+    }
+
+    async with _jobs_lock:
+        _jobs[job_id] = job
+
+    # Run in background — caller polls GET /orchestrate/{job_id}
+    asyncio.create_task(_run_job(job_id, req))
+    logger.info("Orchestrator job queued: %s — %.80s", job_id, req.task)
+    return OrchestrateResponse(job_id=job_id, status="queued")
+
+
+@router.get("/{job_id}", response_model=JobStatusResponse)
+async def job_status(job_id: str) -> JobStatusResponse:
+    """Poll the status of an orchestrator job."""
+    async with _jobs_lock:
+        job = _jobs.get(job_id)
+
+    if job is None:
+        from fastapi import HTTPException
+        raise HTTPException(status_code=404, detail=f"Job {job_id} not found")
+
+    return JobStatusResponse(**job)
+
+
+@router.get("", response_model=list[JobStatusResponse])
+async def list_jobs() -> list[JobStatusResponse]:
+    """List all jobs (most recent first). Useful for debugging."""
+    async with _jobs_lock:
+        jobs = sorted(_jobs.values(), key=lambda j: j["created_at"], reverse=True)
+    return [JobStatusResponse(**j) for j in jobs]
+
+
+# ---------------------------------------------------------------------------
+# Background runner
+# ---------------------------------------------------------------------------
+
+async def _run_job(job_id: str, req: OrchestrateRequest) -> None:
+    """Execute the orchestration job and update the job store."""
+    async with _jobs_lock:
+        _jobs[job_id]["status"] = "running"
+
+    try:
+        # Load Inara's system prompt (same as the chat router does)
+        tier = req.tier or settings.default_tier
+        system_prompt = load_context(
+            tier,
+            include_long=req.include_long,
+            include_mid=req.include_mid,
+            include_short=req.include_short,
+        )
+
+        # Load session history if a session_id was provided
+        session_messages: list[dict] | None = None
+        if req.session_id:
+            from session_store import load as load_session
+            session_messages = load_session(req.session_id) or None
+
+        result = await orchestrator_engine.run(
+            task=req.task,
+            system_prompt=system_prompt,
+            session_messages=session_messages,
+            respond_with_claude=req.respond_with_claude,
+        )
+
+        now = datetime.now(timezone.utc).isoformat()
+        async with _jobs_lock:
+            _jobs[job_id].update({
+                "status": "complete",
+                "completed_at": now,
+                "response": result.response,
+                "tool_calls": result.tool_calls,
+                "backend": result.backend,
+                "gemini_summary": result.gemini_summary,
+            })
+        logger.info("Orchestrator job complete: %s (%d tool calls)", job_id, len(result.tool_calls))
+
+    except Exception as e:
+        logger.exception("Orchestrator job failed: %s", job_id)
+        now = datetime.now(timezone.utc).isoformat()
+        async with _jobs_lock:
+            _jobs[job_id].update({
+                "status": "error",
+                "completed_at": now,
+                "error": str(e),
+            })
--- a/cortex/tools/init.py
+++ b/cortex/tools/init.py
@@ -0,0 +1,193 @@
+"""
+Orchestrator tool registry.
+
+Each tool has two parts:
+  1. A Gemini FunctionDeclaration — tells the model what the tool does and what args it takes
+  2. A Python async callable — the actual implementation
+
+To add a new tool:
+  1. Implement it in a tools/<domain>.py module
+  2. Import it here and add (declaration, callable) to _REGISTRY
+  3. Add a FunctionDeclaration below and include it in TOOL_DECLARATIONS
+
+IMPORTANT: These tools are separate from the ae_* MCP tools used by the fleet agents.
+           Do not modify the ae_* MCP server to support orchestrator needs.
+"""
+
+from google.genai import types
+from tools.web import search as _web_search
+from tools.ae_knowledge import journal_search as _ae_journal_search
+from tools.ae_knowledge import journal_entry_create as _ae_journal_entry_create
+from tools.ae_tasks import task_list as _ae_task_list
+from tools.files import file_read as _file_read
+
+
+# ---------------------------------------------------------------------------
+# Gemini function declarations
+# ---------------------------------------------------------------------------
+
+_web_search_declaration = types.FunctionDeclaration(
+    name="web_search",
+    description=(
+        "Search the web for current information. Use this when you need up-to-date "
+        "facts, news, documentation, or anything not in your training data."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "query": types.Schema(
+                type=types.Type.STRING,
+                description="The search query string",
+            ),
+            "max_results": types.Schema(
+                type=types.Type.INTEGER,
+                description="Number of results to return (default 5, max 10)",
+            ),
+        },
+        required=["query"],
+    ),
+)
+
+_ae_journal_search_declaration = types.FunctionDeclaration(
+    name="ae_journal_search",
+    description=(
+        "Search the Aether Journals knowledge base by keyword. "
+        "Use this to look up notes, documentation, meeting summaries, or any saved knowledge. "
+        "Always search before creating a new entry to avoid duplicates."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "query": types.Schema(
+                type=types.Type.STRING,
+                description="Keyword or phrase to search for",
+            ),
+            "journal_id": types.Schema(
+                type=types.Type.STRING,
+                description=(
+                    "Optional: scope search to a specific journal by its id_random. "
+                    "Omit to search all journals."
+                ),
+            ),
+            "max_results": types.Schema(
+                type=types.Type.INTEGER,
+                description="Maximum number of entries to return (default 10)",
+            ),
+        },
+        required=["query"],
+    ),
+)
+
+_ae_journal_entry_create_declaration = types.FunctionDeclaration(
+    name="ae_journal_entry_create",
+    description=(
+        "Create a new entry in an Aether Journal. "
+        "Use this to save notes, summaries, or any content the user wants to store. "
+        "Always call ae_journal_search first to check for existing entries on the same topic."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "journal_id": types.Schema(
+                type=types.Type.STRING,
+                description=(
+                    "The id_random of the target journal. "
+                    "Ask the user which journal to write to if not specified."
+                ),
+            ),
+            "title": types.Schema(
+                type=types.Type.STRING,
+                description="Entry title",
+            ),
+            "content": types.Schema(
+                type=types.Type.STRING,
+                description="Full entry content (markdown supported)",
+            ),
+            "summary": types.Schema(
+                type=types.Type.STRING,
+                description="Optional short summary (1-2 sentences)",
+            ),
+            "tags": types.Schema(
+                type=types.Type.STRING,
+                description="Optional comma-separated tags (e.g. 'wireguard, networking, homelab')",
+            ),
+        },
+        required=["journal_id", "title", "content"],
+    ),
+)
+
+_ae_task_list_declaration = types.FunctionDeclaration(
+    name="ae_task_list",
+    description=(
+        "List tasks from the agents_sync Kanban board (todo and in-progress). "
+        "Use this when asked about current work, pending tasks, or project status."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "include_done": types.Schema(
+                type=types.Type.BOOLEAN,
+                description="If true, also include completed tasks (default false)",
+            ),
+        },
+    ),
+)
+
+_file_read_declaration = types.FunctionDeclaration(
+    name="file_read",
+    description=(
+        "Read a local file and return its contents. "
+        "Allowed directories: ~/agents_sync/, ~/OSIT_dev/, ~/DgrZone_Nextcloud/, ~/OSIT_Nextcloud/. "
+        "Use this to read documentation, notes, CLAUDE.md files, or config references. "
+        "If given a directory path, returns a directory listing instead."
+    ),
+    parameters=types.Schema(
+        type=types.Type.OBJECT,
+        properties={
+            "path": types.Schema(
+                type=types.Type.STRING,
+                description=(
+                    "Absolute or home-relative path to the file "
+                    "(e.g. ~/agents_sync/CLAUDE.md or /home/scott/agents_sync/tasks/01_todo/)"
+                ),
+            ),
+            "max_lines": types.Schema(
+                type=types.Type.INTEGER,
+                description="Optional line limit (default 500)",
+            ),
+        },
+        required=["path"],
+    ),
+)
+
+
+# ---------------------------------------------------------------------------
+# Registry: maps tool name → async callable
+# ---------------------------------------------------------------------------
+
+_CALLABLES: dict[str, callable] = {
+    "web_search": _web_search,
+    "ae_journal_search": _ae_journal_search,
+    "ae_journal_entry_create": _ae_journal_entry_create,
+    "ae_task_list": _ae_task_list,
+    "file_read": _file_read,
+}
+
+# Gemini Tool object — pass this to GenerateContentConfig
+TOOL_DECLARATIONS = [
+    types.Tool(function_declarations=[
+        _web_search_declaration,
+        _ae_journal_search_declaration,
+        _ae_journal_entry_create_declaration,
+        _ae_task_list_declaration,
+        _file_read_declaration,
+    ])
+]
+
+
+async def call_tool(name: str, args: dict) -> str:
+    """Dispatch a tool call by name. Returns result as a string."""
+    fn = _CALLABLES.get(name)
+    if fn is None:
+        return f"Unknown tool: {name}"
+    return await fn(**args)
--- a/cortex/tools/ae_knowledge.py
+++ b/cortex/tools/ae_knowledge.py
@@ -0,0 +1,177 @@
+"""
+Aether Platform knowledge tools — journal search and entry creation.
+
+These tools give the orchestrator read/write access to the AE Journals module,
+which serves as the primary long-term knowledge base.
+
+Auth: x-aether-api-key + x-account-id headers (same pattern as agents_sync scripts).
+API:  V3 CRUD — POST /v3/crud/journal_entry/search, POST /v3/crud/journal/{id}/journal_entry/
+"""
+
+import asyncio
+import logging
+from config import settings
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Shared helpers
+# ---------------------------------------------------------------------------
+
+def _headers() -> dict:
+    return {
+        "x-aether-api-key": settings.ae_api_key,
+        "x-account-id": settings.ae_account_id,
+        "Content-Type": "application/json",
+    }
+
+
+def _check_config() -> str | None:
+    """Return an error string if AE API is not configured, else None."""
+    if not settings.ae_api_key or not settings.ae_account_id:
+        return (
+            "AE API not configured. Set AE_API_KEY and AE_ACCOUNT_ID in .env. "
+            "Values are the same as agents_sync/mcp/.env."
+        )
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Tool: ae_journal_search
+# ---------------------------------------------------------------------------
+
+async def journal_search(query: str, journal_id: str | None = None, max_results: int = 10) -> str:
+    """Search AE Journal entries by keyword.
+
+    Searches across the default_qry_str field (title + content excerpt).
+    Optionally scoped to a specific journal by journal_id (id_random).
+    Returns a markdown-formatted list of matching entries.
+    """
+    err = _check_config()
+    if err:
+        return err
+
+    return await asyncio.to_thread(_sync_journal_search, query, journal_id, max_results)
+
+
+def _sync_journal_search(query: str, journal_id: str | None, max_results: int) -> str:
+    import requests
+
+    url = f"{settings.ae_api_url}/v3/crud/journal_entry/search"
+    search_body = {
+        "and_filters": [
+            {"field": "default_qry_str", "op": "icontains", "value": query}
+        ],
+        "page_size": max_results,
+    }
+
+    params = {}
+    if journal_id:
+        params["for_obj_type"] = "journal"
+        params["for_obj_id"] = journal_id
+
+    try:
+        resp = requests.post(
+            url,
+            headers=_headers(),
+            params=params,
+            json=search_body,
+            timeout=settings.ae_api_timeout,
+        )
+        resp.raise_for_status()
+        data = resp.json()
+    except Exception as e:
+        logger.warning("ae_journal_search failed: %s", e)
+        return f"Journal search error: {e}"
+
+    entries = data.get("data", [])
+    if not entries:
+        return f"No journal entries found matching: {query}"
+
+    lines = [f"Journal entries matching **{query}** ({len(entries)} result(s)):\n"]
+    for entry in entries:
+        title = entry.get("name") or "(untitled)"
+        entry_id = entry.get("id_random", "")
+        journal_name = entry.get("journal_name") or entry.get("parent_name") or ""
+        summary = entry.get("summary") or ""
+        content_preview = (entry.get("content") or "")[:200].replace("\n", " ")
+
+        header = f"**{title}**"
+        if journal_name:
+            header += f" ({journal_name})"
+        if entry_id:
+            header += f" — id: `{entry_id}`"
+
+        lines.append(header)
+        if summary:
+            lines.append(f"  Summary: {summary}")
+        if content_preview:
+            lines.append(f"  {content_preview}…")
+        lines.append("")
+
+    return "\n".join(lines).strip()
+
+
+# ---------------------------------------------------------------------------
+# Tool: ae_journal_entry_create
+# ---------------------------------------------------------------------------
+
+async def journal_entry_create(
+    journal_id: str,
+    title: str,
+    content: str,
+    summary: str = "",
+    tags: str = "",
+) -> str:
+    """Create a new entry in an AE Journal.
+
+    Args:
+        journal_id: The id_random of the target journal (use ae_journal_search to find it,
+                    or ask the user which journal to write to).
+        title:      Entry title (name field).
+        content:    Full entry content (markdown supported).
+        summary:    Optional short summary (1-2 sentences).
+        tags:       Optional comma-separated tags.
+
+    Returns a confirmation with the new entry's id_random, or an error message.
+    """
+    err = _check_config()
+    if err:
+        return err
+
+    return await asyncio.to_thread(
+        _sync_journal_entry_create, journal_id, title, content, summary, tags
+    )
+
+
+def _sync_journal_entry_create(
+    journal_id: str, title: str, content: str, summary: str, tags: str
+) -> str:
+    import requests
+
+    url = f"{settings.ae_api_url}/v3/crud/journal/{journal_id}/journal_entry/"
+    data: dict = {"name": title, "content": content}
+    if summary:
+        data["summary"] = summary
+    if tags:
+        data["tags"] = [t.strip() for t in tags.split(",") if t.strip()]
+
+    try:
+        resp = requests.post(
+            url,
+            headers=_headers(),
+            json=data,
+            timeout=settings.ae_api_timeout,
+        )
+        resp.raise_for_status()
+        result = resp.json()
+    except Exception as e:
+        logger.warning("ae_journal_entry_create failed: %s", e)
+        return f"Journal entry creation error: {e}"
+
+    entry_id = (
+        result.get("data", {}).get("id_random")
+        or result.get("id_random")
+        or "unknown"
+    )
+    return f"Journal entry created. id: `{entry_id}`, title: \"{title}\", journal: `{journal_id}`"
--- a/cortex/tools/ae_tasks.py
+++ b/cortex/tools/ae_tasks.py
@@ -0,0 +1,100 @@
+"""
+Aether task list tool — reads the agents_sync Kanban board.
+
+Reads task JSON files directly from the agents_sync filesystem rather than
+making an HTTP call, since the tasks directory is always locally available
+(synced via Syncthing). This avoids needing a separate API endpoint for tasks.
+
+Structure:
+  agents_sync/tasks/01_todo/       — pending tasks
+  agents_sync/tasks/02_in_progress/ — active tasks
+  agents_sync/tasks/03_done/       — completed tasks (not included by default)
+"""
+
+import asyncio
+import json
+import logging
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Resolved at import time — agents_sync is always at ~/agents_sync on this machine.
+# If the path doesn't exist the tool returns a helpful error rather than crashing.
+_AGENTS_SYNC = Path.home() / "agents_sync"
+_TASKS_ROOT = _AGENTS_SYNC / "tasks"
+
+
+async def task_list(include_done: bool = False) -> str:
+    """List tasks from the agents_sync Kanban board.
+
+    Reads the todo and in_progress buckets (and optionally done).
+    Returns a markdown summary grouped by status.
+
+    Args:
+        include_done: If True, also include completed tasks (can be noisy).
+    """
+    return await asyncio.to_thread(_sync_task_list, include_done)
+
+
+def _sync_task_list(include_done: bool) -> str:
+    if not _TASKS_ROOT.exists():
+        return f"Task directory not found: {_TASKS_ROOT}"
+
+    buckets = [
+        ("01_todo", "Todo"),
+        ("02_in_progress", "In Progress"),
+    ]
+    if include_done:
+        buckets.append(("03_done", "Done"))
+
+    sections: list[str] = []
+    total = 0
+
+    for dir_name, label in buckets:
+        bucket_dir = _TASKS_ROOT / dir_name
+        if not bucket_dir.exists():
+            continue
+
+        tasks = _read_bucket(bucket_dir)
+        total += len(tasks)
+        if not tasks:
+            continue
+
+        lines = [f"## {label} ({len(tasks)})\n"]
+        for task in tasks:
+            title = task.get("title") or task.get("name") or "(untitled)"
+            assigned = task.get("assigned_to") or ""
+            task_id = task.get("id") or ""
+            desc = task.get("description") or ""
+
+            header = f"- **{title}**"
+            if assigned:
+                header += f" (assigned: {assigned})"
+            if task_id:
+                header += f" — `{task_id}`"
+            lines.append(header)
+
+            if desc:
+                # First sentence / 120 chars of description
+                short = desc.split(".")[0][:120]
+                lines.append(f"  {short}")
+
+        sections.append("\n".join(lines))
+
+    if not sections:
+        return "No tasks found on the Kanban board."
+
+    header_line = f"# Kanban Board — {total} task(s)\n"
+    return header_line + "\n\n".join(sections)
+
+
+def _read_bucket(bucket_dir: Path) -> list[dict]:
+    """Read and parse all JSON task files in a bucket directory."""
+    tasks = []
+    for path in sorted(bucket_dir.glob("*.json")):
+        try:
+            data = json.loads(path.read_text())
+            tasks.append(data)
+        except Exception as e:
+            logger.warning("Failed to read task file %s: %s", path, e)
+    return tasks
--- a/cortex/tools/files.py
+++ b/cortex/tools/files.py
@@ -0,0 +1,112 @@
+"""
+File read tool — restricted to known-safe directory roots.
+
+Lets the orchestrator read local files (documentation, notes, config references)
+without exposing arbitrary filesystem access. All paths are resolved and checked
+against an allowlist of roots before any read is performed.
+"""
+
+import asyncio
+import logging
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+# Directories the orchestrator is allowed to read from.
+# Paths are resolved (symlinks followed, ~ expanded) at import time.
+_ALLOWED_ROOTS: list[Path] = [
+    Path.home() / "agents_sync",
+    Path.home() / "OSIT_dev",
+    Path.home() / "DgrZone_Nextcloud",
+    Path.home() / "OSIT_Nextcloud",
+]
+
+# Hard cap on file size to prevent accidental context blowout
+_MAX_BYTES = 50_000   # ~50 KB
+_MAX_LINES = 500
+
+
+async def file_read(path: str, max_lines: int | None = None) -> str:
+    """Read a local file and return its contents as a string.
+
+    Only files within allowed directories can be read:
+      ~/agents_sync/, ~/OSIT_dev/, ~/DgrZone_Nextcloud/, ~/OSIT_Nextcloud/
+
+    Args:
+        path:      Absolute or home-relative path to the file (e.g. ~/agents_sync/CLAUDE.md).
+        max_lines: Optional line limit (default 500, hard cap). Use for large files.
+
+    Returns the file contents (truncated if over the size limit), or an error message.
+    """
+    return await asyncio.to_thread(_sync_file_read, path, max_lines)
+
+
+def _sync_file_read(path: str, max_lines: int | None) -> str:
+    # Expand ~ and resolve to absolute path
+    try:
+        resolved = Path(path).expanduser().resolve()
+    except Exception as e:
+        return f"Invalid path: {e}"
+
+    # Security check — must be under an allowed root
+    if not _is_allowed(resolved):
+        allowed_str = ", ".join(str(r) for r in _ALLOWED_ROOTS)
+        return (
+            f"Access denied: {resolved}\n"
+            f"Allowed directories: {allowed_str}"
+        )
+
+    if not resolved.exists():
+        return f"File not found: {resolved}"
+
+    if not resolved.is_file():
+        # If it's a directory, list its contents instead
+        try:
+            entries = sorted(resolved.iterdir())
+            names = [e.name + ("/" if e.is_dir() else "") for e in entries[:100]]
+            return f"Directory listing for {resolved}:\n" + "\n".join(names)
+        except Exception as e:
+            return f"Cannot list directory: {e}"
+
+    # Read the file
+    try:
+        raw = resolved.read_bytes()
+    except Exception as e:
+        return f"Read error: {e}"
+
+    # Binary files
+    try:
+        text = raw.decode("utf-8")
+    except UnicodeDecodeError:
+        return f"Binary file (not readable as text): {resolved}  [{len(raw)} bytes]"
+
+    # Apply line limit
+    limit = min(max_lines or _MAX_LINES, _MAX_LINES)
+    lines = text.splitlines()
+    truncated = False
+
+    if len(lines) > limit:
+        lines = lines[:limit]
+        truncated = True
+
+    # Apply byte cap as a final safety net
+    result = "\n".join(lines)
+    if len(result) > _MAX_BYTES:
+        result = result[:_MAX_BYTES]
+        truncated = True
+
+    if truncated:
+        result += f"\n\n… [truncated — file has {len(text.splitlines())} lines total]"
+
+    return result
+
+
+def _is_allowed(resolved: Path) -> bool:
+    """Check that resolved path is under one of the allowed roots."""
+    for root in _ALLOWED_ROOTS:
+        try:
+            resolved.relative_to(root)
+            return True
+        except ValueError:
+            continue
+    return False
--- a/cortex/tools/web.py
+++ b/cortex/tools/web.py
@@ -0,0 +1,50 @@
+"""
+Web search tool — DuckDuckGo backend.
+
+Uses the duckduckgo-search library. Set DDG_API_KEY in .env for a paid account
+(higher rate limits). The free unauthenticated tier works for moderate usage.
+"""
+
+import asyncio
+import logging
+from config import settings
+
+logger = logging.getLogger(__name__)
+
+
+async def search(query: str, max_results: int | None = None) -> str:
+    """Search DuckDuckGo and return results as a formatted string.
+
+    Returns a markdown-formatted list of results: title, URL, and snippet.
+    The orchestrator includes this in the context it passes to Claude.
+    """
+    n = min(max_results or settings.ddg_max_results, 10)
+    results = await asyncio.to_thread(_sync_search, query, n)
+    if not results:
+        return f"No results found for: {query}"
+
+    lines = [f"Search results for: **{query}**\n"]
+    for i, r in enumerate(results, 1):
+        lines.append(f"{i}. [{r['title']}]({r['href']})")
+        if r.get("body"):
+            lines.append(f"   {r['body']}")
+        lines.append("")
+
+    return "\n".join(lines).strip()
+
+
+def _sync_search(query: str, max_results: int) -> list[dict]:
+    """Synchronous DuckDuckGo search — run via asyncio.to_thread."""
+    from duckduckgo_search import DDGS
+
+    kwargs = {}
+    if settings.ddg_api_key:
+        # Paid account — pass token for higher rate limits
+        kwargs["headers"] = {"Authorization": f"Bearer {settings.ddg_api_key}"}
+
+    try:
+        with DDGS(**kwargs) as ddgs:
+            return list(ddgs.text(query, max_results=max_results))
+    except Exception as e:
+        logger.warning("DuckDuckGo search error: %s", e)
+        return []
--- a/documentation/ARCH__Intelligence_Layer.md
+++ b/documentation/ARCH__Intelligence_Layer.md
@@ -0,0 +1,306 @@
+# Architecture: Intelligence Layer
+
+**Status:** Design phase — not yet implemented
+**Last updated:** 2026-03-18
+
+This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.
+
+---
+
+## Overview
+
+Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:
+
+1. **Orchestrator/Responder** — Gemini handles tool use and planning; Claude handles the user-facing response
+2. **Dev Agent Pipeline** — Specialist agents implement code changes; a supervisor checks the work
+3. **Knowledge Layer** — AE Journals becomes the primary knowledge base; agents can read and write it
+
+These are independent tracks that share the same trigger layer and can be built incrementally.
+
+---
+
+## 1. Orchestrator / Responder Pattern
+
+### The Problem
+
+Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.
+
+### The Pattern
+
+```
+User message
+    ↓
+Orchestrator (Gemini API)
+    • interprets intent
+    • decides which tools to call
+    • executes tool loop (ReAct: reason → act → observe → repeat)
+    • assembles enriched context + tool results
+    ↓
+Responder (Claude CLI)
+    • receives enriched context
+    • writes the user-facing response
+    ↓
+User
+```
+
+For **direct chat** (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).
+
+### Why Gemini API (not CLI)?
+
+- Gemini CLI is a subprocess; function calling via subprocess is fragile
+- Gemini API (`google-generativeai` SDK) has native structured tool-calling
+- Free tier (Gemini 2.0 Flash) handles orchestration load without cost
+- Access token is short-lived but auto-refreshed by the SDK (no expiry problem)
+
+### Tool Strategy
+
+Tools for the orchestrator are **separate** from the existing `ae_*` MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.
+
+New orchestrator tools are Python functions wrapped in Gemini function declarations:
+
+| Tool | What it does | Implementation |
+|---|---|---|
+| `web_search` | DuckDuckGo search | `duckduckgo-search` library |
+| `ae_journal_search` | Search AE Journals via V3 API | HTTP to AE API |
+| `ae_journal_entry_create` | Write a new journal entry | HTTP to AE API |
+| `ae_task_list` | Read Kanban tasks | HTTP to AE API or agents_sync file |
+| `file_read` | Read a file from known safe paths | Python `pathlib` |
+| `gitea_api` | Query Gitea repos, issues, PRs | Gitea REST API |
+
+Tools are registered in `cortex/tools/` (one file per domain group).
+
+### Implementation Path
+
+```
+cortex/
+  tools/
+    __init__.py          — tool registry
+    web.py               — web_search
+    ae_knowledge.py      — ae_journal_* tools
+    ae_tasks.py          — task tools
+    gitea.py             — Gitea API tools
+  routers/
+    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
+  orchestrator_engine.py — Gemini tool loop + Claude handoff
+```
+
+Endpoint contract:
+
+```
+POST /orchestrate
+{
+  "task": "What tasks are due this week and summarize my notes on X topic",
+  "session_id": "optional — if part of an ongoing conversation",
+  "respond_with_claude": true   // false = return Gemini's assembled context only
+}
+
+→ { "job_id": "uuid", "status": "queued" }
+
+GET /orchestrate/{job_id}
+→ { "status": "complete", "result": "...", "tool_calls": [...] }
+```
+
+---
+
+## 2. Trigger Layer
+
+All three capabilities (chat, orchestration, dev agents) share the same trigger layer:
+
+```
+┌────────────────────────────────────────────────┐
+│  TRIGGERS                                      │
+│                                                │
+│  Chat UI  →  POST /chat  (existing)            │
+│  Cron     →  POST /orchestrate  (new)          │
+│  Gitea    →  POST /webhook/gitea  (new)        │
+│  NC Talk  →  POST /webhook/nextcloud  (exists) │
+│  Manual   →  CLI / curl for debugging          │
+└────────────────────────────────────────────────┘
+```
+
+Cron trigger example (from existing cron infrastructure):
+
+```bash
+curl -X POST http://localhost:8000/orchestrate \
+  -H "Content-Type: application/json" \
+  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'
+```
+
+This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.
+
+---
+
+## 3. Dev Agent Pipeline
+
+### The Goal
+
+Accept a plain-English task like *"Fix the bug where X, add a test for it"* and produce:
+- A working code change
+- Passing syntax/type checks
+- A summary of what changed and what still needs human review
+- A commit ready to push (pending approval)
+
+### Architecture
+
+```
+Task request (chat / Gitea issue / Kanban)
+    ↓
+Orchestrator
+    • reads relevant files (context gathering)
+    • routes to correct specialist
+    ↓
+Specialist Agent (Claude CLI in project directory)
+    • implements the change
+    • runs self-check: py_compile / svelte-check
+    ↓
+Supervisor Agent
+    • reviews the diff
+    • runs test suite
+    • returns: PASS / NEEDS_REVIEW / FAIL + reason
+    ↓
+Human approval gate
+    • summary shown in Cortex UI or NC Talk
+    • user approves → commit + optional push
+    • user rejects → feedback goes back to specialist
+```
+
+### Specialist Agents
+
+Two initial specialists, both using Claude CLI:
+
+**Frontend specialist** (working dir: `~/OSIT_dev/aether_app_sveltekit/`):
+- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
+- Runs `npx svelte-check` after every change — no exceptions
+- Atomic commits (one component or fix per commit)
+
+**Backend specialist** (working dir: `~/OSIT_dev/aether_api_fastapi/`):
+- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
+- Runs `python3 -m py_compile` after every file edit
+- Runs unit tests before declaring done
+- Flags E2E tests that need human review
+
+### Supervisor Agent
+
+The supervisor is a separate Claude invocation that receives:
+- The diff of all changed files
+- Stdout/stderr from all checks that were run
+- The original task description
+
+It returns a structured assessment:
+
+```json
+{
+  "verdict": "PASS | NEEDS_REVIEW | FAIL",
+  "checks_passed": ["py_compile", "unit_tests"],
+  "checks_failed": [],
+  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
+  "commit_message": "fix: correct session token validation in auth middleware"
+}
+```
+
+### Gitea Integration
+
+- **Gitea webhooks → Cortex:** Push/PR events trigger supervisor review automatically
+- **Gitea Actions:** Run `py_compile`/`svelte-check` on every push (simple CI, no custom runner)
+- **Cortex → Gitea:** After human approval, supervisor calls Gitea API to create PR or push
+
+Gitea Actions are simpler than they sound — a `.gitea/workflows/check.yml` is just a YAML file that runs shell commands on push. No external CI infrastructure needed.
+
+---
+
+## 4. Knowledge Layer
+
+### The Goal
+
+AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.
+
+### Import Strategy
+
+1. **Don't bulk-import blindly.** The orchestrator searches AE Journals before creating anything (deduplication).
+2. **Chunk by section.** A large markdown file becomes multiple journal entries — one per H2 section.
+3. **Preserve provenance.** Each imported entry includes source path, import date, and original file date in its `data_json` or notes.
+4. **Tag intelligently.** Tags come from: frontmatter, filename keywords, directory path, and content analysis.
+
+### Source Priority
+
+| Source | Priority | Notes |
+|---|---|---|
+| `~/DgrZone_Nextcloud/` | High | Personal notes, projects |
+| `~/OSIT_Nextcloud/` | High | Business docs |
+| `~/agents_sync/aether/docs/` | Medium | Platform specs (already structured) |
+| OpenClaw session logs | Low | Historical, lots of noise |
+
+### Agent Workflow
+
+```
+"Summarize my notes on WireGuard setup"
+    ↓
+Orchestrator calls ae_journal_search("wireguard")
+    ↓
+Returns matching entries
+    ↓
+Claude synthesizes a response
+```
+
+```
+"Save this as a note in my DgrZone journal"
+    ↓
+Orchestrator calls ae_journal_entry_create(
+    journal="DgrZone General",
+    title="...",
+    content="...",
+    tags=["note", "wireguard"]
+)
+```
+
+### Context Tiers (Inara Memory)
+
+The existing distill system (`MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`) handles working memory. The Knowledge Layer is complementary — it's the **searchable long-term archive**, not the rolling context window. Agents should:
+
+- Use memory files for "what have we been working on lately"
+- Use AE Journals search for "what do I know about topic X"
+
+---
+
+## 5. Model Routing (Future)
+
+Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:
+
+| Task type | Model | Reason |
+|---|---|---|
+| User-facing conversation | Claude | Quality prose, reasoning |
+| Tool use / orchestration | Gemini API | Native function calling, free |
+| Private / sensitive | Ollama (local) | No data leaves the network |
+| Long context (>100k tokens) | Gemini 2.0 | 1M token context window |
+| Code generation | Claude | Strong code quality |
+
+Routing logic lives in `cortex/orchestrator_engine.py` — a simple function that maps task metadata to a backend choice.
+
+---
+
+## Implementation Order (Recommended)
+
+1. **Orchestrator Phase 1** — Gemini API integration, basic tool loop, `/orchestrate` endpoint
+   - Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
+2. **Knowledge import** — markdown → AE Journal Entries tool + import script
+   - Unlocks: searchable knowledge base for all agents
+3. **Dev agent pipeline** — Frontend + Backend specialist agents
+   - Unlocks: AI-assisted development with supervisor review
+4. **Gitea integration** — webhook receiver + Actions CI
+   - Unlocks: event-driven automation, PR workflow
+5. **Intelligent routing** — model selection by task type
+   - Polish: cost and quality optimization
+
+---
+
+## Key Design Decisions
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| Orchestrator model | Gemini API (not CLI) | Native tool calling; free tier |
+| Responder model | Claude CLI (Pro sub) | Quality output; no API cost |
+| Direct chat bypass | Yes | Don't add latency when tools aren't needed |
+| Tool set | Separate from ae_* MCPs | ae_* tools are stable; don't risk breaking active agents |
+| Dev agents | Claude CLI in project dir | CLAUDE.md + project context already in place |
+| Human approval gate | Required before commit | Agents can propose; humans decide |
+| Knowledge primary source | AE Journals | Already exists, structured, searchable |
--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -0,0 +1,144 @@
+# Cortex / Inara — Agent Task List
+
+> Read this file before starting any work on this project.
+> **Status:** Active development — ongoing.
+
+---
+
+## 🔴 High Priority
+
+### [Auth] Token expiry — sudo restart
+- Cortex currently requires `sudo systemctl restart cortex` after OAuth token refresh
+- This must be done manually by the user (cannot run interactively from Claude Code)
+- **Future:** Explore hot-reload or token-passing mechanism so restart isn't required
+
+### [Backend] Ollama local model backend
+- Add Ollama as a third LLM backend option (direct Ollama API, no CLI wrapper)
+- Endpoint: `http://scott-gaming:<port>/api/` (WireGuard)
+- Model selection: configurable per-request or per-session
+- Auth status check: ping `/api/tags` to confirm reachability
+
+### [Testing] Gitea SSH port 2222
+- pfSense port forward configured but not yet verified end-to-end
+- Test: `ssh -p 2222 git@<external>` from outside WireGuard
+- Document result in this file
+
+---
+
+## 🟡 Medium Priority
+
+### [Intelligence] Orchestrator service — Phase 1
+See `ARCH__Intelligence_Layer.md` for full design. Initial scope:
+- [ ] Add Gemini API (google-generativeai SDK) as a library dependency (not CLI)
+- [ ] Create `cortex/routers/orchestrator.py` — `POST /orchestrate` endpoint
+- [ ] Basic tool registry: web search (DuckDuckGo), AE API query, file read
+- [ ] ReAct loop: Gemini calls tools, assembles context, hands off to Claude for final response
+- [ ] `GET /orchestrate/{job_id}` — poll for status/result
+- [ ] Cron can trigger via HTTP POST (same endpoint)
+
+### [Intelligence] Knowledge consolidation — Phase 1
+See `ARCH__Intelligence_Layer.md` for full design. Initial scope:
+- [ ] Tool: `ae_journal_search` — search before creating to avoid duplicates
+- [ ] Tool: `ae_journal_entry_create` — write a new entry with source metadata
+- [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
+- [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
+- [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
+
+### [Channel] Nextcloud Talk integration — stabilize
+- NC Talk bot is implemented (`cortex/routers/nextcloud_talk.py`)
+- HMAC signing: sign `random + message_text` (NOT raw body) — already fixed
+- [ ] Test end-to-end after any Cortex restart
+- [ ] Document the bot registration process in `docs/NEXTCLOUD_TALK_BOT.md` (complete it)
+
+### [Multi-user] Holly agent instance
+- Plan: run two separate Cortex instances, not multi-user in one service
+- Reverse proxy: `inara.dgrzone.com` → port A, `holly.dgrzone.com` → port B
+- [ ] Create `holly/` identity directory (parallel to `inara/`)
+- [ ] Second `docker-compose` service or separate systemd unit
+
+---
+
+## 🟢 Lower Priority / Future
+
+### [Intelligence] Dev agent pipeline
+See `ARCH__Intelligence_Layer.md`. Full design not yet started.
+- [ ] Specialist agent: frontend (SvelteKit) code changes
+- [ ] Specialist agent: backend (FastAPI) code changes
+- [ ] Supervisor agent: diff review, syntax check, test runner
+- [ ] Gitea webhook integration: trigger on push/PR, report back
+- [ ] Human approval gate before commit
+
+### [Intelligence] Supervisor agent
+- Runs `py_compile`, `svelte-check`, unit tests after specialist agent work
+- Reports pass/fail back to orchestrator
+- Only commits on explicit approval
+
+### [Channel] Gitea webhooks
+- Receive push/PR/issue events → route to appropriate agent
+- `cortex/routers/` already has pattern; add `gitea.py`
+- Gitea Actions (CI) for "run tests on push" — simpler than custom runner
+
+### [Channel] Google Chat integration
+- `cortex/routers/google_chat.py` already exists (stub?)
+- [ ] Review current state, complete or document gaps
+
+### [Distill] Monitor first auto_distill_long run
+- Scheduled for ~April 1 at 04:00
+- Manually review `inara/MEMORY_LONG.md` output before fully trusting
+- Adjust distill prompts if needed
+
+### [Distill] Distill quality review
+- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
+- After first few automatic runs, review quality and tune
+
+### [Backend] Intelligent model routing
+- Currently hardcoded: Claude default, Gemini fallback
+- Future: route by task type (code → Claude, search → Gemini, private → Ollama)
+- Future: route by context length (Gemini 2.0 has 1M token context)
+
+---
+
+## ✅ Completed
+
+### [UI] Mobile-friendly header
+- Backend toggle, font size, theme buttons moved into ⚙ settings panel
+- Header reduced to 4 buttons: Sessions, Files, ⚙, ?
+- Committed: `mobile_header` (2026-03)
+
+### [UI] Mobile text input
+- `flex-direction: column` on `#input-area` at ≤520px
+- `font-size: 16px` on `#input` (prevents iOS Safari auto-zoom)
+- `body { height: 100dvh }` (handles soft keyboard)
+- Committed: `23f8659` (2026-03)
+
+### [UI] Auth warning banner
+- Claude CLI token expiry check (`~/.claude/.credentials.json`)
+- Gemini CLI auth check (warns only if no `refresh_token`)
+- Dismissible amber/red banner with re-auth instructions
+- Committed: `fe6561b` (2026-03)
+
+### [UI] Distill schedule in ⚙ panel
+- Shows next_run times for short/mid/long distill jobs
+- Fetches from existing `/distill/status` endpoint
+
+### [UI] Help modal collapsible sections
+- H2 sections collapse/expand via `<details>` elements
+- Top 4 sections (Header Controls, Chat, Sessions, Notes) open by default
+
+### [Backend] Gemini CLI backend
+- `gemini -p` subprocess, streaming output
+- Auth check endpoint `/auth/status`
+
+### [Backend] Memory distiller
+- APScheduler jobs: `distill_short` (6h), `distill_mid` (24h), `distill_long` (weekly)
+- Writes to `inara/MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`
+
+### [Backend] Session logging + file browser
+- Sessions saved to `inara/sessions/`
+- Files panel in UI browses `inara/` directory
+
+### [Backend] Dispatcher core
+- FastAPI service with streaming response
+- `claude -p` and `gemini -p` subprocess backends
+- Session context management (rolling window)
+- Nextcloud Talk webhook handler