feat: Intelligence Layer Phase 1 — orchestrator service

Adds the Gemini API orchestrator (ReAct tool loop → Claude responder): Orchestrator engine + router: - orchestrator_engine.py: Gemini API tool loop, Claude CLI handoff - routers/orchestrator.py: POST /orchestrate (async job queue), GET /orchestrate/{job_id} Tools (cortex/tools/): - web.py: DuckDuckGo web search (no key required) - ae_knowledge.py: ae_journal_search + ae_journal_entry_create (AE V3 API) - ae_tasks.py: ae_task_list (reads agents_sync Kanban filesystem) - files.py: file_read (path-allowlisted to safe dirs) Config + deps: - config.py: orchestrator, DuckDuckGo, and AE API settings - requirements.txt: google-genai, duckduckgo-search - .env.default: reference config with all new keys documented Docs: - CLAUDE.md, README.md, documentation/ added to repo - Port references updated 7331 → 8000 throughout - Default model updated to gemini-2.5-flash Tested: ae_task_list, ae_journal_search, web_search all working end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 19:37:49 -04:00
parent 23f8659aaa
commit ed472ce9a0
15 changed files with 1840 additions and 1 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,171 @@
+# CLAUDE.md — Cortex / Inara Project
+
+This file is loaded automatically by Claude Code when working in this directory.
+Read it before touching any files.
+
+---
+
+## Identity & Context
+
+- **Project:** Cortex (dispatcher) + Inara (resident agent)
+- **Owner:** Scott Idem (One Sky IT / Danger Zone)
+- **Machine context:** See `~/CLAUDE.md` for fleet identity (`scott_lpt` = General Manager)
+- **Named after:** The 'verse-wide communications network (Firefly)
+
+---
+
+## Directory Map
+
+```
+Cortex_and_Inara_dev/
+  cortex/                ← FastAPI service (the dispatcher)
+    main.py              ← App entry point, router registration
+    config.py            ← All settings (pydantic-settings, reads .env)
+    llm_client.py        ← Claude CLI + Gemini CLI subprocess backends
+    orchestrator_engine.py ← Gemini API ReAct tool loop → Claude handoff
+    context_loader.py    ← Loads Inara's system prompt from inara/ files
+    session_store.py     ← In-memory + file session persistence
+    session_logger.py    ← Writes session turns to inara/sessions/
+    memory_distiller.py  ← Short/mid/long distill jobs (APScheduler)
+    scheduler.py         ← APScheduler setup
+    event_bus.py         ← Internal SSE pub/sub (NC Talk → browser)
+    routers/
+      chat.py            ← POST /chat (streaming SSE)
+      orchestrator.py    ← POST /orchestrate, GET /orchestrate/{job_id}
+      auth.py            ← GET /auth/status (Claude + Gemini CLI token checks)
+      distill.py         ← POST /distill/*, GET /distill/status
+      files.py           ← GET /files (inara/ file browser)
+      nextcloud_talk.py  ← POST /webhook/nextcloud (NC Talk bot)
+      google_chat.py     ← POST /webhook/google (Google Chat — stub)
+    tools/
+      __init__.py        ← Tool registry (Gemini FunctionDeclarations + dispatcher)
+      web.py             ← DuckDuckGo web_search tool
+    static/              ← Single-page web UI (index.html, style.css, app.js)
+    data/sessions/       ← Persisted session JSON files
+
+  inara/                 ← Inara identity, memory, context files
+    IDENTITY.md          ← Who Inara is
+    SOUL.md              ← Values, personality, voice
+    PROTOCOLS.md         ← Behavioral rules
+    CONTEXT_TIERS.md     ← What each tier (1–3) includes in the system prompt
+    USER.md              ← Scott's profile (loaded into context)
+    HELP.md              ← In-app help content (rendered in UI)
+    MEMORY.md            ← Persistent facts (written by distiller or manually)
+    MEMORY_SHORT.md      ← Rolling short-term memory (auto-distilled daily)
+    MEMORY_MID.md        ← Mid-term memory (auto-distilled weekly)
+    MEMORY_LONG.md       ← Long-term memory (auto-distilled monthly)
+    sessions/            ← Session turn logs (YYYY-MM-DD_<id>.md)
+
+  docs/                  ← Integration reference docs
+    NEXTCLOUD_TALK_BOT.md
+
+  documentation/         ← Architecture decisions and agent task list
+    TODO__Agents.md      ← READ THIS FIRST — active task list
+    ARCH__Intelligence_Layer.md ← Orchestrator, dev agent, knowledge architecture
+
+  docker-compose.yml     ← Docker deployment
+  .env.default           ← Reference config (copy to .env, fill in secrets)
+  README.md              ← Project orientation
+```
+
+---
+
+## Run Commands
+
+```bash
+# Start (Docker)
+docker compose up -d
+
+# Restart service (after any Python change)
+sudo systemctl restart cortex
+
+# Syntax check a file before restarting
+python3 -m py_compile cortex/<file>.py
+
+# Syntax check all routers
+for f in cortex/routers/*.py cortex/tools/*.py cortex/orchestrator_engine.py; do
+    python3 -m py_compile "$f" && echo "OK: $f"
+done
+
+# Install/update dependencies
+cd cortex && .venv/bin/pip install -r requirements.txt
+
+# Logs
+journalctl -u cortex -f
+
+# Web UI (local)
+http://localhost:8000
+
+# Swagger docs
+http://localhost:8000/docs
+```
+
+---
+
+## Key Design Decisions
+
+### Two-Brain Architecture (Orchestrator / Responder)
+- **Gemini API** (`orchestrator_engine.py`) — runs the ReAct tool loop; handles tool calling, planning, research
+- **Claude CLI** (`llm_client.py`) — produces all user-facing responses; receives enriched context from Gemini
+- **Direct chat** bypasses the orchestrator entirely — `POST /chat` goes straight to Claude (faster)
+- **Orchestrated tasks** go to `POST /orchestrate` — returns a job_id, result is polled
+
+### LLM Backends
+- `llm_client.py` manages Claude CLI (`claude --print`) and Gemini CLI (`gemini -p`) subprocesses
+- `orchestrator_engine.py` uses the Gemini **API** (google-genai SDK) — completely separate from the Gemini CLI
+- Claude OAuth token is read live from `~/.claude/.credentials.json` (never rely on stale env var)
+
+### Tool Strategy
+- Orchestrator tools live in `cortex/tools/` — separate from the `ae_*` MCP tools
+- **Do not modify** the `ae_*` MCP server to support orchestrator needs; add new tools to `cortex/tools/` instead
+- Tools are registered in `cortex/tools/__init__.py` as both Gemini FunctionDeclarations and Python callables
+
+### Context / Memory
+- `context_loader.py` assembles Inara's system prompt from `inara/` files based on tier (1–3)
+- Tier 1 = minimal (identity only); Tier 2 = standard (+ memory + user profile); Tier 3 = full
+- Memory files are written by the distiller or manually — do not delete them
+
+### Security / Safety
+- **Never `rm`** — move files to `~/tmp/gemini_trash`
+- **Never commit secrets** — `.env` is gitignored; use `.env.default` as the reference
+- `NEXTCLOUD_TALK_BOT_SECRET` and `GEMINI_API_KEY` live in `.env` only
+- Cortex should only be accessible via WireGuard — never internet-exposed without VPN
+
+---
+
+## Adding a New Tool
+
+1. Implement the tool function in `cortex/tools/<domain>.py`
+   - Must be `async def`; use `asyncio.to_thread` for blocking calls
+   - Return a plain string result
+2. Add a `FunctionDeclaration` and register it in `cortex/tools/__init__.py`
+3. Syntax check: `python3 -m py_compile cortex/tools/<domain>.py`
+4. Restart Cortex
+
+## Adding a New Router
+
+1. Create `cortex/routers/<name>.py` with `router = APIRouter()`
+2. Import and register in `cortex/main.py`
+3. Syntax check, restart
+
+---
+
+## Active Tasks
+
+See `documentation/TODO__Agents.md` for the current task list.
+High priority items as of 2026-03-18:
+- Ollama backend (third LLM option — local, no API cost)
+- NC Talk integration stabilization
+- Knowledge consolidation (markdown → AE Journals)
+
+---
+
+## Related Docs
+
+| File | Purpose |
+|---|---|
+| `documentation/TODO__Agents.md` | Active task list — read before starting work |
+| `documentation/ARCH__Intelligence_Layer.md` | Full architecture design |
+| `~/agents_sync/projects/CORTEX.md` | High-level project vision and phases |
+| `~/agents_sync/CLAUDE.md` | Fleet coordination rules |
+| `~/CLAUDE.md` | Machine identity (`scott_lpt`) |