feat: Intelligence Layer Phase 1 — orchestrator service

Adds the Gemini API orchestrator (ReAct tool loop → Claude responder):

Orchestrator engine + router:
- orchestrator_engine.py: Gemini API tool loop, Claude CLI handoff
- routers/orchestrator.py: POST /orchestrate (async job queue), GET /orchestrate/{job_id}

Tools (cortex/tools/):
- web.py: DuckDuckGo web search (no key required)
- ae_knowledge.py: ae_journal_search + ae_journal_entry_create (AE V3 API)
- ae_tasks.py: ae_task_list (reads agents_sync Kanban filesystem)
- files.py: file_read (path-allowlisted to safe dirs)

Config + deps:
- config.py: orchestrator, DuckDuckGo, and AE API settings
- requirements.txt: google-genai, duckduckgo-search
- .env.default: reference config with all new keys documented

Docs:
- CLAUDE.md, README.md, documentation/ added to repo
- Port references updated 7331 → 8000 throughout
- Default model updated to gemini-2.5-flash

Tested: ae_task_list, ae_journal_search, web_search all working end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-03-18 19:37:49 -04:00
parent 23f8659aaa
commit ed472ce9a0
15 changed files with 1840 additions and 1 deletions

View File

@@ -0,0 +1,306 @@
# Architecture: Intelligence Layer
**Status:** Design phase — not yet implemented
**Last updated:** 2026-03-18
This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.
---
## Overview
Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:
1. **Orchestrator/Responder** — Gemini handles tool use and planning; Claude handles the user-facing response
2. **Dev Agent Pipeline** — Specialist agents implement code changes; a supervisor checks the work
3. **Knowledge Layer** — AE Journals becomes the primary knowledge base; agents can read and write it
These are independent tracks that share the same trigger layer and can be built incrementally.
---
## 1. Orchestrator / Responder Pattern
### The Problem
Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.
### The Pattern
```
User message
Orchestrator (Gemini API)
• interprets intent
• decides which tools to call
• executes tool loop (ReAct: reason → act → observe → repeat)
• assembles enriched context + tool results
Responder (Claude CLI)
• receives enriched context
• writes the user-facing response
User
```
For **direct chat** (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).
### Why Gemini API (not CLI)?
- Gemini CLI is a subprocess; function calling via subprocess is fragile
- Gemini API (`google-generativeai` SDK) has native structured tool-calling
- Free tier (Gemini 2.0 Flash) handles orchestration load without cost
- Access token is short-lived but auto-refreshed by the SDK (no expiry problem)
### Tool Strategy
Tools for the orchestrator are **separate** from the existing `ae_*` MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.
New orchestrator tools are Python functions wrapped in Gemini function declarations:
| Tool | What it does | Implementation |
|---|---|---|
| `web_search` | DuckDuckGo search | `duckduckgo-search` library |
| `ae_journal_search` | Search AE Journals via V3 API | HTTP to AE API |
| `ae_journal_entry_create` | Write a new journal entry | HTTP to AE API |
| `ae_task_list` | Read Kanban tasks | HTTP to AE API or agents_sync file |
| `file_read` | Read a file from known safe paths | Python `pathlib` |
| `gitea_api` | Query Gitea repos, issues, PRs | Gitea REST API |
Tools are registered in `cortex/tools/` (one file per domain group).
### Implementation Path
```
cortex/
tools/
__init__.py — tool registry
web.py — web_search
ae_knowledge.py — ae_journal_* tools
ae_tasks.py — task tools
gitea.py — Gitea API tools
routers/
orchestrator.py — POST /orchestrate, GET /orchestrate/{job_id}
orchestrator_engine.py — Gemini tool loop + Claude handoff
```
Endpoint contract:
```
POST /orchestrate
{
"task": "What tasks are due this week and summarize my notes on X topic",
"session_id": "optional — if part of an ongoing conversation",
"respond_with_claude": true // false = return Gemini's assembled context only
}
→ { "job_id": "uuid", "status": "queued" }
GET /orchestrate/{job_id}
→ { "status": "complete", "result": "...", "tool_calls": [...] }
```
---
## 2. Trigger Layer
All three capabilities (chat, orchestration, dev agents) share the same trigger layer:
```
┌────────────────────────────────────────────────┐
│ TRIGGERS │
│ │
│ Chat UI → POST /chat (existing) │
│ Cron → POST /orchestrate (new) │
│ Gitea → POST /webhook/gitea (new) │
│ NC Talk → POST /webhook/nextcloud (exists) │
│ Manual → CLI / curl for debugging │
└────────────────────────────────────────────────┘
```
Cron trigger example (from existing cron infrastructure):
```bash
curl -X POST http://localhost:8000/orchestrate \
-H "Content-Type: application/json" \
-d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'
```
This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.
---
## 3. Dev Agent Pipeline
### The Goal
Accept a plain-English task like *"Fix the bug where X, add a test for it"* and produce:
- A working code change
- Passing syntax/type checks
- A summary of what changed and what still needs human review
- A commit ready to push (pending approval)
### Architecture
```
Task request (chat / Gitea issue / Kanban)
Orchestrator
• reads relevant files (context gathering)
• routes to correct specialist
Specialist Agent (Claude CLI in project directory)
• implements the change
• runs self-check: py_compile / svelte-check
Supervisor Agent
• reviews the diff
• runs test suite
• returns: PASS / NEEDS_REVIEW / FAIL + reason
Human approval gate
• summary shown in Cortex UI or NC Talk
• user approves → commit + optional push
• user rejects → feedback goes back to specialist
```
### Specialist Agents
Two initial specialists, both using Claude CLI:
**Frontend specialist** (working dir: `~/OSIT_dev/aether_app_sveltekit/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `npx svelte-check` after every change — no exceptions
- Atomic commits (one component or fix per commit)
**Backend specialist** (working dir: `~/OSIT_dev/aether_api_fastapi/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `python3 -m py_compile` after every file edit
- Runs unit tests before declaring done
- Flags E2E tests that need human review
### Supervisor Agent
The supervisor is a separate Claude invocation that receives:
- The diff of all changed files
- Stdout/stderr from all checks that were run
- The original task description
It returns a structured assessment:
```json
{
"verdict": "PASS | NEEDS_REVIEW | FAIL",
"checks_passed": ["py_compile", "unit_tests"],
"checks_failed": [],
"review_notes": "E2E tests not run — touch auth router, recommend manual check",
"commit_message": "fix: correct session token validation in auth middleware"
}
```
### Gitea Integration
- **Gitea webhooks → Cortex:** Push/PR events trigger supervisor review automatically
- **Gitea Actions:** Run `py_compile`/`svelte-check` on every push (simple CI, no custom runner)
- **Cortex → Gitea:** After human approval, supervisor calls Gitea API to create PR or push
Gitea Actions are simpler than they sound — a `.gitea/workflows/check.yml` is just a YAML file that runs shell commands on push. No external CI infrastructure needed.
---
## 4. Knowledge Layer
### The Goal
AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.
### Import Strategy
1. **Don't bulk-import blindly.** The orchestrator searches AE Journals before creating anything (deduplication).
2. **Chunk by section.** A large markdown file becomes multiple journal entries — one per H2 section.
3. **Preserve provenance.** Each imported entry includes source path, import date, and original file date in its `data_json` or notes.
4. **Tag intelligently.** Tags come from: frontmatter, filename keywords, directory path, and content analysis.
### Source Priority
| Source | Priority | Notes |
|---|---|---|
| `~/DgrZone_Nextcloud/` | High | Personal notes, projects |
| `~/OSIT_Nextcloud/` | High | Business docs |
| `~/agents_sync/aether/docs/` | Medium | Platform specs (already structured) |
| OpenClaw session logs | Low | Historical, lots of noise |
### Agent Workflow
```
"Summarize my notes on WireGuard setup"
Orchestrator calls ae_journal_search("wireguard")
Returns matching entries
Claude synthesizes a response
```
```
"Save this as a note in my DgrZone journal"
Orchestrator calls ae_journal_entry_create(
journal="DgrZone General",
title="...",
content="...",
tags=["note", "wireguard"]
)
```
### Context Tiers (Inara Memory)
The existing distill system (`MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`) handles working memory. The Knowledge Layer is complementary — it's the **searchable long-term archive**, not the rolling context window. Agents should:
- Use memory files for "what have we been working on lately"
- Use AE Journals search for "what do I know about topic X"
---
## 5. Model Routing (Future)
Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:
| Task type | Model | Reason |
|---|---|---|
| User-facing conversation | Claude | Quality prose, reasoning |
| Tool use / orchestration | Gemini API | Native function calling, free |
| Private / sensitive | Ollama (local) | No data leaves the network |
| Long context (>100k tokens) | Gemini 2.0 | 1M token context window |
| Code generation | Claude | Strong code quality |
Routing logic lives in `cortex/orchestrator_engine.py` — a simple function that maps task metadata to a backend choice.
---
## Implementation Order (Recommended)
1. **Orchestrator Phase 1** — Gemini API integration, basic tool loop, `/orchestrate` endpoint
- Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
2. **Knowledge import** — markdown → AE Journal Entries tool + import script
- Unlocks: searchable knowledge base for all agents
3. **Dev agent pipeline** — Frontend + Backend specialist agents
- Unlocks: AI-assisted development with supervisor review
4. **Gitea integration** — webhook receiver + Actions CI
- Unlocks: event-driven automation, PR workflow
5. **Intelligent routing** — model selection by task type
- Polish: cost and quality optimization
---
## Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Orchestrator model | Gemini API (not CLI) | Native tool calling; free tier |
| Responder model | Claude CLI (Pro sub) | Quality output; no API cost |
| Direct chat bypass | Yes | Don't add latency when tools aren't needed |
| Tool set | Separate from ae_* MCPs | ae_* tools are stable; don't risk breaking active agents |
| Dev agents | Claude CLI in project dir | CLAUDE.md + project context already in place |
| Human approval gate | Required before commit | Agents can propose; humans decide |
| Knowledge primary source | AE Journals | Already exists, structured, searchable |

View File

@@ -0,0 +1,144 @@
# Cortex / Inara — Agent Task List
> Read this file before starting any work on this project.
> **Status:** Active development — ongoing.
---
## 🔴 High Priority
### [Auth] Token expiry — sudo restart
- Cortex currently requires `sudo systemctl restart cortex` after OAuth token refresh
- This must be done manually by the user (cannot run interactively from Claude Code)
- **Future:** Explore hot-reload or token-passing mechanism so restart isn't required
### [Backend] Ollama local model backend
- Add Ollama as a third LLM backend option (direct Ollama API, no CLI wrapper)
- Endpoint: `http://scott-gaming:<port>/api/` (WireGuard)
- Model selection: configurable per-request or per-session
- Auth status check: ping `/api/tags` to confirm reachability
### [Testing] Gitea SSH port 2222
- pfSense port forward configured but not yet verified end-to-end
- Test: `ssh -p 2222 git@<external>` from outside WireGuard
- Document result in this file
---
## 🟡 Medium Priority
### [Intelligence] Orchestrator service — Phase 1
See `ARCH__Intelligence_Layer.md` for full design. Initial scope:
- [ ] Add Gemini API (google-generativeai SDK) as a library dependency (not CLI)
- [ ] Create `cortex/routers/orchestrator.py``POST /orchestrate` endpoint
- [ ] Basic tool registry: web search (DuckDuckGo), AE API query, file read
- [ ] ReAct loop: Gemini calls tools, assembles context, hands off to Claude for final response
- [ ] `GET /orchestrate/{job_id}` — poll for status/result
- [ ] Cron can trigger via HTTP POST (same endpoint)
### [Intelligence] Knowledge consolidation — Phase 1
See `ARCH__Intelligence_Layer.md` for full design. Initial scope:
- [ ] Tool: `ae_journal_search` — search before creating to avoid duplicates
- [ ] Tool: `ae_journal_entry_create` — write a new entry with source metadata
- [ ] Import script: walk a markdown directory, chunk by H2 section, create entries
- [ ] Target: markdown files from `~/DgrZone_Nextcloud/` and `~/OSIT_Nextcloud/`
- [ ] Tag strategy: source path, date, topic tags from frontmatter or filename
### [Channel] Nextcloud Talk integration — stabilize
- NC Talk bot is implemented (`cortex/routers/nextcloud_talk.py`)
- HMAC signing: sign `random + message_text` (NOT raw body) — already fixed
- [ ] Test end-to-end after any Cortex restart
- [ ] Document the bot registration process in `docs/NEXTCLOUD_TALK_BOT.md` (complete it)
### [Multi-user] Holly agent instance
- Plan: run two separate Cortex instances, not multi-user in one service
- Reverse proxy: `inara.dgrzone.com` → port A, `holly.dgrzone.com` → port B
- [ ] Create `holly/` identity directory (parallel to `inara/`)
- [ ] Second `docker-compose` service or separate systemd unit
---
## 🟢 Lower Priority / Future
### [Intelligence] Dev agent pipeline
See `ARCH__Intelligence_Layer.md`. Full design not yet started.
- [ ] Specialist agent: frontend (SvelteKit) code changes
- [ ] Specialist agent: backend (FastAPI) code changes
- [ ] Supervisor agent: diff review, syntax check, test runner
- [ ] Gitea webhook integration: trigger on push/PR, report back
- [ ] Human approval gate before commit
### [Intelligence] Supervisor agent
- Runs `py_compile`, `svelte-check`, unit tests after specialist agent work
- Reports pass/fail back to orchestrator
- Only commits on explicit approval
### [Channel] Gitea webhooks
- Receive push/PR/issue events → route to appropriate agent
- `cortex/routers/` already has pattern; add `gitea.py`
- Gitea Actions (CI) for "run tests on push" — simpler than custom runner
### [Channel] Google Chat integration
- `cortex/routers/google_chat.py` already exists (stub?)
- [ ] Review current state, complete or document gaps
### [Distill] Monitor first auto_distill_long run
- Scheduled for ~April 1 at 04:00
- Manually review `inara/MEMORY_LONG.md` output before fully trusting
- Adjust distill prompts if needed
### [Distill] Distill quality review
- Short/mid/long distill prompts live in `cortex/memory_distiller.py`
- After first few automatic runs, review quality and tune
### [Backend] Intelligent model routing
- Currently hardcoded: Claude default, Gemini fallback
- Future: route by task type (code → Claude, search → Gemini, private → Ollama)
- Future: route by context length (Gemini 2.0 has 1M token context)
---
## ✅ Completed
### [UI] Mobile-friendly header
- Backend toggle, font size, theme buttons moved into ⚙ settings panel
- Header reduced to 4 buttons: Sessions, Files, ⚙, ?
- Committed: `mobile_header` (2026-03)
### [UI] Mobile text input
- `flex-direction: column` on `#input-area` at ≤520px
- `font-size: 16px` on `#input` (prevents iOS Safari auto-zoom)
- `body { height: 100dvh }` (handles soft keyboard)
- Committed: `23f8659` (2026-03)
### [UI] Auth warning banner
- Claude CLI token expiry check (`~/.claude/.credentials.json`)
- Gemini CLI auth check (warns only if no `refresh_token`)
- Dismissible amber/red banner with re-auth instructions
- Committed: `fe6561b` (2026-03)
### [UI] Distill schedule in ⚙ panel
- Shows next_run times for short/mid/long distill jobs
- Fetches from existing `/distill/status` endpoint
### [UI] Help modal collapsible sections
- H2 sections collapse/expand via `<details>` elements
- Top 4 sections (Header Controls, Chat, Sessions, Notes) open by default
### [Backend] Gemini CLI backend
- `gemini -p` subprocess, streaming output
- Auth check endpoint `/auth/status`
### [Backend] Memory distiller
- APScheduler jobs: `distill_short` (6h), `distill_mid` (24h), `distill_long` (weekly)
- Writes to `inara/MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`
### [Backend] Session logging + file browser
- Sessions saved to `inara/sessions/`
- Files panel in UI browses `inara/` directory
### [Backend] Dispatcher core
- FastAPI service with streaming response
- `claude -p` and `gemini -p` subprocess backends
- Session context management (rolling window)
- Nextcloud Talk webhook handler