Files

Scott Idem ed472ce9a0 feat: Intelligence Layer Phase 1 — orchestrator service

Adds the Gemini API orchestrator (ReAct tool loop → Claude responder):

Orchestrator engine + router:
- orchestrator_engine.py: Gemini API tool loop, Claude CLI handoff
- routers/orchestrator.py: POST /orchestrate (async job queue), GET /orchestrate/{job_id}

Tools (cortex/tools/):
- web.py: DuckDuckGo web search (no key required)
- ae_knowledge.py: ae_journal_search + ae_journal_entry_create (AE V3 API)
- ae_tasks.py: ae_task_list (reads agents_sync Kanban filesystem)
- files.py: file_read (path-allowlisted to safe dirs)

Config + deps:
- config.py: orchestrator, DuckDuckGo, and AE API settings
- requirements.txt: google-genai, duckduckgo-search
- .env.default: reference config with all new keys documented

Docs:
- CLAUDE.md, README.md, documentation/ added to repo
- Port references updated 7331 → 8000 throughout
- Default model updated to gemini-2.5-flash

Tested: ae_task_list, ae_journal_search, web_search all working end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-18 19:37:49 -04:00

11 KiB

Raw Blame History

Architecture: Intelligence Layer

Status: Design phase — not yet implemented Last updated: 2026-03-18

This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.

Overview

Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:

Orchestrator/Responder — Gemini handles tool use and planning; Claude handles the user-facing response
Dev Agent Pipeline — Specialist agents implement code changes; a supervisor checks the work
Knowledge Layer — AE Journals becomes the primary knowledge base; agents can read and write it

These are independent tracks that share the same trigger layer and can be built incrementally.

1. Orchestrator / Responder Pattern

The Problem

Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.

The Pattern

User message
    ↓
Orchestrator (Gemini API)
    • interprets intent
    • decides which tools to call
    • executes tool loop (ReAct: reason → act → observe → repeat)
    • assembles enriched context + tool results
    ↓
Responder (Claude CLI)
    • receives enriched context
    • writes the user-facing response
    ↓
User

For direct chat (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).

Why Gemini API (not CLI)?

Gemini CLI is a subprocess; function calling via subprocess is fragile
Gemini API (google-generativeai SDK) has native structured tool-calling
Free tier (Gemini 2.0 Flash) handles orchestration load without cost
Access token is short-lived but auto-refreshed by the SDK (no expiry problem)

Tool Strategy

Tools for the orchestrator are separate from the existing ae_* MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.

New orchestrator tools are Python functions wrapped in Gemini function declarations:

Tool	What it does	Implementation
`web_search`	DuckDuckGo search	`duckduckgo-search` library
`ae_journal_search`	Search AE Journals via V3 API	HTTP to AE API
`ae_journal_entry_create`	Write a new journal entry	HTTP to AE API
`ae_task_list`	Read Kanban tasks	HTTP to AE API or agents_sync file
`file_read`	Read a file from known safe paths	Python `pathlib`
`gitea_api`	Query Gitea repos, issues, PRs	Gitea REST API

Tools are registered in cortex/tools/ (one file per domain group).

Implementation Path

cortex/
  tools/
    __init__.py          — tool registry
    web.py               — web_search
    ae_knowledge.py      — ae_journal_* tools
    ae_tasks.py          — task tools
    gitea.py             — Gitea API tools
  routers/
    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
  orchestrator_engine.py — Gemini tool loop + Claude handoff

Endpoint contract:

POST /orchestrate
{
  "task": "What tasks are due this week and summarize my notes on X topic",
  "session_id": "optional — if part of an ongoing conversation",
  "respond_with_claude": true   // false = return Gemini's assembled context only
}

→ { "job_id": "uuid", "status": "queued" }

GET /orchestrate/{job_id}
→ { "status": "complete", "result": "...", "tool_calls": [...] }

2. Trigger Layer

All three capabilities (chat, orchestration, dev agents) share the same trigger layer:

┌────────────────────────────────────────────────┐
│  TRIGGERS                                      │
│                                                │
│  Chat UI  →  POST /chat  (existing)            │
│  Cron     →  POST /orchestrate  (new)          │
│  Gitea    →  POST /webhook/gitea  (new)        │
│  NC Talk  →  POST /webhook/nextcloud  (exists) │
│  Manual   →  CLI / curl for debugging          │
└────────────────────────────────────────────────┘

Cron trigger example (from existing cron infrastructure):

curl -X POST http://localhost:8000/orchestrate \
  -H "Content-Type: application/json" \
  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'

This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.

3. Dev Agent Pipeline

The Goal

Accept a plain-English task like "Fix the bug where X, add a test for it" and produce:

A working code change
Passing syntax/type checks
A summary of what changed and what still needs human review
A commit ready to push (pending approval)

Architecture

Task request (chat / Gitea issue / Kanban)
    ↓
Orchestrator
    • reads relevant files (context gathering)
    • routes to correct specialist
    ↓
Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
Human approval gate
    • summary shown in Cortex UI or NC Talk
    • user approves → commit + optional push
    • user rejects → feedback goes back to specialist

Specialist Agents

Two initial specialists, both using Claude CLI:

Frontend specialist (working dir: ~/OSIT_dev/aether_app_sveltekit/):

Reads documentation/TODO__Agents.md and CLAUDE.md before acting
Runs npx svelte-check after every change — no exceptions
Atomic commits (one component or fix per commit)

Backend specialist (working dir: ~/OSIT_dev/aether_api_fastapi/):

Reads documentation/TODO__Agents.md and CLAUDE.md before acting
Runs python3 -m py_compile after every file edit
Runs unit tests before declaring done
Flags E2E tests that need human review

Supervisor Agent

The supervisor is a separate Claude invocation that receives:

The diff of all changed files
Stdout/stderr from all checks that were run
The original task description

It returns a structured assessment:

{
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile", "unit_tests"],
  "checks_failed": [],
  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
  "commit_message": "fix: correct session token validation in auth middleware"
}

Gitea Integration

Gitea webhooks → Cortex: Push/PR events trigger supervisor review automatically
Gitea Actions: Run py_compile/svelte-check on every push (simple CI, no custom runner)
Cortex → Gitea: After human approval, supervisor calls Gitea API to create PR or push

Gitea Actions are simpler than they sound — a .gitea/workflows/check.yml is just a YAML file that runs shell commands on push. No external CI infrastructure needed.

4. Knowledge Layer

The Goal

AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.

Import Strategy

Don't bulk-import blindly. The orchestrator searches AE Journals before creating anything (deduplication).
Chunk by section. A large markdown file becomes multiple journal entries — one per H2 section.
Preserve provenance. Each imported entry includes source path, import date, and original file date in its data_json or notes.
Tag intelligently. Tags come from: frontmatter, filename keywords, directory path, and content analysis.

Source Priority

Source	Priority	Notes
`~/DgrZone_Nextcloud/`	High	Personal notes, projects
`~/OSIT_Nextcloud/`	High	Business docs
`~/agents_sync/aether/docs/`	Medium	Platform specs (already structured)
OpenClaw session logs	Low	Historical, lots of noise

Agent Workflow

"Summarize my notes on WireGuard setup"
    ↓
Orchestrator calls ae_journal_search("wireguard")
    ↓
Returns matching entries
    ↓
Claude synthesizes a response

"Save this as a note in my DgrZone journal"
    ↓
Orchestrator calls ae_journal_entry_create(
    journal="DgrZone General",
    title="...",
    content="...",
    tags=["note", "wireguard"]
)

Context Tiers (Inara Memory)

The existing distill system (MEMORY_SHORT.md, MEMORY_MID.md, MEMORY_LONG.md) handles working memory. The Knowledge Layer is complementary — it's the searchable long-term archive, not the rolling context window. Agents should:

Use memory files for "what have we been working on lately"
Use AE Journals search for "what do I know about topic X"

5. Model Routing (Future)

Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:

Task type	Model	Reason
User-facing conversation	Claude	Quality prose, reasoning
Tool use / orchestration	Gemini API	Native function calling, free
Private / sensitive	Ollama (local)	No data leaves the network
Long context (>100k tokens)	Gemini 2.0	1M token context window
Code generation	Claude	Strong code quality

Routing logic lives in cortex/orchestrator_engine.py — a simple function that maps task metadata to a backend choice.

Implementation Order (Recommended)

Orchestrator Phase 1 — Gemini API integration, basic tool loop, /orchestrate endpoint
- Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
Knowledge import — markdown → AE Journal Entries tool + import script
- Unlocks: searchable knowledge base for all agents
Dev agent pipeline — Frontend + Backend specialist agents
- Unlocks: AI-assisted development with supervisor review
Gitea integration — webhook receiver + Actions CI
- Unlocks: event-driven automation, PR workflow
Intelligent routing — model selection by task type
- Polish: cost and quality optimization

Key Design Decisions

Decision	Choice	Rationale
Orchestrator model	Gemini API (not CLI)	Native tool calling; free tier
Responder model	Claude CLI (Pro sub)	Quality output; no API cost
Direct chat bypass	Yes	Don't add latency when tools aren't needed
Tool set	Separate from ae_* MCPs	ae_* tools are stable; don't risk breaking active agents
Dev agents	Claude CLI in project dir	CLAUDE.md + project context already in place
Human approval gate	Required before commit	Agents can propose; humans decide
Knowledge primary source	AE Journals	Already exists, structured, searchable

11 KiB Raw Blame History

Architecture: Intelligence Layer

Overview

1. Orchestrator / Responder Pattern

The Problem

The Pattern

Why Gemini API (not CLI)?

Tool Strategy

Implementation Path

2. Trigger Layer

3. Dev Agent Pipeline

The Goal

Architecture

Specialist Agents

Supervisor Agent

Gitea Integration

4. Knowledge Layer

The Goal

Import Strategy

Source Priority

Agent Workflow

Context Tiers (Inara Memory)

5. Model Routing (Future)

Implementation Order (Recommended)

Key Design Decisions

11 KiB

Raw Blame History