Files
Cortex-Inara/documentation/ARCH__Intelligence_Layer.md
Scott Idem ed472ce9a0 feat: Intelligence Layer Phase 1 — orchestrator service
Adds the Gemini API orchestrator (ReAct tool loop → Claude responder):

Orchestrator engine + router:
- orchestrator_engine.py: Gemini API tool loop, Claude CLI handoff
- routers/orchestrator.py: POST /orchestrate (async job queue), GET /orchestrate/{job_id}

Tools (cortex/tools/):
- web.py: DuckDuckGo web search (no key required)
- ae_knowledge.py: ae_journal_search + ae_journal_entry_create (AE V3 API)
- ae_tasks.py: ae_task_list (reads agents_sync Kanban filesystem)
- files.py: file_read (path-allowlisted to safe dirs)

Config + deps:
- config.py: orchestrator, DuckDuckGo, and AE API settings
- requirements.txt: google-genai, duckduckgo-search
- .env.default: reference config with all new keys documented

Docs:
- CLAUDE.md, README.md, documentation/ added to repo
- Port references updated 7331 → 8000 throughout
- Default model updated to gemini-2.5-flash

Tested: ae_task_list, ae_journal_search, web_search all working end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 19:37:49 -04:00

11 KiB

Architecture: Intelligence Layer

Status: Design phase — not yet implemented Last updated: 2026-03-18

This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.


Overview

Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:

  1. Orchestrator/Responder — Gemini handles tool use and planning; Claude handles the user-facing response
  2. Dev Agent Pipeline — Specialist agents implement code changes; a supervisor checks the work
  3. Knowledge Layer — AE Journals becomes the primary knowledge base; agents can read and write it

These are independent tracks that share the same trigger layer and can be built incrementally.


1. Orchestrator / Responder Pattern

The Problem

Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.

The Pattern

User message
    ↓
Orchestrator (Gemini API)
    • interprets intent
    • decides which tools to call
    • executes tool loop (ReAct: reason → act → observe → repeat)
    • assembles enriched context + tool results
    ↓
Responder (Claude CLI)
    • receives enriched context
    • writes the user-facing response
    ↓
User

For direct chat (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).

Why Gemini API (not CLI)?

  • Gemini CLI is a subprocess; function calling via subprocess is fragile
  • Gemini API (google-generativeai SDK) has native structured tool-calling
  • Free tier (Gemini 2.0 Flash) handles orchestration load without cost
  • Access token is short-lived but auto-refreshed by the SDK (no expiry problem)

Tool Strategy

Tools for the orchestrator are separate from the existing ae_* MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.

New orchestrator tools are Python functions wrapped in Gemini function declarations:

Tool What it does Implementation
web_search DuckDuckGo search duckduckgo-search library
ae_journal_search Search AE Journals via V3 API HTTP to AE API
ae_journal_entry_create Write a new journal entry HTTP to AE API
ae_task_list Read Kanban tasks HTTP to AE API or agents_sync file
file_read Read a file from known safe paths Python pathlib
gitea_api Query Gitea repos, issues, PRs Gitea REST API

Tools are registered in cortex/tools/ (one file per domain group).

Implementation Path

cortex/
  tools/
    __init__.py          — tool registry
    web.py               — web_search
    ae_knowledge.py      — ae_journal_* tools
    ae_tasks.py          — task tools
    gitea.py             — Gitea API tools
  routers/
    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
  orchestrator_engine.py — Gemini tool loop + Claude handoff

Endpoint contract:

POST /orchestrate
{
  "task": "What tasks are due this week and summarize my notes on X topic",
  "session_id": "optional — if part of an ongoing conversation",
  "respond_with_claude": true   // false = return Gemini's assembled context only
}

→ { "job_id": "uuid", "status": "queued" }

GET /orchestrate/{job_id}
→ { "status": "complete", "result": "...", "tool_calls": [...] }

2. Trigger Layer

All three capabilities (chat, orchestration, dev agents) share the same trigger layer:

┌────────────────────────────────────────────────┐
│  TRIGGERS                                      │
│                                                │
│  Chat UI  →  POST /chat  (existing)            │
│  Cron     →  POST /orchestrate  (new)          │
│  Gitea    →  POST /webhook/gitea  (new)        │
│  NC Talk  →  POST /webhook/nextcloud  (exists) │
│  Manual   →  CLI / curl for debugging          │
└────────────────────────────────────────────────┘

Cron trigger example (from existing cron infrastructure):

curl -X POST http://localhost:8000/orchestrate \
  -H "Content-Type: application/json" \
  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'

This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.


3. Dev Agent Pipeline

The Goal

Accept a plain-English task like "Fix the bug where X, add a test for it" and produce:

  • A working code change
  • Passing syntax/type checks
  • A summary of what changed and what still needs human review
  • A commit ready to push (pending approval)

Architecture

Task request (chat / Gitea issue / Kanban)
    ↓
Orchestrator
    • reads relevant files (context gathering)
    • routes to correct specialist
    ↓
Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
Human approval gate
    • summary shown in Cortex UI or NC Talk
    • user approves → commit + optional push
    • user rejects → feedback goes back to specialist

Specialist Agents

Two initial specialists, both using Claude CLI:

Frontend specialist (working dir: ~/OSIT_dev/aether_app_sveltekit/):

  • Reads documentation/TODO__Agents.md and CLAUDE.md before acting
  • Runs npx svelte-check after every change — no exceptions
  • Atomic commits (one component or fix per commit)

Backend specialist (working dir: ~/OSIT_dev/aether_api_fastapi/):

  • Reads documentation/TODO__Agents.md and CLAUDE.md before acting
  • Runs python3 -m py_compile after every file edit
  • Runs unit tests before declaring done
  • Flags E2E tests that need human review

Supervisor Agent

The supervisor is a separate Claude invocation that receives:

  • The diff of all changed files
  • Stdout/stderr from all checks that were run
  • The original task description

It returns a structured assessment:

{
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile", "unit_tests"],
  "checks_failed": [],
  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
  "commit_message": "fix: correct session token validation in auth middleware"
}

Gitea Integration

  • Gitea webhooks → Cortex: Push/PR events trigger supervisor review automatically
  • Gitea Actions: Run py_compile/svelte-check on every push (simple CI, no custom runner)
  • Cortex → Gitea: After human approval, supervisor calls Gitea API to create PR or push

Gitea Actions are simpler than they sound — a .gitea/workflows/check.yml is just a YAML file that runs shell commands on push. No external CI infrastructure needed.


4. Knowledge Layer

The Goal

AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.

Import Strategy

  1. Don't bulk-import blindly. The orchestrator searches AE Journals before creating anything (deduplication).
  2. Chunk by section. A large markdown file becomes multiple journal entries — one per H2 section.
  3. Preserve provenance. Each imported entry includes source path, import date, and original file date in its data_json or notes.
  4. Tag intelligently. Tags come from: frontmatter, filename keywords, directory path, and content analysis.

Source Priority

Source Priority Notes
~/DgrZone_Nextcloud/ High Personal notes, projects
~/OSIT_Nextcloud/ High Business docs
~/agents_sync/aether/docs/ Medium Platform specs (already structured)
OpenClaw session logs Low Historical, lots of noise

Agent Workflow

"Summarize my notes on WireGuard setup"
    ↓
Orchestrator calls ae_journal_search("wireguard")
    ↓
Returns matching entries
    ↓
Claude synthesizes a response
"Save this as a note in my DgrZone journal"
    ↓
Orchestrator calls ae_journal_entry_create(
    journal="DgrZone General",
    title="...",
    content="...",
    tags=["note", "wireguard"]
)

Context Tiers (Inara Memory)

The existing distill system (MEMORY_SHORT.md, MEMORY_MID.md, MEMORY_LONG.md) handles working memory. The Knowledge Layer is complementary — it's the searchable long-term archive, not the rolling context window. Agents should:

  • Use memory files for "what have we been working on lately"
  • Use AE Journals search for "what do I know about topic X"

5. Model Routing (Future)

Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:

Task type Model Reason
User-facing conversation Claude Quality prose, reasoning
Tool use / orchestration Gemini API Native function calling, free
Private / sensitive Ollama (local) No data leaves the network
Long context (>100k tokens) Gemini 2.0 1M token context window
Code generation Claude Strong code quality

Routing logic lives in cortex/orchestrator_engine.py — a simple function that maps task metadata to a backend choice.


  1. Orchestrator Phase 1 — Gemini API integration, basic tool loop, /orchestrate endpoint
    • Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
  2. Knowledge import — markdown → AE Journal Entries tool + import script
    • Unlocks: searchable knowledge base for all agents
  3. Dev agent pipeline — Frontend + Backend specialist agents
    • Unlocks: AI-assisted development with supervisor review
  4. Gitea integration — webhook receiver + Actions CI
    • Unlocks: event-driven automation, PR workflow
  5. Intelligent routing — model selection by task type
    • Polish: cost and quality optimization

Key Design Decisions

Decision Choice Rationale
Orchestrator model Gemini API (not CLI) Native tool calling; free tier
Responder model Claude CLI (Pro sub) Quality output; no API cost
Direct chat bypass Yes Don't add latency when tools aren't needed
Tool set Separate from ae_* MCPs ae_* tools are stable; don't risk breaking active agents
Dev agents Claude CLI in project dir CLAUDE.md + project context already in place
Human approval gate Required before commit Agents can propose; humans decide
Knowledge primary source AE Journals Already exists, structured, searchable