Cortex-Inara/documentation/ARCH__Intelligence_Layer.md

# Architecture: Intelligence Layer

**Status:** Design phase — not yet implemented
**Last updated:** 2026-03-18

This document captures the architectural thinking behind expanding Cortex from a smart dispatcher into a genuine intelligence layer: capable of using tools, coordinating specialist agents, and managing a personal knowledge base.

---

## Overview

Cortex currently dispatches chat messages to LLM CLI backends and returns the response. The Intelligence Layer adds three major capabilities on top of that foundation:

1. **Orchestrator/Responder** — Gemini handles tool use and planning; Claude handles the user-facing response
2. **Dev Agent Pipeline** — Specialist agents implement code changes; a supervisor checks the work
3. **Knowledge Layer** — AE Journals becomes the primary knowledge base; agents can read and write it

These are independent tracks that share the same trigger layer and can be built incrementally.

---

## 1. Orchestrator / Responder Pattern

### The Problem

Claude CLI (via Pro subscription) doesn't expose direct API tool-calling. Gemini API (free tier) does. But Claude produces higher-quality user-facing prose and reasoning. The solution is to use each model for what it does best.

### The Pattern

```
User message
    ↓
Orchestrator (Gemini API)
    • interprets intent
    • decides which tools to call
    • executes tool loop (ReAct: reason → act → observe → repeat)
    • assembles enriched context + tool results
    ↓
Responder (Claude CLI)
    • receives enriched context
    • writes the user-facing response
    ↓
User
```

For **direct chat** (no tools needed), the orchestrator is bypassed entirely — message goes straight to Claude. The orchestrator only activates when tools are required or when explicitly invoked (e.g., a background task).

### Why Gemini API (not CLI)?

- Gemini CLI is a subprocess; function calling via subprocess is fragile
- Gemini API (`google-generativeai` SDK) has native structured tool-calling
- Free tier (Gemini 2.0 Flash) handles orchestration load without cost
- Access token is short-lived but auto-refreshed by the SDK (no expiry problem)

### Tool Strategy

Tools for the orchestrator are **separate** from the existing `ae_*` MCP tools. The ae_* tools are stable and used by existing agents — do not modify them.

New orchestrator tools are Python functions wrapped in Gemini function declarations:

| Tool | What it does | Implementation |
|---|---|---|
| `web_search` | DuckDuckGo search | `duckduckgo-search` library |
| `ae_journal_search` | Search AE Journals via V3 API | HTTP to AE API |
| `ae_journal_entry_create` | Write a new journal entry | HTTP to AE API |
| `ae_task_list` | Read Kanban tasks | HTTP to AE API or agents_sync file |
| `file_read` | Read a file from known safe paths | Python `pathlib` |
| `gitea_api` | Query Gitea repos, issues, PRs | Gitea REST API |

Tools are registered in `cortex/tools/` (one file per domain group).

### Implementation Path

```
cortex/
  tools/
    __init__.py          — tool registry
    web.py               — web_search
    ae_knowledge.py      — ae_journal_* tools
    ae_tasks.py          — task tools
    gitea.py             — Gitea API tools
  routers/
    orchestrator.py      — POST /orchestrate, GET /orchestrate/{job_id}
  orchestrator_engine.py — Gemini tool loop + Claude handoff
```

Endpoint contract:

```
POST /orchestrate
{
  "task": "What tasks are due this week and summarize my notes on X topic",
  "session_id": "optional — if part of an ongoing conversation",
  "respond_with_claude": true   // false = return Gemini's assembled context only
}

→ { "job_id": "uuid", "status": "queued" }

GET /orchestrate/{job_id}
→ { "status": "complete", "result": "...", "tool_calls": [...] }
```

---

## 2. Trigger Layer

All three capabilities (chat, orchestration, dev agents) share the same trigger layer:

```
┌────────────────────────────────────────────────┐
│  TRIGGERS                                      │
│                                                │
│  Chat UI  →  POST /chat  (existing)            │
│  Cron     →  POST /orchestrate  (new)          │
│  Gitea    →  POST /webhook/gitea  (new)        │
│  NC Talk  →  POST /webhook/nextcloud  (exists) │
│  Manual   →  CLI / curl for debugging          │
└────────────────────────────────────────────────┘
```

Cron trigger example (from existing cron infrastructure):

```bash
curl -X POST http://localhost:8000/orchestrate \
  -H "Content-Type: application/json" \
  -d '{"task": "Check for overdue Kanban tasks and notify via NC Talk"}'
```

This means the same orchestrator endpoint is usable from chat, crons, and webhooks without any special cases.

---

## 3. Dev Agent Pipeline

### The Goal

Accept a plain-English task like *"Fix the bug where X, add a test for it"* and produce:
- A working code change
- Passing syntax/type checks
- A summary of what changed and what still needs human review
- A commit ready to push (pending approval)

### Architecture

```
Task request (chat / Gitea issue / Kanban)
    ↓
Orchestrator
    • reads relevant files (context gathering)
    • routes to correct specialist
    ↓
Specialist Agent (Claude CLI in project directory)
    • implements the change
    • runs self-check: py_compile / svelte-check
    ↓
Supervisor Agent
    • reviews the diff
    • runs test suite
    • returns: PASS / NEEDS_REVIEW / FAIL + reason
    ↓
Human approval gate
    • summary shown in Cortex UI or NC Talk
    • user approves → commit + optional push
    • user rejects → feedback goes back to specialist
```

### Specialist Agents

Two initial specialists, both using Claude CLI:

**Frontend specialist** (working dir: `~/OSIT_dev/aether_app_sveltekit/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `npx svelte-check` after every change — no exceptions
- Atomic commits (one component or fix per commit)

**Backend specialist** (working dir: `~/OSIT_dev/aether_api_fastapi/`):
- Reads `documentation/TODO__Agents.md` and `CLAUDE.md` before acting
- Runs `python3 -m py_compile` after every file edit
- Runs unit tests before declaring done
- Flags E2E tests that need human review

### Supervisor Agent

The supervisor is a separate Claude invocation that receives:
- The diff of all changed files
- Stdout/stderr from all checks that were run
- The original task description

It returns a structured assessment:

```json
{
  "verdict": "PASS | NEEDS_REVIEW | FAIL",
  "checks_passed": ["py_compile", "unit_tests"],
  "checks_failed": [],
  "review_notes": "E2E tests not run — touch auth router, recommend manual check",
  "commit_message": "fix: correct session token validation in auth middleware"
}
```

### Gitea Integration

- **Gitea webhooks → Cortex:** Push/PR events trigger supervisor review automatically
- **Gitea Actions:** Run `py_compile`/`svelte-check` on every push (simple CI, no custom runner)
- **Cortex → Gitea:** After human approval, supervisor calls Gitea API to create PR or push

Gitea Actions are simpler than they sound — a `.gitea/workflows/check.yml` is just a YAML file that runs shell commands on push. No external CI infrastructure needed.

---

## 4. Knowledge Layer

### The Goal

AE Journals becomes the primary source of truth for personal and business knowledge. Notes, documentation, and logs that currently live scattered across markdown files get organized into Journals with proper structure, search, and agent-accessible read/write.

### Import Strategy

1. **Don't bulk-import blindly.** The orchestrator searches AE Journals before creating anything (deduplication).
2. **Chunk by section.** A large markdown file becomes multiple journal entries — one per H2 section.
3. **Preserve provenance.** Each imported entry includes source path, import date, and original file date in its `data_json` or notes.
4. **Tag intelligently.** Tags come from: frontmatter, filename keywords, directory path, and content analysis.

### Source Priority

| Source | Priority | Notes |
|---|---|---|
| `~/DgrZone_Nextcloud/` | High | Personal notes, projects |
| `~/OSIT_Nextcloud/` | High | Business docs |
| `~/agents_sync/aether/docs/` | Medium | Platform specs (already structured) |
| OpenClaw session logs | Low | Historical, lots of noise |

### Agent Workflow

```
"Summarize my notes on WireGuard setup"
    ↓
Orchestrator calls ae_journal_search("wireguard")
    ↓
Returns matching entries
    ↓
Claude synthesizes a response
```

```
"Save this as a note in my DgrZone journal"
    ↓
Orchestrator calls ae_journal_entry_create(
    journal="DgrZone General",
    title="...",
    content="...",
    tags=["note", "wireguard"]
)
```

### Context Tiers (Inara Memory)

The existing distill system (`MEMORY_SHORT.md`, `MEMORY_MID.md`, `MEMORY_LONG.md`) handles working memory. The Knowledge Layer is complementary — it's the **searchable long-term archive**, not the rolling context window. Agents should:

- Use memory files for "what have we been working on lately"
- Use AE Journals search for "what do I know about topic X"

---

## 5. Model Routing (Future)

Currently hardcoded: Claude default, Gemini fallback. Future intelligent routing:

| Task type | Model | Reason |
|---|---|---|
| User-facing conversation | Claude | Quality prose, reasoning |
| Tool use / orchestration | Gemini API | Native function calling, free |
| Private / sensitive | Ollama (local) | No data leaves the network |
| Long context (>100k tokens) | Gemini 2.0 | 1M token context window |
| Code generation | Claude | Strong code quality |

Routing logic lives in `cortex/orchestrator_engine.py` — a simple function that maps task metadata to a backend choice.

---

## Implementation Order (Recommended)

1. **Orchestrator Phase 1** — Gemini API integration, basic tool loop, `/orchestrate` endpoint
   - Unlocks: web search in chat, AE Journal queries, cron-triggered tasks
2. **Knowledge import** — markdown → AE Journal Entries tool + import script
   - Unlocks: searchable knowledge base for all agents
3. **Dev agent pipeline** — Frontend + Backend specialist agents
   - Unlocks: AI-assisted development with supervisor review
4. **Gitea integration** — webhook receiver + Actions CI
   - Unlocks: event-driven automation, PR workflow
5. **Intelligent routing** — model selection by task type
   - Polish: cost and quality optimization

---

## Key Design Decisions

| Decision | Choice | Rationale |
|---|---|---|
| Orchestrator model | Gemini API (not CLI) | Native tool calling; free tier |
| Responder model | Claude CLI (Pro sub) | Quality output; no API cost |
| Direct chat bypass | Yes | Don't add latency when tools aren't needed |
| Tool set | Separate from ae_* MCPs | ae_* tools are stable; don't risk breaking active agents |
| Dev agents | Claude CLI in project dir | CLAUDE.md + project context already in place |
| Human approval gate | Required before commit | Agents can propose; humans decide |
| Knowledge primary source | AE Journals | Already exists, structured, searchable |