diff --git a/documentation/ARCH__FUTURE.md b/documentation/ARCH__FUTURE.md index e0fee82..d7131fa 100644 --- a/documentation/ARCH__FUTURE.md +++ b/documentation/ARCH__FUTURE.md @@ -256,3 +256,61 @@ Rather than a single Cortex instance, each device in the fleet runs its own inst - Session continuity — does a conversation that starts on one node stay there, or can it migrate? The Syncthing-synced `home/` directory and shared `model_registry.json` already provide a natural foundation — instances share persona memory and context without a central DB. +--- + +## 11. LLM Wiki — Persistent Knowledge Compilation (Karpathy Pattern) + +**Status:** Concept — no design yet. Inspired by [Karpathy's llm-wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) gist. + +**Core idea:** Instead of treating AE Journals as an archive you retrieve from, evolve them into a **living wiki** that the LLM incrementally builds and maintains. When a new source is added, the LLM doesn't just index it — it reads it, extracts key information, and integrates it into the existing wiki: updating entity pages, revising topic summaries, flagging contradictions, strengthening or challenging the evolving synthesis. Knowledge is compiled once and kept current, not re-derived on every query. + +This is a philosophical shift from our current approach (RAG/retrieval) toward **compounding knowledge** — the wiki gets richer with every source added and every question asked. + +### Three-Layer Architecture + +``` +Raw Sources (immutable) ↓ + → LLM reads, extracts, cross-references +Wiki (LLM-maintained markdown) ← the persistent artifact + → Human reads, LLM writes +Schema (CLAUDE.md / AGENTS.md) ← configuration + conventions +``` + +1. **Raw sources** — curated, immutable originals (articles, papers, session logs, transcripts). LLM reads from them, never modifies them. +2. **The wiki** — directory of LLM-generated markdown files: summaries, entity pages, concept pages, comparisons, synthesis. The LLM owns this layer entirely. Creates pages, updates them when new sources arrive, maintains cross-references. +3. **Schema** — a configuration document (analogous to our `PROTOCOLS.md`) that tells the LLM how the wiki is structured, what conventions to follow, and what workflows to use when ingesting sources or answering questions. Co-evolved with the human over time. + +### Operations + +**Ingest.** Drop a new source into the raw collection and tell the LLM to process it. Flow: LLM reads source → discusses key takeaways with human → writes summary page → updates index → updates relevant entity/concept pages (a single source might touch 10-15 pages) → appends to log. Human stays involved, guiding emphasis. + +**Query.** Ask questions against the wiki. LLM reads the index to find relevant pages, drills in, synthesizes an answer with citations. **Key insight: good answers get filed back into the wiki as new pages.** A comparison table, an analysis, a connection discovered — these are valuable and shouldn't disappear into chat history. + +**Lint.** Periodic health check: contradictions between pages, stale claims superseded by newer sources, orphan pages with no inbound links, missing cross-references, data gaps that could be filled with a web search. + +### Index and Log (Two Navigation Files) + +**`index.md`** — content-oriented catalog. Every wiki page listed with link, one-line summary, and optional metadata (date, source count). Organized by category. LLM updates on every ingest. At moderate scale (~100 sources, ~hundreds of pages), this replaces the need for embedding-based RAG. + +**`log.md`** — chronological, append-only record of what happened and when (ingests, queries, lint passes). Each entry starts with a consistent prefix (e.g. `## [2026-04-02] ingest | Article Title`) making it parseable with simple tools like `grep "^## \[" log.md | tail -5`. + +### Applicability to Cortex / Inara + +This pattern maps naturally to several existing concepts: + +| Karpathy Concept | Cortex Equivalent | Gap | +|---|---|---| +| Raw sources | Session logs, imported docs | No curated raw-source collection yet | +| Wiki pages | AE Journals | Journals are entry-based, not interlinked-wiki-based | +| Index + Log | No equivalent | Would need `wiki_index.md` and `wiki_log.md` | +| Schema/Protocols | PROTOCOLS.md, OPERATIONS.md | Not configured for wiki maintenance workflows | +| Lint operation | No equivalent | No periodic wiki health-check exists | +| Answers filed back | Session chat history | Answers are lost after session (unless distilled) | +| Obsidian as IDE | Cortex UI / Files panel | Files panel could serve as the browsing surface | + +**Next steps (if pursued):** +1. Design the wiki directory structure within `agents_sync/` — separate from session logs and memory files +2. Define the schema document — what goes in a wiki page, cross-reference format, category taxonomy +3. Build an ingest tool/script that reads a source and updates wiki pages (LLM-driven) +4. Build a lint cron job that health-checks the wiki periodically +5. Consider Obsidian compatibility for human browsing of the wiki graph \ No newline at end of file