Docs: Update Unified Agent Architecture and Platform Roadmap.

This commit is contained in:
Scott Idem
2026-01-07 18:53:04 -05:00
parent d4805ebb09
commit 90c6b914fa
2 changed files with 122 additions and 45 deletions

View File

@@ -2,15 +2,9 @@
## My Role and Operating Principles ## My Role and Operating Principles
I am an interactive CLI agent assisting with software engineering tasks for One Sky IT, LLC, primarily on the Aether API project. My core mandates include: I am the **primary orchestrator and main helper** for the development of the **Unified Aether AI Agent (UE-AE-01)**. My goal is to facilitate the creation of a single AI entity with total system awareness across MariaDB, FastAPI, SvelteKit, and Docker.
- Adhering to project conventions and existing code style.
- **Never assuming library/framework availability; always verifying project usage.** ---
- Implementing changes idiomatically and with minimal, high-value comments.
- Being proactive, including adding tests for new features/fixes.
- **Confirming ambiguity or actions beyond clear scope with the user.**
- Prioritizing user control and project conventions.
- **Strictly adhering to instructions and utilizing available tools effectively.**
- **Awaiting explicit user instructions for significant architectural changes or critical decisions.**
## Project Context - Aether API (FastAPI) ## Project Context - Aether API (FastAPI)
@@ -31,59 +25,41 @@ I am an interactive CLI agent assisting with software engineering tasks for One
### Technical Learnings ### Technical Learnings
- **Startup Errors & Logging:** The "worker failed to boot" error is often an import-time error or a logging configuration failure. - **Startup Errors & Logging:** The "worker failed to boot" error is often an import-time error or a logging configuration failure.
- **Root Cause:** If `logging.config.dictConfig` fails (e.g., due to missing `/logs` directories in Docker), the entire application crashes. - **Root Cause:** If `logging.config.dictConfig` fails (e.g., due to missing `/logs` directories in Docker), the entire application crashes.
- **Prevention:** Always wrap logging config in `try/except` and use `import logging.config` explicitly. - **Circular Dependencies during Refactoring:** Even deferred imports can trigger boot failures during FastAPI's introspection phase if the module structure is fragile. "Isolation Mode" (local definitions in routers) is a confirmed temporary fix.
- **Circular Dependencies:** These are frequently masked as logging errors because `app.log` is imported very early in most files. Breaking these loops by moving imports inside functions (deferred imports) is a primary fix.
- **Circular Dependencies during Refactoring:** Attempting to move base CRUD logic and engine initialization into separate modules can trigger "Worker failed to boot" if not done carefully.
- **Issue:** Moving `db` and `engine` to a separate file like `db_connection.py` often creates circular loops with `db_sql.py` or `log.py` because they are imported by almost every other file at the module level.
- **Resolution:** A "Facade Pattern" was used for `db_sql.py`, where helper functions (Search builders, Redis lookups) are moved to `lib_sql_search.py` and `lib_redis_helpers.py`, but the core connection and CRUD stay in the original file to maintain boot order stability.
- **V3 API Dependencies:** Standardized `Response` injection should use plain type hints (e.g., `response: Response`) to avoid router initialization failures. - **V3 API Dependencies:** Standardized `Response` injection should use plain type hints (e.g., `response: Response`) to avoid router initialization failures.
- **Pydantic Compatibility:** The current environment uses Pydantic v1.10. Avoid v2 features like `computed_field` or `model_validator` to prevent startup crashes.
### V3 Architectural Progress (Jan 2026) ### V3 Architectural Progress (Jan 2026)
- **Modular Object Definitions:** Monolithic `ae_obj_types_def.py` refactored into domain-specific files in `app/object_definitions/`. - **Modular Object Definitions:** Monolithic `ae_obj_types_def.py` refactored into domain-specific files in `app/object_definitions/`.
- **Granular Dependencies:** Monolithic `Common_Route_Params` replaced with specialized dependencies in `app/lib_general_v3.py` (AccountContext, Pagination, StatusFilter, Serialization, Delay).
- **Advanced Search (POST):** Implemented `POST /v3/crud/{obj}/search` supporting recursive AND/OR grouping and standardized full-text search via the `q` property. - **Advanced Search (POST):** Implemented `POST /v3/crud/{obj}/search` supporting recursive AND/OR grouping and standardized full-text search via the `q` property.
- **Security Hardening:** Implemented a 5-level recursion depth limit and a field allowlist (`searchable_fields`) for the Search API. - **Security Hardening:** Implemented a 5-level recursion depth limit and a field allowlist (`searchable_fields`) for the Search API.
- **Non-blocking Concurrency:** Standardized on `asyncio.sleep()` for delay simulation to prevent Gunicorn worker hangs.
## Session Learnings & Progress (Jan 2-7, 2026) ## Session Learnings & Progress (Jan 2-7, 2026)
### V3 API Security Hardening (Jan 7, 2026) - MILESTONE ### V3 API Security Hardening (Jan 7, 2026) - MILESTONE
- **Mandatory JWT Authentication**: Successfully implemented strict multi-tenant isolation across all V3 CRUD and Search endpoints. - **Mandatory JWT Authentication**: Successfully implemented strict multi-tenant isolation across all V3 CRUD and Search endpoints.
- All requests (except context resolution) now require a valid JWT `Authorization: Bearer <token>` or `?jwt=<token>`.
- **Account Isolation**: results are automatically filtered by `account_id` from the JWT. - **Account Isolation**: results are automatically filtered by `account_id` from the JWT.
- **Documentation**: Updated `V3_FRONTEND_API_GUIDE.md` with explicit instructions and security requirements for the frontend agent. - **Bootstrap Paradox Exception**: `site_domain` search is explicitly allowed for unauthenticated guests to unblock site context resolution.
### Agent Bridge & Docker Integration ### Unified Agent Architecture
- **Agent Bridge Implementation**: Developed `app/routers/agent_bridge.py` for environment diagnostics. - **Refined Specification**: Incorporated feedback from the Frontend Svelte agent. The Unified Agent will handle **Automated Schema Synchronization**, **Log Stream Aggregation**, and **Automated Lifecycle Management**.
- **MCP Docker Explorer**: Attempted to run `mcp_docker_explorer.py`, but failed with `ModuleNotFoundError: No module named 'mcp'`.
- **Lesson**: The system python (`/usr/bin/python3`) does not have the `mcp` package installed. We must use the specific virtual environment `env_mcp` (e.g., `./env_mcp/bin/python`) or ensure the package is installed in the active environment.
### V3 CRUD Infrastructure & Search ### Infrastructure & Progress
- **Modular Object Definitions**: Refactored `ae_obj_types_def.py` into modular domain files in `app/object_definitions/`. - [x] **Modularize `lib_general.py`**: Successfully extracted Email, Export, JWT, and Hash functions into specialized modules (`lib_email.py`, `lib_export.py`, `lib_jwt.py`, `lib_hash.py`).
- **Advanced Search Fixes**:
- Resolved account listing and search issues by implementing `get_supported_filters` in `api_crud_v3.py`.
- Improved standardized full-text search (`q` parameter) with fallback logic for missing columns.
- **Data Integrity & Aliasing**: Fixed aliased field population by enabling `allow_population_by_field_name` in Pydantic models.
### Startup Failure Resolution (Jan 7, 2026)
- **Root Cause Identified**: The `app/routers/agent_bridge.py` module was preventing the FastAPI worker from booting, likely due to a missing or incompatible dependency (suspected `psutil` in the Docker environment) or a top-level import issue.
- **Resolution**: Commented out the `agent_bridge` router inclusion in `app/main.py`.
- **Status**: The API server has successfully started.
- **Retrospective**: The previous circular dependency refactoring in `lib_general_v3` and `api_crud_v3` might have been unnecessary or at least wasn't the *primary* blocker, though deferring imports is good practice.
## Current To-Do List ## Current To-Do List
1. **Frontend Integration (Priority: Urgent)**: Re-implement the `site_domain` lookup exception. ### 1. High Priority & Urgent
- *Constraint*: Must allow searching `site_domain` without an `account_id` or JWT. - [ ] **Initialize `aether_platform` Project** (Priority: High): Create the root directory at `/home/scott/OSIT_dev/aether_platform/` and establish the initial meta-structure.
- *Approach*: Re-apply the `optional` authentication dependency logic to `api_crud_v3.py` and `lib_general_v3.py`, now that the server is stable. - [ ] **Unified Agent Architecture Document** (Priority: High): Refine and synchronize the final spec (Draft Done).
2. **Docker MCP Integration (Priority: High)**: Re-attempt running the MCP explorer using the correct virtual environment path (`./env_mcp/bin/python`) once the API is stable. - [ ] **Permanent Dependency Fix** (Priority: Urgent): Migrate `AccountContext` and Auth logic to a dedicated module.
3. **Routing - Nginx (Priority: Medium)**: Resolve 404 errors on `/v3/` and `/agent/` routes.
4. **Specialized Endpoints (Priority: Medium)**: Plan modernization of custom logic. ### 2. Infrastructure & Environment
5. **Agent Bridge Repair (Priority: Low)**: Investigate why `agent_bridge.py` crashes the server (check `psutil` availability). - [ ] **Docker MCP Integration**: Re-attempt diagnostics using the correct python path (`./env_mcp/bin/python`).
- [ ] **Agent Bridge Repair**: Resolve the `psutil` or syntax issues in `app/routers/agent_bridge.py`.
- [ ] **Nginx Configuration**: Resolve 404 errors on Port 8888 routes.
### Workflow & Collaboration ### Workflow & Collaboration
- **`GEMINI.md` Strategy:** The user is creating `GEMINI.md` files in key project directories. Their understanding is that context flows from the current directory up the tree, with `~/.gemini/GEMINI.md` serving as a global catch-all for general memories. - **`GEMINI.md` Strategy:** Context flows up the tree.
- **Agents Sync (rsync):** Shared documentation, notes, and architectural updates are pushed to the `agents_sync` directory using `rsync`. This allows real-time coordination between different specialized agents (e.g., FastAPI backend and Svelte frontend agents). - **Agents Sync (rsync):** Shared documentation and notifications pushed to `~/agents_sync/`.
- **Home Server:** The user self-hosts a Proxmox server for services like Nextcloud. - **Home Server:** Remote proxy at `https://dev-api.oneskyit.com`.- [x] **Establish Symbolic Links**: Linked API, App, and Env into aether_platform.

View File

@@ -0,0 +1,101 @@
# Specification: Unified Aether AI Agent (UE-AE-01)
## 1. Vision & Purpose
The **Unified Aether AI Agent** is a single, cohesive AI entity designed to eliminate the friction of multi-agent coordination. It possesses "Total System Awareness," allowing it to understand how a change in the database schema on a remote server impacts the FastAPI backend, the Nginx proxy, and the SvelteKit frontend simultaneously.
---
## 2. System Architecture & Operational Domains
### A. Data Layer (MariaDB)
* **Location:** Separate Virtual Server (Remote VM).
* **Role:** Master data storage.
* **Agent Access Requirements:**
* Remote SQL execution capabilities.
* SSH access for database maintenance and schema inspection.
* Knowledge of cross-server connection strings and security groups.
### B. Caching & Messaging Layer (Redis)
* **Location:** Docker Container (Main Workstation).
* **Role:** Session management, ID resolution (Random ID mapping), and real-time messaging.
* **Agent Access Requirements:**
* Ability to execute `redis-cli` commands via Docker.
* Direct inspection of key-value pairs for troubleshooting.
### C. API Backend Layer (FastAPI / Python)
* **Location:** Docker Container (Main Workstation).
* **Role:** Business logic, CRUD V3 implementation, JWT authentication, and multi-tenant isolation.
* **Agent Access Requirements:**
* Full filesystem access to `osit-api-fastapi/`.
* Ability to manage Python environments and dependencies.
* Docker container management (logs, restarts, shell execution).
### D. Frontend Layer (SvelteKit / TypeScript)
* **Location:** Local Filesystem (Main Workstation).
* **Role:** User Interface, API consumption, and client-side state management.
* **Agent Access Requirements:**
* Full filesystem access to SvelteKit project directories.
* Ability to execute build tools (npm, vite) and linting (eslint, prettier).
* Browser automation for E2E testing (Playwright).
### E. Routing Layer (Nginx)
* **Location:** Host System or Docker.
* **Role:** SSL termination and reverse proxying for the API and Frontend.
* **Agent Access Requirements:**
* Ability to modify and reload Nginx configuration files.
* Diagnostic access to Nginx access/error logs.
### F. Storage Layer (Syncthing / Hosted Files)
* **Location:** `/home/scott/OSIT/hosted_files/` (Main Workstation) and synchronized Remote Servers.
* **Role:** Extremely important persistent storage for files served via the API (e.g., `hosted_file`, `event_file`).
* **Synchronization:** Managed via **Syncthing** (similar to the `agents_sync` directory), ensuring real-time mirroring across the Aether ecosystem.
* **Agent Access Requirements:**
* Full filesystem access to the local hosted files directory.
* Ability to verify synchronization status and resolve conflicts.
* Understanding of the relationship between file metadata in MariaDB and physical assets in this directory.
### G. Workstation Development Environment
* **Base Path:** `/home/scott/OSIT_dev/`
* **Project Repositories:**
* `aether_container_env/`: Docker Compose and environment configuration.
* `aether_api_fastapi/`: The Python/FastAPI backend source.
* `ae_app_svelte_tailwind_skeleton/`: The SvelteKit/TypeScript frontend source.
* **Network & Proxy Path:**
* Docker containers on the workstation are proxied via **Nginx on a separate Home Server** (Proxmox VM hosting Home Assistant, Jellyfin, Jitsi, etc.).
* **External Access URL:** `https://dev-api.oneskyit.com`
* **Agent Access Requirements:**
* Total awareness of the inter-connected paths between these three main directories.
* Knowledge of the home server's proxy logic to debug external connectivity vs. internal container health.
---
## 3. Communication & Context Strategy
### A. Integrated Global Memory
The Unified Agent will move away from separate `GEMINI.md` files in favor of a **Global System Context**. This context tracks:
1. **Service Map:** Mapping of ports, paths (e.g., `/v3/crud/`), and container IDs.
2. **Dependency Graph:** Visualizing how modules across different repositories interact.
3. **Boot Order Logic:** Understanding the fragile initialization requirements of the stack.
### B. Agent Sync Orchestration
The agent acts as the primary orchestrator for the `~/agents_sync/` directory:
- **Log Aggregation:** Pulling logs from MariaDB, FastAPI, and Nginx into a central diagnostic stream.
- **Inbound Messaging:** Processing user instructions from the `inbox/` and updating "System Health" status files.
---
## 4. Key Capabilities
1. **Cross-Stack Debugging:** Tracing a "500 Internal Server Error" from a Svelte fetch call, through the Nginx proxy, into the FastAPI logic, and finally identifying the missing column in the remote MariaDB table.
2. **Automated Schema Synchronization:** Reading Pydantic models and MariaDB table schemas to automatically generate and update TypeScript interfaces and `.editable_fields.ts` definitions in the Svelte project (`src/lib/ae_core/`).
3. **Log Stream Aggregation:** Simultaneous monitoring of Svelte console output, Nginx access/error logs, and FastAPI container logs to provide instant root-cause identification for cross-stack failures.
4. **Automated Lifecycle Management:** Orchestrating the "Change-Restart-Verify" loop. The agent should automatically trigger targeted Docker container restarts whenever backend code is modified to ensure the frontend is always interacting with the latest logic.
5. **Environment-Aware Refactoring:** Safely breaking up monolithic files (like `lib_general.py`) while knowing exactly which services are impacted and verifying them across the full stack.
6. **Automated Full-Stack Verification:** Writing a backend migration and a frontend UI component in a single turn, then verifying the integration with an automated test suite.
---
## 5. Security & Safety
- **Credential Isolation:** Secrets and API keys remain in `.env` files; the agent only manages the logic to use them.
- **Incremental Deployment:** Changes are applied service-by-service with health checks at every stage.
- **Sandboxing Awareness:** The agent operates with the knowledge that it is running directly on the user's workstation and remote infrastructure.