From f865b1cfb77623c4fffa68c71918474832be2b53 Mon Sep 17 00:00:00 2001 From: Scott Idem Date: Fri, 2 Jan 2026 20:26:44 -0500 Subject: [PATCH] Security: Implement modern JWT authentication for V3 CRUD and Search; update documentation and to-do list. --- GEMINI.md | 96 +++++----------- app/lib_general_v3.py | 151 +++++++++++++++++-------- app/models/api_crud_models.py | 2 - documentation/V3_FRONTEND_API_GUIDE.md | 30 ++++- 4 files changed, 164 insertions(+), 115 deletions(-) diff --git a/GEMINI.md b/GEMINI.md index e09eac4..2c2cab7 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -18,78 +18,44 @@ I am an interactive CLI agent assisting with software engineering tasks for One - **System Name:** Aether (AE). - **Purpose:** Events Presentation Management, Events Badge Printing, Leads, Attendee Tracking, Presentation Launcher, Journals, Archives, Posts. - **Started:** Mid-2018. -- **Frontend History:** Python Flask -> Svelte (current: SvelteKit). This explains legacy API calls. -- **Current API Version (FastAPI):** Roughly v2.5. -- **Target API Version:** v3. +- **Frontend History:** Python Flask -> Svelte (current: SvelteKit). +- **Current API Version (FastAPI):** v4.9.0. +- **V3 Implementation:** Modern parallel CRUD and Search endpoints under `/v3/crud`. ### API Versioning & Strategy -- `/crud` (v1): Legacy, still used by some older frontend parts. Defined in `app/routers/api_crud.py`. **Remains untouched.** -- `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint. Defined in `app/routers/api_crud_v2.py`. **Remains untouched.** -- `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure. **Will run alongside v1 and v2.** +- `/crud` (v1): Legacy, still used by older frontend parts. +- `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint. +- `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure and advanced search. **Runs alongside v1 and v2.** -### V3 Architectural Goals +### Technical Learnings +- **Startup Errors & Logging:** The "worker failed to boot" error is often an import-time error or a logging configuration failure. + - **Root Cause:** If `logging.config.dictConfig` fails (e.g., due to missing `/logs` directories in Docker), the entire application crashes. + - **Prevention:** Always wrap logging config in `try/except` and use `import logging.config` explicitly. + - **Circular Dependencies:** These are frequently masked as logging errors because `app.log` is imported very early in most files. Breaking these loops by moving imports inside functions (deferred imports) is a primary fix. +- **V3 API Dependencies:** Standardized `Response` injection should use plain type hints (e.g., `response: Response`) to avoid router initialization failures. +- **Pydantic Compatibility:** The current environment uses Pydantic v1.10. Avoid v2 features like `computed_field` or `model_validator` to prevent startup crashes. -- **Nested URL Structure:** Enforce hierarchical relationships (e.g., `/v3/crud/{parent_type}/{parent_id}/{child_type}/...`). -- **Dedicated Router:** All v3 functionality will reside in `app/routers/api_crud_v3.py`. -- **Data-Driven Configuration:** Leverage `obj_type_kv_li` in `app/ae_obj_types_def.py` for object definitions (tables, models). +### V3 Architectural Progress (Jan 2026) -## Session Learnings & Progress (December 3, 2025) +- **Modular Object Definitions:** Monolithic `ae_obj_types_def.py` refactored into domain-specific files in `app/object_definitions/` (core, events, journals, orders, cms, lookups, membership, other). +- **Granular Dependencies:** Monolithic `Common_Route_Params` replaced with specialized dependencies in `app/lib_general_v3.py` (AccountContext, Pagination, StatusFilter, Serialization, Delay). +- **Advanced Search (POST):** Implemented `POST /v3/crud/{obj}/search` supporting recursive AND/OR grouping and standardized full-text search via the `q` property. +- **Security Hardening:** Implemented a 5-level recursion depth limit and a field allowlist (`searchable_fields`) for the Search API. +- **Non-blocking Concurrency:** Standardized on `asyncio.sleep()` for delay simulation to prevent Gunicorn worker hangs. -### Strategy Shift & V3 Development +## Session Learnings & Progress (Jan 2-3, 2026) -* **Initial Plan Shift:** The strategy shifted from *migrating/replacing* existing v1/v2 routes to *building a new, parallel v3 implementation* from scratch. All existing v1/v2 routes will remain untouched. -* **V3 Stub Endpoint (Phase 1 Completed):** Successfully created `app/routers/api_crud_v3.py` and mounted it at `/v3/crud` in `app/main.py` with a working `/health` endpoint. +- **Logging Robustness:** All core modules and routers now use module-level loggers (`logging.getLogger(__name__)`). `app/log.py` includes robust `dictConfig` initialization with error handling. +- **Backward Compatibility:** Hybrid object definitions ensure that `/v2/crud` continues to work by including both modern (`tbl`, `mdl`) and legacy (`table_name`, `base_name`) keys. +- **FastAPI Best Practices:** Standardized `Response` injection via `response: Response` type hints instead of `Depends(Response)`. +- **Documentation:** Created `V3_FRONTEND_API_GUIDE.md` for Svelte/TypeScript integration and `V3_CRUD_ARCHITECTURE_AND_LEARNINGS.md` for backend maintenance. -### V3 CRUD Proof-of-Concept (Journal & Journal Entry - Phase 2 & 3 Completed) +## Current To-Do List -Implemented the full CRUD functionality for `journal` (top-level) and `journal_entry` (nested child object), demonstrating the v3 nested URL structure and its underlying logic: -* **Top-Level Journal CRUD:** - * `GET /v3/crud/journal/{journal_id}` - * `GET /v3/crud/journal/` (list, with filtering via `for_obj_type`, `for_obj_id`, and `jp`) - * `POST /v3/crud/journal/` - * `PATCH /v3/crud/journal/{journal_id}` - * `DELETE /v3/crud/journal/{journal_id}` -* **Nested Journal Entry CRUD:** - * `GET /v3/crud/journal/{journal_id}/journal_entry/` (list) - * `POST /v3/crud/journal/{journal_id}/journal_entry/` - * `GET /v3/crud/journal/{journal_id}/journal_entry/{entry_id}` - * `PATCH /v3/crud/journal/{journal_id}/journal_entry/{entry_id}` - * `DELETE /v3/crud/journal/{journal_id}/journal_entry/{entry_id}` - -### V3 CRUD Refinements - -* **Soft-Delete Functionality:** `DELETE` endpoints now support a `method` query parameter (`delete`, `disable`, `hide`) for soft-deleting (setting `enable=False` or `hide=True`) or hard-deleting records, mirroring v2 behavior. -* **Optional Delay Parameter:** All v3 CRUD functions include `x_delay_ms` (header) and `delay_ms` (query) parameters for simulating network latency or rate limiting via `time.sleep()`. - -### Current Task: Common Parameters Refactoring (Reverted) - -* **Goal:** Refactor the current monolithic `commons: Common_Route_Params` dependency into smaller, more granular FastAPI dependencies. -* **Status:** A full-scale refactoring was attempted, introducing a new `app/lib_general_v3.py` file and modifying `api_crud_v3.py`, `main.py`, and `models/response_models.py`. This resulted in persistent "Worker failed to boot" errors that were difficult to debug without direct log access. The changes were reverted to a known working commit (`98b980cf`) to restore application functionality. - -### Operational Learnings - -* **File Location:** The `GEMINI.md` file must always be located in the project root directory. -* **Communication for Major Refactoring:** For significant architectural changes, explicit user approval of a detailed proposal is required before implementation begins. -* **Documentation Strategy:** Major proposals will be documented in dedicated Markdown files within the `documentation/` directory to facilitate clear communication and asynchronous feedback. -* **Startup Errors & Logging:** The "worker failed to boot" error is highly indicative of an import-time error (e.g., circular dependency, missing module, or misconfiguration). A common cause is the logging configuration in `app/log.py` failing due to an uninitialized setting or file path issue. -* **Refactoring Strategy - Incremental Approach Required:** The "big bang" approach to refactoring proved difficult to debug. A more incremental strategy is required for the next attempt: - 1. **Isolate Logging:** First, refactor all modules (e.g., `main.py`, `models/response_models.py`) to instantiate their own module-level logger (`logger = logging.getLogger(__name__)`) instead of importing a global `log` instance from `app.log`. This will break potential circular dependencies related to logging. - 2. **Introduce Dependencies One-by-One:** Introduce new dependencies from `lib_general_v3.py` one at a time. - 3. **Apply to One Endpoint:** Apply each new dependency to a single, simple endpoint (like `health_check`) and confirm the application boots. - 4. **Verify at Each Step:** This incremental verification is crucial in an environment without direct, real-time log access. -* **Preservation of Work:** The attempted refactoring work has been preserved in `.snapshot` files for future reference. - ---- - -## Learnings from previous session (December 2, 2025): - -* **Docker Environment Challenges:** Debugging issues in a Dockerized FastAPI environment when running locally (outside the container's execution context) is significantly more challenging due to environment mismatches and symlinked executables. Direct `uvicorn` execution for debugging is not viable in this setup. This necessitates an approach that can either: - * Execute Python code snippets directly (e.g., for import validation). - * Rely on external tools (like `curl` or `requests` from another script) to interact with the Dockerized API for runtime testing. - * Assume that the Docker container provides the authoritative runtime environment, and local checks are primarily for static analysis (syntax, imports). -* **Need for Incremental Verification:** Given the complexity of the project and the debugging constraints, future changes must be exceptionally small, incremental, and verified through a robust testing strategy that can be executed, ideally, within the Docker environment or through isolated Python scripts. -* **Pydantic `BaseModel` Import:** Simple Pydantic `BaseModel` imports can be forgotten, leading to `NameError`. This highlights the need for automated linting or a minimal test harness that can quickly validate new model definitions in isolation. -* **Legacy Code vs. New Code:** When encountering errors, it's crucial to distinguish whether the error is in the new code being introduced or in existing legacy code that might have subtle interactions. The `422 Unprocessable Entity` errors were occurring in the legacy `/crud/` endpoints. This indicates that while our new factory code *itself* didn't cause those specific runtime errors, interactions or an underlying pre-existing issue became apparent when the application was loading. -* **`parent_table_name` for Nested CRUD:** The implementation detail of passing `parent_table_name` (or similar context) to child CRUD operations is essential for correctly linking nested resources in the database layer. The router factory's child object creation needs to pass this context explicitly. -* **API Endpoint Naming Convention:** The user prefers singular nouns for API endpoints (e.g., `/journal`, `/journal_entry`) over plural forms (e.g., `/journals`, `/journal_entries`). This convention will be followed for all new router creations. \ No newline at end of file +1. **Docker Environment Insight Improvements (Priority: High)**: Implement methods/endpoints to give the agent more insight into the actual Docker runtime environment (environment variables, container status, log accessibility) to better diagnose recurring startup and configuration issues. +2. **Security - Field Allowlists (Priority: High)**: Finish populating `searchable_fields` for all remaining object definitions. +3. **Security - Authentication (Priority: High)**: Continue refining and enforcing JWT-based authentication across all V3 endpoints. +4. **Specialized Endpoints (Priority: Medium)**: Identify and plan the modernization of custom logic (e.g., importing, websockets) to match V3 patterns. +5. **Directory Cleanup (Priority: Low)**: Long-term plan to archive old projects and standardize directory naming in `OSIT_dev`. +6. **Unused Route Cleanup**: Successfully commented out `cont_edu_cert` routers in `main.py`. diff --git a/app/lib_general_v3.py b/app/lib_general_v3.py index c872cd8..b7cb0da 100644 --- a/app/lib_general_v3.py +++ b/app/lib_general_v3.py @@ -7,6 +7,7 @@ that are relevant to the v3 API, while removing unused or outdated functionaliti # Standard library imports import time import logging +import jwt from typing import ( Any, Dict, @@ -34,64 +35,120 @@ from pydantic import ( # Internal imports (from this project) from app.config import settings -from app.db_sql import redis_lookup_id_random -from app.log import get_logger -logger = get_logger(__name__) +logger = logging.getLogger(__name__) -# --- Pydantic Model for Account Context --- -class AccountContext(BaseModel): - account_id: Optional[int] - account_id_random: Optional[str] - - -# --- Dependency Function for Account Context --- -def get_account_context( - x_account_id: Optional[str] = Header(None, min_length=11, max_length=22), - x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100), # Assuming 'bypass' or similar string - x_no_account_id_token: Optional[str] = Query(None, min_length=11, max_length=22), -) -> AccountContext: +def decode_jwt( + secret_key: str, + token: str, + ) -> dict: """ - Resolves the account context from headers/query parameters with defined precedence. - Precedence: x_account_id (header) > x_no_account_id_token (query) > x_no_account_id (header flag) - Raises HTTPException 403 if no valid account is found and no bypass is indicated. + Decodes and validates a JWT token. + Ported from lib_general.py to break circular dependencies. """ - logger.setLevel(logging.DEBUG) # Adjust as needed - logger.debug(locals()) + algorithm = 'HS256' + try: + decoded_token = jwt.decode(token, secret_key, algorithms=[algorithm]) + if decoded_token['eat'] >= time.time(): + return decoded_token + else: + return False + except Exception: + return None - resolved_account_id = None - resolved_account_id_random = None +# --- Pydantic Model for Authentication Context --- +class AuthContext(BaseModel): + account_id: Optional[int] = None + account_id_random: Optional[str] = None + user_id: Optional[int] = None + person_id: Optional[int] = None + auth_method: str = 'none' # 'jwt_header', 'jwt_query', 'legacy_header', 'bypass' + +# Alias for backward compatibility with initial V3 implementation +AccountContext = AuthContext + + +# --- Dependency Function for V3 Authentication --- +def get_v3_auth_context( + request: Request, + authorization: Optional[str] = Header(None, description="Bearer "), + jwt_query: Optional[str] = Query(None, alias="jwt", description="JWT token for URL-based auth (e.g., file downloads)"), + x_account_id: Optional[str] = Header(None, min_length=11, max_length=22, description="Legacy X-Account-ID header"), + x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100, description="Bypass account context header"), +) -> AuthContext: + """ + Standardized V3 Authentication Dependency. + Supports JWT in Authorization header (Bearer) OR 'jwt' query parameter. + Falls back to legacy headers for backward compatibility. + """ + # Defer import to break circular dependency + from app.db_sql import redis_lookup_id_random + + # 1. Check for JWT (Header preferred, then Query for downloads) + token = None + method = 'none' + + if authorization and authorization.startswith("Bearer "): + token = authorization.split(" ")[1] + method = 'jwt_header' + elif jwt_query: + token = jwt_query + method = 'jwt_query' + + if token: + payload = decode_jwt(settings.JWT_KEY, token) + if payload: + logger.info(f"JWT Validated ({method}). User: {payload.get('user_id')}, Account: {payload.get('account_id')}") + return AuthContext( + account_id=payload.get('account_id'), + account_id_random=payload.get('public_key'), # existing sign_jwt uses public_key for id_random + user_id=payload.get('user_id'), + person_id=payload.get('person_id'), + auth_method=method + ) + else: + logger.warning(f"Invalid or expired JWT provided via {method}") + raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid or expired authentication token.") + + # 2. Legacy / Testing Fallback: x_account_id if x_account_id: - # Primary check: x_account_id header - resolved_account_id_random = x_account_id if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_account_id): - resolved_account_id = looked_up_id - logger.info(f'Found account from x_account_id header: {resolved_account_id}') + logger.info(f"Authenticated via legacy header: {looked_up_id}") + return AuthContext( + account_id=looked_up_id, + account_id_random=x_account_id, + auth_method='legacy_header' + ) else: - logger.warning(f'Invalid x_account_id header provided: {x_account_id}') - raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-Account-ID header.') - elif x_no_account_id_token: - # Secondary check: x_no_account_id_token query parameter - resolved_account_id_random = x_no_account_id_token - if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_no_account_id_token): - resolved_account_id = looked_up_id - logger.info(f'Found account from x_no_account_id_token query: {resolved_account_id}') - else: - logger.warning(f'Invalid x_no_account_id_token query provided: {x_no_account_id_token}') - raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-No-Account-ID-Token query parameter.') - elif x_no_account_id: - # Tertiary check: x_no_account_id header for bypass - # For now, just presence indicates bypass. Can add a specific value check later if needed. - logger.info(f'X-No-Account-ID header found: {x_no_account_id}. Proceeding without specific account context.') - resolved_account_id = None # Explicitly None for "no specific account" - resolved_account_id_random = '--- NO ACCOUNT ---' - else: - logger.warning('No valid account context provided via X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.') - raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Account context required. Please provide X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.') + raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid X-Account-ID header.") - return AccountContext(account_id=resolved_account_id, account_id_random=resolved_account_id_random) + # 3. Bypass Fallback + if x_no_account_id: + logger.info("Authentication bypassed via X-No-Account-ID") + return AuthContext( + account_id_random='--- NO ACCOUNT ---', + auth_method='bypass' + ) + + # 4. No Auth Found + logger.warning("No authentication provided for V3 endpoint.") + raise HTTPException( + status_code=status.HTTP_403_FORBIDDEN, + detail="Authentication required. Provide Authorization header or 'jwt' query parameter." + ) + + +# --- Legacy wrapper to avoid breaking current V3 code --- +def get_account_context( + auth: AuthContext = Depends(get_v3_auth_context) +) -> AuthContext: + """ + Alias for the new auth dependency to maintain compatibility + with existing V3 routes. + """ + return auth # --- Pydantic Model for Pagination --- diff --git a/app/models/api_crud_models.py b/app/models/api_crud_models.py index 53a0d77..9dde73d 100644 --- a/app/models/api_crud_models.py +++ b/app/models/api_crud_models.py @@ -3,8 +3,6 @@ import datetime, pytz from typing import Any, Dict, List, Optional, Set, Union from pydantic import BaseModel, EmailStr, Field, Json, PrivateAttr, ValidationError, validator -from app.db_sql import redis_lookup_id_random - import logging log = logging.getLogger(__name__) diff --git a/documentation/V3_FRONTEND_API_GUIDE.md b/documentation/V3_FRONTEND_API_GUIDE.md index a46aeb5..b98ee14 100644 --- a/documentation/V3_FRONTEND_API_GUIDE.md +++ b/documentation/V3_FRONTEND_API_GUIDE.md @@ -75,7 +75,35 @@ Use the `q` property in your search body for a general keyword search across ind --- -## 3. Best Practices for V3 +## 4. Authentication in V3 + +V3 supports multiple authentication methods. The backend resolves these automatically. + +### A. Standard Requests (Header) +For most API calls, use the standard Bearer token in the `Authorization` header. + +```ts +// Example: Setting the JWT in headers +headers: { + "Authorization": `Bearer ${user_jwt_token}` +} +``` + +### B. Secure File Downloads (URL Parameter) +**Crucial for `hosted_file` and `event_file`**: To allow browsers to download files without complex header modifications, you can pass the JWT directly in the URL. + +```ts +// Example: Creating a secure download link +// GET /v3/crud/hosted_file/{id}/?jwt={token} +const downloadUrl = `${BASE_URL}/hosted_file/${fileId}/?jwt=${jwtToken}`; +``` + +### C. Legacy Fallback (X-Account-ID) +For development and backward compatibility, the `X-Account-ID` header is still supported but should be phased out in favor of JWT. + +--- + +## 5. Best Practices for V3 1. **Use `view` for Rich Data**: Instead of manually joining data in separate calls, use `?view=enriched` or `?view=detail` if defined in the backend. 2. **Hybrid Search**: Use query parameters for simple toggles (enabled/hidden) and the POST body for complex logic.