Security: Implement modern JWT authentication for V3 CRUD and Search; update documentation and to-do list.

2026-01-02 20:26:44 -05:00
parent 53d252b23d
commit f865b1cfb7
4 changed files with 164 additions and 115 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -18,78 +18,44 @@ I am an interactive CLI agent assisting with software engineering tasks for One
 -   **System Name:** Aether (AE).
 -   **Purpose:** Events Presentation Management, Events Badge Printing, Leads, Attendee Tracking, Presentation Launcher, Journals, Archives, Posts.
 -   **Started:** Mid-2018.
-   **Frontend History:** Python Flask -> Svelte (current: SvelteKit). This explains legacy API calls.
+-   **Frontend History:** Python Flask -> Svelte (current: SvelteKit).
-   **Current API Version (FastAPI):** Roughly v2.5.
+-   **Current API Version (FastAPI):** v4.9.0.
-   **Target API Version:** v3.
+-   **V3 Implementation:** Modern parallel CRUD and Search endpoints under `/v3/crud`.
 ### API Versioning & Strategy
-   `/crud` (v1): Legacy, still used by some older frontend parts. Defined in `app/routers/api_crud.py`. **Remains untouched.**
+-   `/crud` (v1): Legacy, still used by older frontend parts.
-   `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint. Defined in `app/routers/api_crud_v2.py`. **Remains untouched.**
+-   `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint.
-   `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure. **Will run alongside v1 and v2.**
+-   `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure and advanced search. **Runs alongside v1 and v2.**
-### V3 Architectural Goals
+### Technical Learnings
 - **Startup Errors & Logging:** The "worker failed to boot" error is often an import-time error or a logging configuration failure.
    - **Root Cause:** If `logging.config.dictConfig` fails (e.g., due to missing `/logs` directories in Docker), the entire application crashes.
    - **Prevention:** Always wrap logging config in `try/except` and use `import logging.config` explicitly.
    - **Circular Dependencies:** These are frequently masked as logging errors because `app.log` is imported very early in most files. Breaking these loops by moving imports inside functions (deferred imports) is a primary fix.
 - **V3 API Dependencies:** Standardized `Response` injection should use plain type hints (e.g., `response: Response`) to avoid router initialization failures.
 - **Pydantic Compatibility:** The current environment uses Pydantic v1.10. Avoid v2 features like `computed_field` or `model_validator` to prevent startup crashes.
-   **Nested URL Structure:** Enforce hierarchical relationships (e.g., `/v3/crud/{parent_type}/{parent_id}/{child_type}/...`).
+### V3 Architectural Progress (Jan 2026)
 -   **Dedicated Router:** All v3 functionality will reside in `app/routers/api_crud_v3.py`.
 -   **Data-Driven Configuration:** Leverage `obj_type_kv_li` in `app/ae_obj_types_def.py` for object definitions (tables, models).
-## Session Learnings & Progress (December 3, 2025)
+-   **Modular Object Definitions:** Monolithic `ae_obj_types_def.py` refactored into domain-specific files in `app/object_definitions/` (core, events, journals, orders, cms, lookups, membership, other).
 -   **Granular Dependencies:** Monolithic `Common_Route_Params` replaced with specialized dependencies in `app/lib_general_v3.py` (AccountContext, Pagination, StatusFilter, Serialization, Delay).
 -   **Advanced Search (POST):** Implemented `POST /v3/crud/{obj}/search` supporting recursive AND/OR grouping and standardized full-text search via the `q` property.
 -   **Security Hardening:** Implemented a 5-level recursion depth limit and a field allowlist (`searchable_fields`) for the Search API.
 -   **Non-blocking Concurrency:** Standardized on `asyncio.sleep()` for delay simulation to prevent Gunicorn worker hangs.
-### Strategy Shift & V3 Development
+## Session Learnings & Progress (Jan 2-3, 2026)
-*   **Initial Plan Shift:** The strategy shifted from *migrating/replacing* existing v1/v2 routes to *building a new, parallel v3 implementation* from scratch. All existing v1/v2 routes will remain untouched.
+-   **Logging Robustness:** All core modules and routers now use module-level loggers (`logging.getLogger(__name__)`). `app/log.py` includes robust `dictConfig` initialization with error handling.
-*   **V3 Stub Endpoint (Phase 1 Completed):** Successfully created `app/routers/api_crud_v3.py` and mounted it at `/v3/crud` in `app/main.py` with a working `/health` endpoint.
+-   **Backward Compatibility:** Hybrid object definitions ensure that `/v2/crud` continues to work by including both modern (`tbl`, `mdl`) and legacy (`table_name`, `base_name`) keys.
 -   **FastAPI Best Practices:** Standardized `Response` injection via `response: Response` type hints instead of `Depends(Response)`.
 -   **Documentation:** Created `V3_FRONTEND_API_GUIDE.md` for Svelte/TypeScript integration and `V3_CRUD_ARCHITECTURE_AND_LEARNINGS.md` for backend maintenance.
-### V3 CRUD Proof-of-Concept (Journal & Journal Entry - Phase 2 & 3 Completed)
+## Current To-Do List
-Implemented the full CRUD functionality for `journal` (top-level) and `journal_entry` (nested child object), demonstrating the v3 nested URL structure and its underlying logic:
+1. **Docker Environment Insight Improvements (Priority: High)**: Implement methods/endpoints to give the agent more insight into the actual Docker runtime environment (environment variables, container status, log accessibility) to better diagnose recurring startup and configuration issues.
-*   **Top-Level Journal CRUD:**
+2. **Security - Field Allowlists (Priority: High)**: Finish populating `searchable_fields` for all remaining object definitions.
-    *   `GET /v3/crud/journal/{journal_id}`
+3. **Security - Authentication (Priority: High)**: Continue refining and enforcing JWT-based authentication across all V3 endpoints.
-    *   `GET /v3/crud/journal/` (list, with filtering via `for_obj_type`, `for_obj_id`, and `jp`)
+4. **Specialized Endpoints (Priority: Medium)**: Identify and plan the modernization of custom logic (e.g., importing, websockets) to match V3 patterns.
-    *   `POST /v3/crud/journal/`
+5. **Directory Cleanup (Priority: Low)**: Long-term plan to archive old projects and standardize directory naming in `OSIT_dev`.
-    *   `PATCH /v3/crud/journal/{journal_id}`
+6. **Unused Route Cleanup**: Successfully commented out `cont_edu_cert` routers in `main.py`.
    *   `DELETE /v3/crud/journal/{journal_id}`
 *   **Nested Journal Entry CRUD:**
    *   `GET /v3/crud/journal/{journal_id}/journal_entry/` (list)
    *   `POST /v3/crud/journal/{journal_id}/journal_entry/`
    *   `GET /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
    *   `PATCH /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
    *   `DELETE /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
 ### V3 CRUD Refinements
 *   **Soft-Delete Functionality:** `DELETE` endpoints now support a `method` query parameter (`delete`, `disable`, `hide`) for soft-deleting (setting `enable=False` or `hide=True`) or hard-deleting records, mirroring v2 behavior.
 *   **Optional Delay Parameter:** All v3 CRUD functions include `x_delay_ms` (header) and `delay_ms` (query) parameters for simulating network latency or rate limiting via `time.sleep()`.
 ### Current Task: Common Parameters Refactoring (Reverted)
 *   **Goal:** Refactor the current monolithic `commons: Common_Route_Params` dependency into smaller, more granular FastAPI dependencies.
 *   **Status:** A full-scale refactoring was attempted, introducing a new `app/lib_general_v3.py` file and modifying `api_crud_v3.py`, `main.py`, and `models/response_models.py`. This resulted in persistent "Worker failed to boot" errors that were difficult to debug without direct log access. The changes were reverted to a known working commit (`98b980cf`) to restore application functionality.
 ### Operational Learnings
 *   **File Location:** The `GEMINI.md` file must always be located in the project root directory.
 *   **Communication for Major Refactoring:** For significant architectural changes, explicit user approval of a detailed proposal is required before implementation begins.
 *   **Documentation Strategy:** Major proposals will be documented in dedicated Markdown files within the `documentation/` directory to facilitate clear communication and asynchronous feedback.
 *   **Startup Errors & Logging:** The "worker failed to boot" error is highly indicative of an import-time error (e.g., circular dependency, missing module, or misconfiguration). A common cause is the logging configuration in `app/log.py` failing due to an uninitialized setting or file path issue.
 *   **Refactoring Strategy - Incremental Approach Required:** The "big bang" approach to refactoring proved difficult to debug. A more incremental strategy is required for the next attempt:
    1.  **Isolate Logging:** First, refactor all modules (e.g., `main.py`, `models/response_models.py`) to instantiate their own module-level logger (`logger = logging.getLogger(__name__)`) instead of importing a global `log` instance from `app.log`. This will break potential circular dependencies related to logging.
    2.  **Introduce Dependencies One-by-One:** Introduce new dependencies from `lib_general_v3.py` one at a time.
    3.  **Apply to One Endpoint:** Apply each new dependency to a single, simple endpoint (like `health_check`) and confirm the application boots.
    4.  **Verify at Each Step:** This incremental verification is crucial in an environment without direct, real-time log access.
 *   **Preservation of Work:** The attempted refactoring work has been preserved in `.snapshot` files for future reference.
 ---
 ## Learnings from previous session (December 2, 2025):
 *   **Docker Environment Challenges:** Debugging issues in a Dockerized FastAPI environment when running locally (outside the container's execution context) is significantly more challenging due to environment mismatches and symlinked executables. Direct `uvicorn` execution for debugging is not viable in this setup. This necessitates an approach that can either:
    *   Execute Python code snippets directly (e.g., for import validation).
    *   Rely on external tools (like `curl` or `requests` from another script) to interact with the Dockerized API for runtime testing.
    *   Assume that the Docker container provides the authoritative runtime environment, and local checks are primarily for static analysis (syntax, imports).
 *   **Need for Incremental Verification:** Given the complexity of the project and the debugging constraints, future changes must be exceptionally small, incremental, and verified through a robust testing strategy that can be executed, ideally, within the Docker environment or through isolated Python scripts.
 *   **Pydantic `BaseModel` Import:** Simple Pydantic `BaseModel` imports can be forgotten, leading to `NameError`. This highlights the need for automated linting or a minimal test harness that can quickly validate new model definitions in isolation.
 *   **Legacy Code vs. New Code:** When encountering errors, it's crucial to distinguish whether the error is in the new code being introduced or in existing legacy code that might have subtle interactions. The `422 Unprocessable Entity` errors were occurring in the legacy `/crud/` endpoints. This indicates that while our new factory code *itself* didn't cause those specific runtime errors, interactions or an underlying pre-existing issue became apparent when the application was loading.
 *   **`parent_table_name` for Nested CRUD:** The implementation detail of passing `parent_table_name` (or similar context) to child CRUD operations is essential for correctly linking nested resources in the database layer. The router factory's child object creation needs to pass this context explicitly.
 *   **API Endpoint Naming Convention:** The user prefers singular nouns for API endpoints (e.g., `/journal`, `/journal_entry`) over plural forms (e.g., `/journals`, `/journal_entries`). This convention will be followed for all new router creations.
--- a/app/lib_general_v3.py
+++ b/app/lib_general_v3.py
@@ -7,6 +7,7 @@ that are relevant to the v3 API, while removing unused or outdated functionaliti
 # Standard library imports
 import time
 import logging
 import jwt
 from typing import (
    Any,
    Dict,
@@ -34,64 +35,120 @@ from pydantic import (
 # Internal imports (from this project)
 from app.config import settings
 from app.db_sql import redis_lookup_id_random
 from app.log import get_logger
-logger = get_logger(__name__)
+logger = logging.getLogger(__name__)
-# --- Pydantic Model for Account Context ---
+def decode_jwt(
-class AccountContext(BaseModel):
+        secret_key: str,
-    account_id: Optional[int]
+        token: str,
-    account_id_random: Optional[str]
+        ) -> dict:
 # --- Dependency Function for Account Context ---
 def get_account_context(
    x_account_id: Optional[str] = Header(None, min_length=11, max_length=22),
    x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100), # Assuming 'bypass' or similar string
    x_no_account_id_token: Optional[str] = Query(None, min_length=11, max_length=22),
 ) -> AccountContext:
    """
-    Resolves the account context from headers/query parameters with defined precedence.
+    Decodes and validates a JWT token.
-    Precedence: x_account_id (header) > x_no_account_id_token (query) > x_no_account_id (header flag)
+    Ported from lib_general.py to break circular dependencies.
    Raises HTTPException 403 if no valid account is found and no bypass is indicated.
    """
-    logger.setLevel(logging.DEBUG) # Adjust as needed
+    algorithm = 'HS256'
-    logger.debug(locals())
+    try:
        decoded_token = jwt.decode(token, secret_key, algorithms=[algorithm])
        if decoded_token['eat'] >= time.time():
            return decoded_token
        else:
            return False
    except Exception:
        return None
    resolved_account_id = None
    resolved_account_id_random = None
 # --- Pydantic Model for Authentication Context ---
 class AuthContext(BaseModel):
    account_id: Optional[int] = None
    account_id_random: Optional[str] = None
    user_id: Optional[int] = None
    person_id: Optional[int] = None
    auth_method: str = 'none' # 'jwt_header', 'jwt_query', 'legacy_header', 'bypass'
 # Alias for backward compatibility with initial V3 implementation
 AccountContext = AuthContext
 # --- Dependency Function for V3 Authentication ---
 def get_v3_auth_context(
    request: Request,
    authorization: Optional[str] = Header(None, description="Bearer <jwt_token>"),
    jwt_query: Optional[str] = Query(None, alias="jwt", description="JWT token for URL-based auth (e.g., file downloads)"),
    x_account_id: Optional[str] = Header(None, min_length=11, max_length=22, description="Legacy X-Account-ID header"),
    x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100, description="Bypass account context header"),
 ) -> AuthContext:
    """
    Standardized V3 Authentication Dependency.
    Supports JWT in Authorization header (Bearer) OR 'jwt' query parameter.
    Falls back to legacy headers for backward compatibility.
    """
    # Defer import to break circular dependency
    from app.db_sql import redis_lookup_id_random
    # 1. Check for JWT (Header preferred, then Query for downloads)
    token = None
    method = 'none'
    if authorization and authorization.startswith("Bearer "):
        token = authorization.split(" ")[1]
        method = 'jwt_header'
    elif jwt_query:
        token = jwt_query
        method = 'jwt_query'
    if token:
        payload = decode_jwt(settings.JWT_KEY, token)
        if payload:
            logger.info(f"JWT Validated ({method}). User: {payload.get('user_id')}, Account: {payload.get('account_id')}")
            return AuthContext(
                account_id=payload.get('account_id'),
                account_id_random=payload.get('public_key'), # existing sign_jwt uses public_key for id_random
                user_id=payload.get('user_id'),
                person_id=payload.get('person_id'),
                auth_method=method
            )
        else:
            logger.warning(f"Invalid or expired JWT provided via {method}")
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid or expired authentication token.")
    # 2. Legacy / Testing Fallback: x_account_id
    if x_account_id:
        # Primary check: x_account_id header
        resolved_account_id_random = x_account_id
        if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_account_id):
-            resolved_account_id = looked_up_id
+            logger.info(f"Authenticated via legacy header: {looked_up_id}")
-            logger.info(f'Found account from x_account_id header: {resolved_account_id}')
+            return AuthContext(
                account_id=looked_up_id,
                account_id_random=x_account_id,
                auth_method='legacy_header'
            )
        else:
-            logger.warning(f'Invalid x_account_id header provided: {x_account_id}')
+            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid X-Account-ID header.")
            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-Account-ID header.')
    elif x_no_account_id_token:
        # Secondary check: x_no_account_id_token query parameter
        resolved_account_id_random = x_no_account_id_token
        if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_no_account_id_token):
            resolved_account_id = looked_up_id
            logger.info(f'Found account from x_no_account_id_token query: {resolved_account_id}')
        else:
            logger.warning(f'Invalid x_no_account_id_token query provided: {x_no_account_id_token}')
            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-No-Account-ID-Token query parameter.')
    elif x_no_account_id:
        # Tertiary check: x_no_account_id header for bypass
        # For now, just presence indicates bypass. Can add a specific value check later if needed.
        logger.info(f'X-No-Account-ID header found: {x_no_account_id}. Proceeding without specific account context.')
        resolved_account_id = None # Explicitly None for "no specific account"
        resolved_account_id_random = '--- NO ACCOUNT ---'
    else:
        logger.warning('No valid account context provided via X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.')
        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Account context required. Please provide X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.')
-    return AccountContext(account_id=resolved_account_id, account_id_random=resolved_account_id_random)
+    # 3. Bypass Fallback
    if x_no_account_id:
        logger.info("Authentication bypassed via X-No-Account-ID")
        return AuthContext(
            account_id_random='--- NO ACCOUNT ---',
            auth_method='bypass'
        )
    # 4. No Auth Found
    logger.warning("No authentication provided for V3 endpoint.")
    raise HTTPException(
        status_code=status.HTTP_403_FORBIDDEN, 
        detail="Authentication required. Provide Authorization header or 'jwt' query parameter."
    )
 # --- Legacy wrapper to avoid breaking current V3 code ---
 def get_account_context(
    auth: AuthContext = Depends(get_v3_auth_context)
 ) -> AuthContext:
    """
    Alias for the new auth dependency to maintain compatibility 
    with existing V3 routes.
    """
    return auth
 # --- Pydantic Model for Pagination ---
--- a/app/models/api_crud_models.py
+++ b/app/models/api_crud_models.py
@@ -3,8 +3,6 @@ import datetime, pytz
 from typing import Any, Dict, List, Optional, Set, Union
 from pydantic import BaseModel, EmailStr, Field, Json, PrivateAttr, ValidationError, validator
 from app.db_sql import redis_lookup_id_random
 import logging
 log = logging.getLogger(__name__)
--- a/documentation/V3_FRONTEND_API_GUIDE.md
+++ b/documentation/V3_FRONTEND_API_GUIDE.md
@@ -75,7 +75,35 @@ Use the `q` property in your search body for a general keyword search across ind
 ---
-## 3. Best Practices for V3
+## 4. Authentication in V3
 V3 supports multiple authentication methods. The backend resolves these automatically.
 ### A. Standard Requests (Header)
 For most API calls, use the standard Bearer token in the `Authorization` header.
 ```ts
 // Example: Setting the JWT in headers
 headers: {
    "Authorization": `Bearer ${user_jwt_token}`
 }
 ```
 ### B. Secure File Downloads (URL Parameter)
 **Crucial for `hosted_file` and `event_file`**: To allow browsers to download files without complex header modifications, you can pass the JWT directly in the URL.
 ```ts
 // Example: Creating a secure download link
 // GET /v3/crud/hosted_file/{id}/?jwt={token}
 const downloadUrl = `${BASE_URL}/hosted_file/${fileId}/?jwt=${jwtToken}`;
 ```
 ### C. Legacy Fallback (X-Account-ID)
 For development and backward compatibility, the `X-Account-ID` header is still supported but should be phased out in favor of JWT.
 ---
 ## 5. Best Practices for V3
 1.  **Use `view` for Rich Data**: Instead of manually joining data in separate calls, use `?view=enriched` or `?view=detail` if defined in the backend.
 2.  **Hybrid Search**: Use query parameters for simple toggles (enabled/hidden) and the POST body for complex logic.