Security: Implement modern JWT authentication for V3 CRUD and Search; update documentation and to-do list.

2026-01-02 20:26:44 -05:00
parent 53d252b23d
commit f865b1cfb7
4 changed files with 164 additions and 115 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -18,78 +18,44 @@ I am an interactive CLI agent assisting with software engineering tasks for One
 -   **System Name:** Aether (AE).
 -   **Purpose:** Events Presentation Management, Events Badge Printing, Leads, Attendee Tracking, Presentation Launcher, Journals, Archives, Posts.
 -   **Started:** Mid-2018.
-   **Frontend History:** Python Flask -> Svelte (current: SvelteKit). This explains legacy API calls.
-   **Current API Version (FastAPI):** Roughly v2.5.
-   **Target API Version:** v3.
+-   **Frontend History:** Python Flask -> Svelte (current: SvelteKit).
+-   **Current API Version (FastAPI):** v4.9.0.
+-   **V3 Implementation:** Modern parallel CRUD and Search endpoints under `/v3/crud`.

 ### API Versioning & Strategy

-   `/crud` (v1): Legacy, still used by some older frontend parts. Defined in `app/routers/api_crud.py`. **Remains untouched.**
-   `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint. Defined in `app/routers/api_crud_v2.py`. **Remains untouched.**
-   `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure. **Will run alongside v1 and v2.**
+-   `/crud` (v1): Legacy, still used by older frontend parts.
+-   `/v2/crud` (v2.5): Modern, preferred, and mostly functional endpoint.
+-   `/v3/crud`: The goal of this project phase. A new, parallel implementation with a refined structure and advanced search. **Runs alongside v1 and v2.**

-### V3 Architectural Goals
+### Technical Learnings
+- **Startup Errors & Logging:** The "worker failed to boot" error is often an import-time error or a logging configuration failure.
+    - **Root Cause:** If `logging.config.dictConfig` fails (e.g., due to missing `/logs` directories in Docker), the entire application crashes.
+    - **Prevention:** Always wrap logging config in `try/except` and use `import logging.config` explicitly.
+    - **Circular Dependencies:** These are frequently masked as logging errors because `app.log` is imported very early in most files. Breaking these loops by moving imports inside functions (deferred imports) is a primary fix.
+- **V3 API Dependencies:** Standardized `Response` injection should use plain type hints (e.g., `response: Response`) to avoid router initialization failures.
+- **Pydantic Compatibility:** The current environment uses Pydantic v1.10. Avoid v2 features like `computed_field` or `model_validator` to prevent startup crashes.

-   **Nested URL Structure:** Enforce hierarchical relationships (e.g., `/v3/crud/{parent_type}/{parent_id}/{child_type}/...`).
-   **Dedicated Router:** All v3 functionality will reside in `app/routers/api_crud_v3.py`.
-   **Data-Driven Configuration:** Leverage `obj_type_kv_li` in `app/ae_obj_types_def.py` for object definitions (tables, models).
+### V3 Architectural Progress (Jan 2026)

-## Session Learnings & Progress (December 3, 2025)
+-   **Modular Object Definitions:** Monolithic `ae_obj_types_def.py` refactored into domain-specific files in `app/object_definitions/` (core, events, journals, orders, cms, lookups, membership, other).
+-   **Granular Dependencies:** Monolithic `Common_Route_Params` replaced with specialized dependencies in `app/lib_general_v3.py` (AccountContext, Pagination, StatusFilter, Serialization, Delay).
+-   **Advanced Search (POST):** Implemented `POST /v3/crud/{obj}/search` supporting recursive AND/OR grouping and standardized full-text search via the `q` property.
+-   **Security Hardening:** Implemented a 5-level recursion depth limit and a field allowlist (`searchable_fields`) for the Search API.
+-   **Non-blocking Concurrency:** Standardized on `asyncio.sleep()` for delay simulation to prevent Gunicorn worker hangs.

-### Strategy Shift & V3 Development
+## Session Learnings & Progress (Jan 2-3, 2026)

-*   **Initial Plan Shift:** The strategy shifted from *migrating/replacing* existing v1/v2 routes to *building a new, parallel v3 implementation* from scratch. All existing v1/v2 routes will remain untouched.
-*   **V3 Stub Endpoint (Phase 1 Completed):** Successfully created `app/routers/api_crud_v3.py` and mounted it at `/v3/crud` in `app/main.py` with a working `/health` endpoint.
+-   **Logging Robustness:** All core modules and routers now use module-level loggers (`logging.getLogger(__name__)`). `app/log.py` includes robust `dictConfig` initialization with error handling.
+-   **Backward Compatibility:** Hybrid object definitions ensure that `/v2/crud` continues to work by including both modern (`tbl`, `mdl`) and legacy (`table_name`, `base_name`) keys.
+-   **FastAPI Best Practices:** Standardized `Response` injection via `response: Response` type hints instead of `Depends(Response)`.
+-   **Documentation:** Created `V3_FRONTEND_API_GUIDE.md` for Svelte/TypeScript integration and `V3_CRUD_ARCHITECTURE_AND_LEARNINGS.md` for backend maintenance.

-### V3 CRUD Proof-of-Concept (Journal & Journal Entry - Phase 2 & 3 Completed)
+## Current To-Do List

-Implemented the full CRUD functionality for `journal` (top-level) and `journal_entry` (nested child object), demonstrating the v3 nested URL structure and its underlying logic:
-*   **Top-Level Journal CRUD:**
-    *   `GET /v3/crud/journal/{journal_id}`
-    *   `GET /v3/crud/journal/` (list, with filtering via `for_obj_type`, `for_obj_id`, and `jp`)
-    *   `POST /v3/crud/journal/`
-    *   `PATCH /v3/crud/journal/{journal_id}`
-    *   `DELETE /v3/crud/journal/{journal_id}`
-*   **Nested Journal Entry CRUD:**
-    *   `GET /v3/crud/journal/{journal_id}/journal_entry/` (list)
-    *   `POST /v3/crud/journal/{journal_id}/journal_entry/`
-    *   `GET /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
-    *   `PATCH /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
-    *   `DELETE /v3/crud/journal/{journal_id}/journal_entry/{entry_id}`
-
-### V3 CRUD Refinements
-
-*   **Soft-Delete Functionality:** `DELETE` endpoints now support a `method` query parameter (`delete`, `disable`, `hide`) for soft-deleting (setting `enable=False` or `hide=True`) or hard-deleting records, mirroring v2 behavior.
-*   **Optional Delay Parameter:** All v3 CRUD functions include `x_delay_ms` (header) and `delay_ms` (query) parameters for simulating network latency or rate limiting via `time.sleep()`.
-
-### Current Task: Common Parameters Refactoring (Reverted)
-
-*   **Goal:** Refactor the current monolithic `commons: Common_Route_Params` dependency into smaller, more granular FastAPI dependencies.
-*   **Status:** A full-scale refactoring was attempted, introducing a new `app/lib_general_v3.py` file and modifying `api_crud_v3.py`, `main.py`, and `models/response_models.py`. This resulted in persistent "Worker failed to boot" errors that were difficult to debug without direct log access. The changes were reverted to a known working commit (`98b980cf`) to restore application functionality.
-
-### Operational Learnings
-
-*   **File Location:** The `GEMINI.md` file must always be located in the project root directory.
-*   **Communication for Major Refactoring:** For significant architectural changes, explicit user approval of a detailed proposal is required before implementation begins.
-*   **Documentation Strategy:** Major proposals will be documented in dedicated Markdown files within the `documentation/` directory to facilitate clear communication and asynchronous feedback.
-*   **Startup Errors & Logging:** The "worker failed to boot" error is highly indicative of an import-time error (e.g., circular dependency, missing module, or misconfiguration). A common cause is the logging configuration in `app/log.py` failing due to an uninitialized setting or file path issue.
-*   **Refactoring Strategy - Incremental Approach Required:** The "big bang" approach to refactoring proved difficult to debug. A more incremental strategy is required for the next attempt:
-    1.  **Isolate Logging:** First, refactor all modules (e.g., `main.py`, `models/response_models.py`) to instantiate their own module-level logger (`logger = logging.getLogger(__name__)`) instead of importing a global `log` instance from `app.log`. This will break potential circular dependencies related to logging.
-    2.  **Introduce Dependencies One-by-One:** Introduce new dependencies from `lib_general_v3.py` one at a time.
-    3.  **Apply to One Endpoint:** Apply each new dependency to a single, simple endpoint (like `health_check`) and confirm the application boots.
-    4.  **Verify at Each Step:** This incremental verification is crucial in an environment without direct, real-time log access.
-*   **Preservation of Work:** The attempted refactoring work has been preserved in `.snapshot` files for future reference.
-
---
-
-## Learnings from previous session (December 2, 2025):
-
-*   **Docker Environment Challenges:** Debugging issues in a Dockerized FastAPI environment when running locally (outside the container's execution context) is significantly more challenging due to environment mismatches and symlinked executables. Direct `uvicorn` execution for debugging is not viable in this setup. This necessitates an approach that can either:
-    *   Execute Python code snippets directly (e.g., for import validation).
-    *   Rely on external tools (like `curl` or `requests` from another script) to interact with the Dockerized API for runtime testing.
-    *   Assume that the Docker container provides the authoritative runtime environment, and local checks are primarily for static analysis (syntax, imports).
-*   **Need for Incremental Verification:** Given the complexity of the project and the debugging constraints, future changes must be exceptionally small, incremental, and verified through a robust testing strategy that can be executed, ideally, within the Docker environment or through isolated Python scripts.
-*   **Pydantic `BaseModel` Import:** Simple Pydantic `BaseModel` imports can be forgotten, leading to `NameError`. This highlights the need for automated linting or a minimal test harness that can quickly validate new model definitions in isolation.
-*   **Legacy Code vs. New Code:** When encountering errors, it's crucial to distinguish whether the error is in the new code being introduced or in existing legacy code that might have subtle interactions. The `422 Unprocessable Entity` errors were occurring in the legacy `/crud/` endpoints. This indicates that while our new factory code *itself* didn't cause those specific runtime errors, interactions or an underlying pre-existing issue became apparent when the application was loading.
-*   **`parent_table_name` for Nested CRUD:** The implementation detail of passing `parent_table_name` (or similar context) to child CRUD operations is essential for correctly linking nested resources in the database layer. The router factory's child object creation needs to pass this context explicitly.
-*   **API Endpoint Naming Convention:** The user prefers singular nouns for API endpoints (e.g., `/journal`, `/journal_entry`) over plural forms (e.g., `/journals`, `/journal_entries`). This convention will be followed for all new router creations.
+1. **Docker Environment Insight Improvements (Priority: High)**: Implement methods/endpoints to give the agent more insight into the actual Docker runtime environment (environment variables, container status, log accessibility) to better diagnose recurring startup and configuration issues.
+2. **Security - Field Allowlists (Priority: High)**: Finish populating `searchable_fields` for all remaining object definitions.
+3. **Security - Authentication (Priority: High)**: Continue refining and enforcing JWT-based authentication across all V3 endpoints.
+4. **Specialized Endpoints (Priority: Medium)**: Identify and plan the modernization of custom logic (e.g., importing, websockets) to match V3 patterns.
+5. **Directory Cleanup (Priority: Low)**: Long-term plan to archive old projects and standardize directory naming in `OSIT_dev`.
+6. **Unused Route Cleanup**: Successfully commented out `cont_edu_cert` routers in `main.py`.
--- a/app/lib_general_v3.py
+++ b/app/lib_general_v3.py
@@ -7,6 +7,7 @@ that are relevant to the v3 API, while removing unused or outdated functionaliti
 # Standard library imports
 import time
 import logging
+import jwt
 from typing import (
    Any,
    Dict,
@@ -34,64 +35,120 @@ from pydantic import (

 # Internal imports (from this project)
 from app.config import settings
-from app.db_sql import redis_lookup_id_random
-from app.log import get_logger

-logger = get_logger(__name__)
+logger = logging.getLogger(__name__)


-# --- Pydantic Model for Account Context ---
-class AccountContext(BaseModel):
-    account_id: Optional[int]
-    account_id_random: Optional[str]
-
-
-# --- Dependency Function for Account Context ---
-def get_account_context(
-    x_account_id: Optional[str] = Header(None, min_length=11, max_length=22),
-    x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100), # Assuming 'bypass' or similar string
-    x_no_account_id_token: Optional[str] = Query(None, min_length=11, max_length=22),
-) -> AccountContext:
+def decode_jwt(
+        secret_key: str,
+        token: str,
+        ) -> dict:
    """
-    Resolves the account context from headers/query parameters with defined precedence.
-    Precedence: x_account_id (header) > x_no_account_id_token (query) > x_no_account_id (header flag)
-    Raises HTTPException 403 if no valid account is found and no bypass is indicated.
+    Decodes and validates a JWT token.
+    Ported from lib_general.py to break circular dependencies.
    """
-    logger.setLevel(logging.DEBUG) # Adjust as needed
-    logger.debug(locals())
+    algorithm = 'HS256'
+    try:
+        decoded_token = jwt.decode(token, secret_key, algorithms=[algorithm])
+        if decoded_token['eat'] >= time.time():
+            return decoded_token
+        else:
+            return False
+    except Exception:
+        return None

-    resolved_account_id = None
-    resolved_account_id_random = None

+# --- Pydantic Model for Authentication Context ---
+class AuthContext(BaseModel):
+    account_id: Optional[int] = None
+    account_id_random: Optional[str] = None
+    user_id: Optional[int] = None
+    person_id: Optional[int] = None
+    auth_method: str = 'none' # 'jwt_header', 'jwt_query', 'legacy_header', 'bypass'
+
+# Alias for backward compatibility with initial V3 implementation
+AccountContext = AuthContext
+
+
+# --- Dependency Function for V3 Authentication ---
+def get_v3_auth_context(
+    request: Request,
+    authorization: Optional[str] = Header(None, description="Bearer <jwt_token>"),
+    jwt_query: Optional[str] = Query(None, alias="jwt", description="JWT token for URL-based auth (e.g., file downloads)"),
+    x_account_id: Optional[str] = Header(None, min_length=11, max_length=22, description="Legacy X-Account-ID header"),
+    x_no_account_id: Optional[str] = Header(None, min_length=3, max_length=100, description="Bypass account context header"),
+) -> AuthContext:
+    """
+    Standardized V3 Authentication Dependency.
+    Supports JWT in Authorization header (Bearer) OR 'jwt' query parameter.
+    Falls back to legacy headers for backward compatibility.
+    """
+    # Defer import to break circular dependency
+    from app.db_sql import redis_lookup_id_random
+
+    # 1. Check for JWT (Header preferred, then Query for downloads)
+    token = None
+    method = 'none'
+
+    if authorization and authorization.startswith("Bearer "):
+        token = authorization.split(" ")[1]
+        method = 'jwt_header'
+    elif jwt_query:
+        token = jwt_query
+        method = 'jwt_query'
+
+    if token:
+        payload = decode_jwt(settings.JWT_KEY, token)
+        if payload:
+            logger.info(f"JWT Validated ({method}). User: {payload.get('user_id')}, Account: {payload.get('account_id')}")
+            return AuthContext(
+                account_id=payload.get('account_id'),
+                account_id_random=payload.get('public_key'), # existing sign_jwt uses public_key for id_random
+                user_id=payload.get('user_id'),
+                person_id=payload.get('person_id'),
+                auth_method=method
+            )
+        else:
+            logger.warning(f"Invalid or expired JWT provided via {method}")
+            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid or expired authentication token.")
+
+    # 2. Legacy / Testing Fallback: x_account_id
    if x_account_id:
-        # Primary check: x_account_id header
-        resolved_account_id_random = x_account_id
        if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_account_id):
-            resolved_account_id = looked_up_id
-            logger.info(f'Found account from x_account_id header: {resolved_account_id}')
+            logger.info(f"Authenticated via legacy header: {looked_up_id}")
+            return AuthContext(
+                account_id=looked_up_id,
+                account_id_random=x_account_id,
+                auth_method='legacy_header'
+            )
        else:
-            logger.warning(f'Invalid x_account_id header provided: {x_account_id}')
-            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-Account-ID header.')
-    elif x_no_account_id_token:
-        # Secondary check: x_no_account_id_token query parameter
-        resolved_account_id_random = x_no_account_id_token
-        if looked_up_id := redis_lookup_id_random(table_name='account', record_id_random=x_no_account_id_token):
-            resolved_account_id = looked_up_id
-            logger.info(f'Found account from x_no_account_id_token query: {resolved_account_id}')
-        else:
-            logger.warning(f'Invalid x_no_account_id_token query provided: {x_no_account_id_token}')
-            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Invalid X-No-Account-ID-Token query parameter.')
-    elif x_no_account_id:
-        # Tertiary check: x_no_account_id header for bypass
-        # For now, just presence indicates bypass. Can add a specific value check later if needed.
-        logger.info(f'X-No-Account-ID header found: {x_no_account_id}. Proceeding without specific account context.')
-        resolved_account_id = None # Explicitly None for "no specific account"
-        resolved_account_id_random = '--- NO ACCOUNT ---'
-    else:
-        logger.warning('No valid account context provided via X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.')
-        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail='Account context required. Please provide X-Account-ID, X-No-Account-ID-Token, or X-No-Account-ID.')
+            raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid X-Account-ID header.")

-    return AccountContext(account_id=resolved_account_id, account_id_random=resolved_account_id_random)
+    # 3. Bypass Fallback
+    if x_no_account_id:
+        logger.info("Authentication bypassed via X-No-Account-ID")
+        return AuthContext(
+            account_id_random='--- NO ACCOUNT ---',
+            auth_method='bypass'
+        )
+
+    # 4. No Auth Found
+    logger.warning("No authentication provided for V3 endpoint.")
+    raise HTTPException(
+        status_code=status.HTTP_403_FORBIDDEN, 
+        detail="Authentication required. Provide Authorization header or 'jwt' query parameter."
+    )
+
+
+# --- Legacy wrapper to avoid breaking current V3 code ---
+def get_account_context(
+    auth: AuthContext = Depends(get_v3_auth_context)
+) -> AuthContext:
+    """
+    Alias for the new auth dependency to maintain compatibility 
+    with existing V3 routes.
+    """
+    return auth


 # --- Pydantic Model for Pagination ---
--- a/app/models/api_crud_models.py
+++ b/app/models/api_crud_models.py
@@ -3,8 +3,6 @@ import datetime, pytz
 from typing import Any, Dict, List, Optional, Set, Union
 from pydantic import BaseModel, EmailStr, Field, Json, PrivateAttr, ValidationError, validator

-from app.db_sql import redis_lookup_id_random
-
 import logging
 log = logging.getLogger(__name__)

--- a/documentation/V3_FRONTEND_API_GUIDE.md
+++ b/documentation/V3_FRONTEND_API_GUIDE.md
@@ -75,7 +75,35 @@ Use the `q` property in your search body for a general keyword search across ind

 ---

-## 3. Best Practices for V3
+## 4. Authentication in V3
+
+V3 supports multiple authentication methods. The backend resolves these automatically.
+
+### A. Standard Requests (Header)
+For most API calls, use the standard Bearer token in the `Authorization` header.
+
+```ts
+// Example: Setting the JWT in headers
+headers: {
+    "Authorization": `Bearer ${user_jwt_token}`
+}
+```
+
+### B. Secure File Downloads (URL Parameter)
+**Crucial for `hosted_file` and `event_file`**: To allow browsers to download files without complex header modifications, you can pass the JWT directly in the URL.
+
+```ts
+// Example: Creating a secure download link
+// GET /v3/crud/hosted_file/{id}/?jwt={token}
+const downloadUrl = `${BASE_URL}/hosted_file/${fileId}/?jwt=${jwtToken}`;
+```
+
+### C. Legacy Fallback (X-Account-ID)
+For development and backward compatibility, the `X-Account-ID` header is still supported but should be phased out in favor of JWT.
+
+---
+
+## 5. Best Practices for V3

 1.  **Use `view` for Rich Data**: Instead of manually joining data in separate calls, use `?view=enriched` or `?view=detail` if defined in the backend.
 2.  **Hybrid Search**: Use query parameters for simple toggles (enabled/hidden) and the POST body for complex logic.