fix(download): strict ID typing — remove cross-resolution from hosted_file download

event_file download now resolves event_file_id → hosted_file_id explicitly before delegating, rather than relying on a cross-resolution fallback inside the hosted_file endpoint. The hosted_file download endpoint now only accepts hosted_file IDs. Cross-resolution was added reactively (ea117bf) to patch incorrect frontend ID usage and was never a deliberate design decision. With no per-record account ownership check on the download path, the implicit ID aliasing was an unauditable gap. - download_event_file_action: resolves event_file → hosted_file via Redis + SQL before delegating; 404s explicitly if chain is broken - download_file_action: strict hosted_file ID only; cross-resolution fallback removed - Also fixes ?key= not being forwarded (was missing from event_file endpoint signature) - TODO: per-record account ownership check (P1), archive_content download endpoint (P2) - Docs: breaking change note added to frontend guide (remove ~2026-06-24) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 14:46:47 -04:00
parent 2429a1f731
commit 3806613427
4 changed files with 25 additions and 25 deletions
--- a/app/routers/api_v3_actions_event_file.py
+++ b/app/routers/api_v3_actions_event_file.py
@@ -281,20 +281,34 @@ async def download_event_file_action(
        response: Response,
        event_file_id: str = Path(min_length=11, max_length=22),
        filename: Optional[str] = Query(None, min_length=4, max_length=255),
+        key: Optional[str] = Query(None),
        site_key: Optional[str] = Query(None),
        range: Optional[str] = Header(None),
        account: AccountContext = Depends(get_account_context_optional),
        delay: DelayParams = Depends(),
        ):
    """
-    Semantic alias for hosted_file download with Event-specific context.
+    Download the underlying file for an event_file record.
+    Resolves event_file_id → hosted_file_id explicitly before delegating.
    """
-    # Simply delegate to the universal hosted_file download logic
+    ef_int_id = redis_lookup_id_random(record_id_random=event_file_id, table_name='event_file')
+    if not ef_int_id:
+        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Event file record not found.")
+
+    ef_rec = sql_select(sql="SELECT hosted_file_id FROM event_file WHERE id = :id", data={'id': ef_int_id})
+    if not ef_rec or not ef_rec.get('hosted_file_id'):
+        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Event file has no associated hosted file.")
+
+    hosted_file_id_random = get_id_random(record_id=ef_rec['hosted_file_id'], table_name='hosted_file')
+    if not hosted_file_id_random:
+        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Hosted file record not found.")
+
    from app.routers.api_v3_actions_hosted_file import download_file_action
    return await download_file_action(
        response=response,
-        hosted_file_id=event_file_id, # The universal downloader now resolves this!
+        hosted_file_id=hosted_file_id_random,
        filename=filename,
+        key=key,
        site_key=site_key,
        range=range,
        account=account,
--- a/app/routers/api_v3_actions_hosted_file.py
+++ b/app/routers/api_v3_actions_hosted_file.py
@@ -227,28 +227,11 @@ async def download_file_action(
    if not is_authorized:
        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Authentication required or invalid access key.")

-    # 2. Resolve File Record
-    # ID Vision: Attempt to resolve the ID.
-    # 🛑 REMINDER: If adding a new specialized 'container' object (like event_person_profile),
-    # ensure the lookup logic is mirrored here to allow direct downloads via container ID.
-    # If not found in hosted_file, check if it's an event_file or archive_content ID that we can resolve.
+    # 2. Resolve File Record — strict: only accepts hosted_file IDs.
+    # Container objects (event_file, archive_content) must resolve to a hosted_file_id
+    # in their own dedicated download endpoints before calling this function.
    resolved_id = redis_lookup_id_random(record_id_random=hosted_file_id, table_name='hosted_file')

-    if not resolved_id:
-        log.info(f"ID {hosted_file_id} not found in hosted_file. Checking container tables...")
-        # A. Check event_file
-        if ef_id := redis_lookup_id_random(record_id_random=hosted_file_id, table_name='event_file'):
-            if ef_rec := sql_select(sql="SELECT hosted_file_id FROM event_file WHERE id = :id", data={'id': ef_id}):
-                resolved_id = ef_rec.get('hosted_file_id')
-                log.info(f"Resolved event_file {hosted_file_id} to hosted_file {resolved_id}")
-
-        # B. Check archive_content
-        if not resolved_id:
-            if ac_id := redis_lookup_id_random(record_id_random=hosted_file_id, table_name='archive_content'):
-                if ac_rec := sql_select(sql="SELECT hosted_file_id FROM archive_content WHERE id = :id", data={'id': ac_id}):
-                    resolved_id = ac_rec.get('hosted_file_id')
-                    log.info(f"Resolved archive_content {hosted_file_id} to hosted_file {resolved_id}")
-
    if not resolved_id:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Hosted file record not found.")

--- a/documentation/GUIDE__AE_API_V3_for_Frontend.md
+++ b/documentation/GUIDE__AE_API_V3_for_Frontend.md
@@ -286,7 +286,7 @@ When seeding new lookup data (e.g., adding timezones in bulk):

 ## 5. Event File Data Retrieval (Hosted Files)

-Every Event File (`event_file`) **must** have a linked Hosted File (`hosted_file`). The Hosted File itself is a metadata record for binary content (files), which is accessed via separate Action endpoints (e.g., `/v3/action/hosted_file/download`). This API endpoint provides metadata about the associated hosted file. To retrieve this additional metadata:
+Every Event File (`event_file`) **must** have a linked Hosted File (`hosted_file`). The Hosted File is a metadata record for binary content, accessed via dedicated Action endpoints. To download an event file use `/v3/action/event_file/{event_file_id}/download` — not the hosted_file endpoint directly (each endpoint only accepts its own ID type). To retrieve hosted file metadata alongside an event file record:

 *   **Endpoint:** `GET /v3/crud/event_file/{event_file_id}`
 *   **Query Parameter:** Add `inc_hosted_file=true`
@@ -308,7 +308,8 @@ Event files can be downloaded without standard auth headers using one of two byp
 - **Method:** `GET`
 - **Path:** `/v3/action/event_file/{event_file_id}/download`

-The endpoint also accepts `hosted_file_id` or `archive_content_id` directly — it resolves the chain automatically.
+> [!WARNING]
+> **Breaking change (2026-06-10):** This endpoint now requires an `event_file_id`. Previously it accepted `hosted_file_id` or `archive_content_id` and resolved the chain automatically — that cross-resolution has been removed. Pass the correct ID type for the endpoint you are calling. If you were routing downloads through `/v3/action/hosted_file/{hosted_file_id}/download` as a workaround, switch to this endpoint using `event_file_id`. *(Remove this note after ~2026-06-24.)*

 #### Auth bypass options

--- a/documentation/TODO__Agents.md
+++ b/documentation/TODO__Agents.md
@@ -23,6 +23,8 @@
 - [x] **[P4] Expose `pool_size` / `max_overflow` as env vars** — `create_ae_engine()` calls `settings.DB.get('pool_size', 10)` but `settings.DB` property doesn't include those keys, so they're always hardcoded 10/20. Add `AE_DB_POOL_SIZE` / `AE_DB_POOL_MAX_OVERFLOW` to `config.py`.

 ## 📋 Feature Tasks
+- [ ] **[P1] Download endpoint account ownership check (June 2026):** `download_file_action` (hosted_file) has no per-record account isolation — any authenticated caller can download any file if they know the ID. After passing the auth gate (`account.auth_method != 'guest'`), the endpoint loads and serves the file with no check that `hosted_file.account_id` matches `account.account_id`. Fix: after loading `hosted_file_obj`, compare `hosted_file_obj.account_id` to `account.account_id`; reject with 403 unless `account.super` or `auth_method == 'bypass'`. Apply the same check in `download_event_file_action` after the `event_file` record is resolved (the event_file's `account_id` is the authoritative ownership gate for that path). Note: `?key=` and `?site_key=` bypass paths are intentionally loose and should remain unchanged for external link support.
+- [ ] **[P2] `archive_content` download endpoint (June 2026):** No dedicated download endpoint exists for `archive_content`. Previously, callers could pass an `archive_content_id` to the `hosted_file` download endpoint and cross-resolution would handle it — this was removed as part of the Option B strict-ID-typing change. Needs a new `GET /v3/action/archive_content/{archive_content_id}/download` endpoint that resolves `archive_content_id → hosted_file_id` explicitly (same pattern as `download_event_file_action`).
 - [x] **`order_by_li` view-join ambiguity fix (June 2026):** Using view-only join columns (e.g. `event_presenter_family_name` from `v_event_file`) in `order_by_li` caused MariaDB error "Unknown column 'account_id' in WHERE" (HTTP 400). Root cause: `filter_order_by` validated columns against the view — which passes for join-derived fields — and `sql_and_qry_part` generates an unqualified `account_id =` clause that becomes ambiguous when MariaDB expands the view's JOIN inline. Fix: `filter_order_by` now accepts `raw_table_name` and validates ORDER BY columns against the physical table only. Join-only view columns are silently stripped. Updated all three call sites in `api_crud_v3.py` (×2) and `api_crud_v3_nested.py` (×2). **Follow-up (lower priority):** qualify `account_id` in `sql_and_qry_part` to fix the root ambiguity for any future JOIN-capable views.
 - [x] **Core Isolation:** Harden `apply_forced_account_filter` to Fail-Closed.
 - [x] **IDAA Baseline:** Remove `public_read` from Event, CMS, and Archive objects.