10 Commits

Author SHA1 Message Date
Scott Idem
b3029a4d27 docs: update TODO and add BOOTSTRAP mistake #13 for API retry regression
TODO__Agents.md:
- Added the two additional fixes from the review pass to the PATCH/DELETE
  retry hardening entry: default timeout 60s→20s, and DELETE missing
  ae_auth_error banner on 401/403.

BOOTSTRAP__AI_Agent_Quickstart.md:
- Added mistake #13: breaking the API retry loop by returning errors from
  the TypeError/AbortError block instead of throwing them. Documents the
  Jan 2026 regression (commit a10accfaa), the three retry classes that must
  be preserved, and a quick verification method.
- Filled the gap at item #7 (was missing, causing off-by-one numbering
  from item 8 onward). Items renumbered 8-14 → 7-13.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 18:21:01 -04:00
Scott Idem
ea765d8ad2 fix(api): lower patch/delete timeout to 20s and add delete auth error banner
Two gaps found during review of the recent retry-hardening commits:

1. api_patch_object.ts and api_delete_object.ts still defaulted to 60s
   timeout while GET/POST were lowered to 20s. No callers set an explicit
   timeout, so the default was the only value used. With retry_count=5 and
   the new backoff policy, 60s per attempt = 5+ minutes worst-case wait.
   Lowered to 20s to match GET/POST and keep worst-case under ~2 minutes.

2. api_delete_object.ts had no ae_auth_error import and no session-expired
   banner on 401/403. A stale-session DELETE would silently return false
   with no user feedback. Added browser + ae_auth_error imports and the
   ae_auth_error.set() call matching the pattern in GET/POST/PATCH.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 18:11:32 -04:00
Scott Idem
db5acdd30a docs: align API retry hardening status with implemented helpers 2026-05-21 18:04:06 -04:00
Scott Idem
a000e07647 api: harden delete retry classification and backoff 2026-05-21 17:58:59 -04:00
Scott Idem
7f9368589a api: harden patch retry classification and backoff 2026-05-21 17:53:30 -04:00
Scott Idem
55d3d49595 test: add v3 latency probe and modernize api coverage 2026-05-21 17:48:00 -04:00
Scott Idem
f5cf1ef398 api: separate timeout abort retries from intentional aborts 2026-05-21 15:46:30 -04:00
Scott Idem
d5d552a029 Badge layout fix for Axonius 2026-05-21 15:19:48 -04:00
Scott Idem
689bb326cb fix(api): restore network-error retry and add backoff in get/post_object
The Jan 2026 "offline-first fast-paths" commit (a10accfaa) inadvertently
broke retries for transient network failures (ERR_NETWORK_CHANGED, WiFi
roam events, etc.). The original code's .catch() returned undefined, which
fell through to the `if (!response) throw` path and correctly entered the
retry loop. After a10accfaa, .catch() returned the error as a value, and
the subsequent `instanceof Error` check returned false immediately —
bypassing all retries for the most common failure mode in
hotel/conference environments.

Changes:
- TypeError now throws into the retry loop instead of returning false
- AbortError still returns false immediately (intentional cancel, no retry)
- Per-attempt AbortController: moved inside the loop in both files so each
  retry gets its own independent timeout (previously GET retries had no
  timeout at all after the first attempt's clearTimeout ran)
- clearTimeout() added to catch block so timer is always cancelled on error
- Exponential backoff added: 2s→4s→6s→8s (capped) between attempts;
  rapid retries on a flaky network accomplish nothing without a delay
- Default timeout lowered: 90s → 20s (generous for search/GET but avoids
  the 90s worst-case hang that amplified ERR_NETWORK_CHANGED exposure)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 13:44:12 -04:00
Scott Idem
e6db2b4d6a fix(idaa): add Clear Cache & Reload escape hatch to recovery meetings server error state
"Try Again" resets auto_retry_count but reuses the same localStorage state — if
ae_loc or ae_idaa_loc holds a stale account_id or api_secret_key, every retry
fails identically and the user is stuck in an infinite error loop.

New button clears ae_loc + ae_idaa_loc from localStorage and db_events.event
from IDB, then reloads via the sessionStorage-preserved UUID URL (same logic as
the IDAA layout's Clear Cache & Reload). Forces a fresh FQDN handshake and
re-derives correct auth state. Guidance text shown so users know to try it when
Try Again keeps failing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 12:30:53 -04:00
12 changed files with 730 additions and 102 deletions

View File

@@ -300,14 +300,14 @@ These are real incidents — know them before you start.
6. **Deleting files with `rm`** — always move to `~/tmp/agents_trash`. A deleted file may
contain context that's not recoverable from git if it was gitignored.
8. **Dexie `.get()` with a string object ID returns `undefined`** — Dexie `.get(value)`
7. **Dexie `.get()` with a string object ID returns `undefined`** — Dexie `.get(value)`
looks up by the table's **primary key**, which is `id` (the first schema field). The V3
API never returns `id`, so it is always `undefined` in stored records. Passing a string
object ID (e.g. `person_id`) to `.get()` will silently return nothing. Always use
`.where('person_id').equals(person_id).first()` instead. This has caused liveQuery
blocks to always produce `undefined` even when the record exists in Dexie.
9. **Treating `$effect` blocks as auth bypass risks** — a `$effect` inside a child
8. **Treating `$effect` blocks as auth bypass risks** — a `$effect` inside a child
component cannot bypass a parent `+layout.svelte` auth gate. Children only mount if
the parent calls `{@render children?.()}`. Adding redundant auth guards to `$effect`
blocks that can only run after the parent gate already passed is unnecessary — and
@@ -317,13 +317,13 @@ These are real incidents — know them before you start.
clean of data loads in private modules. See `GUIDE__SvelteKit2_Svelte5_DexieJS.md`
"SvelteKit Layout Hierarchy: Security and Execution Order" for the full explanation.
10. **Using query `key` as a proxy for bypass stripped `x-account-id`** — this caused
9. **Using query `key` as a proxy for bypass stripped `x-account-id`** — this caused
valid account-scoped requests to lose account context and 403. `key` can be a valid
endpoint/business param, but it is not equivalent to `x-no-account-id: bypass`. Keep
`x-no-account-id` usage narrow and temporary; do not expand it without a documented
allowlist case.
11. **Pre-stringifying `*_json` fields before passing to API wrappers** — the API wrappers
10. **Pre-stringifying `*_json` fields before passing to API wrappers** — the API wrappers
(`api_post__crud_obj.ts` for V3, `api.ts` for legacy CRUD) automatically serialize any
field ending in `_json` (e.g. `cfg_json`, `data_json`). Pass these as plain JS objects.
Pre-stringifying with `JSON.stringify()` before calling the wrapper will double-encode
@@ -331,12 +331,12 @@ These are real incidents — know them before you start.
redundant on the V3 path. Both paths now pretty-print with 2-space indent.
See `GUIDE__AE_API_V3_for_Frontend.md` → section 3C for the full explanation.
12. **Broad Dexie result windows get silently clipped** — if a broad "All" view shows fewer
11. **Broad Dexie result windows get silently clipped** — if a broad "All" view shows fewer
rows than a narrower filter, check for a page-level limit or an API revalidation step
replacing the local IDB result set. For empty text searches, the full local result set
should drive the display; server refreshes should update cache, not shrink visibility.
13. **Not bumping `IDB_CONTENT_VERSIONS` when changing `properties_to_save`** — this caused
12. **Not bumping `IDB_CONTENT_VERSIONS` when changing `properties_to_save`** — this caused
the IDAA Recovery Meetings "no meetings found" bug for approximately one year (20252026).
**What happened:** A deploy changed `properties_to_save` in `ae_events__event.ts`, but no
@@ -368,6 +368,35 @@ These are real incidents — know them before you start.
0 results in your templates. Silent failures look like data problems and are extremely
difficult to diagnose.
13. **Breaking the API retry loop by returning errors instead of throwing them** — all four
`api_*_object.ts` files (`api_get_object.ts`, `api_post_object.ts`, `api_patch_object.ts`,
`api_delete_object.ts`) use a `.catch()` that returns the error as a value, followed by a
classification block. That block **must throw** for transient network failures (`TypeError`)
so they enter the retry loop. If you change it to `return false`, retries are silently
bypassed for the most common failure mode in hotel/conference WiFi — and nothing warns you.
**What happened (commit a10accfaa, Jan 2026):** A "silence background fetch noise" commit
changed `.catch()` to explicitly `return error`, then the classification block was changed
from a `throw` to `return false`. `TypeError` from `ERR_NETWORK_CHANGED` — the most common
failure on crowded WiFi — stopped retrying. The `retry_count = 5` parameter became dead
code for network errors. Went undetected for ~4 months.
**The retry classification these files must honor:**
- `TypeError` (ERR_NETWORK_CHANGED, WiFi blip) → **`throw`** → enters retry loop with backoff
- `AbortError` where `did_timeout_abort = true` (helper's own timer) → **`throw`** → retries
- `AbortError` where `did_timeout_abort = false` (navigation/unmount abort) → `return false`
- HTTP 400/401/403/422 → `return false` immediately (client errors are deterministic)
- HTTP 5xx → **`throw`** → retries with backoff
**How to verify after any change to the error block:** confirm that a `TypeError` still
produces up to 5 retry attempts with 2s→4s→6s→8s delays before returning false. A single
`return false` after the first network failure means the retry loop is broken.
**Also:** when reviewing these files, check that all four have:
- `ae_auth_error.set()` triggered on 401/403 (shows session-expired banner to the user)
- `timeout = 20000` default (was 60s in PATCH/DELETE until 2026-05-21 — 5-min worst case)
- `did_timeout_abort` flag per attempt (separates helper timeouts from caller aborts)
---
## 8. Source Layout (Quick Reference)

View File

@@ -156,6 +156,108 @@ below. The TTL + `verify_in_flight` guards are the current mitigation.
---
### [API] GET/POST retry hardening — differentiate timeout aborts vs intentional aborts
**Status:** ✅ Completed (2026-05-21)
Recent API helper fixes restored retry/backoff for transient network `TypeError` failures.
Timeout-triggered aborts are now handled separately from intentional/user aborts so the
retry loop behavior is correct.
**Decision (for now):** Keep the global default timeout at **20s**.
**Implemented:**
- GET/POST now explicitly distinguish abort class in helper code:
- **Intentional abort** (navigation/unmount/caller cancel): fail fast, no retry
- **Timeout abort** (helper timer): retryable via existing retry loop
- Timeout classification added with per-attempt timeout flag (not `AbortError` name-only logic).
- Backoff behavior retained for retryable failures (`2s -> 4s -> 6s -> 8s`, cap 8s).
- Existing fail-fast class retained for 400/401/403/422, with auth-expired store signaling on 401/403.
- Validation done:
- `npx svelte-check` clean
- API Playwright tests updated/fixed and passing (`v3_api_security.modern`, `v3_api_nested_crud`)
**Timeout policy improvement (class-based):**
- Keep **20s default** as baseline.
- Add request classes with explicit timeout selection at callsites/wrappers (not random per-page values):
- fast CRUD/read/search: ~20s baseline
- medium actions: higher bounded timeout
- heavy actions (uploads, exports, ffmpeg/video clip): explicit long timeout already required
- Centralize the class mapping so timeout intent is clear and audit-friendly.
**Primary files:**
- `src/lib/ae_api/api_get_object.ts`
- `src/lib/ae_api/api_post_object.ts`
- Wrapper callsites in `src/lib/ae_api/` and legacy bridge points in `src/lib/api/api.ts`
**Acceptance criteria:**
- Timeout-aborted requests retry according to retry_count/backoff policy.
- User/navigation aborts still fail fast with no retry.
- No regression on 400/401/403/422 fail-fast handling.
- Existing long-running flows that already set explicit timeouts (uploads/video tools/exports)
continue to function without behavior regressions.
---
### [API] PATCH/DELETE retry hardening — parity with GET/POST
**Status:** ✅ Completed (2026-05-21)
PATCH and DELETE now implement the same retry-classification model used in GET/POST,
including timeout abort separation and capped retry backoff.
**Implemented:**
- PATCH:
- Per-attempt timeout controller with explicit timeout-abort flag.
- Retries timeout/network transient failures only.
- Intentional caller aborts fail fast (no retry).
- Fail-fast retained for 400/401/403/422.
- Backoff capped at `2s -> 4s -> 6s -> 8s`.
- DELETE:
- Same timeout-vs-intentional abort separation.
- Same retry class for timeout/network transient failures.
- Same caller-abort fail-fast behavior.
- Explicit fail-fast for 400/401/403/422.
- Backoff capped at `2s -> 4s -> 6s -> 8s`.
**Mutation safety note:**
- PATCH/DELETE can have ambiguous commit state on timeout. Current policy is conservative:
retries target obvious transient failure class (timeout/network), while caller aborts remain
fail-fast to avoid duplicate side effects during navigation/unmount flows.
**Primary files:**
- `src/lib/ae_api/api_patch_object.ts`
- `src/lib/ae_api/api_delete_object.ts`
**Acceptance criteria:**
- ✅ PATCH and DELETE timeout-aborts retry under capped backoff.
- ✅ Caller/navigation aborts do not retry.
- ✅ No regression for 400/401/403/422 fail-fast behavior.
-`npx svelte-check` clean, API-focused Playwright tests remained green during rollout.
**Additional fixes found during review pass (2026-05-21, commit ea765d8ad):**
- PATCH + DELETE: default timeout lowered from 60s → 20s to match GET/POST. No callers set
explicit timeouts; 60s × 5 retries = 5-minute worst case before giving up.
- DELETE: added `ae_auth_error` import and session-expired banner on 401/403. All other
files (GET/POST/PATCH) trigger the banner; DELETE was missing it, causing stale-session
deletes to silently return false with no user-visible feedback.
---
### [Testing] V3 API performance probe (basic stress rounds)
**Status:** ✅ Completed baseline harness (2026-05-21)
Implemented a gated Playwright probe for quick repeated list-query timing against live V3 endpoints.
**Files:**
- `tests/v3_api_latency_probe.test.ts`
- `tests/README.md` (run/tuning docs)
**Current capabilities:**
- Measures rounds for event sessions, journal entries, and user lists.
- Writes per-run JSON + Markdown reports to `tests/results/`.
- Optional anomaly thresholds for error-rate / p95 / empty-row detection.
---
### [Launcher/VLC] Linux playback — fullscreen + pause-on-end not working
**Status:** Mac ✅ working perfectly; Linux 🚧 deferred for later investigation
**Date discovered:** 2026-05-20

View File

@@ -1,3 +1,5 @@
import { browser } from '$app/environment';
import { ae_auth_error } from '$lib/stores/ae_stores';
import type { key_val } from '$lib/stores/ae_stores';
/**
@@ -11,7 +13,7 @@ export const delete_object = async function delete_object({
headers = {},
params = {},
data = {},
timeout = 60000,
timeout = 20000,
return_meta = false,
log_lvl = 0,
retry_count = 5
@@ -97,9 +99,15 @@ export const delete_object = async function delete_object({
}
for (let attempt = 1; attempt <= retry_count; attempt++) {
// Keep timeout handle at attempt scope so catch can always clear it.
let timeoutId: ReturnType<typeof setTimeout> | null = null;
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => {
// AbortError alone is ambiguous. Track helper-timeout aborts so
// caller/navigation aborts can still fail fast with no retry.
let did_timeout_abort = false;
timeoutId = setTimeout(() => {
did_timeout_abort = true;
console.error(
`API DELETE request timed out after ${timeout}ms.`
);
@@ -120,12 +128,48 @@ export const delete_object = async function delete_object({
url.toString(),
fetchOptions
).catch(function (error: any) {
if (
error?.name === 'AbortError' ||
error?.name === 'TypeError' ||
error?.message?.includes('aborted')
) {
if (log_lvl > 1) {
console.log(
'API DELETE: Request aborted or browser-terminated.',
error
);
}
return error;
}
console.log(
'API DELETE Object *fetch* request was aborted or failed in an unexpected way.',
error
);
return error;
});
clearTimeout(timeoutId);
if (timeoutId) clearTimeout(timeoutId);
// Error object was returned from fetch catch block; decide retry class.
if (
response instanceof Error ||
(response &&
(response.name === 'AbortError' ||
response.name === 'TypeError'))
) {
if (response.name === 'AbortError') {
if (did_timeout_abort) {
throw new Error(
`Timeout abort (attempt ${attempt}/${retry_count}) after ${timeout}ms`
);
}
return false;
}
throw new Error(
`Network error (attempt ${attempt}): ${response.message}`
);
}
if (!response) {
throw new Error(
@@ -151,7 +195,24 @@ export const delete_object = async function delete_object({
errorBody
);
if (response.status >= 400 && response.status < 404) {
// Fail fast on client/auth/validation failures.
if (
response.status === 400 ||
response.status === 401 ||
response.status === 403 ||
response.status === 422
) {
if (response.status === 401 || response.status === 403) {
console.warn(
`AUTH DIAGNOSTICS (DELETE): Headers sent for ${endpoint}:`,
{
has_api_key: !!headers_cleaned['x-aether-api-key'],
has_account_id: !!headers_cleaned['x-account-id']
}
);
// Signal the root layout to show the session-expired banner.
if (browser) ae_auth_error.set({ type: 'expired', ts: Date.now() });
}
return false;
}
@@ -174,6 +235,8 @@ export const delete_object = async function delete_object({
? json.data
: json;
} catch (error) {
// Ensure per-attempt timeout is always cleared on failure.
if (timeoutId) clearTimeout(timeoutId);
console.error(`API DELETE error on attempt ${attempt}:`, error);
if (attempt === retry_count) {
@@ -181,9 +244,12 @@ export const delete_object = async function delete_object({
return false;
}
if (log_lvl) {
console.log(`Retrying... (${attempt}/${retry_count})`);
}
// Backoff before retrying. Caps at 8s to match GET/POST/PATCH policy.
const delay_ms = Math.min(2000 * attempt, 8000);
console.log(
`API DELETE: Retrying in ${delay_ms}ms... (attempt ${attempt}/${retry_count})`
);
await new Promise<void>((resolve) => setTimeout(resolve, delay_ms));
}
}
};

View File

@@ -14,7 +14,7 @@ export const get_object = async function get_object({
headers = {},
params = {},
data = {},
timeout = 90000,
timeout = 20000,
return_meta = false,
return_blob = false,
filename = '',
@@ -73,9 +73,6 @@ export const get_object = async function get_object({
url.searchParams.append(key, params[key])
);
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);
// Clean and merge headers without mutating the original api_cfg
const headers_cleaned: key_val = {};
const merged_headers = { ...api_cfg['headers'], ...headers };
@@ -169,10 +166,11 @@ export const get_object = async function get_object({
console.log('Final cleaned headers:', headers_cleaned);
}
// signal is injected per-attempt inside the retry loop so each retry gets
// a fresh AbortController with its own independent timeout.
const fetchOptions: RequestInit = {
method: 'GET',
headers: headers_cleaned,
signal: controller.signal,
// Be explicit about CORS behavior and redirect handling to avoid
// environment-dependent defaults that can cause opaque failures.
mode: 'cors',
@@ -203,10 +201,24 @@ export const get_object = async function get_object({
return false;
}
// Fresh AbortController per attempt — ensures each retry has its own
// independent timeout. Sharing a single controller across retries leaves
// retries unprotected once the first attempt's clearTimeout() runs.
const controller = new AbortController();
// Track whether THIS helper's timeout fired. AbortError alone is ambiguous:
// it can mean timeout OR intentional caller abort (navigation/unmount).
// We only retry timeout-aborts; intentional aborts should fail fast.
let did_timeout_abort = false;
const timeoutId = setTimeout(() => {
did_timeout_abort = true;
console.warn(`API GET: Request timed out after ${timeout}ms (attempt ${attempt}/${retry_count}).`);
controller.abort();
}, timeout);
try {
const response = await fetch_method(
url.toString(),
fetchOptions
{ ...fetchOptions, signal: controller.signal }
).catch(function (error: any) {
// SILENCE NOISE: Aborted requests (common in SWR/Background loads) shouldn't spam logs
if (
@@ -231,21 +243,36 @@ export const get_object = async function get_object({
});
clearTimeout(timeoutId);
// Check if we should stop due to abort or network failure
// Check if we should stop due to abort or network failure.
if (
response instanceof Error ||
(response &&
(response.name === 'TypeError' ||
response.name === 'AbortError'))
) {
// If it was an explicit abort, definitely stop
if (response.name === 'AbortError') return false;
// AbortError can be either timeout or intentional abort.
// Retry only helper-owned timeout aborts; fail fast on caller abort.
if (response.name === 'AbortError') {
if (did_timeout_abort) {
throw new Error(
`Timeout abort (attempt ${attempt}/${retry_count}) after ${timeout}ms`
);
}
return false;
}
if (log_lvl > 1)
console.log(
'API GET Object: Detected NetworkError or TypeError. Failing fast.'
);
return false;
// TypeError = transient network failure (ERR_NETWORK_CHANGED,
// ERR_NETWORK_IO_SUSPENDED, hotel/conference WiFi blip, etc.).
// IMPORTANT: throw here so the retry loop's catch block handles it with
// backoff. Returning false would bypass retries entirely.
//
// WHY THIS WAS BROKEN: The Jan 2026 "offline-first fast-paths" commit
// (a10accfaa) changed .catch() to return the error as a value instead of
// not returning (undefined). The undefined path fell through to the
// `if (!response)` throw which DID retry. The explicit `return error` +
// this `return false` block silently killed the retry for the most common
// failure mode on conference/hotel WiFi.
throw new Error(`Network error (attempt ${attempt}): ${response.message}`);
}
if (!response) {
@@ -438,6 +465,8 @@ export const get_object = async function get_object({
}
}
} catch (error) {
// Ensure the per-attempt timeout timer is always cancelled on failure.
clearTimeout(timeoutId);
console.log(
`API GET object request *fetch* error on attempt ${attempt}:`,
error
@@ -448,10 +477,13 @@ export const get_object = async function get_object({
return false;
}
// Log retry information
if (log_lvl) {
console.log(`Retrying... (${attempt}/${retry_count})`);
}
// Backoff before retrying. Without a delay, rapid retries on a flaky
// connection accomplish nothing and add noise. Caps at 8s so later
// attempts don't wait excessively. Gives the network time to recover
// (ERR_NETWORK_CHANGED is typically a sub-second WiFi roam event).
const delay_ms = Math.min(2000 * attempt, 8000);
console.log(`API GET: Retrying in ${delay_ms}ms... (attempt ${attempt}/${retry_count})`);
await new Promise<void>((resolve) => setTimeout(resolve, delay_ms));
}
}
};

View File

@@ -13,7 +13,7 @@ export const patch_object = async function patch_object({
headers = {},
params = {},
data = {},
timeout = 60000,
timeout = 20000,
return_meta = false,
log_lvl = 0,
retry_count = 5
@@ -153,9 +153,15 @@ export const patch_object = async function patch_object({
}
for (let attempt = 1; attempt <= retry_count; attempt++) {
// Keep timeout handle at attempt scope so catch can always clear it.
let timeoutId: ReturnType<typeof setTimeout> | null = null;
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => {
// AbortError alone is ambiguous. Track whether the helper timeout
// fired so we can retry timeout-aborts but fail fast on caller abort.
let did_timeout_abort = false;
timeoutId = setTimeout(() => {
did_timeout_abort = true;
console.error(
`API PATCH request timed out after ${timeout}ms.`
);
@@ -173,12 +179,52 @@ export const patch_object = async function patch_object({
url.toString(),
fetchOptions
).catch(function (error: any) {
// Keep noisy abort/network conditions out of high-level logs.
if (
error?.name === 'AbortError' ||
error?.name === 'TypeError' ||
error?.message?.includes('aborted')
) {
if (log_lvl > 1) {
console.log(
'API PATCH: Request aborted or browser-terminated.',
error
);
}
return error;
}
console.log(
'API PATCH Object *fetch* request was aborted or failed in an unexpected way.',
error
);
return error;
});
clearTimeout(timeoutId);
if (timeoutId) clearTimeout(timeoutId);
// Error object was returned from fetch catch block; decide retry class.
if (
response instanceof Error ||
(response &&
(response.name === 'AbortError' ||
response.name === 'TypeError'))
) {
if (response.name === 'AbortError') {
// Retry only helper-timeout aborts. Caller/navigation aborts
// should fail fast to avoid duplicate mutation side-effects.
if (did_timeout_abort) {
throw new Error(
`Timeout abort (attempt ${attempt}/${retry_count}) after ${timeout}ms`
);
}
return false;
}
// Transient browser/network failure class.
throw new Error(
`Network error (attempt ${attempt}): ${response.message}`
);
}
if (!response) {
throw new Error(
@@ -292,6 +338,8 @@ export const patch_object = async function patch_object({
? json.data
: json;
} catch (error) {
// Ensure per-attempt timeout is always cleared on failure.
if (timeoutId) clearTimeout(timeoutId);
console.error(`API PATCH error on attempt ${attempt}:`, error);
if (attempt === retry_count) {
@@ -299,9 +347,12 @@ export const patch_object = async function patch_object({
return false;
}
if (log_lvl) {
console.log(`Retrying... (${attempt}/${retry_count})`);
}
// Backoff before retrying. Caps at 8s to match GET/POST policy.
const delay_ms = Math.min(2000 * attempt, 8000);
console.log(
`API PATCH: Retrying in ${delay_ms}ms... (attempt ${attempt}/${retry_count})`
);
await new Promise<void>((resolve) => setTimeout(resolve, delay_ms));
}
}
};

View File

@@ -15,7 +15,7 @@ export const post_object = async function post_object({
params = {},
data = {},
form_data = null,
timeout = 90000,
timeout = 20000,
return_meta = false,
return_blob = false,
filename = '',
@@ -200,13 +200,19 @@ export const post_object = async function post_object({
}
for (let attempt = 1; attempt <= retry_count; attempt++) {
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => {
console.error(`API POST request timed out after ${timeout}ms.`);
controller.abort();
}, timeout);
// Declared at loop scope (not inside try) so the catch block can clearTimeout.
// Fresh controller per attempt — same rationale as api_get_object.ts.
const controller = new AbortController();
// AbortError is not specific enough by itself. Distinguish timeout-aborts
// (retryable transient class) from intentional caller aborts (fail-fast).
let did_timeout_abort = false;
const timeoutId = setTimeout(() => {
did_timeout_abort = true;
console.warn(`API POST: Request timed out after ${timeout}ms (attempt ${attempt}/${retry_count}).`);
controller.abort();
}, timeout);
try {
const fetchOptions: RequestInit = {
method: 'POST',
headers: headers_cleaned,
@@ -245,19 +251,28 @@ export const post_object = async function post_object({
});
clearTimeout(timeoutId);
// Check if we should stop due to abort or network failure
// Check if we should stop due to abort or network failure.
if (
response instanceof Error ||
(response &&
(response.name === 'TypeError' ||
response.name === 'AbortError'))
) {
if (response.name === 'AbortError') return false;
if (log_lvl > 1)
console.log(
'API POST Object: Detected NetworkError or TypeError. Failing fast.'
);
return false;
// Retry timeout-aborts from this helper; do not retry caller aborts
// (route change/unmount/manual cancellation).
if (response.name === 'AbortError') {
if (did_timeout_abort) {
throw new Error(
`Timeout abort (attempt ${attempt}/${retry_count}) after ${timeout}ms`
);
}
return false;
}
// TypeError = transient network failure. Throw into the retry loop
// so backoff-and-retry applies. Same fix as api_get_object.ts — see
// comment there for the full history of why this was broken.
throw new Error(`Network error (attempt ${attempt}): ${response.message}`);
}
if (!response) {
@@ -411,6 +426,8 @@ export const post_object = async function post_object({
}
}
} catch (error) {
// Ensure the per-attempt timeout timer is always cancelled on failure.
clearTimeout(timeoutId);
console.error(`API POST error on attempt ${attempt}:`, error);
if (attempt === retry_count) {
@@ -418,9 +435,10 @@ export const post_object = async function post_object({
return false;
}
if (log_lvl) {
console.log(`Retrying... (${attempt}/${retry_count})`);
}
// Backoff before retrying — same rationale as api_get_object.ts.
const delay_ms = Math.min(2000 * attempt, 8000);
console.log(`API POST: Retrying in ${delay_ms}ms... (attempt ${attempt}/${retry_count})`);
await new Promise<void>((resolve) => setTimeout(resolve, delay_ms));
}
}
};

View File

@@ -662,8 +662,8 @@ const code_to_icon: {
<div
class="badge_header
image
m-0
max-h-[1.00in]
m-0 mt-8
max-h-[1.10in]
min-h-[.50in]
max-w-full overflow-hidden
p-2

View File

@@ -484,16 +484,41 @@ if (browser) {
Unable to load meetings — server error. Please try again.
{/if}
</p>
<button
type="button"
class="btn btn-sm preset-tonal-primary m-auto"
onclick={() => {
auto_retry_count = 0;
$idaa_sess.recovery_meetings.search_version++;
}}>
<span class="fas fa-redo m-1"></span>
Try Again
</button>
<p class="text-xs opacity-60">
If "Try Again" keeps failing, use "Clear Cache &amp; Reload" to reset your local data.
</p>
<div class="flex flex-row flex-wrap items-center justify-center gap-2">
<button
type="button"
class="btn btn-sm preset-tonal-primary"
onclick={() => {
auto_retry_count = 0;
$idaa_sess.recovery_meetings.search_version++;
}}>
<span class="fas fa-redo m-1"></span>
Try Again
</button>
<!-- Escape hatch for persistent server errors caused by stale auth state in
localStorage (stale account_id, api_secret_key, or site config). "Try Again"
reuses the same bad state and loops indefinitely — this clears it.
Mirrors the "Clear Cache & Reload" button in the IDAA layout auth error state. -->
<button
type="button"
class="btn btn-sm preset-tonal-surface preset-outlined-warning-100-900 hover:preset-filled-warning-200-800 transition-all"
onclick={async () => {
localStorage.removeItem('ae_loc');
localStorage.removeItem('ae_idaa_loc');
try { await db_events.event.clear(); } catch { /* ignore */ }
try {
const saved_url = sessionStorage.getItem('idaa_iframe_reload_url');
if (saved_url) { location.href = saved_url; return; }
} catch { /* ignore */ }
location.reload();
}}>
<span class="fas fa-sync-alt m-1"></span>
Clear Cache &amp; Reload
</button>
</div>
</div>
{:else}
{#if has_active_filters}

View File

@@ -74,6 +74,21 @@ git add tests/
git commit -m "test: add <description>"
```
Latency probing
- Use the gated probe in `tests/v3_api_latency_probe.test.ts` for quick live rounds against V3 list endpoints.
- Run it only when you have the live API key available:
```bash
RUN_V3_LATENCY_PROBE=1 PUBLIC_AE_API_SECRET_KEY=... npx playwright test tests/v3_api_latency_probe.test.ts -c playwright.config.ts
```
- Tune the rounds with `V3_LATENCY_ROUNDS` and the pause between calls with `V3_LATENCY_PAUSE_MS`.
- Reports are written to `tests/results/` as JSON and Markdown per run.
- Optional bug-finding thresholds:
- `V3_LATENCY_MAX_ERROR_RATE` (default `0`) — fail if an endpoint exceeds this error rate
- `V3_LATENCY_MAX_P95_MS` (optional) — fail if endpoint p95 exceeds the threshold
- `V3_LATENCY_REQUIRE_ROWS=1` (optional) — fail if all rounds return zero rows
- `V3_LATENCY_OUTPUT_DIR` (optional) — override report directory (default `tests/results`)
Help
- If a test fails due to external network calls or platform-specific behavior, try mocking the relevant endpoints and move the test to `tests/disabled` if it cannot be made deterministic.

View File

@@ -0,0 +1,308 @@
import { expect, test } from '@playwright/test';
import { mkdir, writeFile } from 'node:fs/promises';
import path from 'node:path';
import { dev_api_base, testing_account_id, testing_event_id } from './_helpers/env';
const testing_journal_id = 'BVYE-94-46-29';
const apiSecretKey =
process.env.PUBLIC_AE_API_SECRET_KEY ?? process.env.AE_API_SECRET_KEY ?? '';
const probeEnabled = process.env.RUN_V3_LATENCY_PROBE === '1';
const outputDir = process.env.V3_LATENCY_OUTPUT_DIR ?? 'tests/results';
type ProbeSample = {
label: string;
ms: number;
rows: number;
status: number;
ok: boolean;
error?: string;
};
type EndpointProbe = {
name: 'event_sessions' | 'journal_entries' | 'users';
label: string;
url: string;
body?: unknown;
};
function percentile(values: number[], pct: number): number {
if (values.length === 0) return 0;
const sorted = [...values].sort((a, b) => a - b);
const idx = Math.min(sorted.length - 1, Math.max(0, Math.ceil((pct / 100) * sorted.length) - 1));
return sorted[idx];
}
function summarize(samples: ProbeSample[]) {
const timings = samples.map((sample) => sample.ms);
const statuses = samples.map((sample) => sample.status);
const ok_count = samples.filter((sample) => sample.ok).length;
const error_count = samples.length - ok_count;
const row_counts = samples.map((sample) => sample.rows);
const total = timings.reduce((sum, value) => sum + value, 0);
return {
count: samples.length,
ok_count,
error_count,
error_rate: Number((error_count / Math.max(1, samples.length)).toFixed(4)),
min: Math.min(...timings),
p50: percentile(timings, 50),
p95: percentile(timings, 95),
max: Math.max(...timings),
avg: Math.round(total / Math.max(1, timings.length)),
rows_last: samples.at(-1)?.rows ?? 0,
rows_min: Math.min(...row_counts),
rows_max: Math.max(...row_counts),
statuses
};
}
async function timedJsonFetch({
label,
url,
headers,
body
}: {
label: string;
url: string;
headers: Record<string, string>;
body?: unknown;
}): Promise<ProbeSample> {
const started_ms = performance.now();
try {
const response = await fetch(url, {
method: body ? 'POST' : 'GET',
headers,
body: body ? JSON.stringify(body) : undefined
});
const elapsed_ms = Math.round(performance.now() - started_ms);
const payload = await response.json().catch(() => null);
const rows = Array.isArray(payload?.data)
? payload.data.length
: Array.isArray(payload)
? payload.length
: 0;
return {
label,
ms: elapsed_ms,
rows,
status: response.status,
ok: response.ok
};
} catch (error) {
const elapsed_ms = Math.round(performance.now() - started_ms);
return {
label,
ms: elapsed_ms,
rows: 0,
status: 0,
ok: false,
error: error instanceof Error ? error.message : String(error)
};
}
}
function reportMarkdown({
run_id,
started_at,
base_url,
rounds,
pause_ms,
threshold_max_error_rate,
threshold_p95_ms,
require_non_empty_rows,
report,
anomalies
}: any): string {
const lines: string[] = [];
lines.push('# V3 API Performance Probe');
lines.push('');
lines.push(`- run_id: ${run_id}`);
lines.push(`- started_at: ${started_at}`);
lines.push(`- base_url: ${base_url}`);
lines.push(`- rounds_per_endpoint: ${rounds}`);
lines.push(`- pause_ms: ${pause_ms}`);
lines.push(`- threshold_max_error_rate: ${threshold_max_error_rate}`);
lines.push(`- threshold_p95_ms: ${threshold_p95_ms ?? 'disabled'}`);
lines.push(`- require_non_empty_rows: ${require_non_empty_rows}`);
lines.push('');
lines.push('| Endpoint | count | errors | error_rate | p50 | p95 | max | rows_min | rows_max |');
lines.push('| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |');
for (const [name, stats] of Object.entries(report) as any) {
lines.push(
`| ${name} | ${stats.count} | ${stats.error_count} | ${stats.error_rate} | ${stats.p50} | ${stats.p95} | ${stats.max} | ${stats.rows_min} | ${stats.rows_max} |`
);
}
lines.push('');
if (anomalies.length > 0) {
lines.push('## Anomalies');
for (const item of anomalies) lines.push(`- ${item}`);
} else {
lines.push('## Anomalies');
lines.push('- none');
}
return lines.join('\n');
}
test.describe('V3 API latency probe', () => {
test.skip(
!probeEnabled || !apiSecretKey,
'Set RUN_V3_LATENCY_PROBE=1 and PUBLIC_AE_API_SECRET_KEY to run the live probe.'
);
test.setTimeout(120000);
test('quick rounds on common list endpoints', async () => {
const rounds = Number(process.env.V3_LATENCY_ROUNDS ?? 6);
const delay_ms = Number(process.env.V3_LATENCY_PAUSE_MS ?? 150);
const threshold_max_error_rate = Number(process.env.V3_LATENCY_MAX_ERROR_RATE ?? 0);
const threshold_p95_ms = process.env.V3_LATENCY_MAX_P95_MS
? Number(process.env.V3_LATENCY_MAX_P95_MS)
: null;
const require_non_empty_rows = process.env.V3_LATENCY_REQUIRE_ROWS === '1';
const started_at = new Date().toISOString();
const run_id = started_at.replace(/[:.]/g, '-');
const headers = {
'x-aether-api-key': apiSecretKey,
'x-account-id': testing_account_id,
'x-ae-ignore-extra-fields': 'true',
'Content-Type': 'application/json'
};
const event_session_url = new URL('/v3/crud/event_session/search', dev_api_base).toString();
const journal_entry_url = new URL(`/v3/crud/journal/${testing_journal_id}/journal_entry/`, dev_api_base).toString();
const user_list_url = new URL('/v3/crud/user/', dev_api_base).toString();
const probes: EndpointProbe[] = [
{
name: 'event_sessions',
label: 'event_session',
url: event_session_url,
body: {
and: [{ field: 'event_id', op: 'eq', value: testing_event_id }]
}
},
{
name: 'journal_entries',
label: 'journal_entry',
url: journal_entry_url
},
{
name: 'users',
label: 'user',
url: `${user_list_url}?${new URLSearchParams({
for_obj_type: 'account',
for_obj_id: testing_account_id,
enabled: 'all',
hidden: 'not_hidden',
view: 'default',
limit: '99',
offset: '0',
order_by_li: JSON.stringify({ username: 'ASC' })
}).toString()}`
}
];
const samples_by_endpoint: Record<string, ProbeSample[]> = {
event_sessions: [],
journal_entries: [],
users: []
};
for (let round = 1; round <= rounds; round++) {
for (const probe of probes) {
samples_by_endpoint[probe.name].push(
await timedJsonFetch({
label: `${probe.label} round ${round}`,
url: probe.url,
headers,
body: probe.body
})
);
await new Promise((resolve) => setTimeout(resolve, delay_ms));
}
}
const report = {
event_sessions: summarize(samples_by_endpoint.event_sessions),
journal_entries: summarize(samples_by_endpoint.journal_entries),
users: summarize(samples_by_endpoint.users)
};
const anomalies: string[] = [];
for (const [name, stats] of Object.entries(report) as any) {
if (stats.error_rate > threshold_max_error_rate) {
anomalies.push(
`${name}: error_rate ${stats.error_rate} > threshold ${threshold_max_error_rate}`
);
}
if (threshold_p95_ms !== null && stats.p95 > threshold_p95_ms) {
anomalies.push(
`${name}: p95 ${stats.p95}ms > threshold ${threshold_p95_ms}ms`
);
}
if (require_non_empty_rows && stats.rows_max === 0) {
anomalies.push(`${name}: all rounds returned 0 rows`);
}
if (stats.rows_max > 0 && stats.rows_min === 0) {
anomalies.push(
`${name}: row count flapped between empty and non-empty (rows_min=0 rows_max=${stats.rows_max})`
);
}
if (stats.p95 > stats.p50 * 3 && stats.p95 > 1000) {
anomalies.push(
`${name}: jitter spike (p95=${stats.p95}ms vs p50=${stats.p50}ms)`
);
}
}
const report_payload = {
run_id,
started_at,
base_url: dev_api_base,
rounds,
pause_ms: delay_ms,
threshold_max_error_rate,
threshold_p95_ms,
require_non_empty_rows,
report,
samples: samples_by_endpoint,
anomalies
};
await mkdir(outputDir, { recursive: true });
const json_path = path.join(outputDir, `v3_latency_probe_${run_id}.json`);
const md_path = path.join(outputDir, `v3_latency_probe_${run_id}.md`);
await writeFile(json_path, `${JSON.stringify(report_payload, null, 2)}\n`, 'utf8');
await writeFile(
md_path,
reportMarkdown({
run_id,
started_at,
base_url: dev_api_base,
rounds,
pause_ms: delay_ms,
threshold_max_error_rate,
threshold_p95_ms,
require_non_empty_rows,
report,
anomalies
}),
'utf8'
);
console.log('V3 latency probe summary:');
console.table(report);
console.log('V3 latency probe report files:', {
json_path,
md_path
});
expect(anomalies, `Latency probe anomalies:\n- ${anomalies.join('\n- ')}`).toEqual([]);
});
});

View File

@@ -101,46 +101,25 @@ test.describe('V3 API Nested CRUD Integrity', () => {
});
test('should send a nested request when creating an Event Location', async ({ page }) => {
// We'll perform the UI action and assert the resulting UI change (and the route handler
// separately logs the POST). Relying on DOM update is less flaky than waiting
// directly for the network request in this environment.
// The page is now loaded. The test will automatically fail because
// the UI is not yet interactive enough to trigger the POST request.
// The console output will show us which GET requests we need to mock.
// Validate the real app flow: click the UI button and assert the outgoing
// nested POST request shape and endpoint.
const requestPromise = page.waitForRequest(
(request) =>
request.method() === 'POST' &&
request.url().includes(`/v3/crud/event/${testing_event_id}/event_location`)
);
// Ensure the Add Location button is present
const addBtn = page.getByRole('button', { name: 'Add Location' });
await expect(addBtn).toBeVisible();
await addBtn.click();
// Instead of relying on the complex client-side helper to call the nested create,
// POST directly from the browser context to the nested endpoint so the page.route
// handler is exercised and we can assert nested endpoint behavior.
const resp = await page.evaluate(async (eventId) => {
const r = await fetch(`/v3/crud/event/${eventId}/event_location/`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: 'TEMP Location Name', event_id: eventId })
});
try { return { status: r.status, json: await r.json() }; } catch(e) { return { status: r.status, json: null }; }
}, testing_event_id as any);
const request = await requestPromise;
const postData = JSON.parse(request.postData() ?? '{}');
expect(resp.status === 200 || resp.status === 201).toBeTruthy();
expect(resp.json).toBeDefined();
if (resp.json && resp.json.data) expect(resp.json.data.name).toBe('TEMP Location Name');
expect(request.url()).toContain(`/v3/crud/event/${testing_event_id}/event_location`);
expect(postData.name).toBe('TEMP Location Name');
expect(postData.event_id).toBe(testing_event_id);
// Wait for the request to be captured
// const request = await requestPromise;
// const postData = request.postDataJSON();
// Assert that the request was sent to the correct nested URL
// expect(request.url()).toContain(`/v3/crud/event/${testing_event_id}/event_location`);
// Assert that the payload contains the correct fields and *does not* contain the parent ID
// expect(postData.fields).toBeDefined();
// expect(postData.fields.name).toBe('Test Location');
// expect(postData.fields.event_id).toBeUndefined();
});
});

View File

@@ -37,7 +37,7 @@ test.describe('V3 API Header Integrity (modernized)', () => {
});
});
test('Verify lookup requests include the unauthenticated bypass header', async ({ page }) => {
test('Verify lookup requests use account-scoped headers (no bypass)', async ({ page }) => {
await page.addInitScript((defaults) => {
const testData = { ...defaults, account_id: 'test-account-id', manager_access: true };
window.localStorage.setItem('ae_loc', JSON.stringify(testData));
@@ -50,7 +50,10 @@ test.describe('V3 API Header Integrity (modernized)', () => {
const request = await requestPromise;
const headers = request.headers();
expect(headers['x-no-account-id']).toBe('Nothing to See Here');
// Current lookup policy is account-scoped for these routes.
// The bypass header should not be sent here.
expect(headers['x-no-account-id']).toBeUndefined();
expect(headers['x-account-id']).toBe('test-account-id');
expect(headers['x-aether-api-key']).toBeDefined();
});