docs: update TODO and add BOOTSTRAP mistake #13 for API retry regression
TODO__Agents.md:
- Added the two additional fixes from the review pass to the PATCH/DELETE
retry hardening entry: default timeout 60s→20s, and DELETE missing
ae_auth_error banner on 401/403.
BOOTSTRAP__AI_Agent_Quickstart.md:
- Added mistake #13: breaking the API retry loop by returning errors from
the TypeError/AbortError block instead of throwing them. Documents the
Jan 2026 regression (commit a10accfaa), the three retry classes that must
be preserved, and a quick verification method.
- Filled the gap at item #7 (was missing, causing off-by-one numbering
from item 8 onward). Items renumbered 8-14 → 7-13.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -300,14 +300,14 @@ These are real incidents — know them before you start.
|
||||
6. **Deleting files with `rm`** — always move to `~/tmp/agents_trash`. A deleted file may
|
||||
contain context that's not recoverable from git if it was gitignored.
|
||||
|
||||
8. **Dexie `.get()` with a string object ID returns `undefined`** — Dexie `.get(value)`
|
||||
7. **Dexie `.get()` with a string object ID returns `undefined`** — Dexie `.get(value)`
|
||||
looks up by the table's **primary key**, which is `id` (the first schema field). The V3
|
||||
API never returns `id`, so it is always `undefined` in stored records. Passing a string
|
||||
object ID (e.g. `person_id`) to `.get()` will silently return nothing. Always use
|
||||
`.where('person_id').equals(person_id).first()` instead. This has caused liveQuery
|
||||
blocks to always produce `undefined` even when the record exists in Dexie.
|
||||
|
||||
9. **Treating `$effect` blocks as auth bypass risks** — a `$effect` inside a child
|
||||
8. **Treating `$effect` blocks as auth bypass risks** — a `$effect` inside a child
|
||||
component cannot bypass a parent `+layout.svelte` auth gate. Children only mount if
|
||||
the parent calls `{@render children?.()}`. Adding redundant auth guards to `$effect`
|
||||
blocks that can only run after the parent gate already passed is unnecessary — and
|
||||
@@ -317,13 +317,13 @@ These are real incidents — know them before you start.
|
||||
clean of data loads in private modules. See `GUIDE__SvelteKit2_Svelte5_DexieJS.md` →
|
||||
"SvelteKit Layout Hierarchy: Security and Execution Order" for the full explanation.
|
||||
|
||||
10. **Using query `key` as a proxy for bypass stripped `x-account-id`** — this caused
|
||||
9. **Using query `key` as a proxy for bypass stripped `x-account-id`** — this caused
|
||||
valid account-scoped requests to lose account context and 403. `key` can be a valid
|
||||
endpoint/business param, but it is not equivalent to `x-no-account-id: bypass`. Keep
|
||||
`x-no-account-id` usage narrow and temporary; do not expand it without a documented
|
||||
allowlist case.
|
||||
|
||||
11. **Pre-stringifying `*_json` fields before passing to API wrappers** — the API wrappers
|
||||
10. **Pre-stringifying `*_json` fields before passing to API wrappers** — the API wrappers
|
||||
(`api_post__crud_obj.ts` for V3, `api.ts` for legacy CRUD) automatically serialize any
|
||||
field ending in `_json` (e.g. `cfg_json`, `data_json`). Pass these as plain JS objects.
|
||||
Pre-stringifying with `JSON.stringify()` before calling the wrapper will double-encode
|
||||
@@ -331,12 +331,12 @@ These are real incidents — know them before you start.
|
||||
redundant on the V3 path. Both paths now pretty-print with 2-space indent.
|
||||
See `GUIDE__AE_API_V3_for_Frontend.md` → section 3C for the full explanation.
|
||||
|
||||
12. **Broad Dexie result windows get silently clipped** — if a broad "All" view shows fewer
|
||||
11. **Broad Dexie result windows get silently clipped** — if a broad "All" view shows fewer
|
||||
rows than a narrower filter, check for a page-level limit or an API revalidation step
|
||||
replacing the local IDB result set. For empty text searches, the full local result set
|
||||
should drive the display; server refreshes should update cache, not shrink visibility.
|
||||
|
||||
13. **Not bumping `IDB_CONTENT_VERSIONS` when changing `properties_to_save`** — this caused
|
||||
12. **Not bumping `IDB_CONTENT_VERSIONS` when changing `properties_to_save`** — this caused
|
||||
the IDAA Recovery Meetings "no meetings found" bug for approximately one year (2025–2026).
|
||||
|
||||
**What happened:** A deploy changed `properties_to_save` in `ae_events__event.ts`, but no
|
||||
@@ -368,6 +368,35 @@ These are real incidents — know them before you start.
|
||||
0 results in your templates. Silent failures look like data problems and are extremely
|
||||
difficult to diagnose.
|
||||
|
||||
13. **Breaking the API retry loop by returning errors instead of throwing them** — all four
|
||||
`api_*_object.ts` files (`api_get_object.ts`, `api_post_object.ts`, `api_patch_object.ts`,
|
||||
`api_delete_object.ts`) use a `.catch()` that returns the error as a value, followed by a
|
||||
classification block. That block **must throw** for transient network failures (`TypeError`)
|
||||
so they enter the retry loop. If you change it to `return false`, retries are silently
|
||||
bypassed for the most common failure mode in hotel/conference WiFi — and nothing warns you.
|
||||
|
||||
**What happened (commit a10accfaa, Jan 2026):** A "silence background fetch noise" commit
|
||||
changed `.catch()` to explicitly `return error`, then the classification block was changed
|
||||
from a `throw` to `return false`. `TypeError` from `ERR_NETWORK_CHANGED` — the most common
|
||||
failure on crowded WiFi — stopped retrying. The `retry_count = 5` parameter became dead
|
||||
code for network errors. Went undetected for ~4 months.
|
||||
|
||||
**The retry classification these files must honor:**
|
||||
- `TypeError` (ERR_NETWORK_CHANGED, WiFi blip) → **`throw`** → enters retry loop with backoff
|
||||
- `AbortError` where `did_timeout_abort = true` (helper's own timer) → **`throw`** → retries
|
||||
- `AbortError` where `did_timeout_abort = false` (navigation/unmount abort) → `return false`
|
||||
- HTTP 400/401/403/422 → `return false` immediately (client errors are deterministic)
|
||||
- HTTP 5xx → **`throw`** → retries with backoff
|
||||
|
||||
**How to verify after any change to the error block:** confirm that a `TypeError` still
|
||||
produces up to 5 retry attempts with 2s→4s→6s→8s delays before returning false. A single
|
||||
`return false` after the first network failure means the retry loop is broken.
|
||||
|
||||
**Also:** when reviewing these files, check that all four have:
|
||||
- `ae_auth_error.set()` triggered on 401/403 (shows session-expired banner to the user)
|
||||
- `timeout = 20000` default (was 60s in PATCH/DELETE until 2026-05-21 — 5-min worst case)
|
||||
- `did_timeout_abort` flag per attempt (separates helper timeouts from caller aborts)
|
||||
|
||||
---
|
||||
|
||||
## 8. Source Layout (Quick Reference)
|
||||
|
||||
@@ -233,6 +233,13 @@ including timeout abort separation and capped retry backoff.
|
||||
- ✅ No regression for 400/401/403/422 fail-fast behavior.
|
||||
- ✅ `npx svelte-check` clean, API-focused Playwright tests remained green during rollout.
|
||||
|
||||
**Additional fixes found during review pass (2026-05-21, commit ea765d8ad):**
|
||||
- PATCH + DELETE: default timeout lowered from 60s → 20s to match GET/POST. No callers set
|
||||
explicit timeouts; 60s × 5 retries = 5-minute worst case before giving up.
|
||||
- DELETE: added `ae_auth_error` import and session-expired banner on 401/403. All other
|
||||
files (GET/POST/PATCH) trigger the banner; DELETE was missing it, causing stale-session
|
||||
deletes to silently return false with no user-visible feedback.
|
||||
|
||||
---
|
||||
|
||||
### [Testing] V3 API performance probe (basic stress rounds)
|
||||
|
||||
Reference in New Issue
Block a user