fix(P3): guard startup db connection with try/except in lib_sql_core

Wraps the deprecated global `db = engine.connect()` in a try/except so
a Docker startup race (MariaDB not yet ready) no longer crashes the
Gunicorn worker before it can serve any requests.

Sets db=None on failure; reconnect_db() on the lifespan bootstrap path
re-establishes it once credentials are confirmed.

TODO (P3 full): migrate lib_schema_v3.py:39 and lib_api_crud_v3.py:166
off the global db to engine.connect() context managers, then remove it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Idem
2026-04-17 19:28:28 -04:00
parent 55debc8009
commit 3db5f7c749
2 changed files with 11 additions and 4 deletions

View File

@@ -43,9 +43,15 @@ def create_ae_engine(uri: str):
engine = create_ae_engine(db_uri) engine = create_ae_engine(db_uri)
# DEPRECATED: Global shared 'db' connection. Use engine.connect() in context managers instead. # DEPRECATED: Global shared 'db' connection. Still used by lib_schema_v3.py and lib_api_crud_v3.py.
# Keeping for legacy compatibility but will phase out usage in crud lib. # TODO (P3 full fix): migrate those two call sites to engine.connect() context managers, then remove this.
db = engine.connect() # Bare connect guarded so a Docker startup race (MariaDB not yet ready) doesn't crash the worker.
# If this fails, db=None — callers that hit it before reconnect_db() runs will raise AttributeError.
try:
db = engine.connect()
except Exception:
log.warning("DB SQL Core: Initial db connection failed at startup (MariaDB not ready?). Will retry via reconnect_db().")
db = None
log.info('DB SQL Core: Initializing engine...') log.info('DB SQL Core: Initializing engine...')

View File

@@ -18,7 +18,8 @@
- [x] **[P1] Remove zombie `db_connection.py` import** — `app/routers/api.py` imports `db` from `app/db_connection.py`, creating a parasitic second SQLAlchemy engine at startup that is never updated by `reconnect_db()` after bootstrap. The imported `db` is only used in a commented-out line (`api.py:268`). Fix: remove the import; delete or archive `db_connection.py`. - [x] **[P1] Remove zombie `db_connection.py` import** — `app/routers/api.py` imports `db` from `app/db_connection.py`, creating a parasitic second SQLAlchemy engine at startup that is never updated by `reconnect_db()` after bootstrap. The imported `db` is only used in a commented-out line (`api.py:268`). Fix: remove the import; delete or archive `db_connection.py`.
- [x] **[P1] Fix retry mechanism in `sql_update` / `run_sql_select`** — On `OperationalError`, both call `sql_connect()``reconnect_db()` which calls `engine.dispose()`, nuking the entire connection pool mid-flight. Under concurrent requests this kills other in-flight connections. Fix: remove the `sql_connect()` retry call; SQLAlchemy's `pool_pre_ping=True` already handles stale connections — just open a fresh `engine.connect()` for the retry without disposing the pool. - [x] **[P1] Fix retry mechanism in `sql_update` / `run_sql_select`** — On `OperationalError`, both call `sql_connect()``reconnect_db()` which calls `engine.dispose()`, nuking the entire connection pool mid-flight. Under concurrent requests this kills other in-flight connections. Fix: remove the `sql_connect()` retry call; SQLAlchemy's `pool_pre_ping=True` already handles stale connections — just open a fresh `engine.connect()` for the retry without disposing the pool.
- [ ] **[P2] Add retry logic to `sql_insert` and `sql_select`** — Both are missing the `OperationalError` retry that `sql_update` and `run_sql_select` have. A DB blip during an INSERT (scan record, badge log, etc.) fails silently and returns `False` with no recovery attempt. - [ ] **[P2] Add retry logic to `sql_insert` and `sql_select`** — Both are missing the `OperationalError` retry that `sql_update` and `run_sql_select` have. A DB blip during an INSERT (scan record, badge log, etc.) fails silently and returns `False` with no recovery attempt.
- [ ] **[P3] Guard `db = engine.connect()` in `lib_sql_core.py` with try/except** — Line 48 is a bare connect at module load time with no error handling. If MariaDB isn't ready (Docker race), this throws unhandled and crashes the worker. Wrap in try/except like `db_connection.py` already does. - [x] **[P3] Guard `db = engine.connect()` in `lib_sql_core.py` with try/except** — Wrapped in try/except; sets `db = None` on failure so Docker startup race no longer crashes the worker.
- [ ] **[P3 full]** Migrate `lib_schema_v3.py:39` and `lib_api_crud_v3.py:166` off the global `db` to `engine.connect()` context managers, then remove the global `db` entirely.
- [x] **[P4] Expose `pool_size` / `max_overflow` as env vars** — `create_ae_engine()` calls `settings.DB.get('pool_size', 10)` but `settings.DB` property doesn't include those keys, so they're always hardcoded 10/20. Add `AE_DB_POOL_SIZE` / `AE_DB_POOL_MAX_OVERFLOW` to `config.py`. - [x] **[P4] Expose `pool_size` / `max_overflow` as env vars** — `create_ae_engine()` calls `settings.DB.get('pool_size', 10)` but `settings.DB` property doesn't include those keys, so they're always hardcoded 10/20. Add `AE_DB_POOL_SIZE` / `AE_DB_POOL_MAX_OVERFLOW` to `config.py`.
## 📋 Feature Tasks ## 📋 Feature Tasks