feat(pipeline): tests, docs, and ruff fixes (subtask 6/6)

- Add 29-test suite for all pipeline routes, data helpers, and query
  execution (test_pipeline.py); all 1333 tests pass
- Fix ruff UP041: asyncio.TimeoutError → TimeoutError in analytics.py
- Fix ruff UP036/F401: replace sys.version_info tomllib block with
  plain `import tomllib` (project requires Python 3.11+)
- Fix ruff F841: remove unused `cutoff` variable in pipeline_overview
- Update CHANGELOG.md with Pipeline Console entry
- Update PROJECT.md: add Pipeline Console to Admin Panel done list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-25 13:02:51 +01:00
parent 8f8f7f7acb
commit d637687795
5 changed files with 591 additions and 19 deletions

View File

@@ -7,6 +7,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased] ## [Unreleased]
### Added ### Added
- **Pipeline Console admin section** — full operational visibility into the data engineering pipeline at `/admin/pipeline/`:
- **Overview tab** — extraction status grid (one card per workflow with status dot, schedule, last-run timestamp, error preview), serving table row counts from `_serving_meta.json`, landing zone file stats (per-source file count + total size)
- **Extractions tab** — filterable, paginated run history table from `.state.sqlite` (extractor + status dropdowns, HTMX live filter); stale "running" row detection (amber highlight) with "Mark Failed" button; "Run All Extractors" button enqueues `run_extraction` task
- **Catalog tab** — accordion list of serving tables with row count badges; click-to-expand lazy-loads column schema + 10-row sample data per table
- **Query editor tab** — dark-themed SQL textarea (`Commit Mono`, navy background, electric blue focus glow); schema sidebar (collapsible table/column list with types); Tab-key indent and Cmd/Ctrl+Enter submit; results table with sticky headers + row count + elapsed time; query security (read-only DuckDB, blocklist regex, 10k char limit, 1000 row cap, 10s timeout)
- **`analytics.execute_user_query()`** — new function returning `(columns, rows, error, elapsed_ms)` for admin query editor
- **`worker.run_extraction` task** — background handler shells out to `uv run extract` from repo root (2h timeout)
- 29 new tests covering all routes, data access helpers, security checks, and `execute_user_query()`
- **Email template system** — all 11 transactional emails migrated from inline f-string HTML in `worker.py` to Jinja2 templates: - **Email template system** — all 11 transactional emails migrated from inline f-string HTML in `worker.py` to Jinja2 templates:
- **Standalone renderer** (`email_templates.py`) — `render_email_template()` uses a module-level `jinja2.Environment` with `autoescape=True`, works outside Quart request context (worker process); `tformat` filter mirrors the one in `app.py` - **Standalone renderer** (`email_templates.py`) — `render_email_template()` uses a module-level `jinja2.Environment` with `autoescape=True`, works outside Quart request context (worker process); `tformat` filter mirrors the one in `app.py`
- **`_base.html`** — branded shell (dark header, 3px blue accent, white card body, footer with tagline + copyright); replaces the old `_email_wrap()` helper - **`_base.html`** — branded shell (dark header, 3px blue accent, white card body, footer with tagline + copyright); replaces the old `_email_wrap()` helper

View File

@@ -114,6 +114,7 @@
- [x] **Admin email gallery** (`/admin/emails/gallery`) — card grid of all templates, EN/DE preview in sandboxed iframe, "View in sent log" cross-link; compose page now has HTMX live preview pane - [x] **Admin email gallery** (`/admin/emails/gallery`) — card grid of all templates, EN/DE preview in sandboxed iframe, "View in sent log" cross-link; compose page now has HTMX live preview pane
- [x] **pSEO Engine tab** (`/admin/pseo`) — content gap detection, data freshness signals, article health checks (hreflang orphans, missing build files, broken scenario refs), generation job monitoring with live progress bars - [x] **pSEO Engine tab** (`/admin/pseo`) — content gap detection, data freshness signals, article health checks (hreflang orphans, missing build files, broken scenario refs), generation job monitoring with live progress bars
- [x] **Marketplace admin dashboard** (`/admin/marketplace`) — lead funnel, credit economy, supplier engagement, live activity stream, inline feature flag toggles - [x] **Marketplace admin dashboard** (`/admin/marketplace`) — lead funnel, credit economy, supplier engagement, live activity stream, inline feature flag toggles
- [x] **Pipeline Console** (`/admin/pipeline`) — 4-tab operational dashboard: extraction status grid per source, filterable run history with stale-run management ("Mark Failed"), data catalog with column schema + 10-row sample, SQL query editor with dark-themed textarea + schema sidebar + read-only security sandboxing (keyword blocklist, 10s timeout, 1,000-row cap)
- [x] **Lead matching notifications**`notify_matching_suppliers` task on quote verification + `send_weekly_lead_digest` every Monday; one-click CTA token in forward emails - [x] **Lead matching notifications**`notify_matching_suppliers` task on quote verification + `send_weekly_lead_digest` every Monday; one-click CTA token in forward emails
- [x] **Migration 0022**`status_updated_at`, `supplier_note`, `cta_token` on `lead_forwards`; supplier respond endpoint; inline HTMX lead detail actions; extended quote form fields - [x] **Migration 0022**`status_updated_at`, `supplier_note`, `cta_token` on `lead_forwards`; supplier respond endpoint; inline HTMX lead detail actions; extended quote form fields

View File

@@ -25,8 +25,7 @@ import logging
import os import os
import re import re
import sqlite3 import sqlite3
import sys import tomllib
import time
from datetime import UTC, datetime, timedelta from datetime import UTC, datetime, timedelta
from pathlib import Path from pathlib import Path
@@ -307,18 +306,7 @@ def _load_workflows() -> list[dict]:
if not _WORKFLOWS_TOML.exists(): if not _WORKFLOWS_TOML.exists():
return [] return []
if sys.version_info >= (3, 11):
import tomllib
data = tomllib.loads(_WORKFLOWS_TOML.read_text()) data = tomllib.loads(_WORKFLOWS_TOML.read_text())
else:
# Fallback for older Python (shouldn't happen — project requires 3.11+)
try:
import tomli as tomllib # type: ignore[no-redef]
data = tomllib.loads(_WORKFLOWS_TOML.read_text())
except ImportError:
return []
workflows = [] workflows = []
for name, config in data.items(): for name, config in data.items():
@@ -422,9 +410,6 @@ async def pipeline_overview():
latest_by_name = {r["extractor"]: r for r in latest_runs} latest_by_name = {r["extractor"]: r for r in latest_runs}
# Enrich each workflow with its latest run data # Enrich each workflow with its latest run data
cutoff = (datetime.now(UTC) - timedelta(hours=_STALE_THRESHOLD_HOURS)).strftime(
"%Y-%m-%dT%H:%M:%SZ"
)
workflow_rows = [] workflow_rows = []
for wf in workflows: for wf in workflows:
run = latest_by_name.get(wf["name"]) run = latest_by_name.get(wf["name"])

View File

@@ -74,7 +74,7 @@ async def fetch_analytics(sql: str, params: list | None = None) -> list[dict[str
asyncio.to_thread(_run), asyncio.to_thread(_run),
timeout=_QUERY_TIMEOUT_SECONDS, timeout=_QUERY_TIMEOUT_SECONDS,
) )
except asyncio.TimeoutError: except TimeoutError:
logger.error("DuckDB analytics query timed out after %ds: %.200s", _QUERY_TIMEOUT_SECONDS, sql) logger.error("DuckDB analytics query timed out after %ds: %.200s", _QUERY_TIMEOUT_SECONDS, sql)
return [] return []
except Exception: except Exception:
@@ -123,5 +123,5 @@ async def execute_user_query(
asyncio.to_thread(_run), asyncio.to_thread(_run),
timeout=timeout_seconds, timeout=timeout_seconds,
) )
except asyncio.TimeoutError: except TimeoutError:
return [], [], f"Query timed out after {timeout_seconds}s.", 0.0 return [], [], f"Query timed out after {timeout_seconds}s.", 0.0

578
web/tests/test_pipeline.py Normal file
View File

@@ -0,0 +1,578 @@
"""
Tests for the Pipeline Console admin blueprint.
Covers:
- admin/pipeline_routes.py: all 9 routes
- analytics.py: execute_user_query() function
- Data access functions: state DB, serving meta, landing zone
"""
import json
import sqlite3
import tempfile
from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
import padelnomics.admin.pipeline_routes as pipeline_mod
import pytest
from padelnomics.core import utcnow_iso
# ── Fixtures ──────────────────────────────────────────────────────────────────
@pytest.fixture
async def admin_client(app, db):
"""Authenticated admin test client."""
now = utcnow_iso()
async with db.execute(
"INSERT INTO users (email, name, created_at) VALUES (?, ?, ?)",
("pipeline-admin@test.com", "Pipeline Admin", now),
) as cursor:
admin_id = cursor.lastrowid
await db.execute(
"INSERT INTO user_roles (user_id, role) VALUES (?, 'admin')", (admin_id,)
)
await db.commit()
async with app.test_client() as c:
async with c.session_transaction() as sess:
sess["user_id"] = admin_id
yield c
@pytest.fixture
def state_db_dir():
"""Temp directory with a seeded .state.sqlite for testing."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = Path(tmpdir) / ".state.sqlite"
conn = sqlite3.connect(str(db_path))
conn.execute(
"""
CREATE TABLE extraction_runs (
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
extractor TEXT NOT NULL,
started_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ', 'now')),
finished_at TEXT,
status TEXT NOT NULL DEFAULT 'running',
files_written INTEGER DEFAULT 0,
files_skipped INTEGER DEFAULT 0,
bytes_written INTEGER DEFAULT 0,
cursor_value TEXT,
error_message TEXT
)
"""
)
conn.executemany(
"""INSERT INTO extraction_runs
(extractor, started_at, finished_at, status, files_written, bytes_written, error_message)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
[
("overpass", "2026-02-01T08:00:00Z", "2026-02-01T08:05:00Z", "success", 1, 400000, None),
("playtomic_tenants", "2026-02-24T06:00:00Z", "2026-02-24T06:10:00Z", "success", 1, 7700000, None),
("playtomic_availability", "2026-02-25T06:00:00Z", "2026-02-25T07:30:00Z", "failed", 0, 0, "ReadTimeout: connection timed out"),
# Stale running row (started 1970)
("eurostat", "1970-01-01T00:00:00Z", None, "running", 0, 0, None),
],
)
conn.commit()
conn.close()
yield tmpdir
@pytest.fixture
def serving_meta_dir():
"""Temp directory with a _serving_meta.json file."""
with tempfile.TemporaryDirectory() as tmpdir:
meta = {
"exported_at_utc": "2026-02-25T08:30:00+00:00",
"tables": {
"city_market_profile": {"row_count": 612},
"planner_defaults": {"row_count": 612},
"pseo_city_costs_de": {"row_count": 487},
},
}
(Path(tmpdir) / "_serving_meta.json").write_text(json.dumps(meta))
# Fake duckdb file so the path exists
(Path(tmpdir) / "analytics.duckdb").touch()
yield tmpdir
# ── Schema + query mocks ──────────────────────────────────────────────────────
_MOCK_SCHEMA_ROWS = [
{"table_name": "city_market_profile", "column_name": "city_slug", "data_type": "VARCHAR", "ordinal_position": 1},
{"table_name": "city_market_profile", "column_name": "country_code", "data_type": "VARCHAR", "ordinal_position": 2},
{"table_name": "city_market_profile", "column_name": "marktreife_score", "data_type": "DOUBLE", "ordinal_position": 3},
{"table_name": "planner_defaults", "column_name": "city_slug", "data_type": "VARCHAR", "ordinal_position": 1},
]
_MOCK_TABLE_EXISTS = [{"1": 1}]
_MOCK_SAMPLE_ROWS = [
{"city_slug": "berlin", "country_code": "DE", "marktreife_score": 82.5},
{"city_slug": "munich", "country_code": "DE", "marktreife_score": 77.0},
]
def _make_fetch_analytics_mock(schema=True):
"""Return an async mock for fetch_analytics that returns schema or table data."""
async def _mock(sql, params=None):
if "information_schema.tables" in sql:
return _MOCK_TABLE_EXISTS
if "information_schema.columns" in sql and params:
return [r for r in _MOCK_SCHEMA_ROWS if r["table_name"] == params[0]]
if "information_schema.columns" in sql:
return _MOCK_SCHEMA_ROWS
if "city_market_profile" in sql:
return _MOCK_SAMPLE_ROWS
return []
return _mock
# ════════════════════════════════════════════════════════════════════════════
# Dashboard
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_pipeline_dashboard_loads(admin_client, state_db_dir, serving_meta_dir):
"""Dashboard returns 200 with stat cards."""
with (
patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir),
patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")),
):
resp = await admin_client.get("/admin/pipeline/")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "Data Pipeline" in data
assert "Total Runs" in data
assert "Success Rate" in data
assert "Serving Tables" in data
@pytest.mark.asyncio
async def test_pipeline_dashboard_requires_admin(client):
"""Unauthenticated access redirects to login."""
resp = await client.get("/admin/pipeline/")
assert resp.status_code in (302, 401)
@pytest.mark.asyncio
async def test_pipeline_dashboard_stale_warning(admin_client, state_db_dir, serving_meta_dir):
"""Stale run banner appears when a running row is old."""
with (
patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir),
patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")),
):
resp = await admin_client.get("/admin/pipeline/")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "stale run" in data.lower()
# ════════════════════════════════════════════════════════════════════════════
# Overview tab
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_pipeline_overview(admin_client, state_db_dir, serving_meta_dir):
"""Overview tab returns extraction status grid and serving table counts."""
with (
patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir),
patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")),
):
resp = await admin_client.get("/admin/pipeline/overview")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "city_market_profile" in data
assert "612" in data # row count from serving meta
@pytest.mark.asyncio
async def test_pipeline_overview_no_state_db(admin_client, serving_meta_dir):
"""Overview handles gracefully when .state.sqlite doesn't exist."""
with tempfile.TemporaryDirectory() as empty_dir:
with (
patch.object(pipeline_mod, "_LANDING_DIR", empty_dir),
patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")),
):
resp = await admin_client.get("/admin/pipeline/overview")
assert resp.status_code == 200
# ════════════════════════════════════════════════════════════════════════════
# Extractions tab
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_pipeline_extractions_list(admin_client, state_db_dir):
"""Extractions tab returns run history table."""
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
resp = await admin_client.get("/admin/pipeline/extractions")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "overpass" in data
assert "playtomic_tenants" in data
assert "success" in data
@pytest.mark.asyncio
async def test_pipeline_extractions_filter_extractor(admin_client, state_db_dir):
"""Extractor filter returns only 1 matching run (not 4)."""
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
resp = await admin_client.get("/admin/pipeline/extractions?extractor=overpass")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "overpass" in data
# Filtered result should show "Showing 1 of 1"
assert "Showing 1 of 1" in data
@pytest.mark.asyncio
async def test_pipeline_extractions_filter_status(admin_client, state_db_dir):
"""Status filter returns only runs with matching status."""
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
resp = await admin_client.get("/admin/pipeline/extractions?status=failed")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "failed" in data
assert "ReadTimeout" in data # error message shown
@pytest.mark.asyncio
async def test_pipeline_mark_stale(admin_client, state_db_dir):
"""POST to mark-stale updates a running row to failed."""
# Find the run_id of the stale running row (eurostat, started 1970)
db_path = Path(state_db_dir) / ".state.sqlite"
conn = sqlite3.connect(str(db_path))
row = conn.execute(
"SELECT run_id FROM extraction_runs WHERE status = 'running' ORDER BY run_id LIMIT 1"
).fetchone()
conn.close()
assert row is not None
run_id = row[0]
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
resp = await admin_client.post(
f"/admin/pipeline/extractions/{run_id}/mark-stale",
form={"csrf_token": "test"},
)
# Should redirect (flash + redirect pattern)
assert resp.status_code in (302, 200)
# Verify DB was updated
conn = sqlite3.connect(str(db_path))
updated = conn.execute(
"SELECT status FROM extraction_runs WHERE run_id = ?", (run_id,)
).fetchone()
conn.close()
assert updated[0] == "failed"
@pytest.mark.asyncio
async def test_pipeline_mark_stale_already_finished(admin_client, state_db_dir):
"""Cannot mark an already-finished (success) row as stale."""
db_path = Path(state_db_dir) / ".state.sqlite"
conn = sqlite3.connect(str(db_path))
row = conn.execute(
"SELECT run_id FROM extraction_runs WHERE status = 'success' ORDER BY run_id LIMIT 1"
).fetchone()
conn.close()
run_id = row[0]
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
resp = await admin_client.post(
f"/admin/pipeline/extractions/{run_id}/mark-stale",
form={"csrf_token": "test"},
)
assert resp.status_code in (302, 200)
# Verify status unchanged
conn = sqlite3.connect(str(db_path))
status = conn.execute(
"SELECT status FROM extraction_runs WHERE run_id = ?", (run_id,)
).fetchone()[0]
conn.close()
assert status == "success"
@pytest.mark.asyncio
async def test_pipeline_trigger_extract(admin_client, state_db_dir):
"""POST to trigger enqueues a run_extraction task and redirects."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
# enqueue is imported inside the route handler, so patch at the source module
with (
patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir),
patch("padelnomics.worker.enqueue", new_callable=AsyncMock) as mock_enqueue,
):
resp = await admin_client.post(
"/admin/pipeline/extract/trigger",
form={"csrf_token": "test"},
)
assert resp.status_code in (302, 200)
mock_enqueue.assert_called_once_with("run_extraction")
# ════════════════════════════════════════════════════════════════════════════
# Catalog tab
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_pipeline_catalog(admin_client, serving_meta_dir):
"""Catalog tab lists serving tables with row counts."""
with (
patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")),
patch("padelnomics.analytics.fetch_analytics", side_effect=_make_fetch_analytics_mock()),
):
resp = await admin_client.get("/admin/pipeline/catalog")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "city_market_profile" in data
assert "612" in data # row count from serving meta
@pytest.mark.asyncio
async def test_pipeline_table_detail(admin_client):
"""Table detail returns columns and sample rows."""
with patch("padelnomics.analytics.fetch_analytics", side_effect=_make_fetch_analytics_mock()):
resp = await admin_client.get("/admin/pipeline/catalog/city_market_profile")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "city_slug" in data
assert "berlin" in data # from sample rows
@pytest.mark.asyncio
async def test_pipeline_table_detail_invalid_name(admin_client):
"""Table name with uppercase characters (invalid) returns 400."""
with patch("padelnomics.analytics.fetch_analytics", side_effect=_make_fetch_analytics_mock()):
resp = await admin_client.get("/admin/pipeline/catalog/InvalidTableName")
assert resp.status_code in (400, 404)
@pytest.mark.asyncio
async def test_pipeline_table_detail_unknown_table(admin_client):
"""Non-existent table returns 404."""
async def _empty_fetch(sql, params=None):
return []
with patch("padelnomics.analytics.fetch_analytics", side_effect=_empty_fetch):
resp = await admin_client.get("/admin/pipeline/catalog/nonexistent_table")
assert resp.status_code == 404
# ════════════════════════════════════════════════════════════════════════════
# Query editor
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_pipeline_query_editor_loads(admin_client):
"""Query editor tab returns textarea and schema sidebar."""
with patch("padelnomics.analytics.fetch_analytics", side_effect=_make_fetch_analytics_mock()):
resp = await admin_client.get("/admin/pipeline/query")
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "query-editor" in data
assert "schema-panel" in data
assert "city_market_profile" in data
@pytest.mark.asyncio
async def test_pipeline_query_execute_valid(admin_client):
"""Valid SELECT query returns results table."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
mock_result = (
["city_slug", "country_code"],
[("berlin", "DE"), ("munich", "DE")],
None,
12.5,
)
with patch("padelnomics.analytics.execute_user_query", new_callable=AsyncMock, return_value=mock_result):
resp = await admin_client.post(
"/admin/pipeline/query/execute",
form={"csrf_token": "test", "sql": "SELECT city_slug, country_code FROM serving.city_market_profile"},
)
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "berlin" in data
assert "city_slug" in data
@pytest.mark.asyncio
async def test_pipeline_query_execute_blocked_keyword(admin_client):
"""Queries with blocked keywords return an error (no DB call made)."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
with patch("padelnomics.analytics.execute_user_query", new_callable=AsyncMock) as mock_q:
resp = await admin_client.post(
"/admin/pipeline/query/execute",
form={"csrf_token": "test", "sql": "DROP TABLE serving.city_market_profile"},
)
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "blocked" in data.lower() or "error" in data.lower()
mock_q.assert_not_called()
@pytest.mark.asyncio
async def test_pipeline_query_execute_empty(admin_client):
"""Empty SQL returns validation error."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
with patch("padelnomics.analytics.execute_user_query", new_callable=AsyncMock) as mock_q:
resp = await admin_client.post(
"/admin/pipeline/query/execute",
form={"csrf_token": "test", "sql": ""},
)
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "empty" in data.lower() or "error" in data.lower()
mock_q.assert_not_called()
@pytest.mark.asyncio
async def test_pipeline_query_execute_too_long(admin_client):
"""SQL over 10,000 chars returns a length error."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
with patch("padelnomics.analytics.execute_user_query", new_callable=AsyncMock) as mock_q:
resp = await admin_client.post(
"/admin/pipeline/query/execute",
form={"csrf_token": "test", "sql": "SELECT " + "x" * 10_001},
)
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "long" in data.lower() or "error" in data.lower()
mock_q.assert_not_called()
@pytest.mark.asyncio
async def test_pipeline_query_execute_db_error(admin_client):
"""DB error from execute_user_query is displayed as error message."""
async with admin_client.session_transaction() as sess:
sess["csrf_token"] = "test"
mock_result = ([], [], "Table 'foo' not found", 5.0)
with patch("padelnomics.analytics.execute_user_query", new_callable=AsyncMock, return_value=mock_result):
resp = await admin_client.post(
"/admin/pipeline/query/execute",
form={"csrf_token": "test", "sql": "SELECT * FROM serving.foo"},
)
assert resp.status_code == 200
data = await resp.get_data(as_text=True)
assert "not found" in data
# ════════════════════════════════════════════════════════════════════════════
# analytics.execute_user_query()
# ════════════════════════════════════════════════════════════════════════════
@pytest.mark.asyncio
async def test_execute_user_query_no_connection():
"""Returns error tuple when _conn is None."""
import padelnomics.analytics as analytics_mod
with patch.object(analytics_mod, "_conn", None):
cols, rows, error, elapsed = await analytics_mod.execute_user_query("SELECT 1")
assert cols == []
assert rows == []
assert error is not None
assert "not available" in error.lower()
@pytest.mark.asyncio
async def test_execute_user_query_timeout():
"""Returns timeout error when query takes too long."""
import asyncio
import padelnomics.analytics as analytics_mod
def _slow():
import time
time.sleep(10)
return [], [], None, 0.0
mock_conn = MagicMock()
async def _slow_thread(_fn):
await asyncio.sleep(10)
with (
patch.object(analytics_mod, "_conn", mock_conn),
patch("padelnomics.analytics.asyncio.to_thread", side_effect=_slow_thread),
):
cols, rows, error, elapsed = await analytics_mod.execute_user_query(
"SELECT 1", timeout_seconds=1
)
assert error is not None
assert "timed out" in error.lower()
# ════════════════════════════════════════════════════════════════════════════
# Unit tests: data access helpers
# ════════════════════════════════════════════════════════════════════════════
def test_fetch_extraction_summary_missing_db():
"""Returns zero-filled dict when state DB doesn't exist."""
with tempfile.TemporaryDirectory() as empty_dir:
with patch.object(pipeline_mod, "_LANDING_DIR", empty_dir):
result = pipeline_mod._fetch_extraction_summary_sync()
assert result["total"] == 0
assert result["stale"] == 0
def test_fetch_extraction_summary_counts(state_db_dir):
"""Returns correct total/success/failed/running/stale counts."""
with patch.object(pipeline_mod, "_LANDING_DIR", state_db_dir):
result = pipeline_mod._fetch_extraction_summary_sync()
assert result["total"] == 4
assert result["success"] == 2
assert result["failed"] == 1
assert result["running"] == 1
assert result["stale"] == 1 # eurostat started in 1970
def test_load_serving_meta(serving_meta_dir):
"""Parses _serving_meta.json correctly."""
with patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(serving_meta_dir) / "analytics.duckdb")):
meta = pipeline_mod._load_serving_meta()
assert meta is not None
assert "city_market_profile" in meta["tables"]
assert meta["tables"]["city_market_profile"]["row_count"] == 612
def test_load_serving_meta_missing():
"""Returns None when _serving_meta.json doesn't exist."""
with tempfile.TemporaryDirectory() as empty_dir:
with patch.object(pipeline_mod, "_SERVING_DUCKDB_PATH", str(Path(empty_dir) / "analytics.duckdb")):
meta = pipeline_mod._load_serving_meta()
assert meta is None
def test_format_bytes():
assert pipeline_mod._format_bytes(0) == "0 B"
assert pipeline_mod._format_bytes(512) == "512 B"
assert pipeline_mod._format_bytes(1536) == "1.5 KB"
assert pipeline_mod._format_bytes(1_572_864) == "1.5 MB"
def test_duration_str():
assert pipeline_mod._duration_str("2026-02-01T08:00:00Z", "2026-02-01T08:00:45Z") == "45s"
assert pipeline_mod._duration_str("2026-02-01T08:00:00Z", "2026-02-01T08:02:30Z") == "2m 30s"
assert pipeline_mod._duration_str(None, "2026-02-01T08:00:00Z") == ""