- partials/pipeline_catalog.html: accordion list of serving tables with
row count badges, column count, click-to-expand lazy-loaded detail
- partials/pipeline_table_detail.html: column schema grid + sticky-header
sample data table (10 rows, truncated values with title attribute)
- JS: toggleCatalogTable() + htmx.trigger(content, 'revealed') for
lazy-loading detail only on first open
Subtask 4 of 6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Filterable extraction run history table (extractor + status dropdowns,
HTMX live filter via 'change' trigger)
- Status badges with stale row highlighting (amber background)
- 'Mark Failed' button for stuck 'running' rows (with confirm dialog)
- 'Run All Extractors' trigger button
- Pagination via hx-vals
Subtask 3 of 6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pipeline.html: 4 stat cards (total runs, success rate, serving tables,
last export) + stale-run warning banner + tab bar (Overview/Extractions/
Catalog/Query) + tab container (lazy-loaded via HTMX on page load)
- partials/pipeline_overview.html: extraction status grid (one card per
workflow with status dot, schedule, last run timestamp, error preview),
serving freshness table (row counts per table), landing zone file stats
Subtask 2 of 6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates minimal .jsonl.gz and .json.gz seed files so all SQLMesh staging
models can compile and run before real extraction data arrives.
Each seed has a single null record filtered by the staging model's WHERE
clause (tenant_id IS NOT NULL, geoname_id IS NOT NULL, type IS NOT NULL, etc).
Covers both formats (JSONL + blob) for the UNION ALL transition CTEs:
playtomic/1970/01/: tenants.{jsonl,json}.gz, availability seeds (morning + recheck)
geonames/1970/01/: cities_global.{jsonl,json}.gz
overpass_tennis/1970/01/: courts.{jsonl,json}.gz
overpass/1970/01/: courts.json.gz (padel, unchanged format)
eurostat/1970/01/: urb_cpop1.json.gz, ilc_di03.json.gz
eurostat_city_labels/1970/01/: cities_codelist.json.gz
ons_uk/1970/01/: lad_population.json.gz
census_usa/1970/01/: acs5_places.json.gz
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 10 email handlers now use render_email_template(). The two legacy
inline-HTML helpers are no longer needed and have been removed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace single global Overpass query (150K+ elements, times out) with
10 regional bbox queries (~10-40K elements each, 150s server / 180s client).
- REGIONS: 10 bboxes covering all continents
- Crash recovery: working.jsonl accumulates per-region results;
already_seen_ids deduplication skips re-written elements on restart
- Overlapping bbox elements deduped by OSM id across regions
- Retry per region: up to 2 retries with 30s cooldown
- Polite 5s inter-region delay
- Skip if courts.jsonl.gz or courts.json.gz already exists for the month
stg_tennis_courts: UNION ALL transition (jsonl_elements + blob_elements)
- jsonl_elements: JSONL, explicit columns, COALESCE lat/lon with center coords
(supports both node direct lat/lon and way/relation Overpass out center)
- blob_elements: existing UNNEST(elements) pattern, unchanged
- Removed osm_type='node' filter — ways/relations now usable via center coords
- Dedup on (osm_id, extracted_date DESC) unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- availability_{date}.jsonl.gz replaces .json.gz for morning snapshots
- Each JSONL line = one venue object with date + captured_at_utc injected
- Eliminates in-memory consolidation: working.jsonl IS the final file
(compress_jsonl_atomic at end instead of write_gzip_atomic blob)
- Crash recovery unchanged: working.jsonl accumulates via flush_partial_batch
- _load_morning_availability tries .jsonl.gz first, falls back to .json.gz
- Skip check covers both formats during transition
- Recheck files stay blob format (small, infrequent)
stg_playtomic_availability: UNION ALL transition (morning_jsonl + morning_blob + recheck_blob)
- morning_jsonl: read_json JSONL, tenant_id direct column, no outer UNNEST
- morning_blob / recheck_blob: subquery + LATERAL UNNEST (unchanged semantics)
- All three produce (snapshot_date, captured_at_utc, snapshot_type, recheck_hour, tenant_id, slots_json)
- Downstream raw_resources / raw_slots CTEs unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add emails/admin_compose.html: branded wrapper for ad-hoc compose body
- Update email_compose.html: two-column layout with HTMX live preview pane
(hx-post, hx-trigger=input delay:500ms, hx-target=#preview-pane)
- Add partials/email_preview_frame.html: sandboxed iframe partial
- Add POST /admin/emails/compose/preview route (no CSRF — read-only render)
- Update email_compose POST handler to use render_email_template() instead
of importing _email_wrap from worker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- playtomic_tenants.py: write each tenant as a JSONL line after dedup,
compress via compress_jsonl_atomic → tenants.jsonl.gz
- playtomic_availability.py: update _load_tenant_ids() to prefer
tenants.jsonl.gz, fall back to tenants.json.gz (transition)
- stg_playtomic_venues.sql: UNION ALL jsonl+blob CTEs for transition;
JSONL reads top-level columns directly, no UNNEST(tenants) needed
- stg_playtomic_resources.sql: same UNION ALL pattern, single UNNEST
for resources in JSONL path vs double UNNEST in blob path
- stg_playtomic_opening_hours.sql: same UNION ALL pattern, opening_hours
as top-level JSON column in JSONL path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Streams a JSONL working file to .jsonl.gz in 1MB chunks (constant memory),
atomic rename via .tmp sibling, deletes source on success. Companion to
write_gzip_atomic() for extractors that stream records incrementally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lead detail:
- contact_email → 📧 email log (pre-filtered), mailto, Send Email compose
- country → leads list filtered by that country
Supplier detail:
- contact_email → 📧 email log (pre-filtered), mailto, Send Email compose
- claimed_by → user detail page (was plain "User #N")
Marketplace dashboard:
- Funnel card numbers are now links: Total → /leads, Verified New →
/leads?status=new, Unlocked → /leads?status=forwarded, Won → /leads?status=closed_won
- Active suppliers number links to /suppliers
Marketplace activity stream:
- lead events → link to lead_detail
- unlock events → supplier name links to supplier_detail, "lead #N" links to lead_detail
- credit events → supplier name links to supplier_detail (query now joins
suppliers table for name; ref2_id exposes supplier_id and lead_id per event)
Email detail:
- Reverse-lookup to_addr against lead_requests + suppliers; renders
linked "Lead #N" / "Supplier Name" chips next to the To field
Email compose:
- Accepts ?to= query param to pre-fill recipient (enables Send Email links)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New visible_from column on lead_requests set to NOW + 2h on both the
direct insert (logged-in user) and the email verification update.
Supplier feed, notify_matching_suppliers, and send_weekly_lead_digest
all filter on visible_from <= datetime('now'), so no lead surfaces to
suppliers before the window expires.
Migration 0023 adds the column and backfills existing verified leads
with created_at so they remain immediately visible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each slot is now rechecked once, at most 30 min before it starts.
Worst-case miss: a booking made 29 min before start.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
60-min window + hourly rechecks = each slot caught exactly once, 0-60 min
before it starts. 90-min window causes double-querying (T-90 and T-30).
Slot duration is irrelevant — it doesn't affect when the slot appears in
the window.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Data analysis of 5,115 venues with slots shows 24.8% have a 90-min minimum
slot duration. A 60-min window would miss those venues entirely with hourly
rechecks. 90 min is correct — covers 30/60/90-min minimum venues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With hourly rechecks and 60-min minimum slots, a 90-min window causes each
slot to be queried twice. 60-min window = each slot caught exactly once in
the recheck immediately before it starts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- notify_matching_suppliers task: on lead verification, finds growth/pro
suppliers whose service_area matches the lead country and sends an
instant alert email (LIMIT 20 suppliers per lead)
- send_weekly_lead_digest task: every Monday 08:00 UTC, sends paid
suppliers a table of new matching leads from the past 7 days they
haven't seen yet (LIMIT 5 per supplier)
- One-click CTA token: forward emails now include a "Mark as contacted"
footer link; clicking sets forward status to 'contacted' immediately
- cta_token stored on lead_forwards after email send
- Supplier lead_respond endpoint: HTMX status update for forwarded leads
(sent / viewed / contacted / quoted / won / lost / no_response)
- Supplier lead_cta_contacted endpoint: handles one-click email CTA,
redirects to dashboard leads tab
- leads/routes.py: enqueue notify_matching_suppliers on quote verification
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds status_updated_at, supplier_note, and cta_token columns to the
lead_forwards table. cta_token gets a unique partial index for fast
one-click email CTA lookups.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- playtomic_tenants.py: batch_size = len(proxy_urls) pages fired in parallel per
batch; each page gets its own session + proxy; sorted(results) ensures
deterministic done-detection; falls back to serial + THROTTLE_SECONDS when no
proxies. Expected speedup: ~2.5 min → ~15 s with 10 proxies.
- .env.dev.sops, .env.prod.sops: remove EXTRACT_WORKERS (now derived from
PROXY_URLS length)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces `python -m padelnomics.app` (Quart's built-in Hypercorn-based
dev runner) with granian directly. Adds granian[reload] extra which
pulls in watchfiles for file-change detection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Quart depends on Hypercorn and uses it in app.run() → run_task().
Removing the silencing caused hypercorn.error noise in dev logs.
Keep both granian and hypercorn logger config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Granian is ~3-5x faster than Hypercorn in benchmarks. No code changes
needed — Quart is standard ASGI so any ASGI server works.
- web/pyproject.toml: hypercorn → granian>=1.6.0 (installed: 2.7.1)
- Dockerfile CMD: hypercorn → granian --interface asgi
- core.py setup_logging(): silence granian loggers instead of hypercorn's
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- all.py: replace sequential loop with graphlib.TopologicalSorter + ThreadPoolExecutor
- EXTRACTORS dict declares (func, [deps]) — self-documenting dependency graph
- 8 extractors run in parallel immediately; availability starts as soon as
tenants finishes (not after all others complete)
- max_workers=len(EXTRACTORS) — all I/O-bound, no CPU contention
- playtomic_tenants.py: add proxy rotation via make_round_robin_cycler
- no throttle when PROXY_URLS set (IP rotation removes per-IP rate concern)
- keeps 2s throttle for direct runs
- _shared.py: add optional proxy_url param to run_extractor()
- any extractor can opt in to proxy support via the shared session
- overpass_tennis.py: fix query timeout (out body → out center, timeout 180 → 300)
- out center returns centroids only, not full geometry — fits within server limits
- playtomic_availability.py: fix CIRCUIT_BREAKER_THRESHOLD empty string crash
- int(os.environ.get(..., "10")) → int(os.environ.get(...) or "10")
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operational dashboard at /admin/pseo for the programmatic SEO system:
content gap detection, data freshness signals, article health checks
(hreflang orphans, missing build files, broken scenario refs), and
live generation job monitoring with HTMX progress bars.
- _serving_meta.json written by export_serving.py after atomic DB swap
- content/health.py: pure async query functions for all health checks
- Migration 0021: progress_current/total/error_log on tasks table
- generate_articles() writes progress every 50 articles + on completion
- admin/pseo_routes.py: 6 routes, standalone blueprint
- 5 HTML templates + sidebar nav + fromjson Jinja filter
- 45 tests (all passing); 2 bugs caught and fixed during testing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
# src/padelnomics/export_serving.py
count_template_data() uses fetch_analytics with a COUNT(*) query.
The pseo_env test fixture's mock returned TEST_ROWS for any unrecognized
query, causing a KeyError on rows[0]["cnt"]. Add a COUNT(*) branch that
returns [{cnt: len(TEST_ROWS)}].
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Records all Phase 1 deliverables: content gaps, data freshness,
health checks, generation job monitoring, 45 tests, bug fixes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- export_serving.py: move `import re` to module level — was imported
inside a loop body on every iteration
- sitemap.py: add comment documenting that the in-memory TTL cache is
process-local (valid for single-worker deployment, Dockerfile --workers 1)
- playtomic_availability.py: use `or "10"` fallback for
CIRCUIT_BREAKER_THRESHOLD env var to handle empty-string case
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>