Files

Deeman a1faddbed6 feat: Python supervisor + feature flags

Supervisor (replaces supervisor.sh):
- supervisor.py — cron-based pipeline orchestration, reads workflows.toml
  on every tick, runs due extractors in topological waves with parallel
  execution, then SQLMesh transform + serving export
- workflows.toml — workflow registry: overpass (monthly), eurostat (monthly),
  playtomic_tenants (weekly), playtomic_availability (daily),
  playtomic_recheck (hourly 6–23)
- padelnomics-supervisor.service — updated ExecStart to Python supervisor

Extraction enhancements:
- proxy.py — optional round-robin/sticky proxy rotation via PROXY_URLS env
- playtomic_availability.py — parallel fetch (EXTRACT_WORKERS), recheck mode
  (main_recheck) re-queries imminent slots for accurate occupancy measurement
- _shared.py — realistic browser User-Agent on all extractor sessions
- stg_playtomic_availability.sql — reads morning + recheck snapshots, tags each
- fct_daily_availability.sql — prefers recheck over morning for same slot

Feature flags (replaces WAITLIST_MODE env var):
- migration 0019 — feature_flags table, 5 initial flags:
  markets (on), payments/planner_export/supplier_signup/lead_unlock (off)
- core.py — is_flag_enabled() + feature_gate() decorator
- routes — payments, markets, planner_export, supplier_signup, lead_unlock gated
- admin flags UI — /admin/flags toggle page + nav link
- app.py — flag() injected as Jinja2 global

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-23 13:53:45 +01:00

macros

feat: copier update v0.9.0 → v0.10.0

2026-02-22 17:50:36 +01:00

models

feat: Python supervisor + feature flags

2026-02-23 13:53:45 +01:00

config.yaml

refactor: flatten padelnomics/padelnomics/ → repo root

2026-02-22 00:44:40 +01:00

pyproject.toml

refactor: flatten padelnomics/padelnomics/ → repo root

2026-02-22 00:44:40 +01:00

README.md

feat: migrate transform to 3-layer architecture with per-layer schemas

2026-02-22 19:04:40 +01:00

README.md

Padelnomics Transform (SQLMesh)

3-layer SQL transformation pipeline using SQLMesh + DuckDB. Reads from the landing zone, produces analytics-ready tables consumed by the web app via an atomically-swapped serving DB.

Running

# From repo root — plan all changes (shows what will run)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan

# Apply to production
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod

# Run model tests
uv run sqlmesh -p transform/sqlmesh_padelnomics test

# Format SQL
uv run sqlmesh -p transform/sqlmesh_padelnomics format

# Export serving tables to analytics.duckdb (run after SQLMesh)
DUCKDB_PATH=data/lakehouse.duckdb SERVING_DUCKDB_PATH=data/analytics.duckdb \
    uv run python -m padelnomics.export_serving

3-layer architecture

landing/                    ← raw files (extraction output)
  ├── overpass/*/*/courts.json.gz
  ├── eurostat/*/*/urb_cpop1.json.gz
  └── playtomic/*/*/tenants.json.gz

staging/                    ← reads landing files directly, type casting, dedup
  ├── staging.stg_padel_courts
  ├── staging.stg_playtomic_venues
  └── staging.stg_population

foundation/                 ← business logic, dimensions, facts
  ├── foundation.dim_venues
  └── foundation.dim_cities

serving/                    ← pre-aggregated for web app
  ├── serving.city_market_profile
  └── serving.planner_defaults

staging/ — read landing files + type casting

Reads landing zone JSON files directly with read_json(..., format='auto', filename=true)
Uses @LANDING_DIR variable for file path discovery
Casts all columns to correct types: TRY_CAST(... AS DOUBLE)
Deduplicates where source produces duplicates (ROW_NUMBER partitioned on ID)
Validates coordinates, nulls, and data quality inline
Naming: staging.stg_<source>

foundation/ — business logic

Dimensions (dim_*): slowly changing attributes, one row per entity
Facts (fact_*): events and measurements, one row per event
May join across multiple staging models from different sources
Naming: foundation.dim_<entity>, foundation.fact_<event>

serving/ — analytics-ready aggregates

Pre-aggregated for specific web app query patterns
These are the only tables the web app reads (via analytics.duckdb)
Queried from analytics.py via fetch_analytics()
Naming: serving.<purpose>

Two-DuckDB architecture

data/lakehouse.duckdb       ← SQLMesh exclusive write (DUCKDB_PATH)
  ├── staging.*
  ├── foundation.*
  └── serving.*

data/analytics.duckdb       ← web app read-only (SERVING_DUCKDB_PATH)
  └── serving.*             ← atomically replaced by export_serving.py

SQLMesh holds an exclusive write lock on lakehouse.duckdb during plan/run. The web app needs read-only access at all times. export_serving.py copies serving.* tables to a temp file, then atomically renames it to analytics.duckdb. The web app detects the inode change on next query — no restart needed.

Never point DUCKDB_PATH and SERVING_DUCKDB_PATH to the same file.

Adding a new data source

Add an extractor in extract/padelnomics_extract/ (see extraction README)
Add a staging model: models/staging/stg_<source>.sql that reads landing files directly
Join into foundation or serving models as needed

Model materialization

Layer	Default kind	Rationale
staging	FULL	Re-reads all landing files; cheap with DuckDB parallel scan
foundation	FULL	Business logic rarely changes; recompute is fast
serving	FULL	Small aggregates; web app needs latest at all times

For large historical tables, switch to kind INCREMENTAL_BY_TIME_RANGE with a time partition column.

Environment variables

Variable	Default	Description
`LANDING_DIR`	`data/landing`	Root of the landing zone
`DUCKDB_PATH`	`data/lakehouse.duckdb`	DuckDB file (SQLMesh exclusive write access)
`SERVING_DUCKDB_PATH`	`data/analytics.duckdb`	Serving DB (web app reads from here)