Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Padelnomics Transform (SQLMesh)
3-layer SQL transformation pipeline using SQLMesh + DuckDB. Reads from the landing zone, produces analytics-ready tables consumed by the web app via an atomically-swapped serving DB.
Running
# From repo root — plan all changes (shows what will run)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
# Apply to production
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
# Run model tests
uv run sqlmesh -p transform/sqlmesh_padelnomics test
# Format SQL
uv run sqlmesh -p transform/sqlmesh_padelnomics format
# Export serving tables to analytics.duckdb (run after SQLMesh)
DUCKDB_PATH=data/lakehouse.duckdb SERVING_DUCKDB_PATH=data/analytics.duckdb \
uv run python src/padelnomics/export_serving.py
3-layer architecture
landing/ ← raw files (extraction output)
├── overpass/*/*/courts.json.gz
├── eurostat/*/*/urb_cpop1.json.gz
└── playtomic/*/*/tenants.json.gz
staging/ ← reads landing files directly, type casting, dedup
├── staging.stg_padel_courts
├── staging.stg_playtomic_venues
└── staging.stg_population
foundation/ ← business logic, dimensions, facts
├── foundation.dim_venues ← conformed venue dimension (Playtomic + OSM)
├── foundation.dim_cities ← conformed city dimension (venue-derived + Eurostat)
├── foundation.dim_venue_capacity ← static capacity attributes per venue
├── foundation.fct_availability_slot ← event-grain: one row per deduplicated slot
└── foundation.fct_daily_availability← venue-day aggregate: occupancy + revenue estimates
serving/ ← pre-aggregated for web app
├── serving.city_market_profile
└── serving.planner_defaults
staging/ — read landing files + type casting
- Reads landing zone JSON files directly with
read_json(..., format='auto', filename=true) - Uses
@LANDING_DIRvariable for file path discovery - Casts all columns to correct types:
TRY_CAST(... AS DOUBLE) - Deduplicates where source produces duplicates (ROW_NUMBER partitioned on ID)
- Validates coordinates, nulls, and data quality inline
- Naming:
staging.stg_<source>
foundation/ — business logic
- Dimensions (
dim_*): slowly changing attributes, one row per entity - Facts (
fact_*): events and measurements, one row per event - May join across multiple staging models from different sources
- Naming:
foundation.dim_<entity>,foundation.fact_<event>
serving/ — analytics-ready aggregates
- Pre-aggregated for specific web app query patterns
- These are the only tables the web app reads (via
analytics.duckdb) - Queried from
analytics.pyviafetch_analytics() - Naming:
serving.<purpose>
Two-DuckDB architecture
data/lakehouse.duckdb ← SQLMesh exclusive write (DUCKDB_PATH)
├── staging.*
├── foundation.*
└── serving.*
data/analytics.duckdb ← web app read-only (SERVING_DUCKDB_PATH)
└── serving.* ← atomically replaced by export_serving.py
SQLMesh holds an exclusive write lock on lakehouse.duckdb during plan/run.
The web app needs read-only access at all times. export_serving.py copies
serving.* tables to a temp file, then atomically renames it to analytics.duckdb.
The web app detects the inode change on next query — no restart needed.
Never point DUCKDB_PATH and SERVING_DUCKDB_PATH to the same file.
Adding a new data source
- Add an extractor in
extract/padelnomics_extract/(see extraction README) - Add a staging model:
models/staging/stg_<source>.sqlthat reads landing files directly - Join into foundation or serving models as needed
Model materialization
| Layer | Default kind | Rationale |
|---|---|---|
| staging | FULL | Re-reads all landing files; cheap with DuckDB parallel scan |
| foundation | FULL | Business logic rarely changes; recompute is fast |
| serving | FULL | Small aggregates; web app needs latest at all times |
For large historical tables, switch to kind INCREMENTAL_BY_TIME_RANGE with a time partition column.
Environment variables
| Variable | Default | Description |
|---|---|---|
LANDING_DIR |
data/landing |
Root of the landing zone |
DUCKDB_PATH |
data/lakehouse.duckdb |
DuckDB file (SQLMesh exclusive write access) |
SERVING_DUCKDB_PATH |
data/analytics.duckdb |
Serving DB (web app reads from here) |