Files
padelnomics/transform/sqlmesh_padelnomics
Deeman 88ed17484b feat(sql+templates): market_score v3 — log density + count gate
Fixes ranking inversion where Germany (1/100k courts) outscored Spain
(36/100k). Root causes: population/income were 55% of max before any
padel signal, density ceiling saturated 73% of cities, small-town
inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation
discount actively penalised mature markets.

SQL (city_market_profile.sql):
- Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count
  gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town
  inflation without hard cutoffs (1 venue = 20%, 5+ = 100%).
- Demand evidence 25pts: occupancy if available; 40% density proxy
  otherwise. Separated from supply to avoid double-counting.
- Addressable market 15pts: population as context, not maturity.
- Economic context 10pts: income PPS (flat per country, low signal).
- Data quality 10pts.
- Removed saturation discount. High density = maturity.

Verified spot-check scores:
  Málaga (46v, 7.77/100k): 70.1  [was 98.9]
  Barcelona (104v, 6.17/100k): 67.4  [was 100.0]
  Amsterdam (24v, 3.24/100k): 58.4  [was 93.7]
  Bernau bei Berlin (2v, 5.74/100k): 43.9  [was 92.7]
  Berlin (20v, 0.55/100k): 42.2  [was 74.1]
  London (66v, 0.74/100k): 44.1  [was 75.5]

Templates (city-cost-de, country-overview, city-pricing):
- Color coding: green >= 55 (was 65), amber >= 35 (was 40)
- Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45)
- Opportunity interplay: market_score < 40 (was < 50) for white-space

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:40:12 +01:00
..

Padelnomics Transform (SQLMesh)

3-layer SQL transformation pipeline using SQLMesh + DuckDB. Reads from the landing zone, produces analytics-ready tables consumed by the web app via an atomically-swapped serving DB.

Running

# From repo root — plan all changes (shows what will run)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan

# Apply to production
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod

# Run model tests
uv run sqlmesh -p transform/sqlmesh_padelnomics test

# Format SQL
uv run sqlmesh -p transform/sqlmesh_padelnomics format

# Export serving tables to analytics.duckdb (run after SQLMesh)
DUCKDB_PATH=data/lakehouse.duckdb SERVING_DUCKDB_PATH=data/analytics.duckdb \
    uv run python src/padelnomics/export_serving.py

3-layer architecture

landing/                    ← raw files (extraction output)
  ├── overpass/*/*/courts.json.gz
  ├── eurostat/*/*/urb_cpop1.json.gz
  └── playtomic/*/*/tenants.json.gz

staging/                    ← reads landing files directly, type casting, dedup
  ├── staging.stg_padel_courts
  ├── staging.stg_playtomic_venues
  └── staging.stg_population

foundation/                 ← business logic, dimensions, facts
  ├── foundation.dim_venues            ← conformed venue dimension (Playtomic + OSM)
  ├── foundation.dim_cities            ← conformed city dimension (venue-derived + Eurostat)
  ├── foundation.dim_venue_capacity    ← static capacity attributes per venue
  ├── foundation.fct_availability_slot ← event-grain: one row per deduplicated slot
  └── foundation.fct_daily_availability← venue-day aggregate: occupancy + revenue estimates

serving/                    ← pre-aggregated for web app
  ├── serving.city_market_profile
  └── serving.planner_defaults

staging/ — read landing files + type casting

  • Reads landing zone JSON files directly with read_json(..., format='auto', filename=true)
  • Uses @LANDING_DIR variable for file path discovery
  • Casts all columns to correct types: TRY_CAST(... AS DOUBLE)
  • Deduplicates where source produces duplicates (ROW_NUMBER partitioned on ID)
  • Validates coordinates, nulls, and data quality inline
  • Naming: staging.stg_<source>

foundation/ — business logic

  • Dimensions (dim_*): slowly changing attributes, one row per entity
  • Facts (fact_*): events and measurements, one row per event
  • May join across multiple staging models from different sources
  • Naming: foundation.dim_<entity>, foundation.fact_<event>

serving/ — analytics-ready aggregates

  • Pre-aggregated for specific web app query patterns
  • These are the only tables the web app reads (via analytics.duckdb)
  • Queried from analytics.py via fetch_analytics()
  • Naming: serving.<purpose>

Two-DuckDB architecture

data/lakehouse.duckdb       ← SQLMesh exclusive write (DUCKDB_PATH)
  ├── staging.*
  ├── foundation.*
  └── serving.*

data/analytics.duckdb       ← web app read-only (SERVING_DUCKDB_PATH)
  └── serving.*             ← atomically replaced by export_serving.py

SQLMesh holds an exclusive write lock on lakehouse.duckdb during plan/run. The web app needs read-only access at all times. export_serving.py copies serving.* tables to a temp file, then atomically renames it to analytics.duckdb. The web app detects the inode change on next query — no restart needed.

Never point DUCKDB_PATH and SERVING_DUCKDB_PATH to the same file.

Adding a new data source

  1. Add an extractor in extract/padelnomics_extract/ (see extraction README)
  2. Add a staging model: models/staging/stg_<source>.sql that reads landing files directly
  3. Join into foundation or serving models as needed

Model materialization

Layer Default kind Rationale
staging FULL Re-reads all landing files; cheap with DuckDB parallel scan
foundation FULL Business logic rarely changes; recompute is fast
serving FULL Small aggregates; web app needs latest at all times

For large historical tables, switch to kind INCREMENTAL_BY_TIME_RANGE with a time partition column.

Environment variables

Variable Default Description
LANDING_DIR data/landing Root of the landing zone
DUCKDB_PATH data/lakehouse.duckdb DuckDB file (SQLMesh exclusive write access)
SERVING_DUCKDB_PATH data/analytics.duckdb Serving DB (web app reads from here)