- Expand dim_countries.sql CASE to cover 22 missing countries (PL, RO, CO, HU, ZA, KE, BR, CZ, QA, NZ, HR, LV, MT, CR, CY, PA, SV, DO, PE, VE, EE, ID) that fell through to bare ISO codes - Add 19 missing entries to COUNTRY_LABELS (i18n.py) + both locale files (EN + DE dir_country_* keys) including IE which was in SQL but not i18n - Localise map tooltips: routes.py injects country_name via get_country_name(), JS uses c.country_name instead of c.country_name_en - Localise dropdown: apply country_name filter to option labels - Show avg + top score in map tooltip with separate color dots and new map_score_avg / map_score_top i18n keys (EN: "Avg. Score" / "Top City", DE: "Ø Score" / "Top-Stadt") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Padelnomics Transform (SQLMesh)
3-layer SQL transformation pipeline using SQLMesh + DuckDB. Reads from the landing zone, produces analytics-ready tables consumed by the web app via an atomically-swapped serving DB.
Running
# From repo root — plan all changes (shows what will run)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
# Apply to production
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
# Run model tests
uv run sqlmesh -p transform/sqlmesh_padelnomics test
# Format SQL
uv run sqlmesh -p transform/sqlmesh_padelnomics format
# Export serving tables to analytics.duckdb (run after SQLMesh)
DUCKDB_PATH=data/lakehouse.duckdb SERVING_DUCKDB_PATH=data/analytics.duckdb \
uv run python src/padelnomics/export_serving.py
3-layer architecture
landing/ ← raw files (extraction output)
├── overpass/*/*/courts.json.gz
├── eurostat/*/*/urb_cpop1.json.gz
└── playtomic/*/*/tenants.json.gz
staging/ ← reads landing files directly, type casting, dedup
├── staging.stg_padel_courts
├── staging.stg_playtomic_venues
└── staging.stg_population
foundation/ ← business logic, dimensions, facts
├── foundation.dim_venues ← conformed venue dimension (Playtomic + OSM)
├── foundation.dim_cities ← conformed city dimension (venue-derived + Eurostat)
├── foundation.dim_venue_capacity ← static capacity attributes per venue
├── foundation.fct_availability_slot ← event-grain: one row per deduplicated slot
└── foundation.fct_daily_availability← venue-day aggregate: occupancy + revenue estimates
serving/ ← pre-aggregated for web app
├── serving.city_market_profile
└── serving.planner_defaults
staging/ — read landing files + type casting
- Reads landing zone JSON files directly with
read_json(..., format='auto', filename=true) - Uses
@LANDING_DIRvariable for file path discovery - Casts all columns to correct types:
TRY_CAST(... AS DOUBLE) - Deduplicates where source produces duplicates (ROW_NUMBER partitioned on ID)
- Validates coordinates, nulls, and data quality inline
- Naming:
staging.stg_<source>
foundation/ — business logic
- Dimensions (
dim_*): slowly changing attributes, one row per entity - Facts (
fact_*): events and measurements, one row per event - May join across multiple staging models from different sources
- Naming:
foundation.dim_<entity>,foundation.fact_<event>
serving/ — analytics-ready aggregates
- Pre-aggregated for specific web app query patterns
- These are the only tables the web app reads (via
analytics.duckdb) - Queried from
analytics.pyviafetch_analytics() - Naming:
serving.<purpose>
Two-DuckDB architecture
data/lakehouse.duckdb ← SQLMesh exclusive write (DUCKDB_PATH)
├── staging.*
├── foundation.*
└── serving.*
data/analytics.duckdb ← web app read-only (SERVING_DUCKDB_PATH)
└── serving.* ← atomically replaced by export_serving.py
SQLMesh holds an exclusive write lock on lakehouse.duckdb during plan/run.
The web app needs read-only access at all times. export_serving.py copies
serving.* tables to a temp file, then atomically renames it to analytics.duckdb.
The web app detects the inode change on next query — no restart needed.
Never point DUCKDB_PATH and SERVING_DUCKDB_PATH to the same file.
Adding a new data source
- Add an extractor in
extract/padelnomics_extract/(see extraction README) - Add a staging model:
models/staging/stg_<source>.sqlthat reads landing files directly - Join into foundation or serving models as needed
Model materialization
| Layer | Default kind | Rationale |
|---|---|---|
| staging | FULL | Re-reads all landing files; cheap with DuckDB parallel scan |
| foundation | FULL | Business logic rarely changes; recompute is fast |
| serving | FULL | Small aggregates; web app needs latest at all times |
For large historical tables, switch to kind INCREMENTAL_BY_TIME_RANGE with a time partition column.
Environment variables
| Variable | Default | Description |
|---|---|---|
LANDING_DIR |
data/landing |
Root of the landing zone |
DUCKDB_PATH |
data/lakehouse.duckdb |
DuckDB file (SQLMesh exclusive write access) |
SERVING_DUCKDB_PATH |
data/analytics.duckdb |
Serving DB (web app reads from here) |