- Expand dim_countries.sql CASE to cover 22 missing countries (PL, RO,
CO, HU, ZA, KE, BR, CZ, QA, NZ, HR, LV, MT, CR, CY, PA, SV, DO,
PE, VE, EE, ID) that fell through to bare ISO codes
- Add 19 missing entries to COUNTRY_LABELS (i18n.py) + both locale files
(EN + DE dir_country_* keys) including IE which was in SQL but not i18n
- Localise map tooltips: routes.py injects country_name via
get_country_name(), JS uses c.country_name instead of c.country_name_en
- Localise dropdown: apply country_name filter to option labels
- Show avg + top score in map tooltip with separate color dots and new
map_score_avg / map_score_top i18n keys (EN: "Avg. Score" / "Top City",
DE: "Ø Score" / "Top-Stadt")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:
1. dim_locations now sources venues from dim_venues (deduplicated OSM + Playtomic)
instead of stg_padel_courts (OSM only). Playtomic-only venues are no longer
invisible to spatial lookups.
2. Country-level supply saturation dampener on supply deficit component.
Saturated countries (Spain 7.4/100k) get dampened supply deficit (x0.30 → 12 pts max).
Emerging markets (Germany 0.24/100k) nearly unaffected (x0.98 → ~39 pts).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex
(fixes "Seelow" showing Japanese characters on map)
- dim_locations: filter empty location names after trim
- location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for
median_income_pps and pli_construction, falling back to EU-calibrated
defaults (15K PPS, PLI=100) that produced wrong scores.
New World Bank WDI extractor fetches GNI per capita PPP and price level
ratio for 215 countries. dim_countries uses Germany as calibration anchor
to scale WB values into the Eurostat range (dynamic ratio, self-corrects
as both sources update). EU countries keep exact Eurostat values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5
(res 5, ~8.5km edge). location_profiles and dim_locations updated;
old location_opportunity_profile.sql already removed on master.
Conflict: location_opportunity_profile.sql deleted on master, kept deletion
and applied h3_cell_res4→res5 rename to location_profiles instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to
absorb Bremen (40km away) and destroying score differentiation.
Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst
at 15km) without bleeding into unrelated cities at 40km+.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update transform CLAUDE.md source integration map and conformed
dimensions table. Update CHANGELOG with unified model + tooltip
changes. Fix stale comments in dim_cities.sql and serving README.
Subtask 5/5: documentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6
neighbours) to both the addressable market (25pts) and supply gap
(30pts) components of location_opportunity_profile.
Changes:
- config.yaml: add h3 to DuckDB extensions (requires one-time
INSTALL h3 FROM community on each machine)
- dim_locations: add h3_cell_res4 column via h3_latlng_to_cell()
- location_opportunity_profile: add hex_stats + catchment CTEs;
update score formula to use catchment_population and
catchment_padel_courts; expose catchment_population,
catchment_padel_courts, catchment_venues_per_100k as output cols
Motivation: local population underestimates functional market for
mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden).
H3 k_ring(1) captures the realistic driving-distance catchment
(~462km²) consistently across both score components.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert the availability chain (stg_playtomic_availability →
fct_availability_slot → fct_daily_availability) from FULL to
INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily
intervals instead of re-reading all files.
Supervisor changes:
- run_transform(): plan prod --auto-apply → run prod (evaluates
missing cron intervals, picks up new data)
- git_pull_and_sync(): add plan prod --auto-apply before re-exec
so model code changes are applied on deploy
- supervisor.sh: same plan → run change
Staging model uses a date-scoped glob (@start_ds) to read only
the current interval's files. snapshot_date cast to DATE (was
VARCHAR) as required by time_column.
Clean up redundant TRY_CAST(snapshot_date AS DATE) in
venue_pricing_benchmarks since it's already DATE from foundation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.
Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
dict entries can share one Eurostat API endpoint
Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices — EUR/GJ by country, semi-annual
- stg_labour_costs — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels — PLI indices (EU27=100) for 5 categories, annual
Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
override columns from PLI ratios and energy price ratios vs DE baseline;
NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
with JOIN dim_countries
- dim_locations — same refactor as dim_cities
Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
to /api/market-data endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
for top cities, population for top opportunity cities
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.
Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).
Expected: ~50-200x speedup on the spatial CTE portion of the model.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.
Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues
Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
+ regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz
Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a coordinate-based population lookup as a fallback when string name
matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then
ST_Distance_Sphere to find the nearest GeoNames location in the same country.
Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich.
Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string >
GeoNames spatial > 0.
Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0).
Key cities before/after:
Wien: 0 → 1,691,468
Milano: 0 → 1,371,498
München: already matched by string; verified still correct at 1,488,719
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
top_opportunity_names aggregates
Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part A: Data Layer — Sprints 1-5
Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
(city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
replaces hardcoded 0::BIGINT population
Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
is available to the scoring formula
Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)
Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)
Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
with case/whitespace-insensitive city name matching
Registered all 4 new CLI entrypoints in pyproject.toml and all.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prerequisite for all pSEO serving models. Adds CASE-based country_name_en
and URL-safe country_slug to foundation.dim_cities, then selects them through
serving.city_market_profile so downstream models inherit them automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three workstreams:
1. Playtomic full data extraction & transform pipeline:
- Expand venue bounding boxes from 4 to 23 regions (global coverage)
- New staging models for court resources, opening hours, and slot-level
availability with real prices from the Playtomic API
- Foundation fact tables for venue capacity and daily occupancy/revenue
- City-level pricing benchmarks replacing hardcoded country estimates
- Planner defaults now use 3-tier cascade: city data → country → fallback
2. Transactional email i18n:
- _t() helper in worker.py with ~70 translation keys (EN + DE)
- All 8 email handlers translated, lang passed in task payloads
3. Resend audiences restructured to 3 named audiences (free plan limit)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eurostat JSON-stat format (4-7 dimension sparse dict with 583K values)
causes DuckDB OOM — pre-process in extractor to flat records.
Also fix dim_cities unused CTE bug and playtomic venue lat/lon path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.
New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)
Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>