- Expand dim_countries.sql CASE to cover 22 missing countries (PL, RO,
CO, HU, ZA, KE, BR, CZ, QA, NZ, HR, LV, MT, CR, CY, PA, SV, DO,
PE, VE, EE, ID) that fell through to bare ISO codes
- Add 19 missing entries to COUNTRY_LABELS (i18n.py) + both locale files
(EN + DE dir_country_* keys) including IE which was in SQL but not i18n
- Localise map tooltips: routes.py injects country_name via
get_country_name(), JS uses c.country_name instead of c.country_name_en
- Localise dropdown: apply country_name filter to option labels
- Show avg + top score in map tooltip with separate color dots and new
map_score_avg / map_score_top i18n keys (EN: "Avg. Score" / "Top City",
DE: "Ø Score" / "Top-Stadt")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The @start_ds in the glob pattern only matched files for the first day
of the batch, so incremental restates only loaded 1 day of data.
Changed to wildcard glob with explicit BETWEEN @start_ds AND @end_ds
filter on the date column.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:
1. dim_locations now sources venues from dim_venues (deduplicated OSM + Playtomic)
instead of stg_padel_courts (OSM only). Playtomic-only venues are no longer
invisible to spatial lookups.
2. Country-level supply saturation dampener on supply deficit component.
Saturated countries (Spain 7.4/100k) get dampened supply deficit (x0.30 → 12 pts max).
Emerging markets (Germany 0.24/100k) nearly unaffected (x0.98 → ~39 pts).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex
(fixes "Seelow" showing Japanese characters on map)
- dim_locations: filter empty location names after trim
- location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for
median_income_pps and pli_construction, falling back to EU-calibrated
defaults (15K PPS, PLI=100) that produced wrong scores.
New World Bank WDI extractor fetches GNI per capita PPP and price level
ratio for 215 countries. dim_countries uses Germany as calibration anchor
to scale WB values into the Eurostat range (dynamic ratio, self-corrects
as both sources update). EU countries keep exact Eurostat values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST)
Eliminates ~80% correlated double-count on a single signal.
- Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy.
Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0).
- Add construction affordability (5pts): income relative to PLI construction costs.
Joins dim_countries.pli_construction. High income + low build cost = high score.
- Reduce economic power from 20 → 15pts to make room.
New weights: addressable market 25, economic power 15, supply deficit 35,
sports culture 10, construction affordability 5, market validation 10.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A. location_profiles.sql: supply gap now uses GREATEST(catchment_padel_courts,
COALESCE(city_padel_venue_count, 0)) so Playtomic venues prevent cities like
Murcia/Cordoba/Gijon from receiving a full 30-pt supply gap bonus when their
OSM catchment count is zero. Expected ~10-15 pt drop for affected ES cities.
B. pseo_country_overview.sql: add population-weighted lat/lon centroid columns
so the markets map can use accurate country positions from this table.
C/D. content/routes.py + markets.html: query pseo_country_overview in the route
and pass as map_countries to the template, replacing the fetch('/api/...') call
with inline JSON. Map scores now match pseo_country_overview (pop-weighted),
and the page loads without an extra round-trip.
E. api.py: add @login_required to all 4 endpoints. Unauthenticated callers get
a 302 redirect to login instead of data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two targeted fixes for inflated country scores (ES 83, SE 77):
1. pseo_country_overview: replace AVG() with population-weighted averages
for avg_opportunity_score and avg_market_score. Madrid/Barcelona now
dominate Spain's average instead of hundreds of 30K-town white-space
towns. Expected ES drop from ~83 to ~55-65.
2. location_profiles: replace dead sports culture component (10 pts,
tennis data all zeros) with market validation signal.
Split scored CTE into: market_scored → country_market → scored.
country_market aggregates AVG(market_score) per country from cities
with padel courts (market_score > 0), so zero-court locations don't
dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts.
NULL → 0.5 neutral → 5 pts (untested market, not penalised).
Score budget unchanged: 25+20+30+15+10 = 100 pts.
No new models, no new data sources, no cycles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5
(res 5, ~8.5km edge). location_profiles and dim_locations updated;
old location_opportunity_profile.sql already removed on master.
Conflict: location_opportunity_profile.sql deleted on master, kept deletion
and applied h3_cell_res4→res5 rename to location_profiles instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to
absorb Bremen (40km away) and destroying score differentiation.
Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst
at 15km) without bleeding into unrelated cities at 40km+.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update transform CLAUDE.md source integration map and conformed
dimensions table. Update CHANGELOG with unified model + tooltip
changes. Fix stale comments in dim_cities.sql and serving README.
Subtask 5/5: documentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Combines city_market_profile and location_opportunity_profile into a
single serving model at (country_code, geoname_id) grain. Both Market
Score and Opportunity Score computed per location. City data enriched
via LEFT JOIN dim_cities on geoname_id.
Subtask 1/5: create new model (old models not yet removed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQLMesh's extensions config supports dict form with 'repository' key,
which runs INSTALL h3 FROM community + LOAD h3 automatically at connect
time. No manual one-time install needed per machine.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6
neighbours) to both the addressable market (25pts) and supply gap
(30pts) components of location_opportunity_profile.
Changes:
- config.yaml: add h3 to DuckDB extensions (requires one-time
INSTALL h3 FROM community on each machine)
- dim_locations: add h3_cell_res4 column via h3_latlng_to_cell()
- location_opportunity_profile: add hex_stats + catchment CTEs;
update score formula to use catchment_population and
catchment_padel_courts; expose catchment_population,
catchment_padel_courts, catchment_venues_per_100k as output cols
Motivation: local population underestimates functional market for
mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden).
H3 k_ring(1) captures the realistic driving-distance catchment
(~462km²) consistently across both score components.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert the availability chain (stg_playtomic_availability →
fct_availability_slot → fct_daily_availability) from FULL to
INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily
intervals instead of re-reading all files.
Supervisor changes:
- run_transform(): plan prod --auto-apply → run prod (evaluates
missing cron intervals, picks up new data)
- git_pull_and_sync(): add plan prod --auto-apply before re-exec
so model code changes are applied on deploy
- supervisor.sh: same plan → run change
Staging model uses a date-scoped glob (@start_ds) to read only
the current interval's files. snapshot_date cast to DATE (was
VARCHAR) as required by time_column.
Clean up redundant TRY_CAST(snapshot_date AS DATE) in
venue_pricing_benchmarks since it's already DATE from foundation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Self-hosted Leaflet 1.9.4 maps across 4 placements: markets hub
country bubbles, country overview city bubbles, city venue dots, and
a standalone opportunity map. New /api blueprint with 4 JSON endpoints.
New city_venue_locations SQLMesh serving model. No CDN — GDPR-safe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
# CHANGELOG.md
New serving model: city_venue_locations joins dim_venues + dim_cities
to expose lat/lon/court_count per venue for the city dot map endpoint.
pseo_city_costs_de.sql: add c.lat, c.lon so city-cost articles have
city coordinates for the #city-map data attributes.
city-cost-de.md.jinja: add #city-map div (both DE and EN sections)
after the stats strip. Leaflet init handled by article_detail.html.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.
Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
dict entries can share one Eurostat API endpoint
Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices — EUR/GJ by country, semi-annual
- stg_labour_costs — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels — PLI indices (EU27=100) for 5 categories, annual
Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
override columns from PLI ratios and energy price ratios vs DE baseline;
NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
with JOIN dim_countries
- dim_locations — same refactor as dim_cities
Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
to /api/market-data endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
for top cities, population for top opportunity cities
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.
Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).
Expected: ~50-200x speedup on the spatial CTE portion of the model.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server has cities_global.jsonl.gz (JSONL), not cities_global.json.gz (blob).
TigerStyle clean break — removed blob_rows CTE and UNION ALL.
Simplified to a single SELECT directly from read_json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TigerStyle clean break — no backwards-compat shims for old file formats:
- stg_playtomic_{venues,opening_hours,resources}: glob updated from
*/*/tenants.jsonl.gz (2-level, old weekly) to */*/*/tenants.jsonl.gz
(3-level, new daily YYYY/MM/DD partition); blob tenants.json.gz CTE removed
- stg_playtomic_availability: morning_blob and recheck_blob CTEs removed;
only JSONL format (availability_*.jsonl.gz) is read going forward
Verified locally: stg_playtomic_venues evaluates to 14231 venues from
2026/02/28/tenants.jsonl.gz with 0 errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The overpass_tennis extractor has written JSONL-only since it was added.
The dual-format UNION ALL was backwards-compat debt that broke the
transform once no courts.json.gz files exist on the server:
IO Error: No files found that match the pattern
"data/landing/overpass_tennis/*/*/courts.json.gz"
Remove blob_elements CTE and the UNION ALL. Only read JSONL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.
Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues
Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
+ regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz
Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a coordinate-based population lookup as a fallback when string name
matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then
ST_Distance_Sphere to find the nearest GeoNames location in the same country.
Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich.
Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string >
GeoNames spatial > 0.
Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0).
Key cities before/after:
Wien: 0 → 1,691,468
Milano: 0 → 1,371,498
München: already matched by string; verified still correct at 1,488,719
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0
for ALL countries — 20pts flat uplift, zero differentiation.
Fix: /35000 creates real country spread:
LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before)
Default for missing data 100→15000 (developing-market assumption, ~0.43).
Header comment updated to document v2 formula behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.
Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
/200 ceiling — all EU countries get flat 20/20 pts until city-level
income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
NULL nearest_padel_court_km already yields full 15 pts. COALESCE
fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
— 10-court threshold is correct for when data arrives, contributes
0 pts everywhere for now.
Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
top_opportunity_names aggregates
Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`-m padelnomics.export_serving` doesn't resolve because src/ is not
installed as a package in the workspace. Use the direct script path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>