- stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex
(fixes "Seelow" showing Japanese characters on map)
- dim_locations: filter empty location names after trim
- location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for
median_income_pps and pli_construction, falling back to EU-calibrated
defaults (15K PPS, PLI=100) that produced wrong scores.
New World Bank WDI extractor fetches GNI per capita PPP and price level
ratio for 215 countries. dim_countries uses Germany as calibration anchor
to scale WB values into the Eurostat range (dynamic ratio, self-corrects
as both sources update). EU countries keep exact Eurostat values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST)
Eliminates ~80% correlated double-count on a single signal.
- Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy.
Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0).
- Add construction affordability (5pts): income relative to PLI construction costs.
Joins dim_countries.pli_construction. High income + low build cost = high score.
- Reduce economic power from 20 → 15pts to make room.
New weights: addressable market 25, economic power 15, supply deficit 35,
sports culture 10, construction affordability 5, market validation 10.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A. location_profiles.sql: supply gap now uses GREATEST(catchment_padel_courts,
COALESCE(city_padel_venue_count, 0)) so Playtomic venues prevent cities like
Murcia/Cordoba/Gijon from receiving a full 30-pt supply gap bonus when their
OSM catchment count is zero. Expected ~10-15 pt drop for affected ES cities.
B. pseo_country_overview.sql: add population-weighted lat/lon centroid columns
so the markets map can use accurate country positions from this table.
C/D. content/routes.py + markets.html: query pseo_country_overview in the route
and pass as map_countries to the template, replacing the fetch('/api/...') call
with inline JSON. Map scores now match pseo_country_overview (pop-weighted),
and the page loads without an extra round-trip.
E. api.py: add @login_required to all 4 endpoints. Unauthenticated callers get
a 302 redirect to login instead of data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two targeted fixes for inflated country scores (ES 83, SE 77):
1. pseo_country_overview: replace AVG() with population-weighted averages
for avg_opportunity_score and avg_market_score. Madrid/Barcelona now
dominate Spain's average instead of hundreds of 30K-town white-space
towns. Expected ES drop from ~83 to ~55-65.
2. location_profiles: replace dead sports culture component (10 pts,
tennis data all zeros) with market validation signal.
Split scored CTE into: market_scored → country_market → scored.
country_market aggregates AVG(market_score) per country from cities
with padel courts (market_score > 0), so zero-court locations don't
dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts.
NULL → 0.5 neutral → 5 pts (untested market, not penalised).
Score budget unchanged: 25+20+30+15+10 = 100 pts.
No new models, no new data sources, no cycles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5
(res 5, ~8.5km edge). location_profiles and dim_locations updated;
old location_opportunity_profile.sql already removed on master.
Conflict: location_opportunity_profile.sql deleted on master, kept deletion
and applied h3_cell_res4→res5 rename to location_profiles instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to
absorb Bremen (40km away) and destroying score differentiation.
Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst
at 15km) without bleeding into unrelated cities at 40km+.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update transform CLAUDE.md source integration map and conformed
dimensions table. Update CHANGELOG with unified model + tooltip
changes. Fix stale comments in dim_cities.sql and serving README.
Subtask 5/5: documentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Combines city_market_profile and location_opportunity_profile into a
single serving model at (country_code, geoname_id) grain. Both Market
Score and Opportunity Score computed per location. City data enriched
via LEFT JOIN dim_cities on geoname_id.
Subtask 1/5: create new model (old models not yet removed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQLMesh's extensions config supports dict form with 'repository' key,
which runs INSTALL h3 FROM community + LOAD h3 automatically at connect
time. No manual one-time install needed per machine.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6
neighbours) to both the addressable market (25pts) and supply gap
(30pts) components of location_opportunity_profile.
Changes:
- config.yaml: add h3 to DuckDB extensions (requires one-time
INSTALL h3 FROM community on each machine)
- dim_locations: add h3_cell_res4 column via h3_latlng_to_cell()
- location_opportunity_profile: add hex_stats + catchment CTEs;
update score formula to use catchment_population and
catchment_padel_courts; expose catchment_population,
catchment_padel_courts, catchment_venues_per_100k as output cols
Motivation: local population underestimates functional market for
mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden).
H3 k_ring(1) captures the realistic driving-distance catchment
(~462km²) consistently across both score components.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert the availability chain (stg_playtomic_availability →
fct_availability_slot → fct_daily_availability) from FULL to
INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily
intervals instead of re-reading all files.
Supervisor changes:
- run_transform(): plan prod --auto-apply → run prod (evaluates
missing cron intervals, picks up new data)
- git_pull_and_sync(): add plan prod --auto-apply before re-exec
so model code changes are applied on deploy
- supervisor.sh: same plan → run change
Staging model uses a date-scoped glob (@start_ds) to read only
the current interval's files. snapshot_date cast to DATE (was
VARCHAR) as required by time_column.
Clean up redundant TRY_CAST(snapshot_date AS DATE) in
venue_pricing_benchmarks since it's already DATE from foundation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Self-hosted Leaflet 1.9.4 maps across 4 placements: markets hub
country bubbles, country overview city bubbles, city venue dots, and
a standalone opportunity map. New /api blueprint with 4 JSON endpoints.
New city_venue_locations SQLMesh serving model. No CDN — GDPR-safe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
# CHANGELOG.md
New serving model: city_venue_locations joins dim_venues + dim_cities
to expose lat/lon/court_count per venue for the city dot map endpoint.
pseo_city_costs_de.sql: add c.lat, c.lon so city-cost articles have
city coordinates for the #city-map data attributes.
city-cost-de.md.jinja: add #city-map div (both DE and EN sections)
after the stats strip. Leaflet init handled by article_detail.html.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.
Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
dict entries can share one Eurostat API endpoint
Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices — EUR/GJ by country, semi-annual
- stg_labour_costs — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels — PLI indices (EU27=100) for 5 categories, annual
Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
override columns from PLI ratios and energy price ratios vs DE baseline;
NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
with JOIN dim_countries
- dim_locations — same refactor as dim_cities
Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
to /api/market-data endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
for top cities, population for top opportunity cities
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.
Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).
Expected: ~50-200x speedup on the spatial CTE portion of the model.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server has cities_global.jsonl.gz (JSONL), not cities_global.json.gz (blob).
TigerStyle clean break — removed blob_rows CTE and UNION ALL.
Simplified to a single SELECT directly from read_json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TigerStyle clean break — no backwards-compat shims for old file formats:
- stg_playtomic_{venues,opening_hours,resources}: glob updated from
*/*/tenants.jsonl.gz (2-level, old weekly) to */*/*/tenants.jsonl.gz
(3-level, new daily YYYY/MM/DD partition); blob tenants.json.gz CTE removed
- stg_playtomic_availability: morning_blob and recheck_blob CTEs removed;
only JSONL format (availability_*.jsonl.gz) is read going forward
Verified locally: stg_playtomic_venues evaluates to 14231 venues from
2026/02/28/tenants.jsonl.gz with 0 errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The overpass_tennis extractor has written JSONL-only since it was added.
The dual-format UNION ALL was backwards-compat debt that broke the
transform once no courts.json.gz files exist on the server:
IO Error: No files found that match the pattern
"data/landing/overpass_tennis/*/*/courts.json.gz"
Remove blob_elements CTE and the UNION ALL. Only read JSONL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.
Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues
Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
+ regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz
Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a coordinate-based population lookup as a fallback when string name
matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then
ST_Distance_Sphere to find the nearest GeoNames location in the same country.
Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich.
Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string >
GeoNames spatial > 0.
Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0).
Key cities before/after:
Wien: 0 → 1,691,468
Milano: 0 → 1,371,498
München: already matched by string; verified still correct at 1,488,719
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0
for ALL countries — 20pts flat uplift, zero differentiation.
Fix: /35000 creates real country spread:
LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before)
Default for missing data 100→15000 (developing-market assumption, ~0.43).
Header comment updated to document v2 formula behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.
Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
/200 ceiling — all EU countries get flat 20/20 pts until city-level
income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
NULL nearest_padel_court_km already yields full 15 pts. COALESCE
fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
— 10-court threshold is correct for when data arrives, contributes
0 pts everywhere for now.
Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
top_opportunity_names aggregates
Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`-m padelnomics.export_serving` doesn't resolve because src/ is not
installed as a package in the workspace. Use the direct script path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER()
dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new)
exist for the same month, the UNION ALL produced exactly 2× rows.
Fixes:
- stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id)
- stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week)
- playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month,
preventing same-month dual-format writes that trigger the duplicate
Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace single global Overpass query (150K+ elements, times out) with
10 regional bbox queries (~10-40K elements each, 150s server / 180s client).
- REGIONS: 10 bboxes covering all continents
- Crash recovery: working.jsonl accumulates per-region results;
already_seen_ids deduplication skips re-written elements on restart
- Overlapping bbox elements deduped by OSM id across regions
- Retry per region: up to 2 retries with 30s cooldown
- Polite 5s inter-region delay
- Skip if courts.jsonl.gz or courts.json.gz already exists for the month
stg_tennis_courts: UNION ALL transition (jsonl_elements + blob_elements)
- jsonl_elements: JSONL, explicit columns, COALESCE lat/lon with center coords
(supports both node direct lat/lon and way/relation Overpass out center)
- blob_elements: existing UNNEST(elements) pattern, unchanged
- Removed osm_type='node' filter — ways/relations now usable via center coords
- Dedup on (osm_id, extracted_date DESC) unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- availability_{date}.jsonl.gz replaces .json.gz for morning snapshots
- Each JSONL line = one venue object with date + captured_at_utc injected
- Eliminates in-memory consolidation: working.jsonl IS the final file
(compress_jsonl_atomic at end instead of write_gzip_atomic blob)
- Crash recovery unchanged: working.jsonl accumulates via flush_partial_batch
- _load_morning_availability tries .jsonl.gz first, falls back to .json.gz
- Skip check covers both formats during transition
- Recheck files stay blob format (small, infrequent)
stg_playtomic_availability: UNION ALL transition (morning_jsonl + morning_blob + recheck_blob)
- morning_jsonl: read_json JSONL, tenant_id direct column, no outer UNNEST
- morning_blob / recheck_blob: subquery + LATERAL UNNEST (unchanged semantics)
- All three produce (snapshot_date, captured_at_utc, snapshot_type, recheck_hour, tenant_id, slots_json)
- Downstream raw_resources / raw_slots CTEs unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- playtomic_tenants.py: write each tenant as a JSONL line after dedup,
compress via compress_jsonl_atomic → tenants.jsonl.gz
- playtomic_availability.py: update _load_tenant_ids() to prefer
tenants.jsonl.gz, fall back to tenants.json.gz (transition)
- stg_playtomic_venues.sql: UNION ALL jsonl+blob CTEs for transition;
JSONL reads top-level columns directly, no UNNEST(tenants) needed
- stg_playtomic_resources.sql: same UNION ALL pattern, single UNNEST
for resources in JSONL path vs double UNNEST in blob path
- stg_playtomic_opening_hours.sql: same UNION ALL pattern, opening_hours
as top-level JSON column in JSONL path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>