padelnomics

Author	SHA1	Message	Date
Deeman	a00c8727d7	fix(content): slugify transliteration + article links + country overview ranking - Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf) - Apply @slugify to dim_venues, dim_cities, dim_locations - Fix Python slugify() to pre-replace ß→ss before NFKD normalization - Add language prefix to B2B article market links (/markets/germany → /de/markets/germany) - Change country overview top-5 ranking: venue count (not raw market_score) for top cities, population for top opportunity cities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:46:30 +01:00
Deeman	6774254cb0	feat(sqlmesh): add country code macros, apply across models Task 4/6: Add 5 macros to compress repeated country code patterns: - @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations - @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB - @infer_country_from_coords: bounding box for 8 markets Net: +91 lines in macros, -135 lines in models = -44 lines total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 07:45:52 +01:00
Deeman	fea4f85da3	perf(transform): optimize dim_locations spatial joins via IEJoin + country filters All checks were successful CI / test (push) Successful in 51s Details CI / tag (push) Successful in 2s Details Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs (nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M) nested-loop cross-join. Add country pre-filters to restrict the left side from ~140K global locations to ~20K rows for padel/tennis CTEs (~8 countries each). Expected: ~50-200x speedup on the spatial CTE portion of the model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 02:57:05 +01:00
Deeman	c3531bd75d	feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income - stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level - stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter - stg_income_usa: new — Census ACS state-level income staging model - dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country - init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds - CHANGELOG + PROJECT.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 11:03:16 +01:00
Deeman	5ade38eeaf	feat(data): Phase 2a — NUTS-1 regional income for opportunity score - eurostat.py: add nama_10r_2hhinc dataset config; append filter params to request URL so server pre-filters the large cube before download - stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz, filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB - dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer) + regional_income CTE; final SELECT uses COALESCE(regional, country) income - init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of both inheriting the same national average (~25.5K PPS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:26:15 +01:00
Deeman	55f179ba54	fix(transform): increase geonames object size limit and remove stale column ref - stg_population_geonames: add maximum_object_size=40MB to read_json() call; geonames cities_global.json.gz is ~30MB, exceeding DuckDB's 16MB default - dim_locations: remove stale 'population_year AS population_year' column ref; stg_population_geonames has ref_year, not population_year — caused BinderException Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 09:56:05 +01:00
Deeman	ebfdc84a94	feat(transform): add dim_locations + dual market scoring models dim_locations (foundation): - Seeded from stg_population_geonames (all locations, not venue-dependent) - Grain: (country_code, geoname_id) - Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere), tennis courts within 25km, country income - Covers zero-court Gemeinden for opportunity scoring location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score: - Answers "Where should I build?" — no padel_venue_count filter - Formula: population (25) + income (20) + supply gap inverted (30) + catchment gap (15) + tennis culture (10) = 100pts - Sorted by opportunity_score DESC city_market_profile (serving) — Padelnomics Marktreife-Score: - Add saturation discount (×0.85 when venues_per_100k > 8) - Update header comment to reference Marktreife-Score branding - Kept WHERE padel_venue_count > 0 (established markets only) - column name market_score unchanged (avoids downstream breakage) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:28:16 +01:00

7 Commits