Commit Graph

7 Commits

Author SHA1 Message Date
Deeman
a00c8727d7 fix(content): slugify transliteration + article links + country overview ranking
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
  inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
  for top cities, population for top opportunity cities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:46:30 +01:00
Deeman
6774254cb0 feat(sqlmesh): add country code macros, apply across models
Task 4/6: Add 5 macros to compress repeated country code patterns:
- @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations
- @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB
- @infer_country_from_coords: bounding box for 8 markets
Net: +91 lines in macros, -135 lines in models = -44 lines total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:45:52 +01:00
Deeman
fea4f85da3 perf(transform): optimize dim_locations spatial joins via IEJoin + country filters
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 2s
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.

Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).

Expected: ~50-200x speedup on the spatial CTE portion of the model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 02:57:05 +01:00
Deeman
c3531bd75d feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income
- stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level
- stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter
- stg_income_usa: new — Census ACS state-level income staging model
- dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with
  PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country
- init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds
- CHANGELOG + PROJECT.md updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 11:03:16 +01:00
Deeman
5ade38eeaf feat(data): Phase 2a — NUTS-1 regional income for opportunity score
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
  request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
  filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
  + regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz

Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:26:15 +01:00
Deeman
55f179ba54 fix(transform): increase geonames object size limit and remove stale column ref
- stg_population_geonames: add maximum_object_size=40MB to read_json() call;
  geonames cities_global.json.gz is ~30MB, exceeding DuckDB's 16MB default
- dim_locations: remove stale 'population_year AS population_year' column ref;
  stg_population_geonames has ref_year, not population_year — caused BinderException

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 09:56:05 +01:00
Deeman
ebfdc84a94 feat(transform): add dim_locations + dual market scoring models
dim_locations (foundation):
- Seeded from stg_population_geonames (all locations, not venue-dependent)
- Grain: (country_code, geoname_id)
- Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere),
  tennis courts within 25km, country income
- Covers zero-court Gemeinden for opportunity scoring

location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score:
- Answers "Where should I build?" — no padel_venue_count filter
- Formula: population (25) + income (20) + supply gap inverted (30) +
           catchment gap (15) + tennis culture (10) = 100pts
- Sorted by opportunity_score DESC

city_market_profile (serving) — Padelnomics Marktreife-Score:
- Add saturation discount (×0.85 when venues_per_100k > 8)
- Update header comment to reference Marktreife-Score branding
- Kept WHERE padel_venue_count > 0 (established markets only)
- column name market_score unchanged (avoids downstream breakage)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 16:28:16 +01:00