Commit Graph

10 Commits

Author SHA1 Message Date
Deeman
4e4ff61699 feat(transform): H3 catchment index for Marktpotenzial-Score v3
Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6
neighbours) to both the addressable market (25pts) and supply gap
(30pts) components of location_opportunity_profile.

Changes:
- config.yaml: add h3 to DuckDB extensions (requires one-time
  INSTALL h3 FROM community on each machine)
- dim_locations: add h3_cell_res4 column via h3_latlng_to_cell()
- location_opportunity_profile: add hex_stats + catchment CTEs;
  update score formula to use catchment_population and
  catchment_padel_courts; expose catchment_population,
  catchment_padel_courts, catchment_venues_per_100k as output cols

Motivation: local population underestimates functional market for
mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden).
H3 k_ring(1) captures the realistic driving-distance catchment
(~462km²) consistently across both score components.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 10:19:43 +01:00
Deeman
61c197d233 merge(worktree): individualise article costs with per-country Eurostat data + tiered proxy tenant work
# Conflicts:
#	CHANGELOG.md
#	transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql
#	transform/sqlmesh_padelnomics/models/foundation/dim_locations.sql
2026-03-04 12:44:56 +01:00
Deeman
2e68cfbe4f feat(transform): individualise article costs with per-country Eurostat data
Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.

Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
  5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
  dict entries can share one Eurostat API endpoint

Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices         — EUR/GJ by country, semi-annual
- stg_labour_costs       — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels       — PLI indices (EU27=100) for 5 categories, annual

Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
  blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
  override columns from PLI ratios and energy price ratios vs DE baseline;
  NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
  with JOIN dim_countries
- dim_locations — same refactor as dim_cities

Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
  auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
  to /api/market-data endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 10:09:48 +01:00
Deeman
a00c8727d7 fix(content): slugify transliteration + article links + country overview ranking
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
  inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
  for top cities, population for top opportunity cities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:46:30 +01:00
Deeman
6774254cb0 feat(sqlmesh): add country code macros, apply across models
Task 4/6: Add 5 macros to compress repeated country code patterns:
- @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations
- @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB
- @infer_country_from_coords: bounding box for 8 markets
Net: +91 lines in macros, -135 lines in models = -44 lines total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:45:52 +01:00
Deeman
fea4f85da3 perf(transform): optimize dim_locations spatial joins via IEJoin + country filters
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 2s
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.

Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).

Expected: ~50-200x speedup on the spatial CTE portion of the model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 02:57:05 +01:00
Deeman
c3531bd75d feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income
- stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level
- stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter
- stg_income_usa: new — Census ACS state-level income staging model
- dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with
  PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country
- init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds
- CHANGELOG + PROJECT.md updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 11:03:16 +01:00
Deeman
5ade38eeaf feat(data): Phase 2a — NUTS-1 regional income for opportunity score
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
  request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
  filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
  + regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz

Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:26:15 +01:00
Deeman
55f179ba54 fix(transform): increase geonames object size limit and remove stale column ref
- stg_population_geonames: add maximum_object_size=40MB to read_json() call;
  geonames cities_global.json.gz is ~30MB, exceeding DuckDB's 16MB default
- dim_locations: remove stale 'population_year AS population_year' column ref;
  stg_population_geonames has ref_year, not population_year — caused BinderException

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 09:56:05 +01:00
Deeman
ebfdc84a94 feat(transform): add dim_locations + dual market scoring models
dim_locations (foundation):
- Seeded from stg_population_geonames (all locations, not venue-dependent)
- Grain: (country_code, geoname_id)
- Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere),
  tennis courts within 25km, country income
- Covers zero-court Gemeinden for opportunity scoring

location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score:
- Answers "Where should I build?" — no padel_venue_count filter
- Formula: population (25) + income (20) + supply gap inverted (30) +
           catchment gap (15) + tennis culture (10) = 100pts
- Sorted by opportunity_score DESC

city_market_profile (serving) — Padelnomics Marktreife-Score:
- Add saturation discount (×0.85 when venues_per_100k > 8)
- Update header comment to reference Marktreife-Score branding
- Kept WHERE padel_venue_count > 0 (established markets only)
- column name market_score unchanged (avoids downstream breakage)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 16:28:16 +01:00