Commit Graph

74 Commits

Author SHA1 Message Date
Deeman
236f0d1061 fix(markets): map country names, localised dropdown + avg/top score tooltip
- Expand dim_countries.sql CASE to cover 22 missing countries (PL, RO,
  CO, HU, ZA, KE, BR, CZ, QA, NZ, HR, LV, MT, CR, CY, PA, SV, DO,
  PE, VE, EE, ID) that fell through to bare ISO codes
- Add 19 missing entries to COUNTRY_LABELS (i18n.py) + both locale files
  (EN + DE dir_country_* keys) including IE which was in SQL but not i18n
- Localise map tooltips: routes.py injects country_name via
  get_country_name(), JS uses c.country_name instead of c.country_name_en
- Localise dropdown: apply country_name filter to option labels
- Show avg + top score in map tooltip with separate color dots and new
  map_score_avg / map_score_top i18n keys (EN: "Avg. Score" / "Top City",
  DE: "Ø Score" / "Top-Stadt")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:21:59 +01:00
Deeman
bd7fa1ae9a fix(pipeline): stg_playtomic_availability glob reads all files, filters by date range
All checks were successful
CI / test (push) Successful in 1m0s
CI / tag (push) Successful in 3s
The @start_ds in the glob pattern only matched files for the first day
of the batch, so incremental restates only loaded 1 day of data.
Changed to wildcard glob with explicit BETWEEN @start_ds AND @end_ds
filter on the date column.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:48:10 +01:00
Deeman
927f77ae5e fix: country_supply column name in location_profiles
All checks were successful
CI / test (push) Successful in 55s
CI / tag (push) Successful in 3s
2026-03-10 10:12:09 +01:00
Deeman
adf6f0c1ef fix(score): country_supply uses dim_cities.padel_venue_count (not city_padel_venue_count)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:09:30 +01:00
Deeman
9dc705970e merge: Opportunity Score v8 — better spread/discrimination
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
# Conflicts:
#	CHANGELOG.md
2026-03-09 22:24:43 +01:00
Deeman
ff6401254a feat(score): Opportunity Score v8 — better spread/discrimination
Reweight: addressable market 20→15, economic power 15→10, supply deficit 40→50.
Supply deficit existence dampener (country_venues/50, floor 0.1): zero-venue
countries drop from ~80 to ~17. Steeper addressable market curve (LN/500K →
SQRT/1M). NULL distance gap → 0.0 (was 0.5). Added country_percentile output
column (PERCENT_RANK within country, 0–100).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:14:30 +01:00
Deeman
487722c2f3 chore: changelog + fix stg_population_geonames unicode escapes
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:32:49 +01:00
Deeman
e39dd4ec0b fix(score): Opportunity Score v7 — calibration fix for saturated markets
Two fixes:
1. dim_locations now sources venues from dim_venues (deduplicated OSM + Playtomic)
   instead of stg_padel_courts (OSM only). Playtomic-only venues are no longer
   invisible to spatial lookups.
2. Country-level supply saturation dampener on supply deficit component.
   Saturated countries (Spain 7.4/100k) get dampened supply deficit (x0.30 → 12 pts max).
   Emerging markets (Germany 0.24/100k) nearly unaffected (x0.98 → ~39 pts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:03:14 +01:00
Deeman
8e0dd6af63 fix(data): filter non-Latin city names + score range clamp (Phase F)
- stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex
  (fixes "Seelow" showing Japanese characters on map)
- dim_locations: filter empty location names after trim
- location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 12:23:50 +01:00
Deeman
bda2f85fd6 fix(pipeline): CAST snapshot_date to DATE in venue_pricing_benchmarks
Phase A: defensive CAST for incremental time_column comparison.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 11:55:44 +01:00
Deeman
67fbfde53d feat(scoring): Opportunity Score v5 → v6 — calibrate for saturated markets
- Lower density ceiling 8→5/100k (Spain at 6-16/100k now hits zero-gap)
- Increase supply deficit weight 35→40 pts (primary differentiator)
- Reduce addressable market 25→20 pts (less weight on population alone)
- Invert market validation → market headroom (high country maturity = less opportunity)

Target: Spain avg opportunity drops from ~78 to ~50-60 range.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 20:23:08 +01:00
Deeman
3c135051fd feat(scoring): Score v6 — World Bank global economic data for non-EU countries
Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for
median_income_pps and pli_construction, falling back to EU-calibrated
defaults (15K PPS, PLI=100) that produced wrong scores.

New World Bank WDI extractor fetches GNI per capita PPP and price level
ratio for 215 countries. dim_countries uses Germany as calibration anchor
to scale WB values into the Eurostat range (dynamic ratio, self-corrects
as both sources update). EU countries keep exact Eurostat values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 18:17:33 +01:00
Deeman
118c2c0fc7 feat(scoring): Opportunity Score v4 → v5 — fix correlated components
- Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST)
  Eliminates ~80% correlated double-count on a single signal.
- Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy.
  Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0).
- Add construction affordability (5pts): income relative to PLI construction costs.
  Joins dim_countries.pli_construction. High income + low build cost = high score.
- Reduce economic power from 20 → 15pts to make room.

New weights: addressable market 25, economic power 15, supply deficit 35,
sports culture 10, construction affordability 5, market validation 10.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:30:04 +01:00
Deeman
cd6d950233 feat(scoring): Market Score v3 → v4 — fix Spain underscoring
- Lower count gate threshold: 5 → 3 venues (3 establishes a market pattern)
- Lower density ceiling: LN(21) → LN(11) (10/100k is reachable for mature markets)
- Better demand fallback: 0.4 → 0.65 multiplier + 0.3 floor (venues = demand evidence)
- Fix economic context: income/200 → income/25000 (actual discrimination vs free 10 pts)

Expected: Spain avg market score rises from ~54 to ~65-75.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:22:48 +01:00
Deeman
f215ea8e3a fix: supply gap inflation + inline map data + guard API endpoints
A. location_profiles.sql: supply gap now uses GREATEST(catchment_padel_courts,
   COALESCE(city_padel_venue_count, 0)) so Playtomic venues prevent cities like
   Murcia/Cordoba/Gijon from receiving a full 30-pt supply gap bonus when their
   OSM catchment count is zero. Expected ~10-15 pt drop for affected ES cities.

B. pseo_country_overview.sql: add population-weighted lat/lon centroid columns
   so the markets map can use accurate country positions from this table.

C/D. content/routes.py + markets.html: query pseo_country_overview in the route
   and pass as map_countries to the template, replacing the fetch('/api/...') call
   with inline JSON. Map scores now match pseo_country_overview (pop-weighted),
   and the page loads without an extra round-trip.

E. api.py: add @login_required to all 4 endpoints. Unauthenticated callers get
   a 302 redirect to login instead of data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:33:31 +01:00
Deeman
544891611f feat(transform): opportunity score v4 — market validation + population-weighted aggregation
All checks were successful
CI / test (push) Successful in 57s
CI / tag (push) Successful in 2s
Two targeted fixes for inflated country scores (ES 83, SE 77):

1. pseo_country_overview: replace AVG() with population-weighted averages
   for avg_opportunity_score and avg_market_score. Madrid/Barcelona now
   dominate Spain's average instead of hundreds of 30K-town white-space
   towns. Expected ES drop from ~83 to ~55-65.

2. location_profiles: replace dead sports culture component (10 pts,
   tennis data all zeros) with market validation signal.
   Split scored CTE into: market_scored → country_market → scored.
   country_market aggregates AVG(market_score) per country from cities
   with padel courts (market_score > 0), so zero-court locations don't
   dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts.
   NULL → 0.5 neutral → 5 pts (untested market, not penalised).

Score budget unchanged: 25+20+30+15+10 = 100 pts.
No new models, no new data sources, no cycles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 17:23:11 +01:00
Deeman
77ec3a289f feat(transform): H3 catchment index, res 5 k_ring(1) ~24km radius
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5
(res 5, ~8.5km edge). location_profiles and dim_locations updated;
old location_opportunity_profile.sql already removed on master.

Conflict: location_opportunity_profile.sql deleted on master, kept deletion
and applied h3_cell_res4→res5 rename to location_profiles instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 14:45:45 +01:00
Deeman
f81d5f19da fix(transform): tighten H3 catchment to res 5 (~24km radius)
Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to
absorb Bremen (40km away) and destroying score differentiation.

Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst
at 15km) without bleeding into unrelated cities at 40km+.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 14:34:56 +01:00
Deeman
4d29ecf1d6 merge: unified location_profiles serving model + both scores on map tooltips
All checks were successful
CI / test (push) Successful in 55s
CI / tag (push) Successful in 3s
# Conflicts:
#	CHANGELOG.md
#	transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql
2026-03-06 14:03:55 +01:00
Deeman
a3b4e1fab6 docs: update CHANGELOG, CLAUDE.md, and comments for location_profiles
Update transform CLAUDE.md source integration map and conformed
dimensions table. Update CHANGELOG with unified model + tooltip
changes. Fix stale comments in dim_cities.sql and serving README.

Subtask 5/5: documentation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 11:45:08 +01:00
Deeman
81b556b205 refactor(serving): replace old models with location_profiles
Delete city_market_profile.sql and location_opportunity_profile.sql.
Update downstream models (planner_defaults, pseo_city_costs_de,
pseo_city_pricing) to read from location_profiles instead.

Subtask 2/5: delete old models + update downstream SQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 11:39:52 +01:00
Deeman
cda94c9ee4 feat(serving): add unified location_profiles model
Combines city_market_profile and location_opportunity_profile into a
single serving model at (country_code, geoname_id) grain. Both Market
Score and Opportunity Score computed per location. City data enriched
via LEFT JOIN dim_cities on geoname_id.

Subtask 1/5: create new model (old models not yet removed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 11:36:36 +01:00
Deeman
4fbd91b59b merge: automate h3 community extension install via sqlmesh config 2026-03-06 10:27:03 +01:00
Deeman
159d1b5b9a fix(transform): use community repository for h3 extension install
SQLMesh's extensions config supports dict form with 'repository' key,
which runs INSTALL h3 FROM community + LOAD h3 automatically at connect
time. No manual one-time install needed per machine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 10:26:56 +01:00
Deeman
dec4f07fbb merge: H3 catchment index for Marktpotenzial-Score v3 2026-03-06 10:19:51 +01:00
Deeman
4e4ff61699 feat(transform): H3 catchment index for Marktpotenzial-Score v3
Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6
neighbours) to both the addressable market (25pts) and supply gap
(30pts) components of location_opportunity_profile.

Changes:
- config.yaml: add h3 to DuckDB extensions (requires one-time
  INSTALL h3 FROM community on each machine)
- dim_locations: add h3_cell_res4 column via h3_latlng_to_cell()
- location_opportunity_profile: add hex_stats + catchment CTEs;
  update score formula to use catchment_population and
  catchment_padel_courts; expose catchment_population,
  catchment_padel_courts, catchment_venues_per_100k as output cols

Motivation: local population underestimates functional market for
mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden).
H3 k_ring(1) captures the realistic driving-distance catchment
(~462km²) consistently across both score components.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 10:19:43 +01:00
Deeman
2f47d1e589 fix(pipeline): make availability chain incremental + fix supervisor
Convert the availability chain (stg_playtomic_availability →
fct_availability_slot → fct_daily_availability) from FULL to
INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily
intervals instead of re-reading all files.

Supervisor changes:
- run_transform(): plan prod --auto-apply → run prod (evaluates
  missing cron intervals, picks up new data)
- git_pull_and_sync(): add plan prod --auto-apply before re-exec
  so model code changes are applied on deploy
- supervisor.sh: same plan → run change

Staging model uses a date-scoped glob (@start_ds) to read only
the current interval's files. snapshot_date cast to DATE (was
VARCHAR) as required by time_column.

Clean up redundant TRY_CAST(snapshot_date AS DATE) in
venue_pricing_benchmarks since it's already DATE from foundation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 21:34:02 +01:00
Deeman
59f1f0d699 merge(worktree): interactive maps for market pages
Self-hosted Leaflet 1.9.4 maps across 4 placements: markets hub
country bubbles, country overview city bubbles, city venue dots, and
a standalone opportunity map. New /api blueprint with 4 JSON endpoints.
New city_venue_locations SQLMesh serving model. No CDN — GDPR-safe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts:
#	CHANGELOG.md
2026-03-04 15:36:41 +01:00
Deeman
edf678ac4e feat(maps): Phase 4 — city venue dot map
New serving model: city_venue_locations joins dim_venues + dim_cities
to expose lat/lon/court_count per venue for the city dot map endpoint.

pseo_city_costs_de.sql: add c.lat, c.lon so city-cost articles have
city coordinates for the #city-map data attributes.

city-cost-de.md.jinja: add #city-map div (both DE and EN sections)
after the stats strip. Leaflet init handled by article_detail.html.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 15:07:06 +01:00
Deeman
61c197d233 merge(worktree): individualise article costs with per-country Eurostat data + tiered proxy tenant work
# Conflicts:
#	CHANGELOG.md
#	transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql
#	transform/sqlmesh_padelnomics/models/foundation/dim_locations.sql
2026-03-04 12:44:56 +01:00
Deeman
2e68cfbe4f feat(transform): individualise article costs with per-country Eurostat data
Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.

Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
  5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
  dict entries can share one Eurostat API endpoint

Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices         — EUR/GJ by country, semi-annual
- stg_labour_costs       — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels       — PLI indices (EU27=100) for 5 categories, annual

Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
  blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
  override columns from PLI ratios and energy price ratios vs DE baseline;
  NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
  with JOIN dim_countries
- dim_locations — same refactor as dim_cities

Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
  auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
  to /api/market-data endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 10:09:48 +01:00
Deeman
a00c8727d7 fix(content): slugify transliteration + article links + country overview ranking
- Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken
  inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf)
- Apply @slugify to dim_venues, dim_cities, dim_locations
- Fix Python slugify() to pre-replace ß→ss before NFKD normalization
- Add language prefix to B2B article market links (/markets/germany → /de/markets/germany)
- Change country overview top-5 ranking: venue count (not raw market_score)
  for top cities, population for top opportunity cities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:46:30 +01:00
Deeman
6774254cb0 feat(sqlmesh): add country code macros, apply across models
Task 4/6: Add 5 macros to compress repeated country code patterns:
- @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations
- @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB
- @infer_country_from_coords: bounding box for 8 markets
Net: +91 lines in macros, -135 lines in models = -44 lines total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:45:52 +01:00
Deeman
fea4f85da3 perf(transform): optimize dim_locations spatial joins via IEJoin + country filters
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 2s
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.

Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).

Expected: ~50-200x speedup on the spatial CTE portion of the model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 02:57:05 +01:00
Deeman
e62aad148b fix(transform): remove blob CTE from stg_population_geonames
All checks were successful
CI / test (push) Successful in 49s
CI / tag (push) Successful in 2s
Server has cities_global.jsonl.gz (JSONL), not cities_global.json.gz (blob).
TigerStyle clean break — removed blob_rows CTE and UNION ALL.
Simplified to a single SELECT directly from read_json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 18:40:15 +01:00
Deeman
6fb1e990e3 merge: three-tier proxy + daily tenants + staging model cleanup
All checks were successful
CI / test (push) Successful in 48s
CI / tag (push) Successful in 3s
2026-02-28 18:26:50 +01:00
Deeman
6edf8ba65e fix(transform): remove blob fallback CTEs, update tenants glob to daily partition depth
TigerStyle clean break — no backwards-compat shims for old file formats:

- stg_playtomic_{venues,opening_hours,resources}: glob updated from
  */*/tenants.jsonl.gz (2-level, old weekly) to */*/*/tenants.jsonl.gz
  (3-level, new daily YYYY/MM/DD partition); blob tenants.json.gz CTE removed
- stg_playtomic_availability: morning_blob and recheck_blob CTEs removed;
  only JSONL format (availability_*.jsonl.gz) is read going forward

Verified locally: stg_playtomic_venues evaluates to 14231 venues from
2026/02/28/tenants.jsonl.gz with 0 errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 18:26:44 +01:00
Deeman
6cf98f44d4 fix(transform): remove blob compat CTE from stg_tennis_courts
All checks were successful
CI / test (push) Successful in 49s
CI / tag (push) Successful in 3s
The overpass_tennis extractor has written JSONL-only since it was added.
The dual-format UNION ALL was backwards-compat debt that broke the
transform once no courts.json.gz files exist on the server:

  IO Error: No files found that match the pattern
  "data/landing/overpass_tennis/*/*/courts.json.gz"

Remove blob_elements CTE and the UNION ALL. Only read JSONL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 17:39:11 +01:00
Deeman
4e82907a70 refactor(transform): conform geographic dimension hierarchy via city_slug
Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.

Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
  carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
  add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues

Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 13:23:03 +01:00
Deeman
c3531bd75d feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income
- stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level
- stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter
- stg_income_usa: new — Census ACS state-level income staging model
- dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with
  PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country
- init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds
- CHANGELOG + PROJECT.md updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 11:03:16 +01:00
Deeman
409dc4bfac feat(data): Phase 2b step 1 — expand stg_regional_income + Census income extractor
- stg_regional_income.sql: accept NUTS-1 (3-char) + NUTS-2 (4-char) codes;
  rename nuts1_code → nuts_code; add nuts_level column; NUTS-2 rows were
  already in the landing zone but discarded by LENGTH(geo_code) = 3
- scripts/download_gisco_nuts.py: one-time download of GISCO NUTS-2 boundary
  GeoJSON (NUTS_RG_20M_2021_4326_LEVL_2.geojson, ~5MB) to landing zone;
  uncompressed because ST_Read cannot read .gz files
- census_usa_income.py: new extractor for ACS B19013_001E state-level median
  household income; follows census_usa.py pattern; 51 states + DC
- all.py + pyproject.toml: register census_usa_income extractor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:58:12 +01:00
Deeman
5ade38eeaf feat(data): Phase 2a — NUTS-1 regional income for opportunity score
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
  request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
  filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
  + regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz

Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:26:15 +01:00
Deeman
3aa30ab419 feat(sql): dim_cities — GeoNames spatial population fallback
Adds a coordinate-based population lookup as a fallback when string name
matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then
ST_Distance_Sphere to find the nearest GeoNames location in the same country.

Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich.

Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string >
GeoNames spatial > 0.

Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0).
Key cities before/after:
  Wien:   0 → 1,691,468
  Milano: 0 → 1,371,498
  München: already matched by string; verified still correct at 1,488,719

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 08:47:26 +01:00
Deeman
9835176e87 fix(sql): opportunity_score income ceiling /200→/35000 (economic power)
PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0
for ALL countries — 20pts flat uplift, zero differentiation.

Fix: /35000 creates real country spread:
  LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before)

Default for missing data 100→15000 (developing-market assumption, ~0.43).
Header comment updated to document v2 formula behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:58:57 +01:00
Deeman
10266c3a24 fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings
Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.

Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
  /200 ceiling — all EU countries get flat 20/20 pts until city-level
  income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
  NULL nearest_padel_court_km already yields full 15 pts. COALESCE
  fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
  — 10-court threshold is correct for when data arrives, contributes
  0 pts everywhere for now.

Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:57:57 +01:00
Deeman
88ed17484b feat(sql+templates): market_score v3 — log density + count gate
Fixes ranking inversion where Germany (1/100k courts) outscored Spain
(36/100k). Root causes: population/income were 55% of max before any
padel signal, density ceiling saturated 73% of cities, small-town
inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation
discount actively penalised mature markets.

SQL (city_market_profile.sql):
- Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count
  gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town
  inflation without hard cutoffs (1 venue = 20%, 5+ = 100%).
- Demand evidence 25pts: occupancy if available; 40% density proxy
  otherwise. Separated from supply to avoid double-counting.
- Addressable market 15pts: population as context, not maturity.
- Economic context 10pts: income PPS (flat per country, low signal).
- Data quality 10pts.
- Removed saturation discount. High density = maturity.

Verified spot-check scores:
  Málaga (46v, 7.77/100k): 70.1  [was 98.9]
  Barcelona (104v, 6.17/100k): 67.4  [was 100.0]
  Amsterdam (24v, 3.24/100k): 58.4  [was 93.7]
  Bernau bei Berlin (2v, 5.74/100k): 43.9  [was 92.7]
  Berlin (20v, 0.55/100k): 42.2  [was 74.1]
  London (66v, 0.74/100k): 44.1  [was 75.5]

Templates (city-cost-de, country-overview, city-pricing):
- Color coding: green >= 55 (was 65), amber >= 35 (was 40)
- Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45)
- Opportunity interplay: market_score < 40 (was < 50) for white-space

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:40:12 +01:00
Deeman
7186d4582a feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
  Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
  enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
  add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
  top_opportunity_names aggregates

Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:29:57 +01:00
Deeman
b73386b9b6 fix: correct export_serving invocation in all docs
`-m padelnomics.export_serving` doesn't resolve because src/ is not
installed as a package in the workspace. Use the direct script path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 16:06:31 +01:00
Deeman
cee2e9babc merge: standardise recheck availability to JSONL + update docs 2026-02-25 15:45:23 +01:00
Deeman
b33dd51d76 feat: standardise recheck availability to JSONL output
- extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz
  (one venue per line with date/captured_at_utc/recheck_hour injected);
  uses compress_jsonl_atomic; removes write_gzip_atomic import
- stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited
  read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL;
  old recheck_blob CTE kept for transition
- init_landing_seeds.py: add JSONL recheck seed alongside blob seed
- Docs: README landing structure + data sources table updated; CHANGELOG
  availability bullets updated; data-sources-inventory paths corrected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:52:47 +01:00