padelnomics

Author	SHA1	Message	Date
Deeman	8e0dd6af63	fix(data): filter non-Latin city names + score range clamp (Phase F) - stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex (fixes "Seelow" showing Japanese characters on map) - dim_locations: filter empty location names after trim - location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 12:23:50 +01:00
Deeman	bda2f85fd6	fix(pipeline): CAST snapshot_date to DATE in venue_pricing_benchmarks Phase A: defensive CAST for incremental time_column comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 11:55:44 +01:00
Deeman	67fbfde53d	feat(scoring): Opportunity Score v5 → v6 — calibrate for saturated markets - Lower density ceiling 8→5/100k (Spain at 6-16/100k now hits zero-gap) - Increase supply deficit weight 35→40 pts (primary differentiator) - Reduce addressable market 25→20 pts (less weight on population alone) - Invert market validation → market headroom (high country maturity = less opportunity) Target: Spain avg opportunity drops from ~78 to ~50-60 range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 20:23:08 +01:00
Deeman	3c135051fd	feat(scoring): Score v6 — World Bank global economic data for non-EU countries Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for median_income_pps and pli_construction, falling back to EU-calibrated defaults (15K PPS, PLI=100) that produced wrong scores. New World Bank WDI extractor fetches GNI per capita PPP and price level ratio for 215 countries. dim_countries uses Germany as calibration anchor to scale WB values into the Eurostat range (dynamic ratio, self-corrects as both sources update). EU countries keep exact Eurostat values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:17:33 +01:00
Deeman	118c2c0fc7	feat(scoring): Opportunity Score v4 → v5 — fix correlated components - Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST) Eliminates ~80% correlated double-count on a single signal. - Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy. Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0). - Add construction affordability (5pts): income relative to PLI construction costs. Joins dim_countries.pli_construction. High income + low build cost = high score. - Reduce economic power from 20 → 15pts to make room. New weights: addressable market 25, economic power 15, supply deficit 35, sports culture 10, construction affordability 5, market validation 10. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:30:04 +01:00
Deeman	cd6d950233	feat(scoring): Market Score v3 → v4 — fix Spain underscoring - Lower count gate threshold: 5 → 3 venues (3 establishes a market pattern) - Lower density ceiling: LN(21) → LN(11) (10/100k is reachable for mature markets) - Better demand fallback: 0.4 → 0.65 multiplier + 0.3 floor (venues = demand evidence) - Fix economic context: income/200 → income/25000 (actual discrimination vs free 10 pts) Expected: Spain avg market score rises from ~54 to ~65-75. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:22:48 +01:00
Deeman	f215ea8e3a	fix: supply gap inflation + inline map data + guard API endpoints A. location_profiles.sql: supply gap now uses GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0)) so Playtomic venues prevent cities like Murcia/Cordoba/Gijon from receiving a full 30-pt supply gap bonus when their OSM catchment count is zero. Expected ~10-15 pt drop for affected ES cities. B. pseo_country_overview.sql: add population-weighted lat/lon centroid columns so the markets map can use accurate country positions from this table. C/D. content/routes.py + markets.html: query pseo_country_overview in the route and pass as map_countries to the template, replacing the fetch('/api/...') call with inline JSON. Map scores now match pseo_country_overview (pop-weighted), and the page loads without an extra round-trip. E. api.py: add @login_required to all 4 endpoints. Unauthenticated callers get a 302 redirect to login instead of data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 20:33:31 +01:00
Deeman	544891611f	feat(transform): opportunity score v4 — market validation + population-weighted aggregation All checks were successful CI / test (push) Successful in 57s Details CI / tag (push) Successful in 2s Details Two targeted fixes for inflated country scores (ES 83, SE 77): 1. pseo_country_overview: replace AVG() with population-weighted averages for avg_opportunity_score and avg_market_score. Madrid/Barcelona now dominate Spain's average instead of hundreds of 30K-town white-space towns. Expected ES drop from ~83 to ~55-65. 2. location_profiles: replace dead sports culture component (10 pts, tennis data all zeros) with market validation signal. Split scored CTE into: market_scored → country_market → scored. country_market aggregates AVG(market_score) per country from cities with padel courts (market_score > 0), so zero-court locations don't dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts. NULL → 0.5 neutral → 5 pts (untested market, not penalised). Score budget unchanged: 25+20+30+15+10 = 100 pts. No new models, no new data sources, no cycles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 17:23:11 +01:00
Deeman	77ec3a289f	feat(transform): H3 catchment index, res 5 k_ring(1) ~24km radius All checks were successful CI / test (push) Successful in 54s Details CI / tag (push) Successful in 3s Details Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5 (res 5, ~8.5km edge). location_profiles and dim_locations updated; old location_opportunity_profile.sql already removed on master. Conflict: location_opportunity_profile.sql deleted on master, kept deletion and applied h3_cell_res4→res5 rename to location_profiles instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 14:45:45 +01:00
Deeman	f81d5f19da	fix(transform): tighten H3 catchment to res 5 (~24km radius) Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to absorb Bremen (40km away) and destroying score differentiation. Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst at 15km) without bleeding into unrelated cities at 40km+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 14:34:56 +01:00
Deeman	4d29ecf1d6	merge: unified location_profiles serving model + both scores on map tooltips All checks were successful CI / test (push) Successful in 55s Details CI / tag (push) Successful in 3s Details # Conflicts: # CHANGELOG.md # transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql	2026-03-06 14:03:55 +01:00
Deeman	a3b4e1fab6	docs: update CHANGELOG, CLAUDE.md, and comments for location_profiles Update transform CLAUDE.md source integration map and conformed dimensions table. Update CHANGELOG with unified model + tooltip changes. Fix stale comments in dim_cities.sql and serving README. Subtask 5/5: documentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:45:08 +01:00
Deeman	81b556b205	refactor(serving): replace old models with location_profiles Delete city_market_profile.sql and location_opportunity_profile.sql. Update downstream models (planner_defaults, pseo_city_costs_de, pseo_city_pricing) to read from location_profiles instead. Subtask 2/5: delete old models + update downstream SQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:39:52 +01:00
Deeman	cda94c9ee4	feat(serving): add unified location_profiles model Combines city_market_profile and location_opportunity_profile into a single serving model at (country_code, geoname_id) grain. Both Market Score and Opportunity Score computed per location. City data enriched via LEFT JOIN dim_cities on geoname_id. Subtask 1/5: create new model (old models not yet removed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:36:36 +01:00
Deeman	4fbd91b59b	merge: automate h3 community extension install via sqlmesh config	2026-03-06 10:27:03 +01:00
Deeman	159d1b5b9a	fix(transform): use community repository for h3 extension install SQLMesh's extensions config supports dict form with 'repository' key, which runs INSTALL h3 FROM community + LOAD h3 automatically at connect time. No manual one-time install needed per machine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 10:26:56 +01:00
Deeman	dec4f07fbb	merge: H3 catchment index for Marktpotenzial-Score v3	2026-03-06 10:19:51 +01:00
Deeman	4e4ff61699	feat(transform): H3 catchment index for Marktpotenzial-Score v3 Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6 neighbours) to both the addressable market (25pts) and supply gap (30pts) components of location_opportunity_profile. Changes: - config.yaml: add h3 to DuckDB extensions (requires one-time INSTALL h3 FROM community on each machine) - dim_locations: add h3_cell_res4 column via h3_latlng_to_cell() - location_opportunity_profile: add hex_stats + catchment CTEs; update score formula to use catchment_population and catchment_padel_courts; expose catchment_population, catchment_padel_courts, catchment_venues_per_100k as output cols Motivation: local population underestimates functional market for mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden). H3 k_ring(1) captures the realistic driving-distance catchment (~462km²) consistently across both score components. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 10:19:43 +01:00
Deeman	2f47d1e589	fix(pipeline): make availability chain incremental + fix supervisor Convert the availability chain (stg_playtomic_availability → fct_availability_slot → fct_daily_availability) from FULL to INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily intervals instead of re-reading all files. Supervisor changes: - run_transform(): plan prod --auto-apply → run prod (evaluates missing cron intervals, picks up new data) - git_pull_and_sync(): add plan prod --auto-apply before re-exec so model code changes are applied on deploy - supervisor.sh: same plan → run change Staging model uses a date-scoped glob (@start_ds) to read only the current interval's files. snapshot_date cast to DATE (was VARCHAR) as required by time_column. Clean up redundant TRY_CAST(snapshot_date AS DATE) in venue_pricing_benchmarks since it's already DATE from foundation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 21:34:02 +01:00
Deeman	59f1f0d699	merge(worktree): interactive maps for market pages Self-hosted Leaflet 1.9.4 maps across 4 placements: markets hub country bubbles, country overview city bubbles, city venue dots, and a standalone opportunity map. New /api blueprint with 4 JSON endpoints. New city_venue_locations SQLMesh serving model. No CDN — GDPR-safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # CHANGELOG.md	2026-03-04 15:36:41 +01:00
Deeman	edf678ac4e	feat(maps): Phase 4 — city venue dot map New serving model: city_venue_locations joins dim_venues + dim_cities to expose lat/lon/court_count per venue for the city dot map endpoint. pseo_city_costs_de.sql: add c.lat, c.lon so city-cost articles have city coordinates for the #city-map data attributes. city-cost-de.md.jinja: add #city-map div (both DE and EN sections) after the stats strip. Leaflet init handled by article_detail.html. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 15:07:06 +01:00
Deeman	61c197d233	merge(worktree): individualise article costs with per-country Eurostat data + tiered proxy tenant work # Conflicts: # CHANGELOG.md # transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql # transform/sqlmesh_padelnomics/models/foundation/dim_locations.sql	2026-03-04 12:44:56 +01:00
Deeman	2e68cfbe4f	feat(transform): individualise article costs with per-country Eurostat data Add real per-country cost data to ~30 calculator fields so pSEO articles show country-specific CAPEX/OPEX instead of hardcoded DE defaults. Extractor: - eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev, 5×prc_ppp_ind variants); add optional `dataset_code` field so multiple dict entries can share one Eurostat API endpoint Staging (4 new models): - stg_electricity_prices — EUR/kWh by country, semi-annual - stg_gas_prices — EUR/GJ by country, semi-annual - stg_labour_costs — EUR/hour by country, annual (future staffed scenario) - stg_price_levels — PLI indices (EU27=100) for 5 categories, annual Foundation: - dim_countries (new) — conformed country dimension; eliminates ~50-line CASE blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost override columns from PLI ratios and energy price ratios vs DE baseline; NULL for DE so calculator falls through to DEFAULTS unchanged - dim_cities — replace country_name/slug CASE blocks + country_income CTE with JOIN dim_countries - dim_locations — same refactor as dim_cities Serving: - pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …) - planner_defaults — JOIN dim_countries; same 29 cost columns flow through to /api/market-data endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 10:09:48 +01:00
Deeman	a00c8727d7	fix(content): slugify transliteration + article links + country overview ranking - Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf) - Apply @slugify to dim_venues, dim_cities, dim_locations - Fix Python slugify() to pre-replace ß→ss before NFKD normalization - Add language prefix to B2B article market links (/markets/germany → /de/markets/germany) - Change country overview top-5 ranking: venue count (not raw market_score) for top cities, population for top opportunity cities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:46:30 +01:00
Deeman	6774254cb0	feat(sqlmesh): add country code macros, apply across models Task 4/6: Add 5 macros to compress repeated country code patterns: - @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations - @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB - @infer_country_from_coords: bounding box for 8 markets Net: +91 lines in macros, -135 lines in models = -44 lines total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 07:45:52 +01:00
Deeman	fea4f85da3	perf(transform): optimize dim_locations spatial joins via IEJoin + country filters All checks were successful CI / test (push) Successful in 51s Details CI / tag (push) Successful in 2s Details Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs (nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M) nested-loop cross-join. Add country pre-filters to restrict the left side from ~140K global locations to ~20K rows for padel/tennis CTEs (~8 countries each). Expected: ~50-200x speedup on the spatial CTE portion of the model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 02:57:05 +01:00
Deeman	e62aad148b	fix(transform): remove blob CTE from stg_population_geonames All checks were successful CI / test (push) Successful in 49s Details CI / tag (push) Successful in 2s Details Server has cities_global.jsonl.gz (JSONL), not cities_global.json.gz (blob). TigerStyle clean break — removed blob_rows CTE and UNION ALL. Simplified to a single SELECT directly from read_json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 18:40:15 +01:00
Deeman	6fb1e990e3	merge: three-tier proxy + daily tenants + staging model cleanup All checks were successful CI / test (push) Successful in 48s Details CI / tag (push) Successful in 3s Details	2026-02-28 18:26:50 +01:00
Deeman	6edf8ba65e	fix(transform): remove blob fallback CTEs, update tenants glob to daily partition depth TigerStyle clean break — no backwards-compat shims for old file formats: - stg_playtomic_{venues,opening_hours,resources}: glob updated from //tenants.jsonl.gz (2-level, old weekly) to ///tenants.jsonl.gz (3-level, new daily YYYY/MM/DD partition); blob tenants.json.gz CTE removed - stg_playtomic_availability: morning_blob and recheck_blob CTEs removed; only JSONL format (availability_.jsonl.gz) is read going forward Verified locally: stg_playtomic_venues evaluates to 14231 venues from 2026/02/28/tenants.jsonl.gz with 0 errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 18:26:44 +01:00
Deeman	6cf98f44d4	fix(transform): remove blob compat CTE from stg_tennis_courts All checks were successful CI / test (push) Successful in 49s Details CI / tag (push) Successful in 3s Details The overpass_tennis extractor has written JSONL-only since it was added. The dual-format UNION ALL was backwards-compat debt that broke the transform once no courts.json.gz files exist on the server: IO Error: No files found that match the pattern "data/landing/overpass_tennis///courts.json.gz" Remove blob_elements CTE and the UNION ALL. Only read JSONL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 17:39:11 +01:00
Deeman	4e82907a70	refactor(transform): conform geographic dimension hierarchy via city_slug Propagates the conformed city key (city_slug) from dim_venues through the full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string joins with deterministic key joins. Changes (cascading, task-by-task): - dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation) - dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues; carry city_slug alongside country_code/city - fct_daily_availability: carry city_slug from dim_venue_capacity - venue_pricing_benchmarks: carry city_slug from fct_daily_availability; add to venue_stats GROUP BY and final SELECT/GROUP BY - city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM)) - planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug - pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM)) - pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a fully conformed geographic hierarchy with no fuzzy string comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 13:23:03 +01:00
Deeman	c3531bd75d	feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income - stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level - stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter - stg_income_usa: new — Census ACS state-level income staging model - dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country - init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds - CHANGELOG + PROJECT.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 11:03:16 +01:00
Deeman	409dc4bfac	feat(data): Phase 2b step 1 — expand stg_regional_income + Census income extractor - stg_regional_income.sql: accept NUTS-1 (3-char) + NUTS-2 (4-char) codes; rename nuts1_code → nuts_code; add nuts_level column; NUTS-2 rows were already in the landing zone but discarded by LENGTH(geo_code) = 3 - scripts/download_gisco_nuts.py: one-time download of GISCO NUTS-2 boundary GeoJSON (NUTS_RG_20M_2021_4326_LEVL_2.geojson, ~5MB) to landing zone; uncompressed because ST_Read cannot read .gz files - census_usa_income.py: new extractor for ACS B19013_001E state-level median household income; follows census_usa.py pattern; 51 states + DC - all.py + pyproject.toml: register census_usa_income extractor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:58:12 +01:00
Deeman	5ade38eeaf	feat(data): Phase 2a — NUTS-1 regional income for opportunity score - eurostat.py: add nama_10r_2hhinc dataset config; append filter params to request URL so server pre-filters the large cube before download - stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz, filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB - dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer) + regional_income CTE; final SELECT uses COALESCE(regional, country) income - init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of both inheriting the same national average (~25.5K PPS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:26:15 +01:00
Deeman	3aa30ab419	feat(sql): dim_cities — GeoNames spatial population fallback Adds a coordinate-based population lookup as a fallback when string name matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then ST_Distance_Sphere to find the nearest GeoNames location in the same country. Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich. Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string > GeoNames spatial > 0. Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0). Key cities before/after: Wien: 0 → 1,691,468 Milano: 0 → 1,371,498 München: already matched by string; verified still correct at 1,488,719 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 08:47:26 +01:00
Deeman	9835176e87	fix(sql): opportunity_score income ceiling /200→/35000 (economic power) PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0 for ALL countries — 20pts flat uplift, zero differentiation. Fix: /35000 creates real country spread: LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before) Default for missing data 100→15000 (developing-market assumption, ~0.43). Header comment updated to document v2 formula behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 07:58:57 +01:00
Deeman	10266c3a24	fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings Raises supply gap ceiling from 4/100k to 8/100k in location_opportunity_profile.sql. The original 4/100k hard cliff truncated opportunity scores to 0 for any city with ≥4 courts/100k, but our data undercounts ~87% of real courts (FIP: 17,300 Spanish courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient and fairer partial credit when density data is incomplete. Documents existing formula behaviour discovered during analysis: - Income PPS: country-level constants (18k-37k range) saturate the /200 ceiling — all EU countries get flat 20/20 pts until city-level income data lands. - Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so NULL nearest_padel_court_km already yields full 15 pts. COALESCE fallback is dead code but harmless. - Tennis courts within 25km: dim_locations data is empty (all 0 rows) — 10-court threshold is correct for when data arrives, contributes 0 pts everywhere for now. Effective score impact: minimal (99% of locations have 0 courts/100k, so supply gap was already at max). Only ~1,050 dense-court cities see a score increase (from 0 gap pts to partial gap pts). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:57:57 +01:00
Deeman	88ed17484b	feat(sql+templates): market_score v3 — log density + count gate Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:40:12 +01:00
Deeman	7186d4582a	feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain - dim_cities: add geoname_id to geonames_pop CTE and final SELECT Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames), enabling joins to location_opportunity_profile for the first time. - city_market_profile: pass geoname_id through base CTE and final SELECT - pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id), add opportunity_score to output columns - pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs, top_opportunity_names aggregates Cities with no GeoNames name match get opportunity_score = NULL; templates guard with {% if opportunity_score %}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 20:29:57 +01:00
Deeman	b73386b9b6	fix: correct export_serving invocation in all docs `-m padelnomics.export_serving` doesn't resolve because src/ is not installed as a package in the workspace. Use the direct script path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 16:06:31 +01:00
Deeman	cee2e9babc	merge: standardise recheck availability to JSONL + update docs	2026-02-25 15:45:23 +01:00
Deeman	b33dd51d76	feat: standardise recheck availability to JSONL output - extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz (one venue per line with date/captured_at_utc/recheck_hour injected); uses compress_jsonl_atomic; removes write_gzip_atomic import - stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL; old recheck_blob CTE kept for transition - init_landing_seeds.py: add JSONL recheck seed alongside blob seed - Docs: README landing structure + data sources table updated; CHANGELOG availability bullets updated; data-sources-inventory paths corrected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 14:52:47 +01:00
Deeman	a86f1ecd3a	fix(staging): enforce grain dedup in resources + opening_hours + skip old blob in tenants Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER() dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new) exist for the same month, the UNION ALL produced exactly 2× rows. Fixes: - stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id) - stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week) - playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month, preventing same-month dual-format writes that trigger the duplicate Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 13:41:23 +01:00
Deeman	b5b8493543	feat(extract): regional overpass_tennis splitting + JSONL output Replace single global Overpass query (150K+ elements, times out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client). - REGIONS: 10 bboxes covering all continents - Crash recovery: working.jsonl accumulates per-region results; already_seen_ids deduplication skips re-written elements on restart - Overlapping bbox elements deduped by OSM id across regions - Retry per region: up to 2 retries with 30s cooldown - Polite 5s inter-region delay - Skip if courts.jsonl.gz or courts.json.gz already exists for the month stg_tennis_courts: UNION ALL transition (jsonl_elements + blob_elements) - jsonl_elements: JSONL, explicit columns, COALESCE lat/lon with center coords (supports both node direct lat/lon and way/relation Overpass out center) - blob_elements: existing UNNEST(elements) pattern, unchanged - Removed osm_type='node' filter — ways/relations now usable via center coords - Dedup on (osm_id, extracted_date DESC) unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:19:37 +01:00
Deeman	a4f246d69a	feat(extract): convert geonames to JSONL output - cities_global.jsonl.gz replaces .json.gz (one city object per line) - Empty placeholder writes a minimal .jsonl.gz (null row, filtered in staging) - Eliminates the {"rows": [...]} blob wrapper and maximum_object_size workaround stg_population_geonames: UNION ALL transition (jsonl_rows + blob_rows) - jsonl_rows: read_json JSONL, explicit columns, no UNNEST - blob_rows: existing UNNEST(rows) pattern with 40MB size limit retained Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:16:59 +01:00
Deeman	7b03fd71f9	feat(extract): convert playtomic_availability to JSONL output - availability_{date}.jsonl.gz replaces .json.gz for morning snapshots - Each JSONL line = one venue object with date + captured_at_utc injected - Eliminates in-memory consolidation: working.jsonl IS the final file (compress_jsonl_atomic at end instead of write_gzip_atomic blob) - Crash recovery unchanged: working.jsonl accumulates via flush_partial_batch - _load_morning_availability tries .jsonl.gz first, falls back to .json.gz - Skip check covers both formats during transition - Recheck files stay blob format (small, infrequent) stg_playtomic_availability: UNION ALL transition (morning_jsonl + morning_blob + recheck_blob) - morning_jsonl: read_json JSONL, tenant_id direct column, no outer UNNEST - morning_blob / recheck_blob: subquery + LATERAL UNNEST (unchanged semantics) - All three produce (snapshot_date, captured_at_utc, snapshot_type, recheck_hour, tenant_id, slots_json) - Downstream raw_resources / raw_slots CTEs unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:14:38 +01:00
Deeman	9bef055e6d	feat(extract): convert playtomic_tenants to JSONL output - playtomic_tenants.py: write each tenant as a JSONL line after dedup, compress via compress_jsonl_atomic → tenants.jsonl.gz - playtomic_availability.py: update _load_tenant_ids() to prefer tenants.jsonl.gz, fall back to tenants.json.gz (transition) - stg_playtomic_venues.sql: UNION ALL jsonl+blob CTEs for transition; JSONL reads top-level columns directly, no UNNEST(tenants) needed - stg_playtomic_resources.sql: same UNION ALL pattern, single UNNEST for resources in JSONL path vs double UNNEST in blob path - stg_playtomic_opening_hours.sql: same UNION ALL pattern, opening_hours as top-level JSON column in JSONL path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:07:53 +01:00
Deeman	55f179ba54	fix(transform): increase geonames object size limit and remove stale column ref - stg_population_geonames: add maximum_object_size=40MB to read_json() call; geonames cities_global.json.gz is ~30MB, exceeding DuckDB's 16MB default - dim_locations: remove stale 'population_year AS population_year' column ref; stg_population_geonames has ref_year, not population_year — caused BinderException Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 09:56:05 +01:00
Deeman	405efcfd19	docs: update docs and PROJECT.md for dual score pipeline Task 8: documentation updates for the dual market score feature. - CHANGELOG.md: comprehensive [Unreleased] entries for all additions (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion, DuckDB spatial, SOPS secrets, methodology page updates) - docs/data-sources-inventory.md: add tennis courts Overpass row, update GeoNames entry (cities1000, username=padelnomics, higher score) - transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed dimensions table, update source integration map with new pipeline branch, document ST_Distance_Sphere bounding-box pattern - PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50 ranking page to Next Up, add data backlog items (sports_centre, NUTS-3, opportunity map), add Decisions Log entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 17:12:22 +01:00
Deeman	ebfdc84a94	feat(transform): add dim_locations + dual market scoring models dim_locations (foundation): - Seeded from stg_population_geonames (all locations, not venue-dependent) - Grain: (country_code, geoname_id) - Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere), tennis courts within 25km, country income - Covers zero-court Gemeinden for opportunity scoring location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score: - Answers "Where should I build?" — no padel_venue_count filter - Formula: population (25) + income (20) + supply gap inverted (30) + catchment gap (15) + tennis culture (10) = 100pts - Sorted by opportunity_score DESC city_market_profile (serving) — Padelnomics Marktreife-Score: - Add saturation discount (×0.85 when venues_per_100k > 8) - Update header comment to reference Marktreife-Score branding - Kept WHERE padel_venue_count > 0 (established markets only) - column name market_score unchanged (avoids downstream breakage) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:28:16 +01:00

1 2

66 Commits