padelnomics

Author	SHA1	Message	Date
Deeman	236f0d1061	fix(markets): map country names, localised dropdown + avg/top score tooltip - Expand dim_countries.sql CASE to cover 22 missing countries (PL, RO, CO, HU, ZA, KE, BR, CZ, QA, NZ, HR, LV, MT, CR, CY, PA, SV, DO, PE, VE, EE, ID) that fell through to bare ISO codes - Add 19 missing entries to COUNTRY_LABELS (i18n.py) + both locale files (EN + DE dir_country_* keys) including IE which was in SQL but not i18n - Localise map tooltips: routes.py injects country_name via get_country_name(), JS uses c.country_name instead of c.country_name_en - Localise dropdown: apply country_name filter to option labels - Show avg + top score in map tooltip with separate color dots and new map_score_avg / map_score_top i18n keys (EN: "Avg. Score" / "Top City", DE: "Ø Score" / "Top-Stadt") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 17:21:59 +01:00
Deeman	bd7fa1ae9a	fix(pipeline): stg_playtomic_availability glob reads all files, filters by date range All checks were successful CI / test (push) Successful in 1m0s Details CI / tag (push) Successful in 3s Details The @start_ds in the glob pattern only matched files for the first day of the batch, so incremental restates only loaded 1 day of data. Changed to wildcard glob with explicit BETWEEN @start_ds AND @end_ds filter on the date column. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 15:48:10 +01:00
Deeman	927f77ae5e	fix: country_supply column name in location_profiles All checks were successful CI / test (push) Successful in 55s Details CI / tag (push) Successful in 3s Details	2026-03-10 10:12:09 +01:00
Deeman	adf6f0c1ef	fix(score): country_supply uses dim_cities.padel_venue_count (not city_padel_venue_count) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:09:30 +01:00
Deeman	9dc705970e	merge: Opportunity Score v8 — better spread/discrimination All checks were successful CI / test (push) Successful in 54s Details CI / tag (push) Successful in 3s Details # Conflicts: # CHANGELOG.md	2026-03-09 22:24:43 +01:00
Deeman	ff6401254a	feat(score): Opportunity Score v8 — better spread/discrimination Reweight: addressable market 20→15, economic power 15→10, supply deficit 40→50. Supply deficit existence dampener (country_venues/50, floor 0.1): zero-venue countries drop from ~80 to ~17. Steeper addressable market curve (LN/500K → SQRT/1M). NULL distance gap → 0.0 (was 0.5). Added country_percentile output column (PERCENT_RANK within country, 0–100). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 22:14:30 +01:00
Deeman	487722c2f3	chore: changelog + fix stg_population_geonames unicode escapes All checks were successful CI / test (push) Successful in 54s Details CI / tag (push) Successful in 3s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 20:32:49 +01:00
Deeman	e39dd4ec0b	fix(score): Opportunity Score v7 — calibration fix for saturated markets Two fixes: 1. dim_locations now sources venues from dim_venues (deduplicated OSM + Playtomic) instead of stg_padel_courts (OSM only). Playtomic-only venues are no longer invisible to spatial lookups. 2. Country-level supply saturation dampener on supply deficit component. Saturated countries (Spain 7.4/100k) get dampened supply deficit (x0.30 → 12 pts max). Emerging markets (Germany 0.24/100k) nearly unaffected (x0.98 → ~39 pts). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 18:03:14 +01:00
Deeman	8e0dd6af63	fix(data): filter non-Latin city names + score range clamp (Phase F) - stg_population_geonames: reject CJK/Cyrillic/Arabic city names via regex (fixes "Seelow" showing Japanese characters on map) - dim_locations: filter empty location names after trim - location_profiles: defensive LEAST/GREATEST clamp on both scores (0-100) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 12:23:50 +01:00
Deeman	bda2f85fd6	fix(pipeline): CAST snapshot_date to DATE in venue_pricing_benchmarks Phase A: defensive CAST for incremental time_column comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 11:55:44 +01:00
Deeman	67fbfde53d	feat(scoring): Opportunity Score v5 → v6 — calibrate for saturated markets - Lower density ceiling 8→5/100k (Spain at 6-16/100k now hits zero-gap) - Increase supply deficit weight 35→40 pts (primary differentiator) - Reduce addressable market 25→20 pts (less weight on population alone) - Invert market validation → market headroom (high country maturity = less opportunity) Target: Spain avg opportunity drops from ~78 to ~50-60 range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 20:23:08 +01:00
Deeman	3c135051fd	feat(scoring): Score v6 — World Bank global economic data for non-EU countries Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for median_income_pps and pli_construction, falling back to EU-calibrated defaults (15K PPS, PLI=100) that produced wrong scores. New World Bank WDI extractor fetches GNI per capita PPP and price level ratio for 215 countries. dim_countries uses Germany as calibration anchor to scale WB values into the Eurostat range (dynamic ratio, self-corrects as both sources update). EU countries keep exact Eurostat values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:17:33 +01:00
Deeman	118c2c0fc7	feat(scoring): Opportunity Score v4 → v5 — fix correlated components - Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST) Eliminates ~80% correlated double-count on a single signal. - Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy. Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0). - Add construction affordability (5pts): income relative to PLI construction costs. Joins dim_countries.pli_construction. High income + low build cost = high score. - Reduce economic power from 20 → 15pts to make room. New weights: addressable market 25, economic power 15, supply deficit 35, sports culture 10, construction affordability 5, market validation 10. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:30:04 +01:00
Deeman	cd6d950233	feat(scoring): Market Score v3 → v4 — fix Spain underscoring - Lower count gate threshold: 5 → 3 venues (3 establishes a market pattern) - Lower density ceiling: LN(21) → LN(11) (10/100k is reachable for mature markets) - Better demand fallback: 0.4 → 0.65 multiplier + 0.3 floor (venues = demand evidence) - Fix economic context: income/200 → income/25000 (actual discrimination vs free 10 pts) Expected: Spain avg market score rises from ~54 to ~65-75. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:22:48 +01:00
Deeman	f215ea8e3a	fix: supply gap inflation + inline map data + guard API endpoints A. location_profiles.sql: supply gap now uses GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0)) so Playtomic venues prevent cities like Murcia/Cordoba/Gijon from receiving a full 30-pt supply gap bonus when their OSM catchment count is zero. Expected ~10-15 pt drop for affected ES cities. B. pseo_country_overview.sql: add population-weighted lat/lon centroid columns so the markets map can use accurate country positions from this table. C/D. content/routes.py + markets.html: query pseo_country_overview in the route and pass as map_countries to the template, replacing the fetch('/api/...') call with inline JSON. Map scores now match pseo_country_overview (pop-weighted), and the page loads without an extra round-trip. E. api.py: add @login_required to all 4 endpoints. Unauthenticated callers get a 302 redirect to login instead of data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 20:33:31 +01:00
Deeman	544891611f	feat(transform): opportunity score v4 — market validation + population-weighted aggregation All checks were successful CI / test (push) Successful in 57s Details CI / tag (push) Successful in 2s Details Two targeted fixes for inflated country scores (ES 83, SE 77): 1. pseo_country_overview: replace AVG() with population-weighted averages for avg_opportunity_score and avg_market_score. Madrid/Barcelona now dominate Spain's average instead of hundreds of 30K-town white-space towns. Expected ES drop from ~83 to ~55-65. 2. location_profiles: replace dead sports culture component (10 pts, tennis data all zeros) with market validation signal. Split scored CTE into: market_scored → country_market → scored. country_market aggregates AVG(market_score) per country from cities with padel courts (market_score > 0), so zero-court locations don't dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts. NULL → 0.5 neutral → 5 pts (untested market, not penalised). Score budget unchanged: 25+20+30+15+10 = 100 pts. No new models, no new data sources, no cycles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 17:23:11 +01:00
Deeman	77ec3a289f	feat(transform): H3 catchment index, res 5 k_ring(1) ~24km radius All checks were successful CI / test (push) Successful in 54s Details CI / tag (push) Successful in 3s Details Merges worktree-h3-catchment-index. dim_locations now computes h3_cell_res5 (res 5, ~8.5km edge). location_profiles and dim_locations updated; old location_opportunity_profile.sql already removed on master. Conflict: location_opportunity_profile.sql deleted on master, kept deletion and applied h3_cell_res4→res5 rename to location_profiles instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 14:45:45 +01:00
Deeman	f81d5f19da	fix(transform): tighten H3 catchment to res 5 (~24km radius) Res 4 + k_ring(1) gave ~50-60km effective radius, causing Oldenburg to absorb Bremen (40km away) and destroying score differentiation. Res 5 + k_ring(1) gives ~24km — captures adjacent Gemeinden (Delmenhorst at 15km) without bleeding into unrelated cities at 40km+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 14:34:56 +01:00
Deeman	4d29ecf1d6	merge: unified location_profiles serving model + both scores on map tooltips All checks were successful CI / test (push) Successful in 55s Details CI / tag (push) Successful in 3s Details # Conflicts: # CHANGELOG.md # transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql	2026-03-06 14:03:55 +01:00
Deeman	a3b4e1fab6	docs: update CHANGELOG, CLAUDE.md, and comments for location_profiles Update transform CLAUDE.md source integration map and conformed dimensions table. Update CHANGELOG with unified model + tooltip changes. Fix stale comments in dim_cities.sql and serving README. Subtask 5/5: documentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:45:08 +01:00
Deeman	81b556b205	refactor(serving): replace old models with location_profiles Delete city_market_profile.sql and location_opportunity_profile.sql. Update downstream models (planner_defaults, pseo_city_costs_de, pseo_city_pricing) to read from location_profiles instead. Subtask 2/5: delete old models + update downstream SQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:39:52 +01:00
Deeman	cda94c9ee4	feat(serving): add unified location_profiles model Combines city_market_profile and location_opportunity_profile into a single serving model at (country_code, geoname_id) grain. Both Market Score and Opportunity Score computed per location. City data enriched via LEFT JOIN dim_cities on geoname_id. Subtask 1/5: create new model (old models not yet removed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 11:36:36 +01:00
Deeman	4fbd91b59b	merge: automate h3 community extension install via sqlmesh config	2026-03-06 10:27:03 +01:00
Deeman	159d1b5b9a	fix(transform): use community repository for h3 extension install SQLMesh's extensions config supports dict form with 'repository' key, which runs INSTALL h3 FROM community + LOAD h3 automatically at connect time. No manual one-time install needed per machine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 10:26:56 +01:00
Deeman	dec4f07fbb	merge: H3 catchment index for Marktpotenzial-Score v3	2026-03-06 10:19:51 +01:00
Deeman	4e4ff61699	feat(transform): H3 catchment index for Marktpotenzial-Score v3 Add H3 res-4 regional catchment metrics (~15-18km radius, cell + 6 neighbours) to both the addressable market (25pts) and supply gap (30pts) components of location_opportunity_profile. Changes: - config.yaml: add h3 to DuckDB extensions (requires one-time INSTALL h3 FROM community on each machine) - dim_locations: add h3_cell_res4 column via h3_latlng_to_cell() - location_opportunity_profile: add hex_stats + catchment CTEs; update score formula to use catchment_population and catchment_padel_courts; expose catchment_population, catchment_padel_courts, catchment_venues_per_100k as output cols Motivation: local population underestimates functional market for mid-size cities (e.g. Oldenburg ~170K misses surrounding Gemeinden). H3 k_ring(1) captures the realistic driving-distance catchment (~462km²) consistently across both score components. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 10:19:43 +01:00
Deeman	2f47d1e589	fix(pipeline): make availability chain incremental + fix supervisor Convert the availability chain (stg_playtomic_availability → fct_availability_slot → fct_daily_availability) from FULL to INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily intervals instead of re-reading all files. Supervisor changes: - run_transform(): plan prod --auto-apply → run prod (evaluates missing cron intervals, picks up new data) - git_pull_and_sync(): add plan prod --auto-apply before re-exec so model code changes are applied on deploy - supervisor.sh: same plan → run change Staging model uses a date-scoped glob (@start_ds) to read only the current interval's files. snapshot_date cast to DATE (was VARCHAR) as required by time_column. Clean up redundant TRY_CAST(snapshot_date AS DATE) in venue_pricing_benchmarks since it's already DATE from foundation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 21:34:02 +01:00
Deeman	59f1f0d699	merge(worktree): interactive maps for market pages Self-hosted Leaflet 1.9.4 maps across 4 placements: markets hub country bubbles, country overview city bubbles, city venue dots, and a standalone opportunity map. New /api blueprint with 4 JSON endpoints. New city_venue_locations SQLMesh serving model. No CDN — GDPR-safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # CHANGELOG.md	2026-03-04 15:36:41 +01:00
Deeman	edf678ac4e	feat(maps): Phase 4 — city venue dot map New serving model: city_venue_locations joins dim_venues + dim_cities to expose lat/lon/court_count per venue for the city dot map endpoint. pseo_city_costs_de.sql: add c.lat, c.lon so city-cost articles have city coordinates for the #city-map data attributes. city-cost-de.md.jinja: add #city-map div (both DE and EN sections) after the stats strip. Leaflet init handled by article_detail.html. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 15:07:06 +01:00
Deeman	61c197d233	merge(worktree): individualise article costs with per-country Eurostat data + tiered proxy tenant work # Conflicts: # CHANGELOG.md # transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql # transform/sqlmesh_padelnomics/models/foundation/dim_locations.sql	2026-03-04 12:44:56 +01:00
Deeman	2e68cfbe4f	feat(transform): individualise article costs with per-country Eurostat data Add real per-country cost data to ~30 calculator fields so pSEO articles show country-specific CAPEX/OPEX instead of hardcoded DE defaults. Extractor: - eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev, 5×prc_ppp_ind variants); add optional `dataset_code` field so multiple dict entries can share one Eurostat API endpoint Staging (4 new models): - stg_electricity_prices — EUR/kWh by country, semi-annual - stg_gas_prices — EUR/GJ by country, semi-annual - stg_labour_costs — EUR/hour by country, annual (future staffed scenario) - stg_price_levels — PLI indices (EU27=100) for 5 categories, annual Foundation: - dim_countries (new) — conformed country dimension; eliminates ~50-line CASE blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost override columns from PLI ratios and energy price ratios vs DE baseline; NULL for DE so calculator falls through to DEFAULTS unchanged - dim_cities — replace country_name/slug CASE blocks + country_income CTE with JOIN dim_countries - dim_locations — same refactor as dim_cities Serving: - pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …) - planner_defaults — JOIN dim_countries; same 29 cost columns flow through to /api/market-data endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 10:09:48 +01:00
Deeman	a00c8727d7	fix(content): slugify transliteration + article links + country overview ranking - Add @slugify SQLMesh macro (STRIP_ACCENTS + ß→ss) replacing broken inline REGEXP_REPLACE that dropped non-ASCII chars (Düsseldorf → d-sseldorf) - Apply @slugify to dim_venues, dim_cities, dim_locations - Fix Python slugify() to pre-replace ß→ss before NFKD normalization - Add language prefix to B2B article market links (/markets/germany → /de/markets/germany) - Change country overview top-5 ranking: venue count (not raw market_score) for top cities, population for top opportunity cities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 10:46:30 +01:00
Deeman	6774254cb0	feat(sqlmesh): add country code macros, apply across models Task 4/6: Add 5 macros to compress repeated country code patterns: - @country_name / @country_slug: 20-country CASE in dim_cities, dim_locations - @normalize_eurostat_country / @normalize_eurostat_nuts: EL→GR, UK→GB - @infer_country_from_coords: bounding box for 8 markets Net: +91 lines in macros, -135 lines in models = -44 lines total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 07:45:52 +01:00
Deeman	fea4f85da3	perf(transform): optimize dim_locations spatial joins via IEJoin + country filters All checks were successful CI / test (push) Successful in 51s Details CI / tag (push) Successful in 2s Details Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs (nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M) nested-loop cross-join. Add country pre-filters to restrict the left side from ~140K global locations to ~20K rows for padel/tennis CTEs (~8 countries each). Expected: ~50-200x speedup on the spatial CTE portion of the model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 02:57:05 +01:00
Deeman	e62aad148b	fix(transform): remove blob CTE from stg_population_geonames All checks were successful CI / test (push) Successful in 49s Details CI / tag (push) Successful in 2s Details Server has cities_global.jsonl.gz (JSONL), not cities_global.json.gz (blob). TigerStyle clean break — removed blob_rows CTE and UNION ALL. Simplified to a single SELECT directly from read_json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 18:40:15 +01:00
Deeman	6fb1e990e3	merge: three-tier proxy + daily tenants + staging model cleanup All checks were successful CI / test (push) Successful in 48s Details CI / tag (push) Successful in 3s Details	2026-02-28 18:26:50 +01:00
Deeman	6edf8ba65e	fix(transform): remove blob fallback CTEs, update tenants glob to daily partition depth TigerStyle clean break — no backwards-compat shims for old file formats: - stg_playtomic_{venues,opening_hours,resources}: glob updated from //tenants.jsonl.gz (2-level, old weekly) to ///tenants.jsonl.gz (3-level, new daily YYYY/MM/DD partition); blob tenants.json.gz CTE removed - stg_playtomic_availability: morning_blob and recheck_blob CTEs removed; only JSONL format (availability_.jsonl.gz) is read going forward Verified locally: stg_playtomic_venues evaluates to 14231 venues from 2026/02/28/tenants.jsonl.gz with 0 errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 18:26:44 +01:00
Deeman	6cf98f44d4	fix(transform): remove blob compat CTE from stg_tennis_courts All checks were successful CI / test (push) Successful in 49s Details CI / tag (push) Successful in 3s Details The overpass_tennis extractor has written JSONL-only since it was added. The dual-format UNION ALL was backwards-compat debt that broke the transform once no courts.json.gz files exist on the server: IO Error: No files found that match the pattern "data/landing/overpass_tennis///courts.json.gz" Remove blob_elements CTE and the UNION ALL. Only read JSONL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 17:39:11 +01:00
Deeman	4e82907a70	refactor(transform): conform geographic dimension hierarchy via city_slug Propagates the conformed city key (city_slug) from dim_venues through the full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string joins with deterministic key joins. Changes (cascading, task-by-task): - dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation) - dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues; carry city_slug alongside country_code/city - fct_daily_availability: carry city_slug from dim_venue_capacity - venue_pricing_benchmarks: carry city_slug from fct_daily_availability; add to venue_stats GROUP BY and final SELECT/GROUP BY - city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM)) - planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug - pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM)) - pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a fully conformed geographic hierarchy with no fuzzy string comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 13:23:03 +01:00
Deeman	c3531bd75d	feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income - stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level - stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter - stg_income_usa: new — Census ACS state-level income staging model - dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country - init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds - CHANGELOG + PROJECT.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 11:03:16 +01:00
Deeman	409dc4bfac	feat(data): Phase 2b step 1 — expand stg_regional_income + Census income extractor - stg_regional_income.sql: accept NUTS-1 (3-char) + NUTS-2 (4-char) codes; rename nuts1_code → nuts_code; add nuts_level column; NUTS-2 rows were already in the landing zone but discarded by LENGTH(geo_code) = 3 - scripts/download_gisco_nuts.py: one-time download of GISCO NUTS-2 boundary GeoJSON (NUTS_RG_20M_2021_4326_LEVL_2.geojson, ~5MB) to landing zone; uncompressed because ST_Read cannot read .gz files - census_usa_income.py: new extractor for ACS B19013_001E state-level median household income; follows census_usa.py pattern; 51 states + DC - all.py + pyproject.toml: register census_usa_income extractor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:58:12 +01:00
Deeman	5ade38eeaf	feat(data): Phase 2a — NUTS-1 regional income for opportunity score - eurostat.py: add nama_10r_2hhinc dataset config; append filter params to request URL so server pre-filters the large cube before download - stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz, filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB - dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer) + regional_income CTE; final SELECT uses COALESCE(regional, country) income - init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of both inheriting the same national average (~25.5K PPS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:26:15 +01:00
Deeman	3aa30ab419	feat(sql): dim_cities — GeoNames spatial population fallback Adds a coordinate-based population lookup as a fallback when string name matching fails (~29% of cities). Uses bbox pre-filter (0.14° ≈ 15 km) then ST_Distance_Sphere to find the nearest GeoNames location in the same country. Fixes localization mismatches: Milano≠Milan, Wien≠Vienna, München≠Munich. Population cascade: Eurostat EU > US Census > ONS UK > GeoNames string > GeoNames spatial > 0. Coverage: 70.5% → 98.5% (5,401 / 5,481 cities with population > 0). Key cities before/after: Wien: 0 → 1,691,468 Milano: 0 → 1,371,498 München: already matched by string; verified still correct at 1,488,719 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 08:47:26 +01:00
Deeman	9835176e87	fix(sql): opportunity_score income ceiling /200→/35000 (economic power) PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0 for ALL countries — 20pts flat uplift, zero differentiation. Fix: /35000 creates real country spread: LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before) Default for missing data 100→15000 (developing-market assumption, ~0.43). Header comment updated to document v2 formula behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 07:58:57 +01:00
Deeman	10266c3a24	fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings Raises supply gap ceiling from 4/100k to 8/100k in location_opportunity_profile.sql. The original 4/100k hard cliff truncated opportunity scores to 0 for any city with ≥4 courts/100k, but our data undercounts ~87% of real courts (FIP: 17,300 Spanish courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient and fairer partial credit when density data is incomplete. Documents existing formula behaviour discovered during analysis: - Income PPS: country-level constants (18k-37k range) saturate the /200 ceiling — all EU countries get flat 20/20 pts until city-level income data lands. - Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so NULL nearest_padel_court_km already yields full 15 pts. COALESCE fallback is dead code but harmless. - Tennis courts within 25km: dim_locations data is empty (all 0 rows) — 10-court threshold is correct for when data arrives, contributes 0 pts everywhere for now. Effective score impact: minimal (99% of locations have 0 courts/100k, so supply gap was already at max). Only ~1,050 dense-court cities see a score increase (from 0 gap pts to partial gap pts). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:57:57 +01:00
Deeman	88ed17484b	feat(sql+templates): market_score v3 — log density + count gate Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:40:12 +01:00
Deeman	7186d4582a	feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain - dim_cities: add geoname_id to geonames_pop CTE and final SELECT Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames), enabling joins to location_opportunity_profile for the first time. - city_market_profile: pass geoname_id through base CTE and final SELECT - pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id), add opportunity_score to output columns - pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs, top_opportunity_names aggregates Cities with no GeoNames name match get opportunity_score = NULL; templates guard with {% if opportunity_score %}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 20:29:57 +01:00
Deeman	b73386b9b6	fix: correct export_serving invocation in all docs `-m padelnomics.export_serving` doesn't resolve because src/ is not installed as a package in the workspace. Use the direct script path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 16:06:31 +01:00
Deeman	cee2e9babc	merge: standardise recheck availability to JSONL + update docs	2026-02-25 15:45:23 +01:00
Deeman	b33dd51d76	feat: standardise recheck availability to JSONL output - extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz (one venue per line with date/captured_at_utc/recheck_hour injected); uses compress_jsonl_atomic; removes write_gzip_atomic import - stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL; old recheck_blob CTE kept for transition - init_landing_seeds.py: add JSONL recheck seed alongside blob seed - Docs: README landing structure + data sources table updated; CHANGELOG availability bullets updated; data-sources-inventory paths corrected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 14:52:47 +01:00

1 2

74 Commits