padelnomics

Author	SHA1	Message	Date
Deeman	4e82907a70	refactor(transform): conform geographic dimension hierarchy via city_slug Propagates the conformed city key (city_slug) from dim_venues through the full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string joins with deterministic key joins. Changes (cascading, task-by-task): - dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation) - dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues; carry city_slug alongside country_code/city - fct_daily_availability: carry city_slug from dim_venue_capacity - venue_pricing_benchmarks: carry city_slug from fct_daily_availability; add to venue_stats GROUP BY and final SELECT/GROUP BY - city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM)) - planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug - pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM)) - pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a fully conformed geographic hierarchy with no fuzzy string comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 13:23:03 +01:00
Deeman	9835176e87	fix(sql): opportunity_score income ceiling /200→/35000 (economic power) PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0 for ALL countries — 20pts flat uplift, zero differentiation. Fix: /35000 creates real country spread: LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before) Default for missing data 100→15000 (developing-market assumption, ~0.43). Header comment updated to document v2 formula behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 07:58:57 +01:00
Deeman	10266c3a24	fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings Raises supply gap ceiling from 4/100k to 8/100k in location_opportunity_profile.sql. The original 4/100k hard cliff truncated opportunity scores to 0 for any city with ≥4 courts/100k, but our data undercounts ~87% of real courts (FIP: 17,300 Spanish courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient and fairer partial credit when density data is incomplete. Documents existing formula behaviour discovered during analysis: - Income PPS: country-level constants (18k-37k range) saturate the /200 ceiling — all EU countries get flat 20/20 pts until city-level income data lands. - Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so NULL nearest_padel_court_km already yields full 15 pts. COALESCE fallback is dead code but harmless. - Tennis courts within 25km: dim_locations data is empty (all 0 rows) — 10-court threshold is correct for when data arrives, contributes 0 pts everywhere for now. Effective score impact: minimal (99% of locations have 0 courts/100k, so supply gap was already at max). Only ~1,050 dense-court cities see a score increase (from 0 gap pts to partial gap pts). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:57:57 +01:00
Deeman	88ed17484b	feat(sql+templates): market_score v3 — log density + count gate Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:40:12 +01:00
Deeman	7186d4582a	feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain - dim_cities: add geoname_id to geonames_pop CTE and final SELECT Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames), enabling joins to location_opportunity_profile for the first time. - city_market_profile: pass geoname_id through base CTE and final SELECT - pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id), add opportunity_score to output columns - pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs, top_opportunity_names aggregates Cities with no GeoNames name match get opportunity_score = NULL; templates guard with {% if opportunity_score %}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 20:29:57 +01:00
Deeman	ebfdc84a94	feat(transform): add dim_locations + dual market scoring models dim_locations (foundation): - Seeded from stg_population_geonames (all locations, not venue-dependent) - Grain: (country_code, geoname_id) - Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere), tennis courts within 25km, country income - Covers zero-court Gemeinden for opportunity scoring location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score: - Answers "Where should I build?" — no padel_venue_count filter - Formula: population (25) + income (20) + supply gap inverted (30) + catchment gap (15) + tennis culture (10) = 100pts - Sorted by opportunity_score DESC city_market_profile (serving) — Padelnomics Marktreife-Score: - Add saturation discount (×0.85 when venues_per_100k > 8) - Update header comment to reference Marktreife-Score branding - Kept WHERE padel_venue_count > 0 (established markets only) - column name market_score unchanged (avoids downstream breakage) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:28:16 +01:00
Deeman	0960990373	feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors Part A: Data Layer — Sprints 1-5 Sprint 1 — Eurostat SDMX city labels (unblocks EU population): - New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist (city_code → city_name mapping) with ETag dedup - New staging model: stg_city_labels.sql — grain city_code - Updated dim_cities.sql — joins Eurostat population via city code lookup; replaces hardcoded 0::BIGINT population Sprint 2 — Market score formula v2: - city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200), 30pt demand (occupancy or density), 15pt data confidence - Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate is available to the scoring formula Sprint 3 — US Census ACS extractor: - New extractor: census_usa.py — ACS 5-year place population (vintage 2023) - New staging model: stg_population_usa.sql — grain (place_fips, ref_year) Sprint 4 — ONS UK extractor: - New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API - New staging model: stg_population_uk.sql — grain (lad_code, ref_year) Sprint 5 — GeoNames global extractor: - New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop - New staging model: stg_population_geonames.sql — grain geoname_id - dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0) with case/whitespace-insensitive city name matching Registered all 4 new CLI entrypoints in pyproject.toml and all.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:07:08 +01:00
Deeman	ebba46f700	refactor: align transform layer with template methodology Three deviations from the quart_saas_boilerplate methodology corrected: 1. Fix dim_cities LIKE join (data quality bug) - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%' → cartesian product (2.6M rows vs ~5500 expected) - New: FROM venue_cities (dim_venues) as primary table, Eurostat for enrichment only. grain (country_code, city_slug). - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city names aren't stripped to '-' 2. Rename fct_venue_capacity → dim_venue_capacity - Static venue attributes with no time key are a dimension, not a fact - No SQL logic changes; update fct_daily_availability reference 3. Add fct_availability_slot at event grain - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time) - Recheck dedup logic moves here from fct_daily_availability - fct_daily_availability now reads fct_availability_slot (cleaner DAG) Downstream fixes: - city_market_profile, planner_defaults grain → (country_code, city_slug) - pseo_city_costs_de, pseo_city_pricing add city_key composite natural key (country_slug \|\| '-' \|\| city_slug) to avoid URL collisions across countries - planner_defaults join in pseo_city_costs_de uses both country_code + city_slug - Templates updated: natural_key city_slug → city_key Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules, conformed dimension map, and source integration architecture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 21:17:04 +01:00
Deeman	e3a6b91bc0	fix(transform+content): unblock SQLMesh plan — three pipeline fixes stg_playtomic_availability: - Add maximum_object_size = 134217728 (128 MB) to both read_json calls; daily files exceed the 16 MB default as venue count grows - Add seed recheck file (1970-01-01_recheck_00.json.gz, gitignored with data/) to avoid READ_JSON IOException when no recheck files exist pseo_city_costs_de + pseo_city_pricing: - Add QUALIFY ROW_NUMBER() OVER (PARTITION BY city_slug ...) = 1 to deduplicate rows caused by dim_cities' loose LIKE join; reduces pseo_city_costs_de from 2.6M → 222 rows (one per unique city) content/__init__.py: - DuckDB lowercases all column names at rest ("ratePeak" → "ratepeak"), so calc_overrides dict comprehension never matched DEFAULTS keys. Fix: build case-insensitive reverse map {k.lower(): k} and normalise row keys before lookup. Applied in both generate_articles() and preview_article(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:51:53 +01:00
Deeman	b3afd414a4	feat(transform): add three pSEO serving models — city costs, country overview, city pricing - pseo_city_costs_de: unblocks city-cost-de template (~600 city pages), joins city_market_profile + planner_defaults, includes camelCase calc override columns (ratePeak, rateOffPeak, utilTarget, dblCourts, country) - pseo_country_overview: per-country hub aggregating from pseo_city_costs_de, includes top_city_slugs/names lists for internal linking - pseo_city_pricing: per-city pricing pages requiring >= 2 Playtomic venues, includes P25/P75 price range and occupancy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:37:50 +01:00
Deeman	b517e3e58d	feat(transform): add country_name_en + country_slug to dim_cities, pass through city_market_profile Prerequisite for all pSEO serving models. Adds CASE-based country_name_en and URL-safe country_slug to foundation.dim_cities, then selects them through serving.city_market_profile so downstream models inherit them automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:37:43 +01:00
Deeman	7737b79230	fix: DuckDB compat issues in Playtomic pipeline + export_serving - Add maximum_object_size=128MB to read_json for 14K-venue tenants file - Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access) - Add seed file guard for availability model (empty result on first run) - Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks - Fix export_serving to resolve SQLMesh physical tables from view definitions (SQLMesh views reference "local" catalog unavailable outside its context) - Add pyarrow dependency for Arrow-based cross-connection data transfer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:27:51 +01:00
Deeman	79f7fc6fad	feat: Playtomic pricing/occupancy pipeline + email i18n + audience restructure Three workstreams: 1. Playtomic full data extraction & transform pipeline: - Expand venue bounding boxes from 4 to 23 regions (global coverage) - New staging models for court resources, opening hours, and slot-level availability with real prices from the Playtomic API - Foundation fact tables for venue capacity and daily occupancy/revenue - City-level pricing benchmarks replacing hardcoded country estimates - Planner defaults now use 3-tier cascade: city data → country → fallback 2. Transactional email i18n: - _t() helper in worker.py with ~70 translation keys (EN + DE) - All 8 email handlers translated, lang passed in task payloads 3. Resend audiences restructured to 3 named audiences (free plan limit) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:54:53 +01:00
Deeman	2db66efe77	feat: migrate transform to 3-layer architecture with per-layer schemas Remove raw/ layer — staging models now read landing JSON directly. Rename all model schemas from padelnomics.* to staging./foundation./serving.*. Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH. Supervisor gets daily sleep interval between pipeline runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:04:40 +01:00
Deeman	18ee24818b	feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides Sync template from 29ac25b → v0.9.0 (29 template commits). Due to template's _subdirectory migration, new files were manually rendered rather than auto-merged by copier. New files: - .claude/CLAUDE.md + coding_philosophy.md (agent instructions) - extract utils.py: SQLite state tracking for extraction runs - extract/transform READMEs: architecture & pattern documentation - infra/supervisor: systemd service + orchestration script - Per-layer model READMEs (raw, staging, foundation, serving) Also fixes copier-answers.yml (adds 4 feature toggles, removes stale payment_provider key) and scopes CLAUDE.md gitignore to root only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 15:44:48 +01:00
Deeman	4ae00b35d1	refactor: flatten padelnomics/padelnomics/ → repo root git mv all tracked files from the nested padelnomics/ workspace directory to the git repo root. Merged .gitignore files. No code changes — pure path rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 00:44:40 +01:00

16 Commits