padelnomics

Author	SHA1	Message	Date
Deeman	9835176e87	fix(sql): opportunity_score income ceiling /200→/35000 (economic power) PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0 for ALL countries — 20pts flat uplift, zero differentiation. Fix: /35000 creates real country spread: LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before) Default for missing data 100→15000 (developing-market assumption, ~0.43). Header comment updated to document v2 formula behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 07:58:57 +01:00
Deeman	10266c3a24	fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings Raises supply gap ceiling from 4/100k to 8/100k in location_opportunity_profile.sql. The original 4/100k hard cliff truncated opportunity scores to 0 for any city with ≥4 courts/100k, but our data undercounts ~87% of real courts (FIP: 17,300 Spanish courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient and fairer partial credit when density data is incomplete. Documents existing formula behaviour discovered during analysis: - Income PPS: country-level constants (18k-37k range) saturate the /200 ceiling — all EU countries get flat 20/20 pts until city-level income data lands. - Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so NULL nearest_padel_court_km already yields full 15 pts. COALESCE fallback is dead code but harmless. - Tennis courts within 25km: dim_locations data is empty (all 0 rows) — 10-court threshold is correct for when data arrives, contributes 0 pts everywhere for now. Effective score impact: minimal (99% of locations have 0 courts/100k, so supply gap was already at max). Only ~1,050 dense-court cities see a score increase (from 0 gap pts to partial gap pts). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:57:57 +01:00
Deeman	88ed17484b	feat(sql+templates): market_score v3 — log density + count gate Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 06:40:12 +01:00
Deeman	7186d4582a	feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain - dim_cities: add geoname_id to geonames_pop CTE and final SELECT Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames), enabling joins to location_opportunity_profile for the first time. - city_market_profile: pass geoname_id through base CTE and final SELECT - pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id), add opportunity_score to output columns - pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs, top_opportunity_names aggregates Cities with no GeoNames name match get opportunity_score = NULL; templates guard with {% if opportunity_score %}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 20:29:57 +01:00
Deeman	cee2e9babc	merge: standardise recheck availability to JSONL + update docs	2026-02-25 15:45:23 +01:00
Deeman	b33dd51d76	feat: standardise recheck availability to JSONL output - extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz (one venue per line with date/captured_at_utc/recheck_hour injected); uses compress_jsonl_atomic; removes write_gzip_atomic import - stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL; old recheck_blob CTE kept for transition - init_landing_seeds.py: add JSONL recheck seed alongside blob seed - Docs: README landing structure + data sources table updated; CHANGELOG availability bullets updated; data-sources-inventory paths corrected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 14:52:47 +01:00
Deeman	a86f1ecd3a	fix(staging): enforce grain dedup in resources + opening_hours + skip old blob in tenants Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER() dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new) exist for the same month, the UNION ALL produced exactly 2× rows. Fixes: - stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id) - stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week) - playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month, preventing same-month dual-format writes that trigger the duplicate Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 13:41:23 +01:00
Deeman	b5b8493543	feat(extract): regional overpass_tennis splitting + JSONL output Replace single global Overpass query (150K+ elements, times out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client). - REGIONS: 10 bboxes covering all continents - Crash recovery: working.jsonl accumulates per-region results; already_seen_ids deduplication skips re-written elements on restart - Overlapping bbox elements deduped by OSM id across regions - Retry per region: up to 2 retries with 30s cooldown - Polite 5s inter-region delay - Skip if courts.jsonl.gz or courts.json.gz already exists for the month stg_tennis_courts: UNION ALL transition (jsonl_elements + blob_elements) - jsonl_elements: JSONL, explicit columns, COALESCE lat/lon with center coords (supports both node direct lat/lon and way/relation Overpass out center) - blob_elements: existing UNNEST(elements) pattern, unchanged - Removed osm_type='node' filter — ways/relations now usable via center coords - Dedup on (osm_id, extracted_date DESC) unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:19:37 +01:00
Deeman	a4f246d69a	feat(extract): convert geonames to JSONL output - cities_global.jsonl.gz replaces .json.gz (one city object per line) - Empty placeholder writes a minimal .jsonl.gz (null row, filtered in staging) - Eliminates the {"rows": [...]} blob wrapper and maximum_object_size workaround stg_population_geonames: UNION ALL transition (jsonl_rows + blob_rows) - jsonl_rows: read_json JSONL, explicit columns, no UNNEST - blob_rows: existing UNNEST(rows) pattern with 40MB size limit retained Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:16:59 +01:00
Deeman	7b03fd71f9	feat(extract): convert playtomic_availability to JSONL output - availability_{date}.jsonl.gz replaces .json.gz for morning snapshots - Each JSONL line = one venue object with date + captured_at_utc injected - Eliminates in-memory consolidation: working.jsonl IS the final file (compress_jsonl_atomic at end instead of write_gzip_atomic blob) - Crash recovery unchanged: working.jsonl accumulates via flush_partial_batch - _load_morning_availability tries .jsonl.gz first, falls back to .json.gz - Skip check covers both formats during transition - Recheck files stay blob format (small, infrequent) stg_playtomic_availability: UNION ALL transition (morning_jsonl + morning_blob + recheck_blob) - morning_jsonl: read_json JSONL, tenant_id direct column, no outer UNNEST - morning_blob / recheck_blob: subquery + LATERAL UNNEST (unchanged semantics) - All three produce (snapshot_date, captured_at_utc, snapshot_type, recheck_hour, tenant_id, slots_json) - Downstream raw_resources / raw_slots CTEs unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:14:38 +01:00
Deeman	9bef055e6d	feat(extract): convert playtomic_tenants to JSONL output - playtomic_tenants.py: write each tenant as a JSONL line after dedup, compress via compress_jsonl_atomic → tenants.jsonl.gz - playtomic_availability.py: update _load_tenant_ids() to prefer tenants.jsonl.gz, fall back to tenants.json.gz (transition) - stg_playtomic_venues.sql: UNION ALL jsonl+blob CTEs for transition; JSONL reads top-level columns directly, no UNNEST(tenants) needed - stg_playtomic_resources.sql: same UNION ALL pattern, single UNNEST for resources in JSONL path vs double UNNEST in blob path - stg_playtomic_opening_hours.sql: same UNION ALL pattern, opening_hours as top-level JSON column in JSONL path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:07:53 +01:00
Deeman	55f179ba54	fix(transform): increase geonames object size limit and remove stale column ref - stg_population_geonames: add maximum_object_size=40MB to read_json() call; geonames cities_global.json.gz is ~30MB, exceeding DuckDB's 16MB default - dim_locations: remove stale 'population_year AS population_year' column ref; stg_population_geonames has ref_year, not population_year — caused BinderException Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 09:56:05 +01:00
Deeman	ebfdc84a94	feat(transform): add dim_locations + dual market scoring models dim_locations (foundation): - Seeded from stg_population_geonames (all locations, not venue-dependent) - Grain: (country_code, geoname_id) - Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere), tennis courts within 25km, country income - Covers zero-court Gemeinden for opportunity scoring location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score: - Answers "Where should I build?" — no padel_venue_count filter - Formula: population (25) + income (20) + supply gap inverted (30) + catchment gap (15) + tennis culture (10) = 100pts - Sorted by opportunity_score DESC city_market_profile (serving) — Padelnomics Marktreife-Score: - Add saturation discount (×0.85 when venues_per_100k > 8) - Update header comment to reference Marktreife-Score branding - Kept WHERE padel_venue_count > 0 (established markets only) - column name market_score unchanged (avoids downstream breakage) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:28:16 +01:00
Deeman	c109488d9d	feat(extract): expand GeoNames to cities1000 + add tennis court extractor GeoNames: - cities15000 → cities1000 (~140K global locations, pop ≥ 1K) - Add lat/lon, admin1_code, admin2_code to output (needed for dim_locations) - Expand feature codes to include PPLA3/4/5 (Gemeinden, cantons, etc.) - Remove MIN_POPULATION=50K floor — cities1000 already pre-filters to ≥1K - Update assertions for new scale (~100K+ expected) Tennis courts: - New overpass_tennis.py extractor (sport=tennis, 180s Overpass timeout) - Registered as extract-overpass-tennis, added to EXTRACTORS list - New stg_tennis_courts.sql staging model (grain: osm_id) stg_population_geonames: add lat, lon, admin1_code, admin2_code columns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:15:20 +01:00
Deeman	0960990373	feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors Part A: Data Layer — Sprints 1-5 Sprint 1 — Eurostat SDMX city labels (unblocks EU population): - New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist (city_code → city_name mapping) with ETag dedup - New staging model: stg_city_labels.sql — grain city_code - Updated dim_cities.sql — joins Eurostat population via city code lookup; replaces hardcoded 0::BIGINT population Sprint 2 — Market score formula v2: - city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200), 30pt demand (occupancy or density), 15pt data confidence - Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate is available to the scoring formula Sprint 3 — US Census ACS extractor: - New extractor: census_usa.py — ACS 5-year place population (vintage 2023) - New staging model: stg_population_usa.sql — grain (place_fips, ref_year) Sprint 4 — ONS UK extractor: - New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API - New staging model: stg_population_uk.sql — grain (lad_code, ref_year) Sprint 5 — GeoNames global extractor: - New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop - New staging model: stg_population_geonames.sql — grain geoname_id - dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0) with case/whitespace-insensitive city name matching Registered all 4 new CLI entrypoints in pyproject.toml and all.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:07:08 +01:00
Deeman	ebba46f700	refactor: align transform layer with template methodology Three deviations from the quart_saas_boilerplate methodology corrected: 1. Fix dim_cities LIKE join (data quality bug) - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%' → cartesian product (2.6M rows vs ~5500 expected) - New: FROM venue_cities (dim_venues) as primary table, Eurostat for enrichment only. grain (country_code, city_slug). - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city names aren't stripped to '-' 2. Rename fct_venue_capacity → dim_venue_capacity - Static venue attributes with no time key are a dimension, not a fact - No SQL logic changes; update fct_daily_availability reference 3. Add fct_availability_slot at event grain - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time) - Recheck dedup logic moves here from fct_daily_availability - fct_daily_availability now reads fct_availability_slot (cleaner DAG) Downstream fixes: - city_market_profile, planner_defaults grain → (country_code, city_slug) - pseo_city_costs_de, pseo_city_pricing add city_key composite natural key (country_slug \|\| '-' \|\| city_slug) to avoid URL collisions across countries - planner_defaults join in pseo_city_costs_de uses both country_code + city_slug - Templates updated: natural_key city_slug → city_key Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules, conformed dimension map, and source integration architecture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 21:17:04 +01:00
Deeman	e3a6b91bc0	fix(transform+content): unblock SQLMesh plan — three pipeline fixes stg_playtomic_availability: - Add maximum_object_size = 134217728 (128 MB) to both read_json calls; daily files exceed the 16 MB default as venue count grows - Add seed recheck file (1970-01-01_recheck_00.json.gz, gitignored with data/) to avoid READ_JSON IOException when no recheck files exist pseo_city_costs_de + pseo_city_pricing: - Add QUALIFY ROW_NUMBER() OVER (PARTITION BY city_slug ...) = 1 to deduplicate rows caused by dim_cities' loose LIKE join; reduces pseo_city_costs_de from 2.6M → 222 rows (one per unique city) content/__init__.py: - DuckDB lowercases all column names at rest ("ratePeak" → "ratepeak"), so calc_overrides dict comprehension never matched DEFAULTS keys. Fix: build case-insensitive reverse map {k.lower(): k} and normalise row keys before lookup. Applied in both generate_articles() and preview_article(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:51:53 +01:00
Deeman	b3afd414a4	feat(transform): add three pSEO serving models — city costs, country overview, city pricing - pseo_city_costs_de: unblocks city-cost-de template (~600 city pages), joins city_market_profile + planner_defaults, includes camelCase calc override columns (ratePeak, rateOffPeak, utilTarget, dblCourts, country) - pseo_country_overview: per-country hub aggregating from pseo_city_costs_de, includes top_city_slugs/names lists for internal linking - pseo_city_pricing: per-city pricing pages requiring >= 2 Playtomic venues, includes P25/P75 price range and occupancy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:37:50 +01:00
Deeman	b517e3e58d	feat(transform): add country_name_en + country_slug to dim_cities, pass through city_market_profile Prerequisite for all pSEO serving models. Adds CASE-based country_name_en and URL-safe country_slug to foundation.dim_cities, then selects them through serving.city_market_profile so downstream models inherit them automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:37:43 +01:00
Deeman	a1faddbed6	feat: Python supervisor + feature flags Supervisor (replaces supervisor.sh): - supervisor.py — cron-based pipeline orchestration, reads workflows.toml on every tick, runs due extractors in topological waves with parallel execution, then SQLMesh transform + serving export - workflows.toml — workflow registry: overpass (monthly), eurostat (monthly), playtomic_tenants (weekly), playtomic_availability (daily), playtomic_recheck (hourly 6–23) - padelnomics-supervisor.service — updated ExecStart to Python supervisor Extraction enhancements: - proxy.py — optional round-robin/sticky proxy rotation via PROXY_URLS env - playtomic_availability.py — parallel fetch (EXTRACT_WORKERS), recheck mode (main_recheck) re-queries imminent slots for accurate occupancy measurement - _shared.py — realistic browser User-Agent on all extractor sessions - stg_playtomic_availability.sql — reads morning + recheck snapshots, tags each - fct_daily_availability.sql — prefers recheck over morning for same slot Feature flags (replaces WAITLIST_MODE env var): - migration 0019 — feature_flags table, 5 initial flags: markets (on), payments/planner_export/supplier_signup/lead_unlock (off) - core.py — is_flag_enabled() + feature_gate() decorator - routes — payments, markets, planner_export, supplier_signup, lead_unlock gated - admin flags UI — /admin/flags toggle page + nav link - app.py — flag() injected as Jinja2 global Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:53:45 +01:00
Deeman	7737b79230	fix: DuckDB compat issues in Playtomic pipeline + export_serving - Add maximum_object_size=128MB to read_json for 14K-venue tenants file - Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access) - Add seed file guard for availability model (empty result on first run) - Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks - Fix export_serving to resolve SQLMesh physical tables from view definitions (SQLMesh views reference "local" catalog unavailable outside its context) - Add pyarrow dependency for Arrow-based cross-connection data transfer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:27:51 +01:00
Deeman	13c86ebf84	Merge branch 'worktree-extraction-overhaul' # Conflicts: # transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql # transform/sqlmesh_padelnomics/models/staging/stg_playtomic_venues.sql	2026-02-23 01:01:26 +01:00
Deeman	79f7fc6fad	feat: Playtomic pricing/occupancy pipeline + email i18n + audience restructure Three workstreams: 1. Playtomic full data extraction & transform pipeline: - Expand venue bounding boxes from 4 to 23 regions (global coverage) - New staging models for court resources, opening hours, and slot-level availability with real prices from the Playtomic API - Foundation fact tables for venue capacity and daily occupancy/revenue - City-level pricing benchmarks replacing hardcoded country estimates - Planner defaults now use 3-tier cascade: city data → country → fallback 2. Transactional email i18n: - _t() helper in worker.py with ~70 translation keys (EN + DE) - All 8 email handlers translated, lang passed in task payloads 3. Resend audiences restructured to 3 named audiences (free plan limit) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:54:53 +01:00
Deeman	5a1bb21624	fix: eurostat JSON-stat parsing + staging model corrections Eurostat JSON-stat format (4-7 dimension sparse dict with 583K values) causes DuckDB OOM — pre-process in extractor to flat records. Also fix dim_cities unused CTE bug and playtomic venue lat/lon path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:52:25 +01:00
Deeman	2db66efe77	feat: migrate transform to 3-layer architecture with per-layer schemas Remove raw/ layer — staging models now read landing JSON directly. Rename all model schemas from padelnomics.* to staging./foundation./serving.*. Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH. Supervisor gets daily sleep interval between pipeline runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:04:40 +01:00
Deeman	18ee24818b	feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides Sync template from 29ac25b → v0.9.0 (29 template commits). Due to template's _subdirectory migration, new files were manually rendered rather than auto-merged by copier. New files: - .claude/CLAUDE.md + coding_philosophy.md (agent instructions) - extract utils.py: SQLite state tracking for extraction runs - extract/transform READMEs: architecture & pattern documentation - infra/supervisor: systemd service + orchestration script - Per-layer model READMEs (raw, staging, foundation, serving) Also fixes copier-answers.yml (adds 4 feature toggles, removes stale payment_provider key) and scopes CLAUDE.md gitignore to root only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 15:44:48 +01:00
Deeman	4ae00b35d1	refactor: flatten padelnomics/padelnomics/ → repo root git mv all tracked files from the nested padelnomics/ workspace directory to the git repo root. Merged .gitignore files. No code changes — pure path rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 00:44:40 +01:00

27 Commits