padelnomics

Author	SHA1	Message	Date
Deeman	0960990373	feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors Part A: Data Layer — Sprints 1-5 Sprint 1 — Eurostat SDMX city labels (unblocks EU population): - New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist (city_code → city_name mapping) with ETag dedup - New staging model: stg_city_labels.sql — grain city_code - Updated dim_cities.sql — joins Eurostat population via city code lookup; replaces hardcoded 0::BIGINT population Sprint 2 — Market score formula v2: - city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200), 30pt demand (occupancy or density), 15pt data confidence - Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate is available to the scoring formula Sprint 3 — US Census ACS extractor: - New extractor: census_usa.py — ACS 5-year place population (vintage 2023) - New staging model: stg_population_usa.sql — grain (place_fips, ref_year) Sprint 4 — ONS UK extractor: - New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API - New staging model: stg_population_uk.sql — grain (lad_code, ref_year) Sprint 5 — GeoNames global extractor: - New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop - New staging model: stg_population_geonames.sql — grain geoname_id - dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0) with case/whitespace-insensitive city name matching Registered all 4 new CLI entrypoints in pyproject.toml and all.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:07:08 +01:00
Deeman	ebba46f700	refactor: align transform layer with template methodology Three deviations from the quart_saas_boilerplate methodology corrected: 1. Fix dim_cities LIKE join (data quality bug) - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%' → cartesian product (2.6M rows vs ~5500 expected) - New: FROM venue_cities (dim_venues) as primary table, Eurostat for enrichment only. grain (country_code, city_slug). - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city names aren't stripped to '-' 2. Rename fct_venue_capacity → dim_venue_capacity - Static venue attributes with no time key are a dimension, not a fact - No SQL logic changes; update fct_daily_availability reference 3. Add fct_availability_slot at event grain - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time) - Recheck dedup logic moves here from fct_daily_availability - fct_daily_availability now reads fct_availability_slot (cleaner DAG) Downstream fixes: - city_market_profile, planner_defaults grain → (country_code, city_slug) - pseo_city_costs_de, pseo_city_pricing add city_key composite natural key (country_slug \|\| '-' \|\| city_slug) to avoid URL collisions across countries - planner_defaults join in pseo_city_costs_de uses both country_code + city_slug - Templates updated: natural_key city_slug → city_key Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules, conformed dimension map, and source integration architecture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 21:17:04 +01:00
Deeman	b517e3e58d	feat(transform): add country_name_en + country_slug to dim_cities, pass through city_market_profile Prerequisite for all pSEO serving models. Adds CASE-based country_name_en and URL-safe country_slug to foundation.dim_cities, then selects them through serving.city_market_profile so downstream models inherit them automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 18:37:43 +01:00
Deeman	a1faddbed6	feat: Python supervisor + feature flags Supervisor (replaces supervisor.sh): - supervisor.py — cron-based pipeline orchestration, reads workflows.toml on every tick, runs due extractors in topological waves with parallel execution, then SQLMesh transform + serving export - workflows.toml — workflow registry: overpass (monthly), eurostat (monthly), playtomic_tenants (weekly), playtomic_availability (daily), playtomic_recheck (hourly 6–23) - padelnomics-supervisor.service — updated ExecStart to Python supervisor Extraction enhancements: - proxy.py — optional round-robin/sticky proxy rotation via PROXY_URLS env - playtomic_availability.py — parallel fetch (EXTRACT_WORKERS), recheck mode (main_recheck) re-queries imminent slots for accurate occupancy measurement - _shared.py — realistic browser User-Agent on all extractor sessions - stg_playtomic_availability.sql — reads morning + recheck snapshots, tags each - fct_daily_availability.sql — prefers recheck over morning for same slot Feature flags (replaces WAITLIST_MODE env var): - migration 0019 — feature_flags table, 5 initial flags: markets (on), payments/planner_export/supplier_signup/lead_unlock (off) - core.py — is_flag_enabled() + feature_gate() decorator - routes — payments, markets, planner_export, supplier_signup, lead_unlock gated - admin flags UI — /admin/flags toggle page + nav link - app.py — flag() injected as Jinja2 global Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:53:45 +01:00
Deeman	13c86ebf84	Merge branch 'worktree-extraction-overhaul' # Conflicts: # transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql # transform/sqlmesh_padelnomics/models/staging/stg_playtomic_venues.sql	2026-02-23 01:01:26 +01:00
Deeman	79f7fc6fad	feat: Playtomic pricing/occupancy pipeline + email i18n + audience restructure Three workstreams: 1. Playtomic full data extraction & transform pipeline: - Expand venue bounding boxes from 4 to 23 regions (global coverage) - New staging models for court resources, opening hours, and slot-level availability with real prices from the Playtomic API - Foundation fact tables for venue capacity and daily occupancy/revenue - City-level pricing benchmarks replacing hardcoded country estimates - Planner defaults now use 3-tier cascade: city data → country → fallback 2. Transactional email i18n: - _t() helper in worker.py with ~70 translation keys (EN + DE) - All 8 email handlers translated, lang passed in task payloads 3. Resend audiences restructured to 3 named audiences (free plan limit) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:54:53 +01:00
Deeman	5a1bb21624	fix: eurostat JSON-stat parsing + staging model corrections Eurostat JSON-stat format (4-7 dimension sparse dict with 583K values) causes DuckDB OOM — pre-process in extractor to flat records. Also fix dim_cities unused CTE bug and playtomic venue lat/lon path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:52:25 +01:00
Deeman	2db66efe77	feat: migrate transform to 3-layer architecture with per-layer schemas Remove raw/ layer — staging models now read landing JSON directly. Rename all model schemas from padelnomics.* to staging./foundation./serving.*. Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH. Supervisor gets daily sleep interval between pipeline runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 19:04:40 +01:00
Deeman	18ee24818b	feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides Sync template from 29ac25b → v0.9.0 (29 template commits). Due to template's _subdirectory migration, new files were manually rendered rather than auto-merged by copier. New files: - .claude/CLAUDE.md + coding_philosophy.md (agent instructions) - extract utils.py: SQLite state tracking for extraction runs - extract/transform READMEs: architecture & pattern documentation - infra/supervisor: systemd service + orchestration script - Per-layer model READMEs (raw, staging, foundation, serving) Also fixes copier-answers.yml (adds 4 feature toggles, removes stale payment_provider key) and scopes CLAUDE.md gitignore to root only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 15:44:48 +01:00
Deeman	4ae00b35d1	refactor: flatten padelnomics/padelnomics/ → repo root git mv all tracked files from the nested padelnomics/ workspace directory to the git repo root. Merged .gitignore files. No code changes — pure path rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 00:44:40 +01:00

10 Commits