Commit Graph

4 Commits

Author SHA1 Message Date
Deeman
0960990373 feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors
Part A: Data Layer — Sprints 1-5

Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
  (city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
  replaces hardcoded 0::BIGINT population

Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
  30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
  is available to the scoring formula

Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)

Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)

Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
  with case/whitespace-insensitive city name matching

Registered all 4 new CLI entrypoints in pyproject.toml and all.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:07:08 +01:00
Deeman
a1faddbed6 feat: Python supervisor + feature flags
Supervisor (replaces supervisor.sh):
- supervisor.py — cron-based pipeline orchestration, reads workflows.toml
  on every tick, runs due extractors in topological waves with parallel
  execution, then SQLMesh transform + serving export
- workflows.toml — workflow registry: overpass (monthly), eurostat (monthly),
  playtomic_tenants (weekly), playtomic_availability (daily),
  playtomic_recheck (hourly 6–23)
- padelnomics-supervisor.service — updated ExecStart to Python supervisor

Extraction enhancements:
- proxy.py — optional round-robin/sticky proxy rotation via PROXY_URLS env
- playtomic_availability.py — parallel fetch (EXTRACT_WORKERS), recheck mode
  (main_recheck) re-queries imminent slots for accurate occupancy measurement
- _shared.py — realistic browser User-Agent on all extractor sessions
- stg_playtomic_availability.sql — reads morning + recheck snapshots, tags each
- fct_daily_availability.sql — prefers recheck over morning for same slot

Feature flags (replaces WAITLIST_MODE env var):
- migration 0019 — feature_flags table, 5 initial flags:
  markets (on), payments/planner_export/supplier_signup/lead_unlock (off)
- core.py — is_flag_enabled() + feature_gate() decorator
- routes — payments, markets, planner_export, supplier_signup, lead_unlock gated
- admin flags UI — /admin/flags toggle page + nav link
- app.py — flag() injected as Jinja2 global

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 13:53:45 +01:00
Deeman
53e9bbd66b feat: restructure extraction to one file per source
Split monolithic execute.py into per-source modules with separate CLI
entry points. Each extractor now uses the framework from utils.py:
- SQLite state tracking (start_run / end_run per extractor)
- Proper logging (replace print() with logger)
- Atomic gzip writes (write_gzip_atomic)
- Connection pooling (niquests.Session)
- Bounded pagination (MAX_PAGES_PER_BBOX = 500)

New entry points:
  extract              — run all 4 extractors sequentially
  extract-overpass     — OSM padel courts
  extract-eurostat     — city demographics (etag dedup)
  extract-playtomic-tenants      — venue listings
  extract-playtomic-availability — booking slots + pricing (NEW)

The availability extractor reads tenant IDs from the latest tenants.json.gz,
queries next-day slots for each venue, and stores daily consolidated snapshots.
Supports resumability via cursor and retry with backoff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:56:41 +01:00
Deeman
4ae00b35d1 refactor: flatten padelnomics/padelnomics/ → repo root
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:44:40 +01:00