Assert landing_dir.is_dir() and year_month format (YYYY/MM) at the
entry point of each extract function — turning silent wrong-path bugs
into immediate AssertionError with a descriptive message.
Files changed:
- playtomic_availability.py: assert in _load_tenant_ids(), extract(),
extract_recheck()
- eurostat.py: assert in extract()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a two-tier proxy system for the Playtomic availability extractor:
- Primary tier (PROXY_URLS): datacenter proxies, cheap and fast
- Fallback tier (PROXY_URLS_FALLBACK): residential rotating gateway, reliable
Circuit breaker opens after CIRCUIT_BREAKER_THRESHOLD (default: 10) consecutive
failures, permanently switching to the fallback tier for the rest of the run.
No auto-recovery — avoids flapping. If circuit opens with no fallback configured,
logs an error and writes partial results rather than continuing on a dead proxy pool.
Parallel mode submits futures in PARALLEL_BATCH_SIZE=100 batches so the circuit
breaker can stop new submissions after it opens.
New env vars added to .env.dev.sops (blank defaults):
PROXY_URLS_FALLBACK — residential/rotating gateway URL
CIRCUIT_BREAKER_THRESHOLD — consecutive failures before switching (default 10)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add section 9 to data-sources-inventory.md covering live API quirks:
Eurostat SDMX city labels response shape, ONS CSV download path (observations
API 404s), US Census ACS place endpoint, GeoNames cities15000 bulk format
- Add population coverage summary table and DuckDB glob limitation note
- fix(extract): census_usa + geonames write empty placeholder when credentials
absent so SQLMesh staging models don't fail with "no files found"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eurostat_city_labels: API returns compact dimension JSON (category.label dict),
not SDMX 2.1 nested codelists structure. Fixed parser to read from
data["category"]["label"]. 1771 city codes fetched successfully.
ons_uk: observations endpoint (TS007A) is 404. Switched to CSV download via
/datasets/mid-year-pop-est/editions/mid-2022-england-wales — fetches ~68MB CSV,
filters to sex='all' + target year, aggregates population per LAD. 316 LADs ≥50K.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part A: Data Layer — Sprints 1-5
Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
(city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
replaces hardcoded 0::BIGINT population
Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
is available to the scoring formula
Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)
Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)
Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
with case/whitespace-insensitive city name matching
Registered all 4 new CLI entrypoints in pyproject.toml and all.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Playtomic API ignores bbox params (min_latitude, etc.) and offset param.
Discovered that `page` param works correctly for global enumeration.
Result: 14,202 venues across 82 countries (was 100 with bbox approach).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three workstreams:
1. Playtomic full data extraction & transform pipeline:
- Expand venue bounding boxes from 4 to 23 regions (global coverage)
- New staging models for court resources, opening hours, and slot-level
availability with real prices from the Playtomic API
- Foundation fact tables for venue capacity and daily occupancy/revenue
- City-level pricing benchmarks replacing hardcoded country estimates
- Planner defaults now use 3-tier cascade: city data → country → fallback
2. Transactional email i18n:
- _t() helper in worker.py with ~70 translation keys (EN + DE)
- All 8 email handlers translated, lang passed in task payloads
3. Resend audiences restructured to 3 named audiences (free plan limit)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eurostat JSON-stat format (4-7 dimension sparse dict with 583K values)
causes DuckDB OOM — pre-process in extractor to flat records.
Also fix dim_cities unused CTE bug and playtomic venue lat/lon path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Playtomic tenants API recycles results past its internal limit —
stop after 3 consecutive pages with zero new unique IDs.
Calculator tests: replace hardcoded default values (6 courts, specific
sqm/capex) with DEFAULTS references so tests don't break when
defaults change.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split monolithic execute.py into per-source modules with separate CLI
entry points. Each extractor now uses the framework from utils.py:
- SQLite state tracking (start_run / end_run per extractor)
- Proper logging (replace print() with logger)
- Atomic gzip writes (write_gzip_atomic)
- Connection pooling (niquests.Session)
- Bounded pagination (MAX_PAGES_PER_BBOX = 500)
New entry points:
extract — run all 4 extractors sequentially
extract-overpass — OSM padel courts
extract-eurostat — city demographics (etag dedup)
extract-playtomic-tenants — venue listings
extract-playtomic-availability — booking slots + pricing (NEW)
The availability extractor reads tenant IDs from the latest tenants.json.gz,
queries next-day slots for each venue, and stores daily consolidated snapshots.
Supports resumability via cursor and retry with backoff.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pulls in template changes: export_serving.py for atomic DuckDB swap,
supervisor export step, SQLMesh glob macro, server provisioning script,
imprint template, and formatting improvements.
Template scaffold SQL models excluded (padelnomics has real models).
Web app routes/analytics unchanged (padelnomics-specific customizations).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.
New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)
Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>