PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0
for ALL countries — 20pts flat uplift, zero differentiation.
Fix: /35000 creates real country spread:
LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before)
Default for missing data 100→15000 (developing-market assumption, ~0.43).
Header comment updated to document v2 formula behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.
Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
/200 ceiling — all EU countries get flat 20/20 pts until city-level
income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
NULL nearest_padel_court_km already yields full 15 pts. COALESCE
fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
— 10-court threshold is correct for when data arrives, contributes
0 pts everywhere for now.
Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
top_opportunity_names aggregates
Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER()
dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new)
exist for the same month, the UNION ALL produced exactly 2× rows.
Fixes:
- stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id)
- stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week)
- playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month,
preventing same-month dual-format writes that trigger the duplicate
Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace single global Overpass query (150K+ elements, times out) with
10 regional bbox queries (~10-40K elements each, 150s server / 180s client).
- REGIONS: 10 bboxes covering all continents
- Crash recovery: working.jsonl accumulates per-region results;
already_seen_ids deduplication skips re-written elements on restart
- Overlapping bbox elements deduped by OSM id across regions
- Retry per region: up to 2 retries with 30s cooldown
- Polite 5s inter-region delay
- Skip if courts.jsonl.gz or courts.json.gz already exists for the month
stg_tennis_courts: UNION ALL transition (jsonl_elements + blob_elements)
- jsonl_elements: JSONL, explicit columns, COALESCE lat/lon with center coords
(supports both node direct lat/lon and way/relation Overpass out center)
- blob_elements: existing UNNEST(elements) pattern, unchanged
- Removed osm_type='node' filter — ways/relations now usable via center coords
- Dedup on (osm_id, extracted_date DESC) unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- availability_{date}.jsonl.gz replaces .json.gz for morning snapshots
- Each JSONL line = one venue object with date + captured_at_utc injected
- Eliminates in-memory consolidation: working.jsonl IS the final file
(compress_jsonl_atomic at end instead of write_gzip_atomic blob)
- Crash recovery unchanged: working.jsonl accumulates via flush_partial_batch
- _load_morning_availability tries .jsonl.gz first, falls back to .json.gz
- Skip check covers both formats during transition
- Recheck files stay blob format (small, infrequent)
stg_playtomic_availability: UNION ALL transition (morning_jsonl + morning_blob + recheck_blob)
- morning_jsonl: read_json JSONL, tenant_id direct column, no outer UNNEST
- morning_blob / recheck_blob: subquery + LATERAL UNNEST (unchanged semantics)
- All three produce (snapshot_date, captured_at_utc, snapshot_type, recheck_hour, tenant_id, slots_json)
- Downstream raw_resources / raw_slots CTEs unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- playtomic_tenants.py: write each tenant as a JSONL line after dedup,
compress via compress_jsonl_atomic → tenants.jsonl.gz
- playtomic_availability.py: update _load_tenant_ids() to prefer
tenants.jsonl.gz, fall back to tenants.json.gz (transition)
- stg_playtomic_venues.sql: UNION ALL jsonl+blob CTEs for transition;
JSONL reads top-level columns directly, no UNNEST(tenants) needed
- stg_playtomic_resources.sql: same UNION ALL pattern, single UNNEST
for resources in JSONL path vs double UNNEST in blob path
- stg_playtomic_opening_hours.sql: same UNION ALL pattern, opening_hours
as top-level JSON column in JSONL path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Part A: Data Layer — Sprints 1-5
Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
(city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
replaces hardcoded 0::BIGINT population
Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
is available to the scoring formula
Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)
Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)
Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
with case/whitespace-insensitive city name matching
Registered all 4 new CLI entrypoints in pyproject.toml and all.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
stg_playtomic_availability:
- Add maximum_object_size = 134217728 (128 MB) to both read_json calls;
daily files exceed the 16 MB default as venue count grows
- Add seed recheck file (1970-01-01_recheck_00.json.gz, gitignored with data/)
to avoid READ_JSON IOException when no recheck files exist
pseo_city_costs_de + pseo_city_pricing:
- Add QUALIFY ROW_NUMBER() OVER (PARTITION BY city_slug ...) = 1 to
deduplicate rows caused by dim_cities' loose LIKE join; reduces
pseo_city_costs_de from 2.6M → 222 rows (one per unique city)
content/__init__.py:
- DuckDB lowercases all column names at rest ("ratePeak" → "ratepeak"),
so calc_overrides dict comprehension never matched DEFAULTS keys.
Fix: build case-insensitive reverse map {k.lower(): k} and normalise
row keys before lookup. Applied in both generate_articles() and
preview_article().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prerequisite for all pSEO serving models. Adds CASE-based country_name_en
and URL-safe country_slug to foundation.dim_cities, then selects them through
serving.city_market_profile so downstream models inherit them automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add maximum_object_size=128MB to read_json for 14K-venue tenants file
- Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access)
- Add seed file guard for availability model (empty result on first run)
- Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks
- Fix export_serving to resolve SQLMesh physical tables from view definitions
(SQLMesh views reference "local" catalog unavailable outside its context)
- Add pyarrow dependency for Arrow-based cross-connection data transfer
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three workstreams:
1. Playtomic full data extraction & transform pipeline:
- Expand venue bounding boxes from 4 to 23 regions (global coverage)
- New staging models for court resources, opening hours, and slot-level
availability with real prices from the Playtomic API
- Foundation fact tables for venue capacity and daily occupancy/revenue
- City-level pricing benchmarks replacing hardcoded country estimates
- Planner defaults now use 3-tier cascade: city data → country → fallback
2. Transactional email i18n:
- _t() helper in worker.py with ~70 translation keys (EN + DE)
- All 8 email handlers translated, lang passed in task payloads
3. Resend audiences restructured to 3 named audiences (free plan limit)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eurostat JSON-stat format (4-7 dimension sparse dict with 583K values)
causes DuckDB OOM — pre-process in extractor to flat records.
Also fix dim_cities unused CTE bug and playtomic venue lat/lon path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.
New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)
Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>