Commit Graph

10 Commits

Author SHA1 Message Date
Deeman
0960990373 feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors
Part A: Data Layer — Sprints 1-5

Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
  (city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
  replaces hardcoded 0::BIGINT population

Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
  30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
  is available to the scoring formula

Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)

Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)

Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
  with case/whitespace-insensitive city name matching

Registered all 4 new CLI entrypoints in pyproject.toml and all.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:07:08 +01:00
Deeman
ebba46f700 refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 21:17:04 +01:00
Deeman
e3a6b91bc0 fix(transform+content): unblock SQLMesh plan — three pipeline fixes
stg_playtomic_availability:
- Add maximum_object_size = 134217728 (128 MB) to both read_json calls;
  daily files exceed the 16 MB default as venue count grows
- Add seed recheck file (1970-01-01_recheck_00.json.gz, gitignored with data/)
  to avoid READ_JSON IOException when no recheck files exist

pseo_city_costs_de + pseo_city_pricing:
- Add QUALIFY ROW_NUMBER() OVER (PARTITION BY city_slug ...) = 1 to
  deduplicate rows caused by dim_cities' loose LIKE join; reduces
  pseo_city_costs_de from 2.6M → 222 rows (one per unique city)

content/__init__.py:
- DuckDB lowercases all column names at rest ("ratePeak" → "ratepeak"),
  so calc_overrides dict comprehension never matched DEFAULTS keys.
  Fix: build case-insensitive reverse map {k.lower(): k} and normalise
  row keys before lookup. Applied in both generate_articles() and
  preview_article().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:51:53 +01:00
Deeman
b3afd414a4 feat(transform): add three pSEO serving models — city costs, country overview, city pricing
- pseo_city_costs_de: unblocks city-cost-de template (~600 city pages),
  joins city_market_profile + planner_defaults, includes camelCase calc
  override columns (ratePeak, rateOffPeak, utilTarget, dblCourts, country)
- pseo_country_overview: per-country hub aggregating from pseo_city_costs_de,
  includes top_city_slugs/names lists for internal linking
- pseo_city_pricing: per-city pricing pages requiring >= 2 Playtomic venues,
  includes P25/P75 price range and occupancy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:37:50 +01:00
Deeman
b517e3e58d feat(transform): add country_name_en + country_slug to dim_cities, pass through city_market_profile
Prerequisite for all pSEO serving models. Adds CASE-based country_name_en
and URL-safe country_slug to foundation.dim_cities, then selects them through
serving.city_market_profile so downstream models inherit them automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:37:43 +01:00
Deeman
7737b79230 fix: DuckDB compat issues in Playtomic pipeline + export_serving
- Add maximum_object_size=128MB to read_json for 14K-venue tenants file
- Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access)
- Add seed file guard for availability model (empty result on first run)
- Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks
- Fix export_serving to resolve SQLMesh physical tables from view definitions
  (SQLMesh views reference "local" catalog unavailable outside its context)
- Add pyarrow dependency for Arrow-based cross-connection data transfer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:27:51 +01:00
Deeman
79f7fc6fad feat: Playtomic pricing/occupancy pipeline + email i18n + audience restructure
Three workstreams:

1. Playtomic full data extraction & transform pipeline:
   - Expand venue bounding boxes from 4 to 23 regions (global coverage)
   - New staging models for court resources, opening hours, and slot-level
     availability with real prices from the Playtomic API
   - Foundation fact tables for venue capacity and daily occupancy/revenue
   - City-level pricing benchmarks replacing hardcoded country estimates
   - Planner defaults now use 3-tier cascade: city data → country → fallback

2. Transactional email i18n:
   - _t() helper in worker.py with ~70 translation keys (EN + DE)
   - All 8 email handlers translated, lang passed in task payloads

3. Resend audiences restructured to 3 named audiences (free plan limit)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:54:53 +01:00
Deeman
2db66efe77 feat: migrate transform to 3-layer architecture with per-layer schemas
Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 19:04:40 +01:00
Deeman
18ee24818b feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides
Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.

New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)

Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 15:44:48 +01:00
Deeman
4ae00b35d1 refactor: flatten padelnomics/padelnomics/ → repo root
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:44:40 +01:00