16 Commits

Author SHA1 Message Date
Deeman
4e82907a70 refactor(transform): conform geographic dimension hierarchy via city_slug
Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.

Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
  carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
  add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues

Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 13:23:03 +01:00
Deeman
9835176e87 fix(sql): opportunity_score income ceiling /200→/35000 (economic power)
PPS values are 18k–37k but /200 normalisation caused LEAST(1.0, 115)=1.0
for ALL countries — 20pts flat uplift, zero differentiation.

Fix: /35000 creates real country spread:
  LU 20.0pts, DE 15.2pts, ES 12.8pts, GB 10.5pts (vs 20.0 everywhere before)

Default for missing data 100→15000 (developing-market assumption, ~0.43).
Header comment updated to document v2 formula behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:58:57 +01:00
Deeman
10266c3a24 fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings
Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.

Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
  /200 ceiling — all EU countries get flat 20/20 pts until city-level
  income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
  NULL nearest_padel_court_km already yields full 15 pts. COALESCE
  fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
  — 10-court threshold is correct for when data arrives, contributes
  0 pts everywhere for now.

Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:57:57 +01:00
Deeman
88ed17484b feat(sql+templates): market_score v3 — log density + count gate
Fixes ranking inversion where Germany (1/100k courts) outscored Spain
(36/100k). Root causes: population/income were 55% of max before any
padel signal, density ceiling saturated 73% of cities, small-town
inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation
discount actively penalised mature markets.

SQL (city_market_profile.sql):
- Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count
  gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town
  inflation without hard cutoffs (1 venue = 20%, 5+ = 100%).
- Demand evidence 25pts: occupancy if available; 40% density proxy
  otherwise. Separated from supply to avoid double-counting.
- Addressable market 15pts: population as context, not maturity.
- Economic context 10pts: income PPS (flat per country, low signal).
- Data quality 10pts.
- Removed saturation discount. High density = maturity.

Verified spot-check scores:
  Málaga (46v, 7.77/100k): 70.1  [was 98.9]
  Barcelona (104v, 6.17/100k): 67.4  [was 100.0]
  Amsterdam (24v, 3.24/100k): 58.4  [was 93.7]
  Bernau bei Berlin (2v, 5.74/100k): 43.9  [was 92.7]
  Berlin (20v, 0.55/100k): 42.2  [was 74.1]
  London (66v, 0.74/100k): 44.1  [was 75.5]

Templates (city-cost-de, country-overview, city-pricing):
- Color coding: green >= 55 (was 65), amber >= 35 (was 40)
- Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45)
- Opportunity interplay: market_score < 40 (was < 50) for white-space

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:40:12 +01:00
Deeman
7186d4582a feat(sql): thread opportunity_score from location_opportunity_profile into pSEO serving chain
- dim_cities: add geoname_id to geonames_pop CTE and final SELECT
  Creates FK between dim_cities (city-with-padel-venues) and dim_locations (all GeoNames),
  enabling joins to location_opportunity_profile for the first time.
- city_market_profile: pass geoname_id through base CTE and final SELECT
- pseo_city_costs_de: LEFT JOIN location_opportunity_profile on (country_code, geoname_id),
  add opportunity_score to output columns
- pseo_country_overview: add avg_opportunity_score, top_opportunity_score, top_opportunity_slugs,
  top_opportunity_names aggregates

Cities with no GeoNames name match get opportunity_score = NULL; templates guard with
{% if opportunity_score %}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:29:57 +01:00
Deeman
ebfdc84a94 feat(transform): add dim_locations + dual market scoring models
dim_locations (foundation):
- Seeded from stg_population_geonames (all locations, not venue-dependent)
- Grain: (country_code, geoname_id)
- Enriched with: padel venues within 5km, nearest court distance (ST_Distance_Sphere),
  tennis courts within 25km, country income
- Covers zero-court Gemeinden for opportunity scoring

location_opportunity_profile (serving) — Padelnomics Marktpotenzial-Score:
- Answers "Where should I build?" — no padel_venue_count filter
- Formula: population (25) + income (20) + supply gap inverted (30) +
           catchment gap (15) + tennis culture (10) = 100pts
- Sorted by opportunity_score DESC

city_market_profile (serving) — Padelnomics Marktreife-Score:
- Add saturation discount (×0.85 when venues_per_100k > 8)
- Update header comment to reference Marktreife-Score branding
- Kept WHERE padel_venue_count > 0 (established markets only)
- column name market_score unchanged (avoids downstream breakage)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 16:28:16 +01:00
Deeman
0960990373 feat(data): Sprint 1-5 population pipeline — city labels, US/UK/Global extractors
Part A: Data Layer — Sprints 1-5

Sprint 1 — Eurostat SDMX city labels (unblocks EU population):
- New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist
  (city_code → city_name mapping) with ETag dedup
- New staging model: stg_city_labels.sql — grain city_code
- Updated dim_cities.sql — joins Eurostat population via city code lookup;
  replaces hardcoded 0::BIGINT population

Sprint 2 — Market score formula v2:
- city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200),
  30pt demand (occupancy or density), 15pt data confidence
- Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate
  is available to the scoring formula

Sprint 3 — US Census ACS extractor:
- New extractor: census_usa.py — ACS 5-year place population (vintage 2023)
- New staging model: stg_population_usa.sql — grain (place_fips, ref_year)

Sprint 4 — ONS UK extractor:
- New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API
- New staging model: stg_population_uk.sql — grain (lad_code, ref_year)

Sprint 5 — GeoNames global extractor:
- New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop
- New staging model: stg_population_geonames.sql — grain geoname_id
- dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0)
  with case/whitespace-insensitive city name matching

Registered all 4 new CLI entrypoints in pyproject.toml and all.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:07:08 +01:00
Deeman
ebba46f700 refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 21:17:04 +01:00
Deeman
e3a6b91bc0 fix(transform+content): unblock SQLMesh plan — three pipeline fixes
stg_playtomic_availability:
- Add maximum_object_size = 134217728 (128 MB) to both read_json calls;
  daily files exceed the 16 MB default as venue count grows
- Add seed recheck file (1970-01-01_recheck_00.json.gz, gitignored with data/)
  to avoid READ_JSON IOException when no recheck files exist

pseo_city_costs_de + pseo_city_pricing:
- Add QUALIFY ROW_NUMBER() OVER (PARTITION BY city_slug ...) = 1 to
  deduplicate rows caused by dim_cities' loose LIKE join; reduces
  pseo_city_costs_de from 2.6M → 222 rows (one per unique city)

content/__init__.py:
- DuckDB lowercases all column names at rest ("ratePeak" → "ratepeak"),
  so calc_overrides dict comprehension never matched DEFAULTS keys.
  Fix: build case-insensitive reverse map {k.lower(): k} and normalise
  row keys before lookup. Applied in both generate_articles() and
  preview_article().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:51:53 +01:00
Deeman
b3afd414a4 feat(transform): add three pSEO serving models — city costs, country overview, city pricing
- pseo_city_costs_de: unblocks city-cost-de template (~600 city pages),
  joins city_market_profile + planner_defaults, includes camelCase calc
  override columns (ratePeak, rateOffPeak, utilTarget, dblCourts, country)
- pseo_country_overview: per-country hub aggregating from pseo_city_costs_de,
  includes top_city_slugs/names lists for internal linking
- pseo_city_pricing: per-city pricing pages requiring >= 2 Playtomic venues,
  includes P25/P75 price range and occupancy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:37:50 +01:00
Deeman
b517e3e58d feat(transform): add country_name_en + country_slug to dim_cities, pass through city_market_profile
Prerequisite for all pSEO serving models. Adds CASE-based country_name_en
and URL-safe country_slug to foundation.dim_cities, then selects them through
serving.city_market_profile so downstream models inherit them automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:37:43 +01:00
Deeman
7737b79230 fix: DuckDB compat issues in Playtomic pipeline + export_serving
- Add maximum_object_size=128MB to read_json for 14K-venue tenants file
- Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access)
- Add seed file guard for availability model (empty result on first run)
- Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks
- Fix export_serving to resolve SQLMesh physical tables from view definitions
  (SQLMesh views reference "local" catalog unavailable outside its context)
- Add pyarrow dependency for Arrow-based cross-connection data transfer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:27:51 +01:00
Deeman
79f7fc6fad feat: Playtomic pricing/occupancy pipeline + email i18n + audience restructure
Three workstreams:

1. Playtomic full data extraction & transform pipeline:
   - Expand venue bounding boxes from 4 to 23 regions (global coverage)
   - New staging models for court resources, opening hours, and slot-level
     availability with real prices from the Playtomic API
   - Foundation fact tables for venue capacity and daily occupancy/revenue
   - City-level pricing benchmarks replacing hardcoded country estimates
   - Planner defaults now use 3-tier cascade: city data → country → fallback

2. Transactional email i18n:
   - _t() helper in worker.py with ~70 translation keys (EN + DE)
   - All 8 email handlers translated, lang passed in task payloads

3. Resend audiences restructured to 3 named audiences (free plan limit)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:54:53 +01:00
Deeman
2db66efe77 feat: migrate transform to 3-layer architecture with per-layer schemas
Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 19:04:40 +01:00
Deeman
18ee24818b feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides
Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.

New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)

Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 15:44:48 +01:00
Deeman
4ae00b35d1 refactor: flatten padelnomics/padelnomics/ → repo root
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:44:40 +01:00