Files
padelnomics/transform/sqlmesh_padelnomics/models/serving/pseo_city_pricing.sql
Deeman ebba46f700 refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 21:17:04 +01:00

47 lines
1.5 KiB
SQL

-- pSEO article data: per-city padel court pricing.
-- One row per city — consumed by the city-pricing.md.jinja template.
-- Joins venue_pricing_benchmarks (real Playtomic data) with city_market_profile
-- (population, venue count, country metadata).
--
-- Stricter filter than pseo_city_costs_de: requires >= 2 venues with real
-- pricing data so pricing articles are always data-backed.
MODEL (
name serving.pseo_city_pricing,
kind FULL,
cron '@daily',
grain city_key
);
SELECT
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
c.country_slug || '-' || c.city_slug AS city_key,
-- City identity (from city_market_profile, which has the canonical city_slug)
c.city_slug,
c.city_name,
c.country_code,
c.country_name_en,
c.country_slug,
-- Market context
c.population,
c.padel_venue_count,
c.venues_per_100k,
c.market_score,
-- Pricing benchmarks (from Playtomic availability data)
vpb.median_hourly_rate,
vpb.median_peak_rate,
vpb.median_offpeak_rate,
vpb.hourly_rate_p25,
vpb.hourly_rate_p75,
vpb.median_occupancy_rate,
vpb.venue_count,
vpb.price_currency,
CURRENT_DATE AS refreshed_date
FROM serving.venue_pricing_benchmarks vpb
-- Join city_market_profile to get the canonical city_slug and country metadata
INNER JOIN serving.city_market_profile c
ON vpb.country_code = c.country_code
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
-- Only cities with enough venues for meaningful pricing statistics
WHERE vpb.venue_count >= 2