refactor: align transform layer with template methodology

Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-23 21:17:04 +01:00
parent 4006d47a79
commit ebba46f700
12 changed files with 248 additions and 112 deletions

View File

@@ -10,12 +10,11 @@ MODEL (
name serving.city_market_profile,
kind FULL,
cron '@daily',
grain city_slug
grain (country_code, city_slug)
);
WITH base AS (
SELECT
c.city_code,
c.country_code,
c.country_name_en,
c.country_slug,
@@ -55,7 +54,6 @@ scored AS (
FROM base
)
SELECT
s.city_code,
s.country_code,
s.country_name_en,
s.country_slug,

View File

@@ -13,7 +13,7 @@ MODEL (
name serving.planner_defaults,
kind FULL,
cron '@daily',
grain city_slug
grain (country_code, city_slug)
);
WITH -- Real city-level benchmarks from Playtomic

View File

@@ -9,10 +9,13 @@ MODEL (
name serving.pseo_city_costs_de,
kind FULL,
cron '@daily',
grain city_slug
grain city_key
);
SELECT
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
-- (city_slug alone is not unique — 'valencia' exists in ES, VE, etc.)
c.country_slug || '-' || c.city_slug AS city_key,
-- City identity
c.city_slug,
c.city_name,
@@ -43,10 +46,8 @@ SELECT
CURRENT_DATE AS refreshed_date
FROM serving.city_market_profile c
LEFT JOIN serving.planner_defaults p
ON c.city_slug = p.city_slug
ON c.country_code = p.country_code
AND c.city_slug = p.city_slug
-- Only cities with actual padel presence and at least some rate data
WHERE c.padel_venue_count > 0
AND (p.rate_peak IS NOT NULL OR c.median_peak_rate IS NOT NULL)
-- dim_cities has a loose LIKE join that produces duplicates per city_slug;
-- take the row with the highest market_score to get canonical city data.
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.city_slug ORDER BY c.market_score DESC NULLS LAST) = 1

View File

@@ -10,10 +10,12 @@ MODEL (
name serving.pseo_city_pricing,
kind FULL,
cron '@daily',
grain city_slug
grain city_key
);
SELECT
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
c.country_slug || '-' || c.city_slug AS city_key,
-- City identity (from city_market_profile, which has the canonical city_slug)
c.city_slug,
c.city_name,
@@ -42,6 +44,3 @@ INNER JOIN serving.city_market_profile c
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
-- Only cities with enough venues for meaningful pricing statistics
WHERE vpb.venue_count >= 2
-- city_market_profile inherits duplicates from dim_cities' loose LIKE join;
-- take the highest market_score row as the canonical city record.
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.city_slug ORDER BY c.market_score DESC NULLS LAST) = 1