refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -10,10 +10,12 @@ MODEL (
|
||||
name serving.pseo_city_pricing,
|
||||
kind FULL,
|
||||
cron '@daily',
|
||||
grain city_slug
|
||||
grain city_key
|
||||
);
|
||||
|
||||
SELECT
|
||||
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
|
||||
c.country_slug || '-' || c.city_slug AS city_key,
|
||||
-- City identity (from city_market_profile, which has the canonical city_slug)
|
||||
c.city_slug,
|
||||
c.city_name,
|
||||
@@ -42,6 +44,3 @@ INNER JOIN serving.city_market_profile c
|
||||
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
|
||||
-- Only cities with enough venues for meaningful pricing statistics
|
||||
WHERE vpb.venue_count >= 2
|
||||
-- city_market_profile inherits duplicates from dim_cities' loose LIKE join;
|
||||
-- take the highest market_score row as the canonical city record.
|
||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.city_slug ORDER BY c.market_score DESC NULLS LAST) = 1
|
||||
|
||||
Reference in New Issue
Block a user