refactor(transform): conform geographic dimension hierarchy via city_slug
Propagates the conformed city key (city_slug) from dim_venues through the full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string joins with deterministic key joins. Changes (cascading, task-by-task): - dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation) - dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues; carry city_slug alongside country_code/city - fct_daily_availability: carry city_slug from dim_venue_capacity - venue_pricing_benchmarks: carry city_slug from fct_daily_availability; add to venue_stats GROUP BY and final SELECT/GROUP BY - city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM)) - planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug - pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM)) - pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a fully conformed geographic hierarchy with no fuzzy string comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -57,7 +57,7 @@ WITH base AS (
|
||||
FROM foundation.dim_cities c
|
||||
LEFT JOIN serving.venue_pricing_benchmarks vpb
|
||||
ON c.country_code = vpb.country_code
|
||||
AND LOWER(TRIM(c.city_name)) = LOWER(TRIM(vpb.city))
|
||||
AND c.city_slug = vpb.city_slug
|
||||
WHERE c.padel_venue_count > 0
|
||||
),
|
||||
scored AS (
|
||||
|
||||
@@ -21,6 +21,7 @@ city_benchmarks AS (
|
||||
SELECT
|
||||
country_code,
|
||||
city,
|
||||
city_slug,
|
||||
median_peak_rate,
|
||||
median_offpeak_rate,
|
||||
median_occupancy_rate,
|
||||
@@ -128,7 +129,7 @@ SELECT
|
||||
FROM city_profiles cp
|
||||
LEFT JOIN city_benchmarks cb
|
||||
ON cp.country_code = cb.country_code
|
||||
AND LOWER(TRIM(cp.city_name)) = LOWER(TRIM(cb.city))
|
||||
AND cp.city_slug = cb.city_slug
|
||||
LEFT JOIN country_benchmarks ctb
|
||||
ON cp.country_code = ctb.country_code
|
||||
LEFT JOIN hardcoded_fallbacks hf
|
||||
|
||||
@@ -41,6 +41,6 @@ FROM serving.venue_pricing_benchmarks vpb
|
||||
-- Join city_market_profile to get the canonical city_slug and country metadata
|
||||
INNER JOIN serving.city_market_profile c
|
||||
ON vpb.country_code = c.country_code
|
||||
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
|
||||
AND vpb.city_slug = c.city_slug
|
||||
-- Only cities with enough venues for meaningful pricing statistics
|
||||
WHERE vpb.venue_count >= 2
|
||||
|
||||
@@ -17,6 +17,7 @@ WITH venue_stats AS (
|
||||
da.tenant_id,
|
||||
da.country_code,
|
||||
da.city,
|
||||
da.city_slug,
|
||||
da.price_currency,
|
||||
AVG(da.occupancy_rate) AS avg_occupancy_rate,
|
||||
MEDIAN(da.median_price) AS median_hourly_rate,
|
||||
@@ -29,12 +30,13 @@ WITH venue_stats AS (
|
||||
WHERE TRY_CAST(da.snapshot_date AS DATE) >= CURRENT_DATE - INTERVAL '30 days'
|
||||
AND da.occupancy_rate IS NOT NULL
|
||||
AND da.occupancy_rate BETWEEN 0 AND 1.5
|
||||
GROUP BY da.tenant_id, da.country_code, da.city, da.price_currency
|
||||
GROUP BY da.tenant_id, da.country_code, da.city, da.city_slug, da.price_currency
|
||||
HAVING COUNT(DISTINCT da.snapshot_date) >= 3
|
||||
)
|
||||
SELECT
|
||||
country_code,
|
||||
city,
|
||||
city_slug,
|
||||
price_currency,
|
||||
COUNT(*) AS venue_count,
|
||||
-- Pricing benchmarks
|
||||
@@ -54,4 +56,4 @@ SELECT
|
||||
SUM(days_observed) AS total_venue_days_observed,
|
||||
CURRENT_DATE AS refreshed_date
|
||||
FROM venue_stats
|
||||
GROUP BY country_code, city, price_currency
|
||||
GROUP BY country_code, city, city_slug, price_currency
|
||||
|
||||
Reference in New Issue
Block a user