refactor(transform): conform geographic dimension hierarchy via city_slug

Propagates the conformed city key (city_slug) from dim_venues through the
full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string
joins with deterministic key joins.

Changes (cascading, task-by-task):
- dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation)
- dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues;
  carry city_slug alongside country_code/city
- fct_daily_availability: carry city_slug from dim_venue_capacity
- venue_pricing_benchmarks: carry city_slug from fct_daily_availability;
  add to venue_stats GROUP BY and final SELECT/GROUP BY
- city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM))
- planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug
- pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM))
- pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues

Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a
fully conformed geographic hierarchy with no fuzzy string comparisons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-27 13:23:03 +01:00
parent 160c2c6f7b
commit 4e82907a70
8 changed files with 14 additions and 7 deletions

View File

@@ -17,6 +17,7 @@ WITH venue_stats AS (
da.tenant_id,
da.country_code,
da.city,
da.city_slug,
da.price_currency,
AVG(da.occupancy_rate) AS avg_occupancy_rate,
MEDIAN(da.median_price) AS median_hourly_rate,
@@ -29,12 +30,13 @@ WITH venue_stats AS (
WHERE TRY_CAST(da.snapshot_date AS DATE) >= CURRENT_DATE - INTERVAL '30 days'
AND da.occupancy_rate IS NOT NULL
AND da.occupancy_rate BETWEEN 0 AND 1.5
GROUP BY da.tenant_id, da.country_code, da.city, da.price_currency
GROUP BY da.tenant_id, da.country_code, da.city, da.city_slug, da.price_currency
HAVING COUNT(DISTINCT da.snapshot_date) >= 3
)
SELECT
country_code,
city,
city_slug,
price_currency,
COUNT(*) AS venue_count,
-- Pricing benchmarks
@@ -54,4 +56,4 @@ SELECT
SUM(days_observed) AS total_venue_days_observed,
CURRENT_DATE AS refreshed_date
FROM venue_stats
GROUP BY country_code, city, price_currency
GROUP BY country_code, city, city_slug, price_currency