Propagates the conformed city key (city_slug) from dim_venues through the full pricing pipeline, eliminating 3 fragile LOWER(TRIM(...)) fuzzy string joins with deterministic key joins. Changes (cascading, task-by-task): - dim_venues: add city_slug computed column (REGEXP_REPLACE slug derivation) - dim_venue_capacity: join foundation.dim_venues instead of stg_playtomic_venues; carry city_slug alongside country_code/city - fct_daily_availability: carry city_slug from dim_venue_capacity - venue_pricing_benchmarks: carry city_slug from fct_daily_availability; add to venue_stats GROUP BY and final SELECT/GROUP BY - city_market_profile: join vpb on city_slug = city_slug (was LOWER(TRIM)) - planner_defaults: add city_slug to city_benchmarks CTE; join on city_slug - pseo_city_pricing: join city_market_profile on city_slug (was LOWER(TRIM)) - pipeline_routes._DAG: dim_venue_capacity now depends on dim_venues, not stg_playtomic_venues Result: dim_venues.city_slug → dim_cities.(country_code, city_slug) forms a fully conformed geographic hierarchy with no fuzzy string comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
60 lines
2.7 KiB
SQL
60 lines
2.7 KiB
SQL
-- Per-city pricing and occupancy benchmarks from Playtomic availability data.
|
|
-- Aggregates venue-level daily metrics (last 30 days) into city-level benchmarks.
|
|
-- Consumed by: planner defaults (pre-fill), city market profile, SEO articles.
|
|
--
|
|
-- Minimum data threshold: venues with >= 3 days of observations.
|
|
|
|
MODEL (
|
|
name serving.venue_pricing_benchmarks,
|
|
kind FULL,
|
|
cron '@daily',
|
|
grain (country_code, city)
|
|
);
|
|
|
|
WITH venue_stats AS (
|
|
-- Aggregate last 30 days per venue
|
|
SELECT
|
|
da.tenant_id,
|
|
da.country_code,
|
|
da.city,
|
|
da.city_slug,
|
|
da.price_currency,
|
|
AVG(da.occupancy_rate) AS avg_occupancy_rate,
|
|
MEDIAN(da.median_price) AS median_hourly_rate,
|
|
MEDIAN(da.median_price_peak) AS median_peak_rate,
|
|
MEDIAN(da.median_price_offpeak) AS median_offpeak_rate,
|
|
AVG(da.estimated_revenue_eur) AS avg_daily_revenue,
|
|
MAX(da.active_court_count) AS court_count,
|
|
COUNT(DISTINCT da.snapshot_date) AS days_observed
|
|
FROM foundation.fct_daily_availability da
|
|
WHERE TRY_CAST(da.snapshot_date AS DATE) >= CURRENT_DATE - INTERVAL '30 days'
|
|
AND da.occupancy_rate IS NOT NULL
|
|
AND da.occupancy_rate BETWEEN 0 AND 1.5
|
|
GROUP BY da.tenant_id, da.country_code, da.city, da.city_slug, da.price_currency
|
|
HAVING COUNT(DISTINCT da.snapshot_date) >= 3
|
|
)
|
|
SELECT
|
|
country_code,
|
|
city,
|
|
city_slug,
|
|
price_currency,
|
|
COUNT(*) AS venue_count,
|
|
-- Pricing benchmarks
|
|
ROUND(MEDIAN(median_hourly_rate), 2) AS median_hourly_rate,
|
|
ROUND(MEDIAN(median_peak_rate), 2) AS median_peak_rate,
|
|
ROUND(MEDIAN(median_offpeak_rate), 2) AS median_offpeak_rate,
|
|
ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY median_hourly_rate), 2) AS hourly_rate_p25,
|
|
ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY median_hourly_rate), 2) AS hourly_rate_p75,
|
|
-- Occupancy benchmarks
|
|
ROUND(MEDIAN(avg_occupancy_rate), 4) AS median_occupancy_rate,
|
|
ROUND(AVG(avg_occupancy_rate), 4) AS avg_occupancy_rate,
|
|
-- Revenue benchmarks (per venue per day)
|
|
ROUND(MEDIAN(avg_daily_revenue), 2) AS median_daily_revenue_per_venue,
|
|
-- Court mix
|
|
ROUND(MEDIAN(court_count), 0)::INTEGER AS median_court_count,
|
|
-- Data quality
|
|
SUM(days_observed) AS total_venue_days_observed,
|
|
CURRENT_DATE AS refreshed_date
|
|
FROM venue_stats
|
|
GROUP BY country_code, city, city_slug, price_currency
|