refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,11 +1,9 @@
|
||||
-- Daily venue-level availability, pricing, occupancy, and revenue estimates.
|
||||
-- Aggregates slot-level data from stg_playtomic_availability into per-venue
|
||||
-- per-day statistics, then calculates occupancy by comparing available hours
|
||||
-- against total capacity from fct_venue_capacity.
|
||||
-- Aggregates fct_availability_slot (event-grain fact) into per-venue per-day
|
||||
-- statistics, then calculates occupancy against capacity from dim_venue_capacity.
|
||||
--
|
||||
-- Recheck-aware occupancy: for each (tenant, resource, slot_start_time),
|
||||
-- prefer the latest snapshot (recheck > morning). A slot present in morning
|
||||
-- but absent in the recheck = booked between snapshots → more accurate occupancy.
|
||||
-- Recheck-aware deduplication lives in fct_availability_slot — this model only
|
||||
-- reads the already-deduplicated best-snapshot slots.
|
||||
--
|
||||
-- Occupancy = 1 - (available_court_hours / capacity_court_hours_per_day)
|
||||
-- Revenue estimate = booked_court_hours × avg_price_of_available_slots
|
||||
@@ -19,26 +17,7 @@ MODEL (
|
||||
grain (snapshot_date, tenant_id)
|
||||
);
|
||||
|
||||
-- Prefer the latest snapshot for each slot:
|
||||
-- If a recheck exists for a (date, tenant, resource, start_time), use it.
|
||||
-- Otherwise fall back to the morning snapshot.
|
||||
WITH ranked_slots AS (
|
||||
SELECT
|
||||
a.*,
|
||||
ROW_NUMBER() OVER (
|
||||
PARTITION BY a.snapshot_date, a.tenant_id, a.resource_id, a.slot_start_time
|
||||
ORDER BY
|
||||
CASE WHEN a.snapshot_type = 'recheck' THEN 1 ELSE 2 END,
|
||||
a.captured_at_utc DESC
|
||||
) AS rn
|
||||
FROM staging.stg_playtomic_availability a
|
||||
WHERE a.price_amount IS NOT NULL
|
||||
AND a.price_amount > 0
|
||||
),
|
||||
latest_slots AS (
|
||||
SELECT * FROM ranked_slots WHERE rn = 1
|
||||
),
|
||||
slot_agg AS (
|
||||
WITH slot_agg AS (
|
||||
SELECT
|
||||
a.snapshot_date,
|
||||
a.tenant_id,
|
||||
@@ -51,19 +30,13 @@ slot_agg AS (
|
||||
ROUND(AVG(a.price_amount), 2) AS avg_price,
|
||||
MIN(a.price_amount) AS min_price,
|
||||
MAX(a.price_amount) AS max_price,
|
||||
-- Peak: 17:00–21:00
|
||||
ROUND(MEDIAN(a.price_amount) FILTER (
|
||||
WHERE a.slot_start_time::TIME >= '17:00:00'
|
||||
AND a.slot_start_time::TIME < '21:00:00'
|
||||
), 2) AS median_price_peak,
|
||||
-- Peak: 17:00–21:00 (is_peak flag computed once in fct_availability_slot)
|
||||
ROUND(MEDIAN(a.price_amount) FILTER (WHERE a.is_peak), 2) AS median_price_peak,
|
||||
-- Off-peak: everything outside 17:00–21:00
|
||||
ROUND(MEDIAN(a.price_amount) FILTER (
|
||||
WHERE a.slot_start_time::TIME < '17:00:00'
|
||||
OR a.slot_start_time::TIME >= '21:00:00'
|
||||
), 2) AS median_price_offpeak,
|
||||
ROUND(MEDIAN(a.price_amount) FILTER (WHERE NOT a.is_peak), 2) AS median_price_offpeak,
|
||||
MAX(a.price_currency) AS price_currency,
|
||||
MAX(a.captured_at_utc) AS captured_at_utc
|
||||
FROM latest_slots a
|
||||
FROM foundation.fct_availability_slot a
|
||||
GROUP BY a.snapshot_date, a.tenant_id
|
||||
)
|
||||
SELECT
|
||||
@@ -106,4 +79,4 @@ SELECT
|
||||
sa.price_currency,
|
||||
sa.captured_at_utc
|
||||
FROM slot_agg sa
|
||||
JOIN foundation.fct_venue_capacity cap ON sa.tenant_id = cap.tenant_id
|
||||
JOIN foundation.dim_venue_capacity cap ON sa.tenant_id = cap.tenant_id
|
||||
|
||||
Reference in New Issue
Block a user