Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
57 lines
1.8 KiB
SQL
57 lines
1.8 KiB
SQL
-- Slot-level availability fact: one row per deduplicated available slot.
|
||
-- Event grain: (snapshot_date, tenant_id, resource_id, slot_start_time).
|
||
--
|
||
-- "Available" means the slot was NOT booked at capture time.
|
||
-- Recheck-aware: for each (date, tenant, resource, start_time), prefer the
|
||
-- latest recheck snapshot over the morning snapshot. If a slot was present
|
||
-- in the morning but absent in the recheck, that means it was booked between
|
||
-- snapshots — and it will simply not appear in this model (correct behaviour:
|
||
-- unavailable slots are not in the available-slots fact).
|
||
--
|
||
-- is_peak: convenience flag for 17:00–21:00 slots (main evening rush).
|
||
-- Downstream models (fct_daily_availability) use this to avoid re-computing
|
||
-- the peak window condition on every aggregation.
|
||
|
||
MODEL (
|
||
name foundation.fct_availability_slot,
|
||
kind FULL,
|
||
cron '@daily',
|
||
grain (snapshot_date, tenant_id, resource_id, slot_start_time)
|
||
);
|
||
|
||
WITH deduped AS (
|
||
SELECT
|
||
snapshot_date,
|
||
tenant_id,
|
||
resource_id,
|
||
slot_start_time,
|
||
price_amount,
|
||
price_currency,
|
||
snapshot_type,
|
||
captured_at_utc,
|
||
-- Prefer recheck over morning; within same snapshot_type prefer latest capture
|
||
ROW_NUMBER() OVER (
|
||
PARTITION BY snapshot_date, tenant_id, resource_id, slot_start_time
|
||
ORDER BY
|
||
CASE WHEN snapshot_type = 'recheck' THEN 1 ELSE 2 END,
|
||
captured_at_utc DESC
|
||
) AS rn
|
||
FROM staging.stg_playtomic_availability
|
||
WHERE price_amount IS NOT NULL
|
||
AND price_amount > 0
|
||
)
|
||
SELECT
|
||
snapshot_date,
|
||
tenant_id,
|
||
resource_id,
|
||
slot_start_time,
|
||
price_amount,
|
||
price_currency,
|
||
snapshot_type,
|
||
captured_at_utc,
|
||
( slot_start_time::TIME >= '17:00:00'
|
||
AND slot_start_time::TIME < '21:00:00'
|
||
) AS is_peak
|
||
FROM deduped
|
||
WHERE rn = 1
|