Files
padelnomics/transform/sqlmesh_padelnomics/models/foundation/fct_availability_slot.sql
Deeman ebba46f700 refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 21:17:04 +01:00

57 lines
1.8 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- Slot-level availability fact: one row per deduplicated available slot.
-- Event grain: (snapshot_date, tenant_id, resource_id, slot_start_time).
--
-- "Available" means the slot was NOT booked at capture time.
-- Recheck-aware: for each (date, tenant, resource, start_time), prefer the
-- latest recheck snapshot over the morning snapshot. If a slot was present
-- in the morning but absent in the recheck, that means it was booked between
-- snapshots — and it will simply not appear in this model (correct behaviour:
-- unavailable slots are not in the available-slots fact).
--
-- is_peak: convenience flag for 17:0021:00 slots (main evening rush).
-- Downstream models (fct_daily_availability) use this to avoid re-computing
-- the peak window condition on every aggregation.
MODEL (
name foundation.fct_availability_slot,
kind FULL,
cron '@daily',
grain (snapshot_date, tenant_id, resource_id, slot_start_time)
);
WITH deduped AS (
SELECT
snapshot_date,
tenant_id,
resource_id,
slot_start_time,
price_amount,
price_currency,
snapshot_type,
captured_at_utc,
-- Prefer recheck over morning; within same snapshot_type prefer latest capture
ROW_NUMBER() OVER (
PARTITION BY snapshot_date, tenant_id, resource_id, slot_start_time
ORDER BY
CASE WHEN snapshot_type = 'recheck' THEN 1 ELSE 2 END,
captured_at_utc DESC
) AS rn
FROM staging.stg_playtomic_availability
WHERE price_amount IS NOT NULL
AND price_amount > 0
)
SELECT
snapshot_date,
tenant_id,
resource_id,
slot_start_time,
price_amount,
price_currency,
snapshot_type,
captured_at_utc,
( slot_start_time::TIME >= '17:00:00'
AND slot_start_time::TIME < '21:00:00'
) AS is_peak
FROM deduped
WHERE rn = 1