Files
padelnomics/transform/sqlmesh_padelnomics/models/foundation/fct_daily_availability.sql
Deeman ebba46f700 refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:

1. Fix dim_cities LIKE join (data quality bug)
   - Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
     → cartesian product (2.6M rows vs ~5500 expected)
   - New: FROM venue_cities (dim_venues) as primary table, Eurostat for
     enrichment only. grain (country_code, city_slug).
   - Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
     names aren't stripped to '-'

2. Rename fct_venue_capacity → dim_venue_capacity
   - Static venue attributes with no time key are a dimension, not a fact
   - No SQL logic changes; update fct_daily_availability reference

3. Add fct_availability_slot at event grain
   - New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
   - Recheck dedup logic moves here from fct_daily_availability
   - fct_daily_availability now reads fct_availability_slot (cleaner DAG)

Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
  (country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key

Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 21:17:04 +01:00

83 lines
3.2 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- Daily venue-level availability, pricing, occupancy, and revenue estimates.
-- Aggregates fct_availability_slot (event-grain fact) into per-venue per-day
-- statistics, then calculates occupancy against capacity from dim_venue_capacity.
--
-- Recheck-aware deduplication lives in fct_availability_slot — this model only
-- reads the already-deduplicated best-snapshot slots.
--
-- Occupancy = 1 - (available_court_hours / capacity_court_hours_per_day)
-- Revenue estimate = booked_court_hours × avg_price_of_available_slots
--
-- Peak hours defined as 17:0021:00 (captures main evening rush across markets).
MODEL (
name foundation.fct_daily_availability,
kind FULL,
cron '@daily',
grain (snapshot_date, tenant_id)
);
WITH slot_agg AS (
SELECT
a.snapshot_date,
a.tenant_id,
COUNT(*) AS available_slot_count,
COUNT(DISTINCT a.resource_id) AS courts_with_availability,
-- Each available start_time represents a 60-min bookable window
ROUND(COUNT(*) * 1.0, 2) AS available_court_hours,
-- Pricing stats (60-min slots only)
ROUND(MEDIAN(a.price_amount), 2) AS median_price,
ROUND(AVG(a.price_amount), 2) AS avg_price,
MIN(a.price_amount) AS min_price,
MAX(a.price_amount) AS max_price,
-- Peak: 17:0021:00 (is_peak flag computed once in fct_availability_slot)
ROUND(MEDIAN(a.price_amount) FILTER (WHERE a.is_peak), 2) AS median_price_peak,
-- Off-peak: everything outside 17:0021:00
ROUND(MEDIAN(a.price_amount) FILTER (WHERE NOT a.is_peak), 2) AS median_price_offpeak,
MAX(a.price_currency) AS price_currency,
MAX(a.captured_at_utc) AS captured_at_utc
FROM foundation.fct_availability_slot a
GROUP BY a.snapshot_date, a.tenant_id
)
SELECT
sa.snapshot_date,
sa.tenant_id,
cap.country_code,
cap.city,
cap.active_court_count,
cap.capacity_court_hours_per_day,
sa.available_slot_count,
sa.courts_with_availability,
sa.available_court_hours,
-- Occupancy: (capacity - available) / capacity
CASE
WHEN cap.capacity_court_hours_per_day > 0
THEN ROUND(
1.0 - (sa.available_court_hours / cap.capacity_court_hours_per_day),
4
)
ELSE NULL
END AS occupancy_rate,
-- Estimated booked court-hours
ROUND(
GREATEST(cap.capacity_court_hours_per_day - sa.available_court_hours, 0),
2
) AS booked_court_hours,
-- Estimated daily revenue: booked hours × avg price
ROUND(
GREATEST(cap.capacity_court_hours_per_day - sa.available_court_hours, 0)
* sa.avg_price,
2
) AS estimated_revenue_eur,
-- Pricing
sa.median_price,
sa.avg_price,
sa.min_price,
sa.max_price,
sa.median_price_peak,
sa.median_price_offpeak,
sa.price_currency,
sa.captured_at_utc
FROM slot_agg sa
JOIN foundation.dim_venue_capacity cap ON sa.tenant_id = cap.tenant_id