Files
padelnomics/transform/sqlmesh_padelnomics/models/serving/venue_pricing_benchmarks.sql
Deeman 2f47d1e589 fix(pipeline): make availability chain incremental + fix supervisor
Convert the availability chain (stg_playtomic_availability →
fct_availability_slot → fct_daily_availability) from FULL to
INCREMENTAL_BY_TIME_RANGE so sqlmesh run processes only new daily
intervals instead of re-reading all files.

Supervisor changes:
- run_transform(): plan prod --auto-apply → run prod (evaluates
  missing cron intervals, picks up new data)
- git_pull_and_sync(): add plan prod --auto-apply before re-exec
  so model code changes are applied on deploy
- supervisor.sh: same plan → run change

Staging model uses a date-scoped glob (@start_ds) to read only
the current interval's files. snapshot_date cast to DATE (was
VARCHAR) as required by time_column.

Clean up redundant TRY_CAST(snapshot_date AS DATE) in
venue_pricing_benchmarks since it's already DATE from foundation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 21:34:02 +01:00

60 lines
2.6 KiB
SQL

-- Per-city pricing and occupancy benchmarks from Playtomic availability data.
-- Aggregates venue-level daily metrics (last 30 days) into city-level benchmarks.
-- Consumed by: planner defaults (pre-fill), city market profile, SEO articles.
--
-- Minimum data threshold: venues with >= 3 days of observations.
MODEL (
name serving.venue_pricing_benchmarks,
kind FULL,
cron '@daily',
grain (country_code, city)
);
WITH venue_stats AS (
-- Aggregate last 30 days per venue
SELECT
da.tenant_id,
da.country_code,
da.city,
da.city_slug,
da.price_currency,
AVG(da.occupancy_rate) AS avg_occupancy_rate,
MEDIAN(da.median_price) AS median_hourly_rate,
MEDIAN(da.median_price_peak) AS median_peak_rate,
MEDIAN(da.median_price_offpeak) AS median_offpeak_rate,
AVG(da.estimated_revenue_eur) AS avg_daily_revenue,
MAX(da.active_court_count) AS court_count,
COUNT(DISTINCT da.snapshot_date) AS days_observed
FROM foundation.fct_daily_availability da
WHERE da.snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
AND da.occupancy_rate IS NOT NULL
AND da.occupancy_rate BETWEEN 0 AND 1.5
GROUP BY da.tenant_id, da.country_code, da.city, da.city_slug, da.price_currency
HAVING COUNT(DISTINCT da.snapshot_date) >= 3
)
SELECT
country_code,
city,
city_slug,
price_currency,
COUNT(*) AS venue_count,
-- Pricing benchmarks
ROUND(MEDIAN(median_hourly_rate), 2) AS median_hourly_rate,
ROUND(MEDIAN(median_peak_rate), 2) AS median_peak_rate,
ROUND(MEDIAN(median_offpeak_rate), 2) AS median_offpeak_rate,
ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY median_hourly_rate), 2) AS hourly_rate_p25,
ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY median_hourly_rate), 2) AS hourly_rate_p75,
-- Occupancy benchmarks
ROUND(MEDIAN(avg_occupancy_rate), 4) AS median_occupancy_rate,
ROUND(AVG(avg_occupancy_rate), 4) AS avg_occupancy_rate,
-- Revenue benchmarks (per venue per day)
ROUND(MEDIAN(avg_daily_revenue), 2) AS median_daily_revenue_per_venue,
-- Court mix
ROUND(MEDIAN(court_count), 0)::INTEGER AS median_court_count,
-- Data quality
SUM(days_observed) AS total_venue_days_observed,
CURRENT_DATE AS refreshed_date
FROM venue_stats
GROUP BY country_code, city, city_slug, price_currency