refactor: align transform layer with template methodology
Three deviations from the quart_saas_boilerplate methodology corrected:
1. Fix dim_cities LIKE join (data quality bug)
- Old: FROM eurostat_cities LEFT JOIN venue_counts LIKE '%country_code%'
→ cartesian product (2.6M rows vs ~5500 expected)
- New: FROM venue_cities (dim_venues) as primary table, Eurostat for
enrichment only. grain (country_code, city_slug).
- Also fixes REGEXP_REPLACE to LOWER() before regex so uppercase city
names aren't stripped to '-'
2. Rename fct_venue_capacity → dim_venue_capacity
- Static venue attributes with no time key are a dimension, not a fact
- No SQL logic changes; update fct_daily_availability reference
3. Add fct_availability_slot at event grain
- New: grain (snapshot_date, tenant_id, resource_id, slot_start_time)
- Recheck dedup logic moves here from fct_daily_availability
- fct_daily_availability now reads fct_availability_slot (cleaner DAG)
Downstream fixes:
- city_market_profile, planner_defaults grain → (country_code, city_slug)
- pseo_city_costs_de, pseo_city_pricing add city_key composite natural key
(country_slug || '-' || city_slug) to avoid URL collisions across countries
- planner_defaults join in pseo_city_costs_de uses both country_code + city_slug
- Templates updated: natural_key city_slug → city_key
Added transform/sqlmesh_padelnomics/CLAUDE.md documenting data modeling rules,
conformed dimension map, and source integration architecture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
108
transform/sqlmesh_padelnomics/CLAUDE.md
Normal file
108
transform/sqlmesh_padelnomics/CLAUDE.md
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
# CLAUDE.md — padelnomics SQLMesh transform
|
||||||
|
|
||||||
|
Data engineering guidance for working in this directory. Read the `data-engineer` skill
|
||||||
|
(`/data-engineer`) before making modeling decisions.
|
||||||
|
|
||||||
|
## 3-layer architecture rules
|
||||||
|
|
||||||
|
### staging/ — read + cast + dedup only
|
||||||
|
|
||||||
|
- Reads landing zone files directly: `read_json(@LANDING_DIR || '...', ...)` or `read_csv(...)`
|
||||||
|
- Casts every column to the correct type here: `TRY_CAST(... AS DOUBLE)`, `TRY_CAST(... AS DATE)`
|
||||||
|
- Deduplicates on the source's natural key if the source can produce duplicates
|
||||||
|
- **No business logic.** No joins across sources. No derived metrics.
|
||||||
|
- Naming: `staging.stg_<source_dataset>`
|
||||||
|
|
||||||
|
### foundation/ — business logic, conformed dimensions and facts
|
||||||
|
|
||||||
|
- **Dimensions (`dim_*`)**: one row per entity (venue, city, country). Slowly changing or static.
|
||||||
|
- Conformed = shared across fact tables. `dim_cities` and `dim_venues` are conformed.
|
||||||
|
- May integrate multiple staging sources (e.g. `dim_cities` joins venues + Eurostat + income).
|
||||||
|
- Use `QUALIFY ROW_NUMBER()` to ensure exactly one row per grain.
|
||||||
|
- Surrogate keys (if needed): `MD5(business_key)` for stable joins.
|
||||||
|
- **Facts (`fact_*`)**: one row per **event or measurement**. Always have a time key.
|
||||||
|
- `fct_availability_slot`: grain `(snapshot_date, tenant_id, resource_id, slot_start_time)`
|
||||||
|
- `fct_daily_availability`: grain `(snapshot_date, tenant_id)` — aggregates fct_availability_slot
|
||||||
|
- Facts reference conformed dimensions by their natural key (tenant_id, city_slug, etc.)
|
||||||
|
- **Dimension attributes** with no time key must be `dim_*`, not `fct_*`.
|
||||||
|
- e.g. `dim_venue_capacity` — static venue capacity attributes, grain `tenant_id`
|
||||||
|
|
||||||
|
### serving/ — pre-aggregated, web app ready
|
||||||
|
|
||||||
|
- Read by the web app via `analytics.duckdb` (exported by `export_serving.py`)
|
||||||
|
- One model per query pattern / page type
|
||||||
|
- Column names match what the frontend/template expects — no renaming at query time
|
||||||
|
- Joins across foundation models to produce wide denormalized rows
|
||||||
|
- Only tables with `serving.*` names are exported to `analytics.duckdb`
|
||||||
|
|
||||||
|
## Grain declarations
|
||||||
|
|
||||||
|
Every model must declare its grain in the `MODEL(...)` block:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
MODEL (
|
||||||
|
name foundation.fct_availability_slot,
|
||||||
|
kind FULL,
|
||||||
|
grain (snapshot_date, tenant_id, resource_id, slot_start_time)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
If a model's grain is a single column, use `grain column_name` (no parens).
|
||||||
|
Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
|
||||||
|
|
||||||
|
## Conformed dimensions in this project
|
||||||
|
|
||||||
|
| Dimension | Grain | Used by |
|
||||||
|
|-----------|-------|---------|
|
||||||
|
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
|
||||||
|
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
|
||||||
|
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
|
||||||
|
|
||||||
|
## Source integration map
|
||||||
|
|
||||||
|
```
|
||||||
|
stg_playtomic_venues ─┐
|
||||||
|
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
|
||||||
|
stg_padel_courts ─┘ └→ dim_venue_capacity
|
||||||
|
↓
|
||||||
|
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
|
||||||
|
↓
|
||||||
|
venue_pricing_benchmarks
|
||||||
|
↓
|
||||||
|
stg_population ──→ dim_cities ─────────────────────────────┘
|
||||||
|
stg_income ──→ dim_cities
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common pitfalls
|
||||||
|
|
||||||
|
- **Don't add business logic to staging.** Even a CASE statement renaming values = business
|
||||||
|
logic → move it to foundation.
|
||||||
|
- **Don't aggregate in foundation facts.** `fct_availability_slot` is event-grain. The daily
|
||||||
|
rollup lives in `fct_daily_availability`. If you need a different aggregation, add a new
|
||||||
|
serving model — don't collapse the fact further.
|
||||||
|
- **dim_cities population is approximate.** Eurostat uses city codes (DE001C) not names.
|
||||||
|
Population enrichment succeeds for ~10% of cities. `market_score` degrades gracefully
|
||||||
|
(population component = 0) for unmatched cities. To improve: add a Eurostat city-code→name
|
||||||
|
lookup extract.
|
||||||
|
- **DuckDB lowercases column names at rest.** camelCase columns like `"ratePeak"` are stored
|
||||||
|
as `ratepeak`. The content engine uses a case-insensitive reverse map to match DEFAULTS keys.
|
||||||
|
- **Never point DUCKDB_PATH and SERVING_DUCKDB_PATH to the same file.** SQLMesh holds an
|
||||||
|
exclusive write lock during plan/run; the web app needs concurrent read access.
|
||||||
|
|
||||||
|
## Running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Preview changes (no writes)
|
||||||
|
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
|
||||||
|
|
||||||
|
# Apply to dev environment
|
||||||
|
uv run sqlmesh -p transform/sqlmesh_padelnomics plan --auto-apply
|
||||||
|
|
||||||
|
# Apply to prod virtual layer
|
||||||
|
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod --auto-apply
|
||||||
|
|
||||||
|
# Export serving tables to analytics.duckdb
|
||||||
|
DUCKDB_PATH=$(pwd)/data/lakehouse.duckdb \
|
||||||
|
SERVING_DUCKDB_PATH=$(pwd)/analytics.duckdb \
|
||||||
|
uv run python -m padelnomics.export_serving
|
||||||
|
```
|
||||||
@@ -36,8 +36,11 @@ staging/ ← reads landing files directly, type casting, dedu
|
|||||||
└── staging.stg_population
|
└── staging.stg_population
|
||||||
|
|
||||||
foundation/ ← business logic, dimensions, facts
|
foundation/ ← business logic, dimensions, facts
|
||||||
├── foundation.dim_venues
|
├── foundation.dim_venues ← conformed venue dimension (Playtomic + OSM)
|
||||||
└── foundation.dim_cities
|
├── foundation.dim_cities ← conformed city dimension (venue-derived + Eurostat)
|
||||||
|
├── foundation.dim_venue_capacity ← static capacity attributes per venue
|
||||||
|
├── foundation.fct_availability_slot ← event-grain: one row per deduplicated slot
|
||||||
|
└── foundation.fct_daily_availability← venue-day aggregate: occupancy + revenue estimates
|
||||||
|
|
||||||
serving/ ← pre-aggregated for web app
|
serving/ ← pre-aggregated for web app
|
||||||
├── serving.city_market_profile
|
├── serving.city_market_profile
|
||||||
|
|||||||
@@ -1,62 +1,54 @@
|
|||||||
-- City dimension: canonical city records with population and venue count.
|
-- City dimension: canonical city records with venue count and country metadata.
|
||||||
-- Built from Eurostat Urban Audit codes joined to venue locations.
|
-- Built from venue locations (dim_venues) as the primary source — padelnomics
|
||||||
-- Cities without Eurostat coverage (US, non-EU) are derived from venue clusters.
|
-- tracks cities where padel venues actually exist, not an administrative city list.
|
||||||
|
--
|
||||||
|
-- Conformed dimension: used by city_market_profile and all pSEO serving models.
|
||||||
|
-- Integrates two sources:
|
||||||
|
-- dim_venues → city list, venue count, coordinates (Playtomic + OSM)
|
||||||
|
-- stg_income → country-level median income (Eurostat)
|
||||||
|
--
|
||||||
|
-- Population note: Eurostat uses coded identifiers (e.g. DE001C = Berlin) with no
|
||||||
|
-- city name column in the dataset we extract. City-level population requires a
|
||||||
|
-- separate code→name lookup extract (future improvement). Population is set to 0
|
||||||
|
-- until that source is available; market_score degrades gracefully.
|
||||||
|
--
|
||||||
|
-- Grain: (country_code, city_slug) — two cities in different countries can share a
|
||||||
|
-- city name. QUALIFY enforces no duplicate (country_code, city_slug) pairs.
|
||||||
|
|
||||||
MODEL (
|
MODEL (
|
||||||
name foundation.dim_cities,
|
name foundation.dim_cities,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain city_code
|
grain (country_code, city_slug)
|
||||||
);
|
);
|
||||||
|
|
||||||
WITH -- Eurostat cities: latest population per city code
|
WITH
|
||||||
eurostat_cities AS (
|
-- Primary: distinct cities from dim_venues (canonical padel city list)
|
||||||
SELECT
|
venue_cities AS (
|
||||||
city_code,
|
|
||||||
country_code,
|
|
||||||
population,
|
|
||||||
ref_year,
|
|
||||||
LOWER(REPLACE(city_code, country_code, '')) AS city_slug_raw
|
|
||||||
FROM staging.stg_population
|
|
||||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY city_code ORDER BY ref_year DESC) = 1
|
|
||||||
),
|
|
||||||
-- Venue counts per (country_code, city) from dim_venues
|
|
||||||
venue_counts AS (
|
|
||||||
SELECT
|
SELECT
|
||||||
country_code,
|
country_code,
|
||||||
city,
|
city AS city_name,
|
||||||
COUNT(*) AS venue_count,
|
-- Lowercase before regex so uppercase letters aren't stripped to '-'
|
||||||
AVG(lat) AS centroid_lat,
|
LOWER(REGEXP_REPLACE(LOWER(city), '[^a-z0-9]+', '-')) AS city_slug,
|
||||||
AVG(lon) AS centroid_lon
|
COUNT(*) AS padel_venue_count,
|
||||||
|
AVG(lat) AS centroid_lat,
|
||||||
|
AVG(lon) AS centroid_lon
|
||||||
FROM foundation.dim_venues
|
FROM foundation.dim_venues
|
||||||
WHERE city IS NOT NULL AND city != ''
|
WHERE city IS NOT NULL AND LENGTH(city) > 0
|
||||||
GROUP BY country_code, city
|
GROUP BY country_code, city
|
||||||
),
|
),
|
||||||
-- Eurostat city label mapping to canonical city names
|
-- Latest country income per country
|
||||||
-- (Eurostat uses codes like DE001C → Berlin; we keep both)
|
|
||||||
eurostat_labels AS (
|
|
||||||
SELECT DISTINCT
|
|
||||||
city_code,
|
|
||||||
country_code,
|
|
||||||
-- Derive a slug-friendly city name from the code as fallback
|
|
||||||
LOWER(REPLACE(city_code, country_code, '')) AS city_slug_raw
|
|
||||||
FROM eurostat_cities
|
|
||||||
),
|
|
||||||
-- Country-level median income (latest year per country)
|
|
||||||
country_income AS (
|
country_income AS (
|
||||||
SELECT country_code, median_income_pps, ref_year AS income_year
|
SELECT country_code, median_income_pps, ref_year AS income_year
|
||||||
FROM staging.stg_income
|
FROM staging.stg_income
|
||||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY country_code ORDER BY ref_year DESC) = 1
|
QUALIFY ROW_NUMBER() OVER (PARTITION BY country_code ORDER BY ref_year DESC) = 1
|
||||||
)
|
)
|
||||||
SELECT
|
SELECT
|
||||||
ec.city_code,
|
vc.country_code,
|
||||||
ec.country_code,
|
vc.city_slug,
|
||||||
COALESCE(vc.city, ec.city_code) AS city_name,
|
vc.city_name,
|
||||||
LOWER(REGEXP_REPLACE(
|
|
||||||
COALESCE(vc.city, ec.city_slug_raw), '[^a-z0-9]+', '-'
|
|
||||||
)) AS city_slug,
|
|
||||||
-- Human-readable country name for pSEO templates and internal linking
|
-- Human-readable country name for pSEO templates and internal linking
|
||||||
CASE ec.country_code
|
CASE vc.country_code
|
||||||
WHEN 'DE' THEN 'Germany'
|
WHEN 'DE' THEN 'Germany'
|
||||||
WHEN 'ES' THEN 'Spain'
|
WHEN 'ES' THEN 'Spain'
|
||||||
WHEN 'GB' THEN 'United Kingdom'
|
WHEN 'GB' THEN 'United Kingdom'
|
||||||
@@ -77,11 +69,11 @@ SELECT
|
|||||||
WHEN 'AE' THEN 'UAE'
|
WHEN 'AE' THEN 'UAE'
|
||||||
WHEN 'AU' THEN 'Australia'
|
WHEN 'AU' THEN 'Australia'
|
||||||
WHEN 'IE' THEN 'Ireland'
|
WHEN 'IE' THEN 'Ireland'
|
||||||
ELSE ec.country_code
|
ELSE vc.country_code
|
||||||
END AS country_name_en,
|
END AS country_name_en,
|
||||||
-- URL-safe country slug derived from country_name_en
|
-- URL-safe country slug
|
||||||
LOWER(REGEXP_REPLACE(
|
LOWER(REGEXP_REPLACE(
|
||||||
CASE ec.country_code
|
CASE vc.country_code
|
||||||
WHEN 'DE' THEN 'Germany'
|
WHEN 'DE' THEN 'Germany'
|
||||||
WHEN 'ES' THEN 'Spain'
|
WHEN 'ES' THEN 'Spain'
|
||||||
WHEN 'GB' THEN 'United Kingdom'
|
WHEN 'GB' THEN 'United Kingdom'
|
||||||
@@ -102,19 +94,23 @@ SELECT
|
|||||||
WHEN 'AE' THEN 'UAE'
|
WHEN 'AE' THEN 'UAE'
|
||||||
WHEN 'AU' THEN 'Australia'
|
WHEN 'AU' THEN 'Australia'
|
||||||
WHEN 'IE' THEN 'Ireland'
|
WHEN 'IE' THEN 'Ireland'
|
||||||
ELSE ec.country_code
|
ELSE vc.country_code
|
||||||
END, '[^a-zA-Z0-9]+', '-'
|
END, '[^a-zA-Z0-9]+', '-'
|
||||||
)) AS country_slug,
|
)) AS country_slug,
|
||||||
COALESCE(vc.centroid_lat, 0::DOUBLE) AS lat,
|
vc.centroid_lat AS lat,
|
||||||
COALESCE(vc.centroid_lon, 0::DOUBLE) AS lon,
|
vc.centroid_lon AS lon,
|
||||||
ec.population,
|
-- Population: requires code→name Eurostat lookup (not yet extracted); defaults to 0.
|
||||||
ec.ref_year AS population_year,
|
-- market_score uses LOG(GREATEST(population, 1)) so 0 degrades score gracefully.
|
||||||
COALESCE(vc.venue_count, 0) AS padel_venue_count,
|
0::BIGINT AS population,
|
||||||
|
0::INTEGER AS population_year,
|
||||||
|
vc.padel_venue_count,
|
||||||
ci.median_income_pps,
|
ci.median_income_pps,
|
||||||
ci.income_year
|
ci.income_year
|
||||||
FROM eurostat_cities ec
|
FROM venue_cities vc
|
||||||
LEFT JOIN venue_counts vc
|
LEFT JOIN country_income ci ON vc.country_code = ci.country_code
|
||||||
ON ec.country_code = vc.country_code
|
-- Enforce grain: if two cities in the same country have the same slug
|
||||||
AND LOWER(TRIM(vc.city)) LIKE '%' || LOWER(LEFT(ec.city_code, 2)) || '%'
|
-- (e.g. 'São Paulo' and 'Sao Paulo'), keep the one with more venues
|
||||||
LEFT JOIN country_income ci
|
QUALIFY ROW_NUMBER() OVER (
|
||||||
ON ec.country_code = ci.country_code
|
PARTITION BY vc.country_code, vc.city_slug
|
||||||
|
ORDER BY vc.padel_venue_count DESC NULLS LAST
|
||||||
|
) = 1
|
||||||
|
|||||||
@@ -3,9 +3,11 @@
|
|||||||
-- Used as the denominator for occupancy rate in fct_daily_availability.
|
-- Used as the denominator for occupancy rate in fct_daily_availability.
|
||||||
--
|
--
|
||||||
-- One row per venue (Playtomic tenant).
|
-- One row per venue (Playtomic tenant).
|
||||||
|
-- Named dim_* because these are static venue attributes with no time key,
|
||||||
|
-- not events or measurements.
|
||||||
|
|
||||||
MODEL (
|
MODEL (
|
||||||
name foundation.fct_venue_capacity,
|
name foundation.dim_venue_capacity,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain tenant_id
|
grain tenant_id
|
||||||
@@ -0,0 +1,56 @@
|
|||||||
|
-- Slot-level availability fact: one row per deduplicated available slot.
|
||||||
|
-- Event grain: (snapshot_date, tenant_id, resource_id, slot_start_time).
|
||||||
|
--
|
||||||
|
-- "Available" means the slot was NOT booked at capture time.
|
||||||
|
-- Recheck-aware: for each (date, tenant, resource, start_time), prefer the
|
||||||
|
-- latest recheck snapshot over the morning snapshot. If a slot was present
|
||||||
|
-- in the morning but absent in the recheck, that means it was booked between
|
||||||
|
-- snapshots — and it will simply not appear in this model (correct behaviour:
|
||||||
|
-- unavailable slots are not in the available-slots fact).
|
||||||
|
--
|
||||||
|
-- is_peak: convenience flag for 17:00–21:00 slots (main evening rush).
|
||||||
|
-- Downstream models (fct_daily_availability) use this to avoid re-computing
|
||||||
|
-- the peak window condition on every aggregation.
|
||||||
|
|
||||||
|
MODEL (
|
||||||
|
name foundation.fct_availability_slot,
|
||||||
|
kind FULL,
|
||||||
|
cron '@daily',
|
||||||
|
grain (snapshot_date, tenant_id, resource_id, slot_start_time)
|
||||||
|
);
|
||||||
|
|
||||||
|
WITH deduped AS (
|
||||||
|
SELECT
|
||||||
|
snapshot_date,
|
||||||
|
tenant_id,
|
||||||
|
resource_id,
|
||||||
|
slot_start_time,
|
||||||
|
price_amount,
|
||||||
|
price_currency,
|
||||||
|
snapshot_type,
|
||||||
|
captured_at_utc,
|
||||||
|
-- Prefer recheck over morning; within same snapshot_type prefer latest capture
|
||||||
|
ROW_NUMBER() OVER (
|
||||||
|
PARTITION BY snapshot_date, tenant_id, resource_id, slot_start_time
|
||||||
|
ORDER BY
|
||||||
|
CASE WHEN snapshot_type = 'recheck' THEN 1 ELSE 2 END,
|
||||||
|
captured_at_utc DESC
|
||||||
|
) AS rn
|
||||||
|
FROM staging.stg_playtomic_availability
|
||||||
|
WHERE price_amount IS NOT NULL
|
||||||
|
AND price_amount > 0
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
snapshot_date,
|
||||||
|
tenant_id,
|
||||||
|
resource_id,
|
||||||
|
slot_start_time,
|
||||||
|
price_amount,
|
||||||
|
price_currency,
|
||||||
|
snapshot_type,
|
||||||
|
captured_at_utc,
|
||||||
|
( slot_start_time::TIME >= '17:00:00'
|
||||||
|
AND slot_start_time::TIME < '21:00:00'
|
||||||
|
) AS is_peak
|
||||||
|
FROM deduped
|
||||||
|
WHERE rn = 1
|
||||||
@@ -1,11 +1,9 @@
|
|||||||
-- Daily venue-level availability, pricing, occupancy, and revenue estimates.
|
-- Daily venue-level availability, pricing, occupancy, and revenue estimates.
|
||||||
-- Aggregates slot-level data from stg_playtomic_availability into per-venue
|
-- Aggregates fct_availability_slot (event-grain fact) into per-venue per-day
|
||||||
-- per-day statistics, then calculates occupancy by comparing available hours
|
-- statistics, then calculates occupancy against capacity from dim_venue_capacity.
|
||||||
-- against total capacity from fct_venue_capacity.
|
|
||||||
--
|
--
|
||||||
-- Recheck-aware occupancy: for each (tenant, resource, slot_start_time),
|
-- Recheck-aware deduplication lives in fct_availability_slot — this model only
|
||||||
-- prefer the latest snapshot (recheck > morning). A slot present in morning
|
-- reads the already-deduplicated best-snapshot slots.
|
||||||
-- but absent in the recheck = booked between snapshots → more accurate occupancy.
|
|
||||||
--
|
--
|
||||||
-- Occupancy = 1 - (available_court_hours / capacity_court_hours_per_day)
|
-- Occupancy = 1 - (available_court_hours / capacity_court_hours_per_day)
|
||||||
-- Revenue estimate = booked_court_hours × avg_price_of_available_slots
|
-- Revenue estimate = booked_court_hours × avg_price_of_available_slots
|
||||||
@@ -19,26 +17,7 @@ MODEL (
|
|||||||
grain (snapshot_date, tenant_id)
|
grain (snapshot_date, tenant_id)
|
||||||
);
|
);
|
||||||
|
|
||||||
-- Prefer the latest snapshot for each slot:
|
WITH slot_agg AS (
|
||||||
-- If a recheck exists for a (date, tenant, resource, start_time), use it.
|
|
||||||
-- Otherwise fall back to the morning snapshot.
|
|
||||||
WITH ranked_slots AS (
|
|
||||||
SELECT
|
|
||||||
a.*,
|
|
||||||
ROW_NUMBER() OVER (
|
|
||||||
PARTITION BY a.snapshot_date, a.tenant_id, a.resource_id, a.slot_start_time
|
|
||||||
ORDER BY
|
|
||||||
CASE WHEN a.snapshot_type = 'recheck' THEN 1 ELSE 2 END,
|
|
||||||
a.captured_at_utc DESC
|
|
||||||
) AS rn
|
|
||||||
FROM staging.stg_playtomic_availability a
|
|
||||||
WHERE a.price_amount IS NOT NULL
|
|
||||||
AND a.price_amount > 0
|
|
||||||
),
|
|
||||||
latest_slots AS (
|
|
||||||
SELECT * FROM ranked_slots WHERE rn = 1
|
|
||||||
),
|
|
||||||
slot_agg AS (
|
|
||||||
SELECT
|
SELECT
|
||||||
a.snapshot_date,
|
a.snapshot_date,
|
||||||
a.tenant_id,
|
a.tenant_id,
|
||||||
@@ -51,19 +30,13 @@ slot_agg AS (
|
|||||||
ROUND(AVG(a.price_amount), 2) AS avg_price,
|
ROUND(AVG(a.price_amount), 2) AS avg_price,
|
||||||
MIN(a.price_amount) AS min_price,
|
MIN(a.price_amount) AS min_price,
|
||||||
MAX(a.price_amount) AS max_price,
|
MAX(a.price_amount) AS max_price,
|
||||||
-- Peak: 17:00–21:00
|
-- Peak: 17:00–21:00 (is_peak flag computed once in fct_availability_slot)
|
||||||
ROUND(MEDIAN(a.price_amount) FILTER (
|
ROUND(MEDIAN(a.price_amount) FILTER (WHERE a.is_peak), 2) AS median_price_peak,
|
||||||
WHERE a.slot_start_time::TIME >= '17:00:00'
|
|
||||||
AND a.slot_start_time::TIME < '21:00:00'
|
|
||||||
), 2) AS median_price_peak,
|
|
||||||
-- Off-peak: everything outside 17:00–21:00
|
-- Off-peak: everything outside 17:00–21:00
|
||||||
ROUND(MEDIAN(a.price_amount) FILTER (
|
ROUND(MEDIAN(a.price_amount) FILTER (WHERE NOT a.is_peak), 2) AS median_price_offpeak,
|
||||||
WHERE a.slot_start_time::TIME < '17:00:00'
|
|
||||||
OR a.slot_start_time::TIME >= '21:00:00'
|
|
||||||
), 2) AS median_price_offpeak,
|
|
||||||
MAX(a.price_currency) AS price_currency,
|
MAX(a.price_currency) AS price_currency,
|
||||||
MAX(a.captured_at_utc) AS captured_at_utc
|
MAX(a.captured_at_utc) AS captured_at_utc
|
||||||
FROM latest_slots a
|
FROM foundation.fct_availability_slot a
|
||||||
GROUP BY a.snapshot_date, a.tenant_id
|
GROUP BY a.snapshot_date, a.tenant_id
|
||||||
)
|
)
|
||||||
SELECT
|
SELECT
|
||||||
@@ -106,4 +79,4 @@ SELECT
|
|||||||
sa.price_currency,
|
sa.price_currency,
|
||||||
sa.captured_at_utc
|
sa.captured_at_utc
|
||||||
FROM slot_agg sa
|
FROM slot_agg sa
|
||||||
JOIN foundation.fct_venue_capacity cap ON sa.tenant_id = cap.tenant_id
|
JOIN foundation.dim_venue_capacity cap ON sa.tenant_id = cap.tenant_id
|
||||||
|
|||||||
@@ -10,12 +10,11 @@ MODEL (
|
|||||||
name serving.city_market_profile,
|
name serving.city_market_profile,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain city_slug
|
grain (country_code, city_slug)
|
||||||
);
|
);
|
||||||
|
|
||||||
WITH base AS (
|
WITH base AS (
|
||||||
SELECT
|
SELECT
|
||||||
c.city_code,
|
|
||||||
c.country_code,
|
c.country_code,
|
||||||
c.country_name_en,
|
c.country_name_en,
|
||||||
c.country_slug,
|
c.country_slug,
|
||||||
@@ -55,7 +54,6 @@ scored AS (
|
|||||||
FROM base
|
FROM base
|
||||||
)
|
)
|
||||||
SELECT
|
SELECT
|
||||||
s.city_code,
|
|
||||||
s.country_code,
|
s.country_code,
|
||||||
s.country_name_en,
|
s.country_name_en,
|
||||||
s.country_slug,
|
s.country_slug,
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ MODEL (
|
|||||||
name serving.planner_defaults,
|
name serving.planner_defaults,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain city_slug
|
grain (country_code, city_slug)
|
||||||
);
|
);
|
||||||
|
|
||||||
WITH -- Real city-level benchmarks from Playtomic
|
WITH -- Real city-level benchmarks from Playtomic
|
||||||
|
|||||||
@@ -9,10 +9,13 @@ MODEL (
|
|||||||
name serving.pseo_city_costs_de,
|
name serving.pseo_city_costs_de,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain city_slug
|
grain city_key
|
||||||
);
|
);
|
||||||
|
|
||||||
SELECT
|
SELECT
|
||||||
|
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
|
||||||
|
-- (city_slug alone is not unique — 'valencia' exists in ES, VE, etc.)
|
||||||
|
c.country_slug || '-' || c.city_slug AS city_key,
|
||||||
-- City identity
|
-- City identity
|
||||||
c.city_slug,
|
c.city_slug,
|
||||||
c.city_name,
|
c.city_name,
|
||||||
@@ -43,10 +46,8 @@ SELECT
|
|||||||
CURRENT_DATE AS refreshed_date
|
CURRENT_DATE AS refreshed_date
|
||||||
FROM serving.city_market_profile c
|
FROM serving.city_market_profile c
|
||||||
LEFT JOIN serving.planner_defaults p
|
LEFT JOIN serving.planner_defaults p
|
||||||
ON c.city_slug = p.city_slug
|
ON c.country_code = p.country_code
|
||||||
|
AND c.city_slug = p.city_slug
|
||||||
-- Only cities with actual padel presence and at least some rate data
|
-- Only cities with actual padel presence and at least some rate data
|
||||||
WHERE c.padel_venue_count > 0
|
WHERE c.padel_venue_count > 0
|
||||||
AND (p.rate_peak IS NOT NULL OR c.median_peak_rate IS NOT NULL)
|
AND (p.rate_peak IS NOT NULL OR c.median_peak_rate IS NOT NULL)
|
||||||
-- dim_cities has a loose LIKE join that produces duplicates per city_slug;
|
|
||||||
-- take the row with the highest market_score to get canonical city data.
|
|
||||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.city_slug ORDER BY c.market_score DESC NULLS LAST) = 1
|
|
||||||
|
|||||||
@@ -10,10 +10,12 @@ MODEL (
|
|||||||
name serving.pseo_city_pricing,
|
name serving.pseo_city_pricing,
|
||||||
kind FULL,
|
kind FULL,
|
||||||
cron '@daily',
|
cron '@daily',
|
||||||
grain city_slug
|
grain city_key
|
||||||
);
|
);
|
||||||
|
|
||||||
SELECT
|
SELECT
|
||||||
|
-- Composite natural key: country_slug + city_slug ensures uniqueness across countries
|
||||||
|
c.country_slug || '-' || c.city_slug AS city_key,
|
||||||
-- City identity (from city_market_profile, which has the canonical city_slug)
|
-- City identity (from city_market_profile, which has the canonical city_slug)
|
||||||
c.city_slug,
|
c.city_slug,
|
||||||
c.city_name,
|
c.city_name,
|
||||||
@@ -42,6 +44,3 @@ INNER JOIN serving.city_market_profile c
|
|||||||
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
|
AND LOWER(TRIM(vpb.city)) = LOWER(TRIM(c.city_name))
|
||||||
-- Only cities with enough venues for meaningful pricing statistics
|
-- Only cities with enough venues for meaningful pricing statistics
|
||||||
WHERE vpb.venue_count >= 2
|
WHERE vpb.venue_count >= 2
|
||||||
-- city_market_profile inherits duplicates from dim_cities' loose LIKE join;
|
|
||||||
-- take the highest market_score row as the canonical city record.
|
|
||||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.city_slug ORDER BY c.market_score DESC NULLS LAST) = 1
|
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ name: "DE City Padel Costs"
|
|||||||
slug: city-cost-de
|
slug: city-cost-de
|
||||||
content_type: calculator
|
content_type: calculator
|
||||||
data_table: serving.pseo_city_costs_de
|
data_table: serving.pseo_city_costs_de
|
||||||
natural_key: city_slug
|
natural_key: city_key
|
||||||
languages: [de, en]
|
languages: [de, en]
|
||||||
url_pattern: "/markets/{{ country_slug }}/{{ city_slug }}"
|
url_pattern: "/markets/{{ country_slug }}/{{ city_slug }}"
|
||||||
title_pattern: "Padel in {{ city_name }} — Investment Costs & Market Analysis {{ 'now' | datetimeformat('%Y') }}"
|
title_pattern: "Padel in {{ city_name }} — Investment Costs & Market Analysis {{ 'now' | datetimeformat('%Y') }}"
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ name: "City Padel Court Prices"
|
|||||||
slug: city-pricing
|
slug: city-pricing
|
||||||
content_type: editorial
|
content_type: editorial
|
||||||
data_table: serving.pseo_city_pricing
|
data_table: serving.pseo_city_pricing
|
||||||
natural_key: city_slug
|
natural_key: city_key
|
||||||
languages: [en, de]
|
languages: [en, de]
|
||||||
url_pattern: "/markets/{{ country_slug }}/{{ city_slug }}/court-prices"
|
url_pattern: "/markets/{{ country_slug }}/{{ city_slug }}/court-prices"
|
||||||
title_pattern: "Padel Court Prices in {{ city_name }} — {{ 'now' | datetimeformat('%Y') }} Rates"
|
title_pattern: "Padel Court Prices in {{ city_name }} — {{ 'now' | datetimeformat('%Y') }} Rates"
|
||||||
|
|||||||
Reference in New Issue
Block a user