Merge branch 'worktree-extraction-overhaul'

# Conflicts:
#	transform/sqlmesh_padelnomics/models/foundation/dim_cities.sql
#	transform/sqlmesh_padelnomics/models/staging/stg_playtomic_venues.sql
This commit is contained in:
Deeman
2026-02-23 01:01:26 +01:00
24 changed files with 1326 additions and 322 deletions

View File

@@ -6,6 +6,53 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- **Playtomic full data extraction** — expanded venue bounding boxes from 4 regions
(ES, UK, DE, FR) to 23 globally (Italy, Portugal, NL, BE, AT, CH, Nordics, Mexico,
Argentina, Middle East, USA); PAGE_SIZE increased from 20 to 100; availability
extractor throttle reduced from 2s to 1s for ~4.5h runtime at 16K venues
- **Playtomic pricing & occupancy pipeline** — 4 new staging models:
`stg_playtomic_resources` (per-court: indoor/outdoor, surface type, size),
`stg_playtomic_opening_hours` (per-day: open/close times, hours_open),
`stg_playtomic_availability` (per-slot: 60-min bookable windows with real prices);
`stg_playtomic_venues` rewritten to extract all metadata (opening_hours, resources,
VAT rate, currency, timezone, booking settings)
- **Venue capacity & daily availability fact tables** — `fct_venue_capacity` derives
total bookable court-hours from court_count × opening_hours; `fct_daily_availability`
calculates occupancy rate (1 - available/capacity), booked hours, revenue estimate,
and pricing stats (median/peak/offpeak) per venue per day
- **Venue pricing benchmarks** — `venue_pricing_benchmarks.sql` aggregates last-30-day
venue metrics to city/country level: median hourly rate, peak/offpeak rates, P25/P75,
occupancy rate, estimated daily revenue, court count
- **Real data planner defaults** — `planner_defaults.sql` rewritten with 3-tier cascade:
city-level Playtomic data → country median → hardcoded fallback; replaces income-factor
estimation with actual market pricing; includes `data_source` and `data_confidence`
provenance columns
- **Eurostat income integration** (`stg_income.sql`) — staging model reads `ilc_di03`
(median equivalised net income in PPS) from landing zone; grain `(country_code, ref_year)`
- **Income columns in dim_cities and city_market_profile** — `median_income_pps`
and `income_year` passed through from staging to serving layer
- **Transactional email i18n** — all 8 email types now translated via locale
files; `_t()` helper in `worker.py` looks up `email_*` keys from `en.json` /
`de.json`; `_email_wrap()` accepts `lang` parameter for `<html lang>` tag and
translated footer; ~70 new translation keys (EN + DE); all task payloads now
carry `lang` from request context at enqueue time; payloads without `lang`
gracefully default to English
### Changed
- **Resend audiences restructured** — replaced dynamic `waitlist-{blueprint}`
audience naming (up to 4 audiences) with 3 named audiences fitting free plan
limit: `suppliers` (supplier signups), `leads` (planner/quote users),
`newsletter` (auth/content/public catch-all); new `_audience_for_blueprint()`
mapping function in `core.py`
- **dim_venues enhanced** — now includes court_count, indoor/outdoor split,
timezone, VAT rate, and default currency from Playtomic venue metadata
- **city_market_profile enhanced** — includes median hourly rate, occupancy rate,
daily revenue estimate, and price currency from venue pricing benchmarks
- **Planner API route** — col_map updated to match new planner_defaults columns
(`rate_peak`, `rate_off_peak`, `avg_utilisation_pct`, `courts_typical`); adds
`_dataSource` and `_currency` metadata keys
### Changed
- **Extraction: one file per source** — replaced monolithic `execute.py` with per-source
modules (`overpass.py`, `eurostat.py`, `playtomic_tenants.py`, `playtomic_availability.py`);