Files
padelnomics/transform/sqlmesh_padelnomics/models/serving/pseo_country_overview.sql
Deeman 544891611f
All checks were successful
CI / test (push) Successful in 57s
CI / tag (push) Successful in 2s
feat(transform): opportunity score v4 — market validation + population-weighted aggregation
Two targeted fixes for inflated country scores (ES 83, SE 77):

1. pseo_country_overview: replace AVG() with population-weighted averages
   for avg_opportunity_score and avg_market_score. Madrid/Barcelona now
   dominate Spain's average instead of hundreds of 30K-town white-space
   towns. Expected ES drop from ~83 to ~55-65.

2. location_profiles: replace dead sports culture component (10 pts,
   tennis data all zeros) with market validation signal.
   Split scored CTE into: market_scored → country_market → scored.
   country_market aggregates AVG(market_score) per country from cities
   with padel courts (market_score > 0), so zero-court locations don't
   dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts.
   NULL → 0.5 neutral → 5 pts (untested market, not penalised).

Score budget unchanged: 25+20+30+15+10 = 100 pts.
No new models, no new data sources, no cycles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 17:23:11 +01:00

45 lines
2.6 KiB
SQL

-- pSEO article data: per-country padel market overview.
-- One row per country — consumed by the country-overview.md.jinja template.
-- Aggregates city-level data from pseo_city_costs_de.
--
-- top_city_slugs / top_city_names are ordered lists (up to 5) used to generate
-- internal links from the country hub to its top city pages.
MODEL (
name serving.pseo_country_overview,
kind FULL,
cron '@daily',
grain country_slug
);
SELECT
country_code,
country_name_en,
country_slug,
COUNT(*) AS city_count,
SUM(padel_venue_count) AS total_venues,
-- Population-weighted: large cities (Madrid, Barcelona) dominate, not hundreds of small towns
ROUND(SUM(market_score * population) / NULLIF(SUM(population), 0), 1) AS avg_market_score,
MAX(market_score) AS top_city_market_score,
-- Top 5 cities by venue count (prominence), then score for internal linking
LIST(city_slug ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_slugs,
LIST(city_name ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_names,
-- Opportunity score aggregates (population-weighted: saturated megacities dominate, not hundreds of small towns)
ROUND(SUM(opportunity_score * population) / NULLIF(SUM(population), 0), 1) AS avg_opportunity_score,
MAX(opportunity_score) AS top_opportunity_score,
-- Top 5 opportunity cities by population (prominence), then opportunity score
LIST(city_slug ORDER BY population DESC, opportunity_score DESC NULLS LAST)[1:5] AS top_opportunity_slugs,
LIST(city_name ORDER BY population DESC, opportunity_score DESC NULLS LAST)[1:5] AS top_opportunity_names,
-- Pricing medians across cities (NULL when no Playtomic coverage in country)
ROUND(MEDIAN(median_hourly_rate), 0) AS median_hourly_rate,
ROUND(MEDIAN(median_peak_rate), 0) AS median_peak_rate,
ROUND(MEDIAN(median_offpeak_rate), 0) AS median_offpeak_rate,
-- Use the most common currency in the country (MIN is deterministic for single-currency countries)
MIN(price_currency) AS price_currency,
SUM(population) AS total_population,
CURRENT_DATE AS refreshed_date
FROM serving.pseo_city_costs_de
GROUP BY country_code, country_name_en, country_slug
-- Only countries with enough cities to be worth a hub page
HAVING COUNT(*) >= 2