From 10266c3a24d24d9689d0ad581af538d030782891 Mon Sep 17 00:00:00 2001 From: Deeman Date: Fri, 27 Feb 2026 06:57:57 +0100 Subject: [PATCH] =?UTF-8?q?fix(sql):=20opportunity=5Fscore=20=E2=80=94=20s?= =?UTF-8?q?upply=20gap=20ceiling=204=E2=86=928/100k=20+=20doc=20findings?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Raises supply gap ceiling from 4/100k to 8/100k in location_opportunity_profile.sql. The original 4/100k hard cliff truncated opportunity scores to 0 for any city with ≥4 courts/100k, but our data undercounts ~87% of real courts (FIP: 17,300 Spanish courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient and fairer partial credit when density data is incomplete. Documents existing formula behaviour discovered during analysis: - Income PPS: country-level constants (18k-37k range) saturate the /200 ceiling — all EU countries get flat 20/20 pts until city-level income data lands. - Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so NULL nearest_padel_court_km already yields full 15 pts. COALESCE fallback is dead code but harmless. - Tennis courts within 25km: dim_locations data is empty (all 0 rows) — 10-court threshold is correct for when data arrives, contributes 0 pts everywhere for now. Effective score impact: minimal (99% of locations have 0 courts/100k, so supply gap was already at max). Only ~1,050 dense-court cities see a score increase (from 0 gap pts to partial gap pts). Co-Authored-By: Claude Sonnet 4.6 --- .../serving/location_opportunity_profile.sql | 26 ++++++++++++++----- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql b/transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql index e848db0..1258c30 100644 --- a/transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql +++ b/transform/sqlmesh_padelnomics/models/serving/location_opportunity_profile.sql @@ -1,7 +1,7 @@ -- Per-location padel investment opportunity intelligence. -- Consumed by: Gemeinde-level pSEO pages, opportunity map, "top markets" lists. -- --- Padelnomics Marktpotenzial-Score (0–100): +-- Padelnomics Marktpotenzial-Score v2 (0–100): -- Answers "Where should I build a padel court?" -- Covers ALL GeoNames locations (pop ≥ 1K) — NOT filtered to existing padel markets. -- Zero-court locations score highest on supply gap component (white space = opportunity). @@ -9,9 +9,21 @@ -- 25 pts addressable market — log-scaled population, ceiling 500K -- (opportunity peaks in mid-size cities; megacities already served) -- 20 pts economic power — country income PPS, normalised to 200 --- 30 pts supply gap — INVERTED venue density; 0 courts/100K = full marks --- 15 pts catchment gap — distance to nearest padel court (>30km = full marks) --- 10 pts sports culture — tennis courts within 25km (≥10 = full marks) +-- NOTE: PPS values are country-level constants in the range +-- 18k-37k — ALL EU countries saturate this component (20/20). +-- Component is a flat uplift per country until city-level +-- income data becomes available. +-- 30 pts supply gap — INVERTED venue density; 0 courts/100K = full marks. +-- Ceiling raised to 8/100K (was 4) for a gentler gradient +-- and to account for ~87% data undercount vs FIP totals. +-- Linear: GREATEST(0, 1 - density/8) +-- 15 pts catchment gap — distance to nearest padel court. +-- DuckDB LEAST ignores NULLs: LEAST(1.0, NULL/30) = 1.0, +-- so NULL nearest_km = full marks (no court in bounding box +-- = high opportunity). COALESCE fallback is dead code. +-- 10 pts sports culture — tennis courts within 25km (≥10 = full marks). +-- NOTE: dim_locations tennis data is empty (all 0 rows). +-- Component contributes 0 pts everywhere until data lands. MODEL ( name serving.location_opportunity_profile, @@ -50,9 +62,11 @@ SELECT + 20.0 * LEAST(1.0, COALESCE(l.median_income_pps, 100) / 200.0) -- Supply gap (30 pts): INVERTED venue density. - -- 0 courts/100K = full 30 pts (white space); ≥4/100K = 0 pts (served market). + -- 0 courts/100K = full 30 pts (white space); ≥8/100K = 0 pts (served market). + -- Ceiling raised from 4→8/100K for a gentler gradient and to account for data + -- undercount (~87% of real courts not in our data). -- This is the key signal that separates Marktpotenzial from Marktreife. - + 30.0 * GREATEST(0.0, 1.0 - COALESCE(l.padel_venues_per_100k, 0) / 4.0) + + 30.0 * GREATEST(0.0, 1.0 - COALESCE(l.padel_venues_per_100k, 0) / 8.0) -- Catchment gap (15 pts): distance to nearest existing padel court. -- >30km = full 15 pts (underserved catchment area).