fix(sql): opportunity_score — supply gap ceiling 4→8/100k + doc findings

Raises supply gap ceiling from 4/100k to 8/100k in
location_opportunity_profile.sql. The original 4/100k hard cliff
truncated opportunity scores to 0 for any city with ≥4 courts/100k,
but our data undercounts ~87% of real courts (FIP: 17,300 Spanish
courts vs 2,239 in our DB). Raising to 8/100k gives a gentler gradient
and fairer partial credit when density data is incomplete.

Documents existing formula behaviour discovered during analysis:
- Income PPS: country-level constants (18k-37k range) saturate the
  /200 ceiling — all EU countries get flat 20/20 pts until city-level
  income data lands.
- Catchment NULL: DuckDB LEAST(1.0, NULL) = 1.0 (ignores nulls), so
  NULL nearest_padel_court_km already yields full 15 pts. COALESCE
  fallback is dead code but harmless.
- Tennis courts within 25km: dim_locations data is empty (all 0 rows)
  — 10-court threshold is correct for when data arrives, contributes
  0 pts everywhere for now.

Effective score impact: minimal (99% of locations have 0 courts/100k,
so supply gap was already at max). Only ~1,050 dense-court cities
see a score increase (from 0 gap pts to partial gap pts).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-27 06:57:57 +01:00
parent 88ed17484b
commit 10266c3a24

View File

@@ -1,7 +1,7 @@
-- Per-location padel investment opportunity intelligence. -- Per-location padel investment opportunity intelligence.
-- Consumed by: Gemeinde-level pSEO pages, opportunity map, "top markets" lists. -- Consumed by: Gemeinde-level pSEO pages, opportunity map, "top markets" lists.
-- --
-- Padelnomics Marktpotenzial-Score (0100): -- Padelnomics Marktpotenzial-Score v2 (0100):
-- Answers "Where should I build a padel court?" -- Answers "Where should I build a padel court?"
-- Covers ALL GeoNames locations (pop ≥ 1K) — NOT filtered to existing padel markets. -- Covers ALL GeoNames locations (pop ≥ 1K) — NOT filtered to existing padel markets.
-- Zero-court locations score highest on supply gap component (white space = opportunity). -- Zero-court locations score highest on supply gap component (white space = opportunity).
@@ -9,9 +9,21 @@
-- 25 pts addressable market — log-scaled population, ceiling 500K -- 25 pts addressable market — log-scaled population, ceiling 500K
-- (opportunity peaks in mid-size cities; megacities already served) -- (opportunity peaks in mid-size cities; megacities already served)
-- 20 pts economic power — country income PPS, normalised to 200 -- 20 pts economic power — country income PPS, normalised to 200
-- 30 pts supply gap — INVERTED venue density; 0 courts/100K = full marks -- NOTE: PPS values are country-level constants in the range
-- 15 pts catchment gap — distance to nearest padel court (>30km = full marks) -- 18k-37k — ALL EU countries saturate this component (20/20).
-- 10 pts sports culture — tennis courts within 25km (≥10 = full marks) -- Component is a flat uplift per country until city-level
-- income data becomes available.
-- 30 pts supply gap — INVERTED venue density; 0 courts/100K = full marks.
-- Ceiling raised to 8/100K (was 4) for a gentler gradient
-- and to account for ~87% data undercount vs FIP totals.
-- Linear: GREATEST(0, 1 - density/8)
-- 15 pts catchment gap — distance to nearest padel court.
-- DuckDB LEAST ignores NULLs: LEAST(1.0, NULL/30) = 1.0,
-- so NULL nearest_km = full marks (no court in bounding box
-- = high opportunity). COALESCE fallback is dead code.
-- 10 pts sports culture — tennis courts within 25km (≥10 = full marks).
-- NOTE: dim_locations tennis data is empty (all 0 rows).
-- Component contributes 0 pts everywhere until data lands.
MODEL ( MODEL (
name serving.location_opportunity_profile, name serving.location_opportunity_profile,
@@ -50,9 +62,11 @@ SELECT
+ 20.0 * LEAST(1.0, COALESCE(l.median_income_pps, 100) / 200.0) + 20.0 * LEAST(1.0, COALESCE(l.median_income_pps, 100) / 200.0)
-- Supply gap (30 pts): INVERTED venue density. -- Supply gap (30 pts): INVERTED venue density.
-- 0 courts/100K = full 30 pts (white space); ≥4/100K = 0 pts (served market). -- 0 courts/100K = full 30 pts (white space); ≥8/100K = 0 pts (served market).
-- Ceiling raised from 4→8/100K for a gentler gradient and to account for data
-- undercount (~87% of real courts not in our data).
-- This is the key signal that separates Marktpotenzial from Marktreife. -- This is the key signal that separates Marktpotenzial from Marktreife.
+ 30.0 * GREATEST(0.0, 1.0 - COALESCE(l.padel_venues_per_100k, 0) / 4.0) + 30.0 * GREATEST(0.0, 1.0 - COALESCE(l.padel_venues_per_100k, 0) / 8.0)
-- Catchment gap (15 pts): distance to nearest existing padel court. -- Catchment gap (15 pts): distance to nearest existing padel court.
-- >30km = full 15 pts (underserved catchment area). -- >30km = full 15 pts (underserved catchment area).