feat(sql+templates): market_score v3 — log density + count gate

Fixes ranking inversion where Germany (1/100k courts) outscored Spain (36/100k). Root causes: population/income were 55% of max before any padel signal, density ceiling saturated 73% of cities, small-town inflation (1 venue / 5k pop = 20/100k = full marks), and the saturation discount actively penalised mature markets. SQL (city_market_profile.sql): - Supply development 40pts: log-scaled density LN(d+1)/LN(21) × count gate min(1, count/5). Ceiling 20/100k. Count gate kills small-town inflation without hard cutoffs (1 venue = 20%, 5+ = 100%). - Demand evidence 25pts: occupancy if available; 40% density proxy otherwise. Separated from supply to avoid double-counting. - Addressable market 15pts: population as context, not maturity. - Economic context 10pts: income PPS (flat per country, low signal). - Data quality 10pts. - Removed saturation discount. High density = maturity. Verified spot-check scores: Málaga (46v, 7.77/100k): 70.1 [was 98.9] Barcelona (104v, 6.17/100k): 67.4 [was 100.0] Amsterdam (24v, 3.24/100k): 58.4 [was 93.7] Bernau bei Berlin (2v, 5.74/100k): 43.9 [was 92.7] Berlin (20v, 0.55/100k): 42.2 [was 74.1] London (66v, 0.74/100k): 44.1 [was 75.5] Templates (city-cost-de, country-overview, city-pricing): - Color coding: green >= 55 (was 65), amber >= 35 (was 40) - Intro/FAQ tiers: strong >= 55 (was 70), mid >= 35 (was 45) - Opportunity interplay: market_score < 40 (was < 50) for white-space Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:40:12 +01:00
parent 0b3e1235fa
commit 88ed17484b
4 changed files with 45 additions and 42 deletions
--- a/transform/sqlmesh_padelnomics/models/serving/city_market_profile.sql
+++ b/transform/sqlmesh_padelnomics/models/serving/city_market_profile.sql
@@ -1,16 +1,18 @@
 -- One Big Table: per-city padel market intelligence.
 -- Consumed by: SEO article generation, planner city-select pre-fill, API endpoints.
 --
-- Padelnomics Marktreife-Score v2 (0–100):
+-- Padelnomics Marktreife-Score v3 (0–100):
 -- Answers "How mature/established is this padel market?"
 -- Only computed for cities with ≥1 padel venue (padel_venue_count > 0).
 -- For white-space opportunity scoring, see serving.location_opportunity_profile.
 --
--   30 pts  population  — log-scaled to 1M+ city ceiling
--   25 pts  income PPS  — normalised to 200 ceiling (covers CH/NO/LU outliers)
--   30 pts  demand      — observed occupancy if available, else venue density
--   15 pts  data quality — completeness discount, not a market signal
--   ×0.85   saturation  — discount when venues_per_100k > 8 (oversupplied market)
+--   40 pts  supply development — log-scaled density (LN ceiling 20/100k) × count gate
+--                                (min(1, count/5) kills small-town inflation)
+--   25 pts  demand evidence   — occupancy when available; 40% density proxy otherwise
+--   15 pts  addressable market — log-scaled population, ceiling 1M (context only)
+--   10 pts  economic context  — income PPS normalised to 200 ceiling
+--   10 pts  data quality      — completeness discount
+--   No saturation discount: high density = maturity, not a penalty

 MODEL (
  name serving.city_market_profile,
@@ -61,28 +63,29 @@ WITH base AS (
 scored AS (
  SELECT *,
    ROUND(
-      -- Population (30 pts): log-scale, 1M+ city = full marks.
-      -- LN(1) = 0 so unpopulated cities score 0 here — they still score on demand.
-      30.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000))
-      -- Economic power (25 pts): income PPS normalised to 200 ceiling.
-      -- 200 covers high-income outliers (CH ~190, NO ~180, LU ~200+).
-      -- Drives pricing power and willingness-to-pay directly.
-      + 25.0 * LEAST(1.0, COALESCE(median_income_pps, 100) / 200.0)
-      -- Demand evidence (30 pts): observed occupancy is the best signal
-      -- (proves real demand). If unavailable, venue density is the proxy
-      -- (proves market exists; caps at 4/100K to avoid penalising dense cities).
-      + 30.0 * CASE
+      -- Supply development (40 pts): THE maturity signal.
+      -- Log-scaled density: LN(density+1)/LN(21) → 20/100k ≈ full marks.
+      -- Count gate: min(1, count/5) — 1 venue=20%, 5+ venues=100%.
+      -- Kills small-town inflation (1 court / 5k pop = 20/100k) without hard cutoffs.
+      40.0 * LEAST(1.0, LN(COALESCE(venues_per_100k, 0) + 1) / LN(21))
+           * LEAST(1.0, padel_venue_count / 5.0)
+      -- Demand evidence (25 pts): occupancy when Playtomic data available.
+      -- Fallback: 40% of density score (avoids double-counting with supply component).
+      + 25.0 * CASE
          WHEN median_occupancy_rate IS NOT NULL
            THEN LEAST(1.0, median_occupancy_rate / 0.65)
-          ELSE LEAST(1.0, COALESCE(venues_per_100k, 0) / 4.0)
+          ELSE 0.4 * LEAST(1.0, LN(COALESCE(venues_per_100k, 0) + 1) / LN(21))
+                   * LEAST(1.0, padel_venue_count / 5.0)
        END
-      -- Data quality (15 pts): measures completeness, not market quality.
-      -- Reduced from 20pts — kept as confidence discount, not market signal.
-      + 15.0 * data_confidence
+      -- Addressable market (15 pts): population as context, not maturity signal.
+      -- LN(1) = 0 so zero-pop cities score 0 here.
+      + 15.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000))
+      -- Economic context (10 pts): country-level income PPS.
+      -- Flat per country — kept as context modifier, not primary signal.
+      + 10.0 * LEAST(1.0, COALESCE(median_income_pps, 100) / 200.0)
+      -- Data quality (10 pts): completeness discount.
+      + 10.0 * data_confidence
    , 1)
-    -- Saturation discount: venues_per_100k > 8 signals oversupply.
-    -- ~8/100K ≈ Spain-tier density; above this marginal return decreases.
-    * CASE WHEN venues_per_100k > 8 THEN 0.85 ELSE 1.0 END
                                 AS market_score
  FROM base
 )