merge: Market Score v4 + Opportunity Score v5

chore: update CHANGELOG + admin dependency graph for score v4/v5
- CHANGELOG.md: document Market Score v4 and Opportunity Score v5 changes - pipeline_routes.py: add dim_countries to location_profiles dependency list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:32:26 +01:00 · 2026-03-08 15:32:06 +01:00 · 2026-03-08 15:30:04 +01:00 · 2026-03-08 15:22:48 +01:00
3 changed files with 64 additions and 36 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [Unreleased]

 ### Changed
+- **Market Score v3 → v4** — fixes Spain averaging 54 (should be 65-80). Four calibration changes: count gate threshold lowered from 5 → 3 venues (3 establishes a market pattern), density ceiling lowered from LN(21) → LN(11) (10/100k is reachable for mature markets), demand evidence fallback raised from 0.4 → 0.65 multiplier with 0.3 floor (existence of venues IS evidence of demand), economic context ceiling changed from income/200 → income/25000 (actual discrimination instead of free 10 pts for everyone).
+- **Opportunity Score v4 → v5** — fixes structural flaws: supply gap (30pts) + catchment gap (15pts) merged into single supply deficit (35pts, GREATEST of density gap and distance gap) eliminating ~80% correlated double-count. New sports culture signal (10pts) using tennis court density as racquet-sport adoption proxy. New construction affordability signal (5pts) using income relative to PLI construction costs from `dim_countries`. Economic power reduced from 20 → 15pts. New dependency on `foundation.dim_countries` for `pli_construction`.
+
 - **Unified `location_profiles` serving model** — merged `city_market_profile` and `location_opportunity_profile` into a single `serving.location_profiles` table at `(country_code, geoname_id)` grain. Both Marktreife-Score (Market Score) and Marktpotenzial-Score (Opportunity Score) are now computed per location. City data enriched via LEFT JOIN `dim_cities` on `geoname_id`. Downstream models (`planner_defaults`, `pseo_city_costs_de`, `pseo_city_pricing`) updated to query `location_profiles` directly. `city_padel_venue_count` (exact from dim_cities) distinguished from `padel_venue_count` (spatial 5km from dim_locations).
 - **Both scores on all map tooltips** — country map shows avg Market Score + avg Opportunity Score; city map shows Market Score + Opportunity Score per city; opportunity map shows Opportunity Score + Market Score per location. All score labels use the trademarked "Padelnomics Market Score" / "Padelnomics Opportunity Score" names.
 - **API endpoints** — `/api/markets/countries.json` adds `avg_opportunity_score`; `/api/markets/<country>/cities.json` adds `opportunity_score`; `/api/opportunity/<country>.json` adds `market_score`.
--- a/transform/sqlmesh_padelnomics/models/serving/location_profiles.sql
+++ b/transform/sqlmesh_padelnomics/models/serving/location_profiles.sql
@@ -5,30 +5,36 @@
 --
 -- Two scores per location:
 --
-- Padelnomics Market Score (Marktreife-Score v3, 0–100):
+-- Padelnomics Market Score (Marktreife-Score v4, 0–100):
 --   "How mature/established is this padel market?"
 --   Only meaningful for locations matched to a dim_cities row (city_slug IS NOT NULL)
 --   with padel venues. 0 for all other locations.
 --
--   40 pts  supply development — log-scaled density (LN ceiling 20/100k) × count gate
--   25 pts  demand evidence   — occupancy when available; 40% density proxy otherwise
+--   v4 changes: lower count gate (5→3), lower density ceiling (LN(21)→LN(11)),
+--   better demand fallback (0.4→0.65 with 0.3 floor), economic context discrimination (200→25K).
+--
+--   40 pts  supply development — log-scaled density (LN ceiling 10/100k) × count gate (3)
+--   25 pts  demand evidence   — occupancy when available; 65% density proxy + 0.3 floor otherwise
 --   15 pts  addressable market — log-scaled population, ceiling 1M
--   10 pts  economic context  — income PPS normalised to 200 ceiling
+--   10 pts  economic context  — income PPS normalised to 25,000 ceiling
 --   10 pts  data quality      — completeness discount
 --
-- Padelnomics Opportunity Score (Marktpotenzial-Score v4, 0–100):
+-- Padelnomics Opportunity Score (Marktpotenzial-Score v5, 0–100):
 --   "Where should I build a padel court?"
--   Computed for ALL locations — zero-court locations score highest on supply gap.
--   H3 catchment methodology: addressable market and supply gap use a regional
+--   Computed for ALL locations — zero-court locations score highest on supply deficit.
+--   H3 catchment methodology: addressable market and supply deficit use a regional
 --   H3 catchment (res-5 cell + 6 neighbours, ~24km radius).
 --
--   25 pts  addressable market — log-scaled catchment population, ceiling 500K
--   20 pts  economic power     — income PPS, normalised to 35,000
--   30 pts  supply gap         — inverted catchment venue density; 0 courts = full marks
--   15 pts  catchment gap      — distance to nearest padel court
--   10 pts  market validation  — country-level avg market maturity (from market_scored CTE).
--                               Replaces sports culture proxy (v3: tennis data was all zeros).
--                               ES (~60/100) → ~6 pts, SE (~35/100) → ~3.5 pts, unknown → 5 pts.
+--   v5 changes: merge supply gap + catchment gap → single supply deficit (35 pts),
+--   add sports culture proxy (10 pts, tennis density), add construction affordability (5 pts),
+--   reduce economic power from 20 → 15 pts.
+--
+--   25 pts  addressable market        — log-scaled catchment population, ceiling 500K
+--   15 pts  economic power            — income PPS, normalised to 35,000
+--   35 pts  supply deficit            — max(density gap, distance gap); eliminates double-count
+--   10 pts  sports culture            — tennis court density as racquet-sport adoption proxy
+--    5 pts  construction affordability — income relative to construction costs (PLI)
+--   10 pts  market validation         — country-level avg market maturity (from market_scored CTE)
 --
 -- Consumers query directly with WHERE filters:
 --   cities API:       WHERE country_slug = ? AND city_slug IS NOT NULL
@@ -107,7 +113,7 @@ city_match AS (
    ORDER BY c.padel_venue_count DESC
  ) = 1
 ),
-- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment
+-- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment + country PLI
 with_pricing AS (
  SELECT
    b.*,
@@ -120,6 +126,7 @@ with_pricing AS (
    vpb.median_occupancy_rate,
    vpb.median_daily_revenue_per_venue,
    vpb.price_currency,
+    dc.pli_construction,
    COALESCE(ct.catchment_population, b.population)::BIGINT           AS catchment_population,
    COALESCE(ct.catchment_padel_courts, b.padel_venue_count)::INTEGER AS catchment_padel_courts
  FROM base b
@@ -131,6 +138,8 @@ with_pricing AS (
    AND cm.city_slug = vpb.city_slug
  LEFT JOIN catchment ct
    ON b.geoname_id = ct.geoname_id
+  LEFT JOIN foundation.dim_countries dc
+    ON b.country_code = dc.country_code
 ),
 -- Step 1: market score only — needed first so we can aggregate country averages.
 market_scored AS (
@@ -146,34 +155,38 @@ market_scored AS (
      WHEN population > 0 OR  COALESCE(city_padel_venue_count, 0) > 0 THEN 0.5
      ELSE 0.0
    END AS data_confidence,
-    -- ── Market Score (Marktreife-Score v3) ──────────────────────────────────
+    -- ── Market Score (Marktreife-Score v4) ──────────────────────────────────
    -- 0 when no city match or no venues (city_padel_venue_count NULL or 0)
    CASE WHEN COALESCE(city_padel_venue_count, 0) > 0 THEN
      ROUND(
        -- Supply development (40 pts)
+        -- density ceiling 10/100k (LN(11)), count gate 3 venues
        40.0 * LEAST(1.0, LN(
            COALESCE(
              CASE WHEN population > 0
                THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
                ELSE 0 END
-            , 0) + 1) / LN(21))
-             * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0)
+            , 0) + 1) / LN(11))
+             * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0)
        -- Demand evidence (25 pts)
+        -- with occupancy: scale to 65% target. Without: 65% of supply proxy + 0.3 floor
+        -- (existence of venues IS evidence of demand)
        + 25.0 * CASE
            WHEN median_occupancy_rate IS NOT NULL
              THEN LEAST(1.0, median_occupancy_rate / 0.65)
-            ELSE 0.4 * LEAST(1.0, LN(
+            ELSE GREATEST(0.3, 0.65 * LEAST(1.0, LN(
                COALESCE(
                  CASE WHEN population > 0
                    THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
                    ELSE 0 END
-                , 0) + 1) / LN(21))
-                     * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0)
+                , 0) + 1) / LN(11))
+                     * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0))
          END
        -- Addressable market (15 pts)
        + 15.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000))
        -- Economic context (10 pts)
-        + 10.0 * LEAST(1.0, COALESCE(median_income_pps, 100) / 200.0)
+        -- ceiling 25,000 PPS discriminates between wealthy and poorer markets
+        + 10.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 25000.0)
        -- Data quality (10 pts)
        + 10.0 * CASE
            WHEN population > 0 AND COALESCE(city_padel_venue_count, 0) > 0 THEN 1.0
@@ -199,23 +212,35 @@ country_market AS (
 -- Step 3: add opportunity_score using country market validation signal.
 scored AS (
  SELECT ms.*,
-    -- ── Opportunity Score (Marktpotenzial-Score v4, H3 catchment) ──────────
+    -- ── Opportunity Score (Marktpotenzial-Score v5, H3 catchment) ──────────
    ROUND(
      -- Addressable market (25 pts): log-scaled catchment population, ceiling 500K
      25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000))
-      -- Economic power (20 pts): income PPS normalised to 35,000
-      + 20.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
-      -- Supply gap (30 pts): inverted catchment venue density
-      + 30.0 * GREATEST(0.0, 1.0 - COALESCE(
-          CASE WHEN catchment_population > 0
-            THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000
-            ELSE 0.0
-          END, 0.0) / 8.0)
-      -- Catchment gap (15 pts): distance to nearest court
-      + 15.0 * COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
+      -- Economic power (15 pts): income PPS normalised to 35,000
+      + 15.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
+      -- Supply deficit (35 pts): max of density gap and distance gap.
+      -- Merges old supply gap (30) + catchment gap (15) which were ~80% correlated.
+      + 35.0 * GREATEST(
+          -- density-based gap (H3 catchment): 0 courts = 1.0, 8/100k = 0.0
+          GREATEST(0.0, 1.0 - COALESCE(
+            CASE WHEN catchment_population > 0
+              THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000
+              ELSE 0.0
+            END, 0.0) / 8.0),
+          -- distance-based gap: 30km+ = 1.0, 0km = 0.0; NULL = 0.5
+          COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
+        )
+      -- Sports culture (10 pts): tennis density as racquet-sport adoption proxy.
+      -- Ceiling 50 courts within 25km. Harmless when tennis data is zero (contributes 0).
+      + 10.0 * LEAST(1.0, COALESCE(tennis_courts_within_25km, 0) / 50.0)
+      -- Construction affordability (5 pts): income purchasing power relative to build costs.
+      -- PLI construction is EU27=100 index. High income + low construction cost = high score.
+      + 5.0 * LEAST(1.0,
+          COALESCE(median_income_pps, 15000) / 35000.0
+          / GREATEST(0.5, COALESCE(pli_construction, 100.0) / 100.0)
+        )
      -- Market validation (10 pts): country-level avg market maturity.
-      -- Replaces sports culture (v3 tennis data was all zeros = dead code).
-      -- ES (~60/100): proven demand → ~6 pts. SE (~35/100): struggling → ~3.5 pts.
+      -- ES (~70/100): proven demand → ~7 pts. SE (~35/100): emerging → ~3.5 pts.
      -- NULL (no courts in country yet): 0.5 neutral → 5 pts (untested, not penalised).
      + 10.0 * COALESCE(cm.country_avg_market_score / 100.0, 0.5)
    , 1) AS opportunity_score
--- a/web/src/padelnomics/admin/pipeline_routes.py
+++ b/web/src/padelnomics/admin/pipeline_routes.py
@@ -111,7 +111,7 @@ _DAG: dict[str, list[str]] = {
    "fct_daily_availability": ["fct_availability_slot", "dim_venue_capacity"],
    # Serving
    "venue_pricing_benchmarks": ["fct_daily_availability"],
-    "location_profiles": ["dim_locations", "dim_cities", "venue_pricing_benchmarks"],
+    "location_profiles": ["dim_locations", "dim_cities", "dim_countries", "venue_pricing_benchmarks"],
    "planner_defaults": ["venue_pricing_benchmarks", "location_profiles"],
    "pseo_city_costs_de": [
        "location_profiles", "planner_defaults",
Author	SHA1	Message	Date
Deeman	c3847bb617	merge: Market Score v4 + Opportunity Score v5 All checks were successful CI / test (push) Successful in 55s Details CI / tag (push) Successful in 2s Details	2026-03-08 15:32:26 +01:00
Deeman	fcef47cb22	chore: update CHANGELOG + admin dependency graph for score v4/v5 - CHANGELOG.md: document Market Score v4 and Opportunity Score v5 changes - pipeline_routes.py: add dim_countries to location_profiles dependency list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:32:06 +01:00
Deeman	118c2c0fc7	feat(scoring): Opportunity Score v4 → v5 — fix correlated components - Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST) Eliminates ~80% correlated double-count on a single signal. - Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy. Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0). - Add construction affordability (5pts): income relative to PLI construction costs. Joins dim_countries.pli_construction. High income + low build cost = high score. - Reduce economic power from 20 → 15pts to make room. New weights: addressable market 25, economic power 15, supply deficit 35, sports culture 10, construction affordability 5, market validation 10. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:30:04 +01:00
Deeman	cd6d950233	feat(scoring): Market Score v3 → v4 — fix Spain underscoring - Lower count gate threshold: 5 → 3 venues (3 establishes a market pattern) - Lower density ceiling: LN(21) → LN(11) (10/100k is reachable for mature markets) - Better demand fallback: 0.4 → 0.65 multiplier + 0.3 floor (venues = demand evidence) - Fix economic context: income/200 → income/25000 (actual discrimination vs free 10 pts) Expected: Spain avg market score rises from ~54 to ~65-75. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 15:22:48 +01:00