Compare commits

...

7 Commits

Author SHA1 Message Date
Deeman
9dc705970e merge: Opportunity Score v8 — better spread/discrimination
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
# Conflicts:
#	CHANGELOG.md
2026-03-09 22:24:43 +01:00
Deeman
9c5bed01f5 docs: add Score v8 entry to CHANGELOG
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:15:34 +01:00
Deeman
3ce97cd41b docs(i18n): update methodology weights for Score v8
Addressable Market 20→15, Economic Power 15→10, Supply Deficit 40→50.
Update scaling description (LN/500K → SQRT/1M) and add existence
dampener explanation to supply deficit description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:15:18 +01:00
Deeman
ff6401254a feat(score): Opportunity Score v8 — better spread/discrimination
Reweight: addressable market 20→15, economic power 15→10, supply deficit 40→50.
Supply deficit existence dampener (country_venues/50, floor 0.1): zero-venue
countries drop from ~80 to ~17. Steeper addressable market curve (LN/500K →
SQRT/1M). NULL distance gap → 0.0 (was 0.5). Added country_percentile output
column (PERCENT_RANK within country, 0–100).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:14:30 +01:00
Deeman
487722c2f3 chore: changelog + fix stg_population_geonames unicode escapes
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:32:49 +01:00
Deeman
23c7570736 merge: Opportunity Score v7 calibration fix 2026-03-09 18:12:47 +01:00
Deeman
e39dd4ec0b fix(score): Opportunity Score v7 — calibration fix for saturated markets
Two fixes:
1. dim_locations now sources venues from dim_venues (deduplicated OSM + Playtomic)
   instead of stg_padel_courts (OSM only). Playtomic-only venues are no longer
   invisible to spatial lookups.
2. Country-level supply saturation dampener on supply deficit component.
   Saturated countries (Spain 7.4/100k) get dampened supply deficit (x0.30 → 12 pts max).
   Emerging markets (Germany 0.24/100k) nearly unaffected (x0.98 → ~39 pts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:03:14 +01:00
6 changed files with 65 additions and 32 deletions

View File

@@ -7,10 +7,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Changed
- **Opportunity Score v7 → v8** — better spread and discrimination across the full 0-100 range. Addressable market weight reduced (20→15 pts) with steeper sqrt curve (ceiling 1M, was LN/500K). Economic power reduced (15→10 pts). Supply deficit increased (40→50 pts) with market existence dampener: countries with zero padel venues get max 5 pts supply deficit (factor 0.1), scaling linearly to full credit at 50+ venues. NULL nearest-court distance now treated as 0 (assume nearby) instead of 0.5. Added `country_percentile` output column (PERCENT_RANK within country). Target: P5-P95 spread ≥40 pts (was 22), zero-venue countries avg <30.
- **Opportunity Score v6 → v7 (calibration fix)** — two fixes for inflated scores in saturated markets. (1) `dim_locations` now sources venue coordinates from `dim_venues` (deduplicated OSM + Playtomic) instead of `stg_padel_courts` (OSM only), making Playtomic-only venues visible to spatial lookups. (2) Country-level supply saturation dampener on the 40-pt supply deficit component: saturated countries (Spain ~4.5/100k) get dampened supply deficit (×0.55 → 22 pts max), emerging markets (Germany ~0.7/100k) are nearly unaffected (×0.93 → ~37 pts).
- **Single-score simplification** — consolidated two public-facing scores (Market Score + Opportunity Score) into one **Padelnomics Score** (internally: `opportunity_score`). All maps, tooltips, article templates, and the methodology page now show a single score. Dual-ring markers reverted to single-color markers. `/market-score` route renamed to `/padelnomics-score` (old URL 301-redirects). All `mscore_*` i18n keys replaced with `pnscore_*`. Business plan queries `opportunity_score` from `location_profiles` (replaces legacy `city_market_overview` view). Map tooltip strings now i18n'd via `window.__MAP_T` (12 keys, EN + DE).
### Fixed
- **Non-Latin city names on map** — GeoNames entries with CJK/Cyrillic/Arabic characters (e.g. "Seelow" showing Japanese) now filtered in `stg_population_geonames` via Latin-only regex.
- **GeoNames regex DuckDB compatibility** — replaced Python-style `\u00C0` Unicode escapes in `stg_population_geonames` regex with literal Unicode characters (`À-ɏḀ-ỿ`) for DuckDB compatibility.
- **Score range safety** — `location_profiles` clamps both scores to 0-100 via `LEAST/GREATEST`.
- **Pipeline cast fix** — `venue_pricing_benchmarks.sql` defensively casts `snapshot_date` VARCHAR to DATE.

View File

@@ -9,7 +9,7 @@
-- foundation.dim_countries → country_name_en, country_slug, median_income_pps
-- stg_nuts2_boundaries + stg_regional_income → EU NUTS-2/NUTS-1 income (spatial join)
-- stg_income_usa → US state-level income (PPS-normalised)
-- stg_padel_courts → padel venue count + nearest court distance (km)
-- foundation.dim_venues → padel venue count + nearest court distance (km)
-- stg_tennis_courts → tennis court count within 25km radius
--
-- Income resolution cascade:
@@ -137,10 +137,12 @@ us_income AS (
PARTITION BY m.admin1_code ORDER BY s.ref_year DESC
) = 1
),
-- Padel court lat/lon for distance and density calculations
-- Padel venue lat/lon for distance and density calculations.
-- Uses dim_venues (deduplicated OSM + Playtomic) instead of stg_padel_courts (OSM only)
-- so Playtomic-only venues are visible to spatial lookups.
padel_courts AS (
SELECT lat, lon, country_code
FROM staging.stg_padel_courts
FROM foundation.dim_venues
WHERE lat IS NOT NULL AND lon IS NOT NULL
),
-- Nearest padel court distance per location (bbox pre-filter → exact sphere distance)

View File

@@ -19,19 +19,23 @@
-- 10 pts economic context — income PPS normalised to 25,000 ceiling
-- 10 pts data quality — completeness discount
--
-- Padelnomics Opportunity Score (Marktpotenzial-Score v6, 0100):
-- Padelnomics Opportunity Score (Marktpotenzial-Score v8, 0100):
-- "Where should I build a padel court?"
-- Computed for ALL locations — zero-court locations score highest on supply deficit.
-- H3 catchment methodology: addressable market and supply deficit use a regional
-- H3 catchment (res-5 cell + 6 neighbours, ~24km radius).
--
-- v6 changes: lower density ceiling 8→5/100k (saturated markets hit zero-gap sooner),
-- increase supply deficit weight 35→40 pts, reduce addressable market 25→20 pts,
-- invert market validation (high country maturity = LESS opportunity).
-- v8 changes: better spread/discrimination.
-- - Reweight: addressable market 20→15, economic power 15→10, supply deficit 40→50.
-- - Supply deficit existence dampener: country_venues/50 factor (0.11.0).
-- Zero-venue countries get max 5 pts supply deficit (was 50).
-- - Steeper addressable market curve: LN/500K → SQRT/1M.
-- - NULL distance gap → 0.0 (was 0.5). Unknown = assume nearby.
-- - Added country_percentile output column (PERCENT_RANK within country).
--
-- 20 pts addressable market — log-scaled catchment population, ceiling 500K
-- 15 pts economic power — income PPS, normalised to 35,000
-- 40 pts supply deficit — max(density gap, distance gap); eliminates double-count
-- 15 pts addressable market — sqrt-scaled catchment population, ceiling 1M
-- 10 pts economic power — income PPS, normalised to 35,000
-- 50 pts supply deficit — max(density gap, distance gap) × existence dampener
-- 10 pts sports culture — tennis court density as racquet-sport adoption proxy
-- 5 pts construction affordability — income relative to construction costs (PLI)
-- 10 pts market headroom — inverse country-level avg market maturity
@@ -209,27 +213,47 @@ country_market AS (
WHERE market_score > 0
GROUP BY country_code
),
-- Step 3: add opportunity_score using country market validation signal.
-- Step 3: country-level supply saturation — venues per 100K at the country level.
-- Used to dampen supply deficit in saturated markets (Spain, Sweden).
country_supply AS (
SELECT
country_code,
SUM(city_padel_venue_count) AS country_venues,
SUM(population) AS country_pop,
CASE WHEN SUM(population) > 0
THEN SUM(city_padel_venue_count) * 100000.0 / SUM(population)
ELSE 0
END AS venues_per_100k
FROM foundation.dim_cities
WHERE population > 0
GROUP BY country_code
),
-- Step 4: add opportunity_score using country market validation + supply saturation.
scored AS (
SELECT ms.*,
-- ── Opportunity Score (Marktpotenzial-Score v6, H3 catchment) ──────────
-- ── Opportunity Score (Marktpotenzial-Score v8, H3 catchment) ──────────
ROUND(
-- Addressable market (20 pts): log-scaled catchment population, ceiling 500K
20.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000))
-- Economic power (15 pts): income PPS normalised to 35,000
+ 15.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
-- Supply deficit (40 pts): max of density gap and distance gap.
-- Ceiling 5/100k (down from 8): Spain at 6-16/100k now hits zero-gap.
+ 40.0 * GREATEST(
-- Addressable market (15 pts): sqrt-scaled catchment population, ceiling 1M
15.0 * LEAST(1.0, SQRT(GREATEST(catchment_population, 1) / 1000000.0))
-- Economic power (10 pts): income PPS normalised to 35,000
+ 10.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
-- Supply deficit (50 pts): max of density gap and distance gap.
-- Dampened by market existence: country_venues/50 (0.11.0).
-- 0 venues in country → factor 0.1 → max 5 pts supply deficit
-- 10 venues → 0.2 → max 10 pts
-- 50+ venues → 1.0 → full credit
+ 50.0 * GREATEST(
-- density-based gap (H3 catchment): 0 courts = 1.0, 5/100k = 0.0
GREATEST(0.0, 1.0 - COALESCE(
CASE WHEN catchment_population > 0
THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000
ELSE 0.0
END, 0.0) / 5.0),
-- distance-based gap: 30km+ = 1.0, 0km = 0.0; NULL = 0.5
COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
-- distance-based gap: 30km+ = 1.0, 0km = 0.0; NULL = 0.0 (assume nearby)
COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.0)
)
-- Market existence dampener: zero-venue countries get 0.1, 50+ venues = 1.0
* GREATEST(0.1, LEAST(1.0, COALESCE(cs.country_venues, 0) / 50.0))
-- Sports culture (10 pts): tennis density as racquet-sport adoption proxy.
-- Ceiling 50 courts within 25km. Harmless when tennis data is zero (contributes 0).
+ 10.0 * LEAST(1.0, COALESCE(tennis_courts_within_25km, 0) / 50.0)
@@ -247,6 +271,7 @@ scored AS (
, 1) AS opportunity_score
FROM market_scored ms
LEFT JOIN country_market cm ON ms.country_code = cm.country_code
LEFT JOIN country_supply cs ON ms.country_code = cs.country_code
)
SELECT
s.geoname_id,
@@ -280,6 +305,9 @@ SELECT
END AS catchment_venues_per_100k,
LEAST(GREATEST(s.market_score, 0), 100) AS market_score,
LEAST(GREATEST(s.opportunity_score, 0), 100) AS opportunity_score,
ROUND(PERCENT_RANK() OVER (
PARTITION BY s.country_code ORDER BY s.opportunity_score
) * 100, 0) AS country_percentile,
s.median_hourly_rate,
s.median_peak_rate,
s.median_offpeak_rate,

View File

@@ -40,4 +40,4 @@ WHERE geoname_id IS NOT NULL
AND lon IS NOT NULL
-- Reject names with non-Latin characters (CJK, Cyrillic, Arabic, Thai, etc.)
-- Allows ASCII + Latin Extended (diacritics: ÄÖÜ, àéî, ñ, ø, etc.)
AND regexp_matches(city_name, '^[\x20-\x7E\u00C0-\u024F\u1E00-\u1EFF]+$')
AND regexp_matches(city_name, '^[\x20-\x7EÀ-ɏḀ-ỿ]+$')

View File

@@ -1711,12 +1711,12 @@
"pnscore_what_intro": "Der Padelnomics Score ist ein Komposit-Index von 0 bis 100, der bewertet, wie attraktiv ein Standort für eine neue Padelanlage ist. Er kombiniert angebotsseitige Lücken (gibt es genug Courts?) mit nachfrageseitigen Signalen (Bevölkerung, Einkommen, Sportaffinität) und berücksichtigt die Marktreife. Ein hoher Score bedeutet: Es gibt adressierbare Nachfrage, das Gebiet ist unterversorgt und die Rahmenbedingungen begünstigen ein Investment.",
"pnscore_components_h2": "Was der Score misst",
"pnscore_components_intro": "Sechs gewichtete Komponenten fließen in den Gesamtscore ein. Jede erfasst einen anderen Aspekt des Investitionspotenzials.",
"pnscore_cat_market_h3": "Adressierbarer Markt (20 Pkt)",
"pnscore_cat_market_p": "Einzugsgebiet-Bevölkerung im Umkreis von ~24 km (H3 Res-5-Zelle + Nachbarn). Logarithmisch skaliert — eine Stadt mit 500K Einwohnern erreicht das Maximum. Größeres Einzugsgebiet bedeutet mehr potenzielle Spieler.",
"pnscore_cat_econ_h3": "Wirtschaftskraft (15 Pkt)",
"pnscore_cat_market_h3": "Adressierbarer Markt (15 Pkt)",
"pnscore_cat_market_p": "Einzugsgebiet-Bevölkerung im Umkreis von ~24 km (H3 Res-5-Zelle + Nachbarn). Wurzelskaliert — ein Einzugsgebiet von 1 Mio. erreicht das Maximum. Größeres Einzugsgebiet bedeutet mehr potenzielle Spieler.",
"pnscore_cat_econ_h3": "Wirtschaftskraft (10 Pkt)",
"pnscore_cat_econ_p": "Regionales Einkommen in Kaufkraftstandards (KKS). Höheres verfügbares Einkommen stützt Premium-Preise und häufigeres Spielen. Daten von Eurostat (EU), Census (USA), ONS (UK).",
"pnscore_cat_gap_h3": "Versorgungslücke (40 Pkt)",
"pnscore_cat_gap_p": "Die gewichtigste Komponente. Misst zwei Signale: Anlagendichte-Lücke (wie weit unter 5 Courts pro 100K?) und Entfernungslücke (wie weit zur nächsten Anlage?). Null Courts = maximale Punktzahl. Bereits gut versorgte Gebiete erhalten kaum Punkte.",
"pnscore_cat_gap_h3": "Versorgungslücke (50 Pkt)",
"pnscore_cat_gap_p": "Die gewichtigste Komponente. Misst zwei Signale: Anlagendichte-Lücke (wie weit unter 5 Courts pro 100K?) und Entfernungslücke (wie weit zur nächsten Anlage?). Gedämpft nach Marktreife — Länder mit wenigen oder keinen Padel-Anlagen erhalten reduzierten Punktwert, da eine Versorgungslücke ohne nachgewiesene Nachfrage spekulativ ist. Voller Punktwert erst ab 50+ Anlagen im Land.",
"pnscore_cat_sports_h3": "Sportaffinität (10 Pkt)",
"pnscore_cat_sports_p": "Tennisplatz-Dichte im Umkreis von 25 km als Proxy für Racketsport-Affinität. Regionen mit starker Tennis-Infrastruktur haben ein bereites Publikum für Padel — einen eng verwandten Sport mit niedrigerer Einstiegshürde.",
"pnscore_cat_catchment_h3": "Baukosten-Erschwinglichkeit (5 Pkt)",

View File

@@ -1742,12 +1742,12 @@
"pnscore_what_intro": "The Padelnomics Score is a 0-100 composite index that evaluates how attractive a location is for a new padel facility. It combines supply-side gaps (are there enough courts?) with demand-side signals (population, income, sports culture) and adjusts for market maturity. A high score means: there is addressable demand, the area is underserved, and conditions favor a new investment.",
"pnscore_components_h2": "What It Measures",
"pnscore_components_intro": "Six weighted components combine into the final score. Each captures a different aspect of investment potential.",
"pnscore_cat_market_h3": "Addressable Market (20 pts)",
"pnscore_cat_market_p": "Catchment population within ~24 km (H3 res-5 cell + neighbors). Log-scaled — a city of 500K scores the maximum. Larger catchment means more potential players.",
"pnscore_cat_econ_h3": "Economic Power (15 pts)",
"pnscore_cat_market_h3": "Addressable Market (15 pts)",
"pnscore_cat_market_p": "Catchment population within ~24 km (H3 res-5 cell + neighbors). Square-root scaled — a catchment of 1M scores the maximum. Larger catchment means more potential players.",
"pnscore_cat_econ_h3": "Economic Power (10 pts)",
"pnscore_cat_econ_p": "Regional income in purchasing power standard (PPS). Higher disposable income supports premium pricing and more frequent play. Data from Eurostat (EU), Census (US), ONS (UK).",
"pnscore_cat_gap_h3": "Supply Deficit (40 pts)",
"pnscore_cat_gap_p": "The single biggest component. Measures two signals: court density gap (how far below 5 courts per 100K?) and distance gap (how far to the nearest existing court?). Zero courts = maximum score. Already well-served areas score near zero.",
"pnscore_cat_gap_h3": "Supply Deficit (50 pts)",
"pnscore_cat_gap_p": "The single biggest component. Measures two signals: court density gap (how far below 5 courts per 100K?) and distance gap (how far to the nearest existing court?). Dampened by market existence — countries with few or no padel venues get reduced credit, since a supply gap without proven demand is speculative. Full credit requires 50+ venues nationally.",
"pnscore_cat_sports_h3": "Sports Culture (10 pts)",
"pnscore_cat_sports_p": "Tennis court density within 25 km as a proxy for racquet sport adoption. Regions with strong tennis infrastructure have a ready audience for padel — a closely related sport with a lower barrier to entry.",
"pnscore_cat_catchment_h3": "Construction Affordability (5 pts)",