Compare commits

..

6 Commits

Author SHA1 Message Date
Deeman
bf811444ba merge: Score v6 — World Bank global economic data for non-EU countries
All checks were successful
CI / test (push) Successful in 56s
CI / tag (push) Successful in 3s
2026-03-08 19:40:57 +01:00
Deeman
3c135051fd feat(scoring): Score v6 — World Bank global economic data for non-EU countries
Non-EU countries (AR, MX, AE, AU, etc.) previously got NULL for
median_income_pps and pli_construction, falling back to EU-calibrated
defaults (15K PPS, PLI=100) that produced wrong scores.

New World Bank WDI extractor fetches GNI per capita PPP and price level
ratio for 215 countries. dim_countries uses Germany as calibration anchor
to scale WB values into the Eurostat range (dynamic ratio, self-corrects
as both sources update). EU countries keep exact Eurostat values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 18:17:33 +01:00
Deeman
c3847bb617 merge: Market Score v4 + Opportunity Score v5
All checks were successful
CI / test (push) Successful in 55s
CI / tag (push) Successful in 2s
2026-03-08 15:32:26 +01:00
Deeman
fcef47cb22 chore: update CHANGELOG + admin dependency graph for score v4/v5
- CHANGELOG.md: document Market Score v4 and Opportunity Score v5 changes
- pipeline_routes.py: add dim_countries to location_profiles dependency list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:32:06 +01:00
Deeman
118c2c0fc7 feat(scoring): Opportunity Score v4 → v5 — fix correlated components
- Merge supply gap (30pts) + catchment gap (15pts) → supply deficit (35pts, GREATEST)
  Eliminates ~80% correlated double-count on a single signal.
- Add sports culture signal (10pts): tennis court density as racquet-sport adoption proxy.
  Ceiling 50 courts/25km. Harmless when tennis data is zero (contributes 0).
- Add construction affordability (5pts): income relative to PLI construction costs.
  Joins dim_countries.pli_construction. High income + low build cost = high score.
- Reduce economic power from 20 → 15pts to make room.

New weights: addressable market 25, economic power 15, supply deficit 35,
sports culture 10, construction affordability 5, market validation 10.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:30:04 +01:00
Deeman
cd6d950233 feat(scoring): Market Score v3 → v4 — fix Spain underscoring
- Lower count gate threshold: 5 → 3 venues (3 establishes a market pattern)
- Lower density ceiling: LN(21) → LN(11) (10/100k is reachable for mature markets)
- Better demand fallback: 0.4 → 0.65 multiplier + 0.3 floor (venues = demand evidence)
- Fix economic context: income/200 → income/25000 (actual discrimination vs free 10 pts)

Expected: Spain avg market score rises from ~54 to ~65-75.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 15:22:48 +01:00
9 changed files with 310 additions and 44 deletions

View File

@@ -7,6 +7,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased] ## [Unreleased]
### Changed ### Changed
- **Score v6: Global economic data** — `dim_countries.median_income_pps` and `pli_construction` now cover all target markets, not just EU. World Bank WDI indicators (GNI per capita PPP + price level ratio) fill gaps for non-EU countries (AR, MX, AE, AU, etc.) with values calibrated to the Eurostat scale using Germany as anchor. EU countries keep exact Eurostat values. New extractor (`worldbank.py`), staging model (`stg_worldbank_income`), and `dim_countries` fallback CTEs. No changes to scoring formulas — the fix is upstream in the data layer.
- **Market Score v3 → v4** — fixes Spain averaging 54 (should be 65-80). Four calibration changes: count gate threshold lowered from 5 → 3 venues (3 establishes a market pattern), density ceiling lowered from LN(21) → LN(11) (10/100k is reachable for mature markets), demand evidence fallback raised from 0.4 → 0.65 multiplier with 0.3 floor (existence of venues IS evidence of demand), economic context ceiling changed from income/200 → income/25000 (actual discrimination instead of free 10 pts for everyone).
- **Opportunity Score v4 → v5** — fixes structural flaws: supply gap (30pts) + catchment gap (15pts) merged into single supply deficit (35pts, GREATEST of density gap and distance gap) eliminating ~80% correlated double-count. New sports culture signal (10pts) using tennis court density as racquet-sport adoption proxy. New construction affordability signal (5pts) using income relative to PLI construction costs from `dim_countries`. Economic power reduced from 20 → 15pts. New dependency on `foundation.dim_countries` for `pli_construction`.
- **Unified `location_profiles` serving model** — merged `city_market_profile` and `location_opportunity_profile` into a single `serving.location_profiles` table at `(country_code, geoname_id)` grain. Both Marktreife-Score (Market Score) and Marktpotenzial-Score (Opportunity Score) are now computed per location. City data enriched via LEFT JOIN `dim_cities` on `geoname_id`. Downstream models (`planner_defaults`, `pseo_city_costs_de`, `pseo_city_pricing`) updated to query `location_profiles` directly. `city_padel_venue_count` (exact from dim_cities) distinguished from `padel_venue_count` (spatial 5km from dim_locations). - **Unified `location_profiles` serving model** — merged `city_market_profile` and `location_opportunity_profile` into a single `serving.location_profiles` table at `(country_code, geoname_id)` grain. Both Marktreife-Score (Market Score) and Marktpotenzial-Score (Opportunity Score) are now computed per location. City data enriched via LEFT JOIN `dim_cities` on `geoname_id`. Downstream models (`planner_defaults`, `pseo_city_costs_de`, `pseo_city_pricing`) updated to query `location_profiles` directly. `city_padel_venue_count` (exact from dim_cities) distinguished from `padel_venue_count` (spatial 5km from dim_locations).
- **Both scores on all map tooltips** — country map shows avg Market Score + avg Opportunity Score; city map shows Market Score + Opportunity Score per city; opportunity map shows Opportunity Score + Market Score per location. All score labels use the trademarked "Padelnomics Market Score" / "Padelnomics Opportunity Score" names. - **Both scores on all map tooltips** — country map shows avg Market Score + avg Opportunity Score; city map shows Market Score + Opportunity Score per city; opportunity map shows Opportunity Score + Market Score per location. All score labels use the trademarked "Padelnomics Market Score" / "Padelnomics Opportunity Score" names.
- **API endpoints** — `/api/markets/countries.json` adds `avg_opportunity_score`; `/api/markets/<country>/cities.json` adds `opportunity_score`; `/api/opportunity/<country>.json` adds `market_score`. - **API endpoints** — `/api/markets/countries.json` adds `avg_opportunity_score`; `/api/markets/<country>/cities.json` adds `opportunity_score`; `/api/opportunity/<country>.json` adds `market_score`.

View File

@@ -22,6 +22,7 @@ extract-census-usa-income = "padelnomics_extract.census_usa_income:main"
extract-ons-uk = "padelnomics_extract.ons_uk:main" extract-ons-uk = "padelnomics_extract.ons_uk:main"
extract-geonames = "padelnomics_extract.geonames:main" extract-geonames = "padelnomics_extract.geonames:main"
extract-gisco = "padelnomics_extract.gisco:main" extract-gisco = "padelnomics_extract.gisco:main"
extract-worldbank = "padelnomics_extract.worldbank:main"
[build-system] [build-system]
requires = ["hatchling"] requires = ["hatchling"]

View File

@@ -7,7 +7,7 @@ A graphlib.TopologicalSorter schedules them: tasks with no unmet dependencies
run immediately in parallel; each completion may unlock new tasks. run immediately in parallel; each completion may unlock new tasks.
Current dependency graph: Current dependency graph:
- All 9 non-availability extractors have no dependencies (run in parallel) - All 10 non-availability extractors have no dependencies (run in parallel)
- playtomic_availability depends on playtomic_tenants (starts as soon as - playtomic_availability depends on playtomic_tenants (starts as soon as
tenants finishes, even if other extractors are still running) tenants finishes, even if other extractors are still running)
""" """
@@ -38,6 +38,8 @@ from .playtomic_availability import EXTRACTOR_NAME as AVAILABILITY_NAME
from .playtomic_availability import extract as extract_availability from .playtomic_availability import extract as extract_availability
from .playtomic_tenants import EXTRACTOR_NAME as TENANTS_NAME from .playtomic_tenants import EXTRACTOR_NAME as TENANTS_NAME
from .playtomic_tenants import extract as extract_tenants from .playtomic_tenants import extract as extract_tenants
from .worldbank import EXTRACTOR_NAME as WORLDBANK_NAME
from .worldbank import extract as extract_worldbank
logger = setup_logging("padelnomics.extract") logger = setup_logging("padelnomics.extract")
@@ -54,6 +56,7 @@ EXTRACTORS: dict[str, tuple] = {
GEONAMES_NAME: (extract_geonames, []), GEONAMES_NAME: (extract_geonames, []),
GISCO_NAME: (extract_gisco, []), GISCO_NAME: (extract_gisco, []),
TENANTS_NAME: (extract_tenants, []), TENANTS_NAME: (extract_tenants, []),
WORLDBANK_NAME: (extract_worldbank, []),
AVAILABILITY_NAME: (extract_availability, [TENANTS_NAME]), AVAILABILITY_NAME: (extract_availability, [TENANTS_NAME]),
} }

View File

@@ -0,0 +1,153 @@
"""World Bank WDI extractor — GNI per capita PPP and price level ratio.
Fetches two indicators (one API call each, no key required):
- NY.GNP.PCAP.PP.CD — GNI per capita, PPP (international $)
- PA.NUS.PPPC.RF — Price level ratio (PPP conversion factor / exchange rate)
These provide global fallbacks behind Eurostat for dim_countries.median_income_pps
and dim_countries.pli_construction (see dim_countries.sql for calibration logic).
API: World Bank API v2 — https://api.worldbank.org/v2/
No API key required. No env vars.
Landing: {LANDING_DIR}/worldbank/{year}/{month}/wdi_indicators.json.gz
Output: {"rows": [{"country_code": "DE", "indicator": "NY.GNP.PCAP.PP.CD",
"ref_year": 2023, "value": 74200.0}, ...], "count": N}
"""
import json
import sqlite3
from pathlib import Path
import niquests
from ._shared import HTTP_TIMEOUT_SECONDS, run_extractor, setup_logging
from .utils import get_last_cursor, landing_path, write_gzip_atomic
logger = setup_logging("padelnomics.extract.worldbank")
EXTRACTOR_NAME = "worldbank"
INDICATORS = ["NY.GNP.PCAP.PP.CD", "PA.NUS.PPPC.RF"]
# 6 years of data — we take the latest non-null per country in staging
DATE_RANGE = "2019:2025"
MAX_PER_PAGE = 5000
MAX_PAGES = 3
WDI_BASE_URL = "https://api.worldbank.org/v2/country/all/indicator"
# WB aggregate codes that look like real 2-letter country codes.
# These are regional/income-group aggregates, not actual countries.
_WB_AGGREGATE_CODES = frozenset({
"EU", "OE",
"XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL", "XM",
"XN", "XO", "XP", "XQ", "XR", "XS", "XT", "XU", "XV", "XY",
"ZF", "ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT",
"V1", "V2", "V3", "V4",
})
def _normalize_country_code(wb_code: str) -> str | None:
"""Normalize WB country code to ISO alpha-2. Returns None for aggregates."""
code = wb_code.strip().upper()
if len(code) != 2:
return None
# Reject codes starting with a digit (e.g. "1W" for World)
if code[0].isdigit():
return None
if code in _WB_AGGREGATE_CODES:
return None
return code
def _fetch_indicator(
session: niquests.Session,
indicator: str,
) -> list[dict]:
"""Fetch all records for one indicator. Returns list of row dicts."""
rows: list[dict] = []
page = 1
while page <= MAX_PAGES:
url = (
f"{WDI_BASE_URL}/{indicator}"
f"?format=json&date={DATE_RANGE}&per_page={MAX_PER_PAGE}&page={page}"
)
logger.info("GET %s page %d", indicator, page)
resp = session.get(url, timeout=HTTP_TIMEOUT_SECONDS * 2)
resp.raise_for_status()
data = resp.json()
assert isinstance(data, list) and len(data) == 2, (
f"unexpected WB response shape for {indicator}: {type(data)}, len={len(data)}"
)
meta, records = data
total_pages = meta.get("pages", 1)
if records is None:
logger.warning("WB returned null data for %s page %d", indicator, page)
break
for record in records:
value = record.get("value")
if value is None:
continue
country_code = _normalize_country_code(record["country"]["id"])
if country_code is None:
continue
rows.append({
"country_code": country_code,
"indicator": indicator,
"ref_year": int(record["date"]),
"value": float(value),
})
if page >= total_pages:
break
page += 1
return rows
def extract(
landing_dir: Path,
year_month: str,
conn: sqlite3.Connection,
session: niquests.Session,
) -> dict:
"""Fetch WDI indicators. Skips if already run this month."""
last_cursor = get_last_cursor(conn, EXTRACTOR_NAME)
if last_cursor == year_month:
logger.info("already have data for %s — skipping", year_month)
return {"files_written": 0, "files_skipped": 1, "bytes_written": 0}
rows: list[dict] = []
for indicator in INDICATORS:
indicator_rows = _fetch_indicator(session, indicator)
logger.info("%s: %d records", indicator, len(indicator_rows))
rows.extend(indicator_rows)
assert len(rows) >= 200, f"expected ≥200 WB records, got {len(rows)} — API may have changed"
logger.info("total: %d WDI records", len(rows))
year, month = year_month.split("/")
dest_dir = landing_path(landing_dir, "worldbank", year, month)
dest = dest_dir / "wdi_indicators.json.gz"
payload = json.dumps({"rows": rows, "count": len(rows)}).encode()
bytes_written = write_gzip_atomic(dest, payload)
logger.info("written %s bytes compressed", f"{bytes_written:,}")
return {
"files_written": 1,
"files_skipped": 0,
"bytes_written": bytes_written,
"cursor_value": year_month,
}
def main() -> None:
run_extractor(EXTRACTOR_NAME, extract)
if __name__ == "__main__":
main()

View File

@@ -72,3 +72,8 @@ description = "UK local authority population estimates from ONS"
module = "padelnomics_extract.gisco" module = "padelnomics_extract.gisco"
schedule = "0 0 1 1 *" schedule = "0 0 1 1 *"
description = "EU geographic boundaries (NUTS2 polygons) from Eurostat GISCO" description = "EU geographic boundaries (NUTS2 polygons) from Eurostat GISCO"
[worldbank]
module = "padelnomics_extract.worldbank"
schedule = "monthly"
description = "GNI per capita PPP + price level ratio from World Bank WDI"

View File

@@ -2,10 +2,14 @@
-- --
-- Consolidates data previously duplicated across dim_cities and dim_locations: -- Consolidates data previously duplicated across dim_cities and dim_locations:
-- - country_name_en / country_slug (was: ~50-line CASE blocks in both models) -- - country_name_en / country_slug (was: ~50-line CASE blocks in both models)
-- - median_income_pps (was: country_income CTE in both models) -- - median_income_pps (Eurostat PPS preferred, World Bank GNI PPP fallback)
-- - energy prices, labour costs, PLI indices (new — from Eurostat datasets) -- - energy prices, labour costs, PLI indices (Eurostat, WB price level ratio fallback)
-- - cost override columns for the financial calculator -- - cost override columns for the financial calculator
-- --
-- World Bank fallback: for non-EU countries (AR, MX, AE, AU, etc.), income and PLI
-- are derived from WB WDI indicators calibrated to the Eurostat scale using Germany
-- as anchor. See de_calibration CTE. EU countries keep exact Eurostat values.
--
-- Used by: dim_cities, dim_locations, pseo_city_costs_de, planner_defaults. -- Used by: dim_cities, dim_locations, pseo_city_costs_de, planner_defaults.
-- Grain: country_code (one row per ISO 3166-1 alpha-2 country code). -- Grain: country_code (one row per ISO 3166-1 alpha-2 country code).
-- Kind: FULL — small table (~40 rows), full refresh daily. -- Kind: FULL — small table (~40 rows), full refresh daily.
@@ -82,6 +86,26 @@ de_elec AS (
de_gas AS ( de_gas AS (
SELECT gas_eur_gj FROM latest_gas WHERE country_code = 'DE' SELECT gas_eur_gj FROM latest_gas WHERE country_code = 'DE'
), ),
-- Latest World Bank WDI per country (GNI PPP + price level ratio)
latest_wb AS (
SELECT country_code, gni_ppp, price_level_ratio, ref_year AS wb_year
FROM staging.stg_worldbank_income
WHERE gni_ppp IS NOT NULL OR price_level_ratio IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY country_code ORDER BY ref_year DESC) = 1
),
-- Germany calibration anchor: Eurostat PPS + WB GNI PPP + WB price ratio + Eurostat PLI construction.
-- Used to scale World Bank values into Eurostat-comparable ranges.
-- Single row; if DE is missing from any source, that ratio produces NULL (safe fallthrough).
de_calibration AS (
SELECT
i.median_income_pps AS de_eurostat_pps,
wb.gni_ppp AS de_gni_ppp,
wb.price_level_ratio AS de_price_level_ratio,
p.construction AS de_pli_construction
FROM (SELECT median_income_pps FROM latest_income WHERE country_code = 'DE') i
CROSS JOIN (SELECT gni_ppp, price_level_ratio FROM latest_wb WHERE country_code = 'DE') wb
CROSS JOIN (SELECT construction FROM pli_pivoted WHERE country_code = 'DE') p
),
-- All distinct country codes from any source -- All distinct country codes from any source
all_countries AS ( all_countries AS (
SELECT country_code FROM latest_income SELECT country_code FROM latest_income
@@ -93,6 +117,8 @@ all_countries AS (
SELECT country_code FROM latest_labour SELECT country_code FROM latest_labour
UNION UNION
SELECT country_code FROM pli_pivoted SELECT country_code FROM pli_pivoted
UNION
SELECT country_code FROM latest_wb
-- Ensure known padel markets appear even if Eurostat doesn't cover them yet -- Ensure known padel markets appear even if Eurostat doesn't cover them yet
UNION ALL UNION ALL
SELECT unnest(['DE','ES','GB','FR','IT','PT','AT','CH','NL','BE','SE','NO','DK','FI', SELECT unnest(['DE','ES','GB','FR','IT','PT','AT','CH','NL','BE','SE','NO','DK','FI',
@@ -149,15 +175,21 @@ SELECT
ELSE ac.country_code ELSE ac.country_code
END, '[^a-zA-Z0-9]+', '-' END, '[^a-zA-Z0-9]+', '-'
)) AS country_slug, )) AS country_slug,
-- Income data -- Income: Eurostat PPS preferred, World Bank GNI PPP scaled to PPS as fallback
i.median_income_pps, COALESCE(
i.income_year, i.median_income_pps,
ROUND(wb.gni_ppp * (de_cal.de_eurostat_pps / NULLIF(de_cal.de_gni_ppp, 0)), 0)
) AS median_income_pps,
COALESCE(i.income_year, wb.wb_year) AS income_year,
-- Raw energy and labour data (for reference / future staffed-scenario use) -- Raw energy and labour data (for reference / future staffed-scenario use)
e.electricity_eur_kwh, e.electricity_eur_kwh,
g.gas_eur_gj, g.gas_eur_gj,
la.labour_cost_eur_hour, la.labour_cost_eur_hour,
-- PLI indices per category (EU27=100) -- PLI construction: Eurostat preferred, World Bank price level ratio scaled to PLI as fallback
p.construction AS pli_construction, COALESCE(
p.construction,
ROUND(wb.price_level_ratio / NULLIF(de_cal.de_price_level_ratio, 0) * de_cal.de_pli_construction, 1)
) AS pli_construction,
p.housing AS pli_housing, p.housing AS pli_housing,
p.services AS pli_services, p.services AS pli_services,
p.misc AS pli_misc, p.misc AS pli_misc,
@@ -278,8 +310,10 @@ LEFT JOIN latest_electricity e ON ac.country_code = e.country_code
LEFT JOIN latest_gas g ON ac.country_code = g.country_code LEFT JOIN latest_gas g ON ac.country_code = g.country_code
LEFT JOIN latest_labour la ON ac.country_code = la.country_code LEFT JOIN latest_labour la ON ac.country_code = la.country_code
LEFT JOIN pli_pivoted p ON ac.country_code = p.country_code LEFT JOIN pli_pivoted p ON ac.country_code = p.country_code
LEFT JOIN latest_wb wb ON ac.country_code = wb.country_code
CROSS JOIN de_pli de_p CROSS JOIN de_pli de_p
CROSS JOIN de_elec de_e CROSS JOIN de_elec de_e
CROSS JOIN de_gas de_g CROSS JOIN de_gas de_g
CROSS JOIN de_calibration de_cal
-- Enforce grain -- Enforce grain
QUALIFY ROW_NUMBER() OVER (PARTITION BY ac.country_code ORDER BY ac.country_code) = 1 QUALIFY ROW_NUMBER() OVER (PARTITION BY ac.country_code ORDER BY ac.country_code) = 1

View File

@@ -5,30 +5,36 @@
-- --
-- Two scores per location: -- Two scores per location:
-- --
-- Padelnomics Market Score (Marktreife-Score v3, 0100): -- Padelnomics Market Score (Marktreife-Score v4, 0100):
-- "How mature/established is this padel market?" -- "How mature/established is this padel market?"
-- Only meaningful for locations matched to a dim_cities row (city_slug IS NOT NULL) -- Only meaningful for locations matched to a dim_cities row (city_slug IS NOT NULL)
-- with padel venues. 0 for all other locations. -- with padel venues. 0 for all other locations.
-- --
-- 40 pts supply development — log-scaled density (LN ceiling 20/100k) × count gate -- v4 changes: lower count gate (5→3), lower density ceiling (LN(21)→LN(11)),
-- 25 pts demand evidence — occupancy when available; 40% density proxy otherwise -- better demand fallback (0.4→0.65 with 0.3 floor), economic context discrimination (200→25K).
--
-- 40 pts supply development — log-scaled density (LN ceiling 10/100k) × count gate (3)
-- 25 pts demand evidence — occupancy when available; 65% density proxy + 0.3 floor otherwise
-- 15 pts addressable market — log-scaled population, ceiling 1M -- 15 pts addressable market — log-scaled population, ceiling 1M
-- 10 pts economic context — income PPS normalised to 200 ceiling -- 10 pts economic context — income PPS normalised to 25,000 ceiling
-- 10 pts data quality — completeness discount -- 10 pts data quality — completeness discount
-- --
-- Padelnomics Opportunity Score (Marktpotenzial-Score v4, 0100): -- Padelnomics Opportunity Score (Marktpotenzial-Score v5, 0100):
-- "Where should I build a padel court?" -- "Where should I build a padel court?"
-- Computed for ALL locations — zero-court locations score highest on supply gap. -- Computed for ALL locations — zero-court locations score highest on supply deficit.
-- H3 catchment methodology: addressable market and supply gap use a regional -- H3 catchment methodology: addressable market and supply deficit use a regional
-- H3 catchment (res-5 cell + 6 neighbours, ~24km radius). -- H3 catchment (res-5 cell + 6 neighbours, ~24km radius).
-- --
-- 25 pts addressable market — log-scaled catchment population, ceiling 500K -- v5 changes: merge supply gap + catchment gap → single supply deficit (35 pts),
-- 20 pts economic power — income PPS, normalised to 35,000 -- add sports culture proxy (10 pts, tennis density), add construction affordability (5 pts),
-- 30 pts supply gap — inverted catchment venue density; 0 courts = full marks -- reduce economic power from 20 → 15 pts.
-- 15 pts catchment gap — distance to nearest padel court --
-- 10 pts market validation — country-level avg market maturity (from market_scored CTE). -- 25 pts addressable market — log-scaled catchment population, ceiling 500K
-- Replaces sports culture proxy (v3: tennis data was all zeros). -- 15 pts economic power — income PPS, normalised to 35,000
-- ES (~60/100) → ~6 pts, SE (~35/100) → ~3.5 pts, unknown → 5 pts. -- 35 pts supply deficit — max(density gap, distance gap); eliminates double-count
-- 10 pts sports culture — tennis court density as racquet-sport adoption proxy
-- 5 pts construction affordability — income relative to construction costs (PLI)
-- 10 pts market validation — country-level avg market maturity (from market_scored CTE)
-- --
-- Consumers query directly with WHERE filters: -- Consumers query directly with WHERE filters:
-- cities API: WHERE country_slug = ? AND city_slug IS NOT NULL -- cities API: WHERE country_slug = ? AND city_slug IS NOT NULL
@@ -107,7 +113,7 @@ city_match AS (
ORDER BY c.padel_venue_count DESC ORDER BY c.padel_venue_count DESC
) = 1 ) = 1
), ),
-- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment -- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment + country PLI
with_pricing AS ( with_pricing AS (
SELECT SELECT
b.*, b.*,
@@ -120,6 +126,7 @@ with_pricing AS (
vpb.median_occupancy_rate, vpb.median_occupancy_rate,
vpb.median_daily_revenue_per_venue, vpb.median_daily_revenue_per_venue,
vpb.price_currency, vpb.price_currency,
dc.pli_construction,
COALESCE(ct.catchment_population, b.population)::BIGINT AS catchment_population, COALESCE(ct.catchment_population, b.population)::BIGINT AS catchment_population,
COALESCE(ct.catchment_padel_courts, b.padel_venue_count)::INTEGER AS catchment_padel_courts COALESCE(ct.catchment_padel_courts, b.padel_venue_count)::INTEGER AS catchment_padel_courts
FROM base b FROM base b
@@ -131,6 +138,8 @@ with_pricing AS (
AND cm.city_slug = vpb.city_slug AND cm.city_slug = vpb.city_slug
LEFT JOIN catchment ct LEFT JOIN catchment ct
ON b.geoname_id = ct.geoname_id ON b.geoname_id = ct.geoname_id
LEFT JOIN foundation.dim_countries dc
ON b.country_code = dc.country_code
), ),
-- Step 1: market score only — needed first so we can aggregate country averages. -- Step 1: market score only — needed first so we can aggregate country averages.
market_scored AS ( market_scored AS (
@@ -146,34 +155,38 @@ market_scored AS (
WHEN population > 0 OR COALESCE(city_padel_venue_count, 0) > 0 THEN 0.5 WHEN population > 0 OR COALESCE(city_padel_venue_count, 0) > 0 THEN 0.5
ELSE 0.0 ELSE 0.0
END AS data_confidence, END AS data_confidence,
-- ── Market Score (Marktreife-Score v3) ────────────────────────────────── -- ── Market Score (Marktreife-Score v4) ──────────────────────────────────
-- 0 when no city match or no venues (city_padel_venue_count NULL or 0) -- 0 when no city match or no venues (city_padel_venue_count NULL or 0)
CASE WHEN COALESCE(city_padel_venue_count, 0) > 0 THEN CASE WHEN COALESCE(city_padel_venue_count, 0) > 0 THEN
ROUND( ROUND(
-- Supply development (40 pts) -- Supply development (40 pts)
-- density ceiling 10/100k (LN(11)), count gate 3 venues
40.0 * LEAST(1.0, LN( 40.0 * LEAST(1.0, LN(
COALESCE( COALESCE(
CASE WHEN population > 0 CASE WHEN population > 0
THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000 THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
ELSE 0 END ELSE 0 END
, 0) + 1) / LN(21)) , 0) + 1) / LN(11))
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0) * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0)
-- Demand evidence (25 pts) -- Demand evidence (25 pts)
-- with occupancy: scale to 65% target. Without: 65% of supply proxy + 0.3 floor
-- (existence of venues IS evidence of demand)
+ 25.0 * CASE + 25.0 * CASE
WHEN median_occupancy_rate IS NOT NULL WHEN median_occupancy_rate IS NOT NULL
THEN LEAST(1.0, median_occupancy_rate / 0.65) THEN LEAST(1.0, median_occupancy_rate / 0.65)
ELSE 0.4 * LEAST(1.0, LN( ELSE GREATEST(0.3, 0.65 * LEAST(1.0, LN(
COALESCE( COALESCE(
CASE WHEN population > 0 CASE WHEN population > 0
THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000 THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
ELSE 0 END ELSE 0 END
, 0) + 1) / LN(21)) , 0) + 1) / LN(11))
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0) * LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0))
END END
-- Addressable market (15 pts) -- Addressable market (15 pts)
+ 15.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000)) + 15.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000))
-- Economic context (10 pts) -- Economic context (10 pts)
+ 10.0 * LEAST(1.0, COALESCE(median_income_pps, 100) / 200.0) -- ceiling 25,000 PPS discriminates between wealthy and poorer markets
+ 10.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 25000.0)
-- Data quality (10 pts) -- Data quality (10 pts)
+ 10.0 * CASE + 10.0 * CASE
WHEN population > 0 AND COALESCE(city_padel_venue_count, 0) > 0 THEN 1.0 WHEN population > 0 AND COALESCE(city_padel_venue_count, 0) > 0 THEN 1.0
@@ -199,23 +212,35 @@ country_market AS (
-- Step 3: add opportunity_score using country market validation signal. -- Step 3: add opportunity_score using country market validation signal.
scored AS ( scored AS (
SELECT ms.*, SELECT ms.*,
-- ── Opportunity Score (Marktpotenzial-Score v4, H3 catchment) ────────── -- ── Opportunity Score (Marktpotenzial-Score v5, H3 catchment) ──────────
ROUND( ROUND(
-- Addressable market (25 pts): log-scaled catchment population, ceiling 500K -- Addressable market (25 pts): log-scaled catchment population, ceiling 500K
25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000)) 25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000))
-- Economic power (20 pts): income PPS normalised to 35,000 -- Economic power (15 pts): income PPS normalised to 35,000
+ 20.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0) + 15.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
-- Supply gap (30 pts): inverted catchment venue density -- Supply deficit (35 pts): max of density gap and distance gap.
+ 30.0 * GREATEST(0.0, 1.0 - COALESCE( -- Merges old supply gap (30) + catchment gap (15) which were ~80% correlated.
CASE WHEN catchment_population > 0 + 35.0 * GREATEST(
THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000 -- density-based gap (H3 catchment): 0 courts = 1.0, 8/100k = 0.0
ELSE 0.0 GREATEST(0.0, 1.0 - COALESCE(
END, 0.0) / 8.0) CASE WHEN catchment_population > 0
-- Catchment gap (15 pts): distance to nearest court THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000
+ 15.0 * COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5) ELSE 0.0
END, 0.0) / 8.0),
-- distance-based gap: 30km+ = 1.0, 0km = 0.0; NULL = 0.5
COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
)
-- Sports culture (10 pts): tennis density as racquet-sport adoption proxy.
-- Ceiling 50 courts within 25km. Harmless when tennis data is zero (contributes 0).
+ 10.0 * LEAST(1.0, COALESCE(tennis_courts_within_25km, 0) / 50.0)
-- Construction affordability (5 pts): income purchasing power relative to build costs.
-- PLI construction is EU27=100 index. High income + low construction cost = high score.
+ 5.0 * LEAST(1.0,
COALESCE(median_income_pps, 15000) / 35000.0
/ GREATEST(0.5, COALESCE(pli_construction, 100.0) / 100.0)
)
-- Market validation (10 pts): country-level avg market maturity. -- Market validation (10 pts): country-level avg market maturity.
-- Replaces sports culture (v3 tennis data was all zeros = dead code). -- ES (~70/100): proven demand → ~7 pts. SE (~35/100): emerging → ~3.5 pts.
-- ES (~60/100): proven demand → ~6 pts. SE (~35/100): struggling → ~3.5 pts.
-- NULL (no courts in country yet): 0.5 neutral → 5 pts (untested, not penalised). -- NULL (no courts in country yet): 0.5 neutral → 5 pts (untested, not penalised).
+ 10.0 * COALESCE(cm.country_avg_market_score / 100.0, 0.5) + 10.0 * COALESCE(cm.country_avg_market_score / 100.0, 0.5)
, 1) AS opportunity_score , 1) AS opportunity_score

View File

@@ -0,0 +1,41 @@
-- World Bank WDI indicators: GNI per capita PPP and price level ratio.
-- Pivoted to one row per (country_code, ref_year) with both indicators as columns.
--
-- Source: data/landing/worldbank/{year}/{month}/wdi_indicators.json.gz
-- Extracted by: worldbank.py
-- Used by: dim_countries (fallback behind Eurostat for non-EU countries)
MODEL (
name staging.stg_worldbank_income,
kind FULL,
cron '@daily',
grain (country_code, ref_year)
);
WITH parsed AS (
SELECT
row ->> 'country_code' AS country_code,
TRY_CAST(row ->> 'ref_year' AS INTEGER) AS ref_year,
row ->> 'indicator' AS indicator,
TRY_CAST(row ->> 'value' AS DOUBLE) AS value,
CURRENT_DATE AS extracted_date
FROM (
SELECT UNNEST(rows) AS row
FROM read_json(
@LANDING_DIR || '/worldbank/*/*/wdi_indicators.json.gz',
auto_detect = true
)
)
WHERE (row ->> 'country_code') IS NOT NULL
)
SELECT
country_code,
ref_year,
MAX(value) FILTER (WHERE indicator = 'NY.GNP.PCAP.PP.CD') AS gni_ppp,
MAX(value) FILTER (WHERE indicator = 'PA.NUS.PPPC.RF') AS price_level_ratio,
MAX(extracted_date) AS extracted_date
FROM parsed
WHERE value IS NOT NULL
AND value > 0
AND LENGTH(country_code) = 2
GROUP BY country_code, ref_year

View File

@@ -111,7 +111,7 @@ _DAG: dict[str, list[str]] = {
"fct_daily_availability": ["fct_availability_slot", "dim_venue_capacity"], "fct_daily_availability": ["fct_availability_slot", "dim_venue_capacity"],
# Serving # Serving
"venue_pricing_benchmarks": ["fct_daily_availability"], "venue_pricing_benchmarks": ["fct_daily_availability"],
"location_profiles": ["dim_locations", "dim_cities", "venue_pricing_benchmarks"], "location_profiles": ["dim_locations", "dim_cities", "dim_countries", "venue_pricing_benchmarks"],
"planner_defaults": ["venue_pricing_benchmarks", "location_profiles"], "planner_defaults": ["venue_pricing_benchmarks", "location_profiles"],
"pseo_city_costs_de": [ "pseo_city_costs_de": [
"location_profiles", "planner_defaults", "location_profiles", "planner_defaults",