Compare commits
14 Commits
v202603071
...
v202603081
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bf811444ba | ||
|
|
3c135051fd | ||
|
|
c3847bb617 | ||
|
|
fcef47cb22 | ||
|
|
118c2c0fc7 | ||
|
|
cd6d950233 | ||
|
|
28e44384ef | ||
|
|
b1e008a2a4 | ||
|
|
d556ceecee | ||
|
|
f215ea8e3a | ||
|
|
b2ffad055b | ||
|
|
544891611f | ||
|
|
c30a7943aa | ||
|
|
b071199895 |
@@ -7,6 +7,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
- **Score v6: Global economic data** — `dim_countries.median_income_pps` and `pli_construction` now cover all target markets, not just EU. World Bank WDI indicators (GNI per capita PPP + price level ratio) fill gaps for non-EU countries (AR, MX, AE, AU, etc.) with values calibrated to the Eurostat scale using Germany as anchor. EU countries keep exact Eurostat values. New extractor (`worldbank.py`), staging model (`stg_worldbank_income`), and `dim_countries` fallback CTEs. No changes to scoring formulas — the fix is upstream in the data layer.
|
||||
- **Market Score v3 → v4** — fixes Spain averaging 54 (should be 65-80). Four calibration changes: count gate threshold lowered from 5 → 3 venues (3 establishes a market pattern), density ceiling lowered from LN(21) → LN(11) (10/100k is reachable for mature markets), demand evidence fallback raised from 0.4 → 0.65 multiplier with 0.3 floor (existence of venues IS evidence of demand), economic context ceiling changed from income/200 → income/25000 (actual discrimination instead of free 10 pts for everyone).
|
||||
- **Opportunity Score v4 → v5** — fixes structural flaws: supply gap (30pts) + catchment gap (15pts) merged into single supply deficit (35pts, GREATEST of density gap and distance gap) eliminating ~80% correlated double-count. New sports culture signal (10pts) using tennis court density as racquet-sport adoption proxy. New construction affordability signal (5pts) using income relative to PLI construction costs from `dim_countries`. Economic power reduced from 20 → 15pts. New dependency on `foundation.dim_countries` for `pli_construction`.
|
||||
|
||||
- **Unified `location_profiles` serving model** — merged `city_market_profile` and `location_opportunity_profile` into a single `serving.location_profiles` table at `(country_code, geoname_id)` grain. Both Marktreife-Score (Market Score) and Marktpotenzial-Score (Opportunity Score) are now computed per location. City data enriched via LEFT JOIN `dim_cities` on `geoname_id`. Downstream models (`planner_defaults`, `pseo_city_costs_de`, `pseo_city_pricing`) updated to query `location_profiles` directly. `city_padel_venue_count` (exact from dim_cities) distinguished from `padel_venue_count` (spatial 5km from dim_locations).
|
||||
- **Both scores on all map tooltips** — country map shows avg Market Score + avg Opportunity Score; city map shows Market Score + Opportunity Score per city; opportunity map shows Opportunity Score + Market Score per location. All score labels use the trademarked "Padelnomics Market Score" / "Padelnomics Opportunity Score" names.
|
||||
- **API endpoints** — `/api/markets/countries.json` adds `avg_opportunity_score`; `/api/markets/<country>/cities.json` adds `opportunity_score`; `/api/opportunity/<country>.json` adds `market_score`.
|
||||
|
||||
@@ -26,6 +26,7 @@ RUN mkdir -p /app/data && chown -R appuser:appuser /app
|
||||
COPY --from=build --chown=appuser:appuser /app .
|
||||
COPY --from=css-build /app/web/src/padelnomics/static/css/output.css ./web/src/padelnomics/static/css/output.css
|
||||
COPY --chown=appuser:appuser infra/supervisor/workflows.toml ./infra/supervisor/workflows.toml
|
||||
COPY --chown=appuser:appuser content/ ./content/
|
||||
USER appuser
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
ENV DATABASE_PATH=/app/data/app.db
|
||||
|
||||
@@ -22,6 +22,7 @@ extract-census-usa-income = "padelnomics_extract.census_usa_income:main"
|
||||
extract-ons-uk = "padelnomics_extract.ons_uk:main"
|
||||
extract-geonames = "padelnomics_extract.geonames:main"
|
||||
extract-gisco = "padelnomics_extract.gisco:main"
|
||||
extract-worldbank = "padelnomics_extract.worldbank:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
|
||||
@@ -7,7 +7,7 @@ A graphlib.TopologicalSorter schedules them: tasks with no unmet dependencies
|
||||
run immediately in parallel; each completion may unlock new tasks.
|
||||
|
||||
Current dependency graph:
|
||||
- All 9 non-availability extractors have no dependencies (run in parallel)
|
||||
- All 10 non-availability extractors have no dependencies (run in parallel)
|
||||
- playtomic_availability depends on playtomic_tenants (starts as soon as
|
||||
tenants finishes, even if other extractors are still running)
|
||||
"""
|
||||
@@ -38,6 +38,8 @@ from .playtomic_availability import EXTRACTOR_NAME as AVAILABILITY_NAME
|
||||
from .playtomic_availability import extract as extract_availability
|
||||
from .playtomic_tenants import EXTRACTOR_NAME as TENANTS_NAME
|
||||
from .playtomic_tenants import extract as extract_tenants
|
||||
from .worldbank import EXTRACTOR_NAME as WORLDBANK_NAME
|
||||
from .worldbank import extract as extract_worldbank
|
||||
|
||||
logger = setup_logging("padelnomics.extract")
|
||||
|
||||
@@ -54,6 +56,7 @@ EXTRACTORS: dict[str, tuple] = {
|
||||
GEONAMES_NAME: (extract_geonames, []),
|
||||
GISCO_NAME: (extract_gisco, []),
|
||||
TENANTS_NAME: (extract_tenants, []),
|
||||
WORLDBANK_NAME: (extract_worldbank, []),
|
||||
AVAILABILITY_NAME: (extract_availability, [TENANTS_NAME]),
|
||||
}
|
||||
|
||||
|
||||
153
extract/padelnomics_extract/src/padelnomics_extract/worldbank.py
Normal file
153
extract/padelnomics_extract/src/padelnomics_extract/worldbank.py
Normal file
@@ -0,0 +1,153 @@
|
||||
"""World Bank WDI extractor — GNI per capita PPP and price level ratio.
|
||||
|
||||
Fetches two indicators (one API call each, no key required):
|
||||
- NY.GNP.PCAP.PP.CD — GNI per capita, PPP (international $)
|
||||
- PA.NUS.PPPC.RF — Price level ratio (PPP conversion factor / exchange rate)
|
||||
|
||||
These provide global fallbacks behind Eurostat for dim_countries.median_income_pps
|
||||
and dim_countries.pli_construction (see dim_countries.sql for calibration logic).
|
||||
|
||||
API: World Bank API v2 — https://api.worldbank.org/v2/
|
||||
No API key required. No env vars.
|
||||
|
||||
Landing: {LANDING_DIR}/worldbank/{year}/{month}/wdi_indicators.json.gz
|
||||
Output: {"rows": [{"country_code": "DE", "indicator": "NY.GNP.PCAP.PP.CD",
|
||||
"ref_year": 2023, "value": 74200.0}, ...], "count": N}
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
|
||||
import niquests
|
||||
|
||||
from ._shared import HTTP_TIMEOUT_SECONDS, run_extractor, setup_logging
|
||||
from .utils import get_last_cursor, landing_path, write_gzip_atomic
|
||||
|
||||
logger = setup_logging("padelnomics.extract.worldbank")
|
||||
|
||||
EXTRACTOR_NAME = "worldbank"
|
||||
|
||||
INDICATORS = ["NY.GNP.PCAP.PP.CD", "PA.NUS.PPPC.RF"]
|
||||
# 6 years of data — we take the latest non-null per country in staging
|
||||
DATE_RANGE = "2019:2025"
|
||||
MAX_PER_PAGE = 5000
|
||||
MAX_PAGES = 3
|
||||
|
||||
WDI_BASE_URL = "https://api.worldbank.org/v2/country/all/indicator"
|
||||
|
||||
# WB aggregate codes that look like real 2-letter country codes.
|
||||
# These are regional/income-group aggregates, not actual countries.
|
||||
_WB_AGGREGATE_CODES = frozenset({
|
||||
"EU", "OE",
|
||||
"XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL", "XM",
|
||||
"XN", "XO", "XP", "XQ", "XR", "XS", "XT", "XU", "XV", "XY",
|
||||
"ZF", "ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT",
|
||||
"V1", "V2", "V3", "V4",
|
||||
})
|
||||
|
||||
|
||||
def _normalize_country_code(wb_code: str) -> str | None:
|
||||
"""Normalize WB country code to ISO alpha-2. Returns None for aggregates."""
|
||||
code = wb_code.strip().upper()
|
||||
if len(code) != 2:
|
||||
return None
|
||||
# Reject codes starting with a digit (e.g. "1W" for World)
|
||||
if code[0].isdigit():
|
||||
return None
|
||||
if code in _WB_AGGREGATE_CODES:
|
||||
return None
|
||||
return code
|
||||
|
||||
|
||||
def _fetch_indicator(
|
||||
session: niquests.Session,
|
||||
indicator: str,
|
||||
) -> list[dict]:
|
||||
"""Fetch all records for one indicator. Returns list of row dicts."""
|
||||
rows: list[dict] = []
|
||||
page = 1
|
||||
|
||||
while page <= MAX_PAGES:
|
||||
url = (
|
||||
f"{WDI_BASE_URL}/{indicator}"
|
||||
f"?format=json&date={DATE_RANGE}&per_page={MAX_PER_PAGE}&page={page}"
|
||||
)
|
||||
logger.info("GET %s page %d", indicator, page)
|
||||
resp = session.get(url, timeout=HTTP_TIMEOUT_SECONDS * 2)
|
||||
resp.raise_for_status()
|
||||
|
||||
data = resp.json()
|
||||
assert isinstance(data, list) and len(data) == 2, (
|
||||
f"unexpected WB response shape for {indicator}: {type(data)}, len={len(data)}"
|
||||
)
|
||||
meta, records = data
|
||||
total_pages = meta.get("pages", 1)
|
||||
|
||||
if records is None:
|
||||
logger.warning("WB returned null data for %s page %d", indicator, page)
|
||||
break
|
||||
|
||||
for record in records:
|
||||
value = record.get("value")
|
||||
if value is None:
|
||||
continue
|
||||
country_code = _normalize_country_code(record["country"]["id"])
|
||||
if country_code is None:
|
||||
continue
|
||||
rows.append({
|
||||
"country_code": country_code,
|
||||
"indicator": indicator,
|
||||
"ref_year": int(record["date"]),
|
||||
"value": float(value),
|
||||
})
|
||||
|
||||
if page >= total_pages:
|
||||
break
|
||||
page += 1
|
||||
|
||||
return rows
|
||||
|
||||
|
||||
def extract(
|
||||
landing_dir: Path,
|
||||
year_month: str,
|
||||
conn: sqlite3.Connection,
|
||||
session: niquests.Session,
|
||||
) -> dict:
|
||||
"""Fetch WDI indicators. Skips if already run this month."""
|
||||
last_cursor = get_last_cursor(conn, EXTRACTOR_NAME)
|
||||
if last_cursor == year_month:
|
||||
logger.info("already have data for %s — skipping", year_month)
|
||||
return {"files_written": 0, "files_skipped": 1, "bytes_written": 0}
|
||||
|
||||
rows: list[dict] = []
|
||||
for indicator in INDICATORS:
|
||||
indicator_rows = _fetch_indicator(session, indicator)
|
||||
logger.info("%s: %d records", indicator, len(indicator_rows))
|
||||
rows.extend(indicator_rows)
|
||||
|
||||
assert len(rows) >= 200, f"expected ≥200 WB records, got {len(rows)} — API may have changed"
|
||||
logger.info("total: %d WDI records", len(rows))
|
||||
|
||||
year, month = year_month.split("/")
|
||||
dest_dir = landing_path(landing_dir, "worldbank", year, month)
|
||||
dest = dest_dir / "wdi_indicators.json.gz"
|
||||
payload = json.dumps({"rows": rows, "count": len(rows)}).encode()
|
||||
bytes_written = write_gzip_atomic(dest, payload)
|
||||
logger.info("written %s bytes compressed", f"{bytes_written:,}")
|
||||
|
||||
return {
|
||||
"files_written": 1,
|
||||
"files_skipped": 0,
|
||||
"bytes_written": bytes_written,
|
||||
"cursor_value": year_month,
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
run_extractor(EXTRACTOR_NAME, extract)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -42,7 +42,7 @@ do
|
||||
# The web app detects the inode change on next query — no restart needed.
|
||||
DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
|
||||
SERVING_DUCKDB_PATH="${SERVING_DUCKDB_PATH:-/data/padelnomics/analytics.duckdb}" \
|
||||
uv run python -m padelnomics.export_serving
|
||||
uv run python src/padelnomics/export_serving.py
|
||||
|
||||
) || {
|
||||
if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then
|
||||
|
||||
@@ -72,3 +72,8 @@ description = "UK local authority population estimates from ONS"
|
||||
module = "padelnomics_extract.gisco"
|
||||
schedule = "0 0 1 1 *"
|
||||
description = "EU geographic boundaries (NUTS2 polygons) from Eurostat GISCO"
|
||||
|
||||
[worldbank]
|
||||
module = "padelnomics_extract.worldbank"
|
||||
schedule = "monthly"
|
||||
description = "GNI per capita PPP + price level ratio from World Bank WDI"
|
||||
|
||||
@@ -2,10 +2,14 @@
|
||||
--
|
||||
-- Consolidates data previously duplicated across dim_cities and dim_locations:
|
||||
-- - country_name_en / country_slug (was: ~50-line CASE blocks in both models)
|
||||
-- - median_income_pps (was: country_income CTE in both models)
|
||||
-- - energy prices, labour costs, PLI indices (new — from Eurostat datasets)
|
||||
-- - median_income_pps (Eurostat PPS preferred, World Bank GNI PPP fallback)
|
||||
-- - energy prices, labour costs, PLI indices (Eurostat, WB price level ratio fallback)
|
||||
-- - cost override columns for the financial calculator
|
||||
--
|
||||
-- World Bank fallback: for non-EU countries (AR, MX, AE, AU, etc.), income and PLI
|
||||
-- are derived from WB WDI indicators calibrated to the Eurostat scale using Germany
|
||||
-- as anchor. See de_calibration CTE. EU countries keep exact Eurostat values.
|
||||
--
|
||||
-- Used by: dim_cities, dim_locations, pseo_city_costs_de, planner_defaults.
|
||||
-- Grain: country_code (one row per ISO 3166-1 alpha-2 country code).
|
||||
-- Kind: FULL — small table (~40 rows), full refresh daily.
|
||||
@@ -82,6 +86,26 @@ de_elec AS (
|
||||
de_gas AS (
|
||||
SELECT gas_eur_gj FROM latest_gas WHERE country_code = 'DE'
|
||||
),
|
||||
-- Latest World Bank WDI per country (GNI PPP + price level ratio)
|
||||
latest_wb AS (
|
||||
SELECT country_code, gni_ppp, price_level_ratio, ref_year AS wb_year
|
||||
FROM staging.stg_worldbank_income
|
||||
WHERE gni_ppp IS NOT NULL OR price_level_ratio IS NOT NULL
|
||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY country_code ORDER BY ref_year DESC) = 1
|
||||
),
|
||||
-- Germany calibration anchor: Eurostat PPS + WB GNI PPP + WB price ratio + Eurostat PLI construction.
|
||||
-- Used to scale World Bank values into Eurostat-comparable ranges.
|
||||
-- Single row; if DE is missing from any source, that ratio produces NULL (safe fallthrough).
|
||||
de_calibration AS (
|
||||
SELECT
|
||||
i.median_income_pps AS de_eurostat_pps,
|
||||
wb.gni_ppp AS de_gni_ppp,
|
||||
wb.price_level_ratio AS de_price_level_ratio,
|
||||
p.construction AS de_pli_construction
|
||||
FROM (SELECT median_income_pps FROM latest_income WHERE country_code = 'DE') i
|
||||
CROSS JOIN (SELECT gni_ppp, price_level_ratio FROM latest_wb WHERE country_code = 'DE') wb
|
||||
CROSS JOIN (SELECT construction FROM pli_pivoted WHERE country_code = 'DE') p
|
||||
),
|
||||
-- All distinct country codes from any source
|
||||
all_countries AS (
|
||||
SELECT country_code FROM latest_income
|
||||
@@ -93,6 +117,8 @@ all_countries AS (
|
||||
SELECT country_code FROM latest_labour
|
||||
UNION
|
||||
SELECT country_code FROM pli_pivoted
|
||||
UNION
|
||||
SELECT country_code FROM latest_wb
|
||||
-- Ensure known padel markets appear even if Eurostat doesn't cover them yet
|
||||
UNION ALL
|
||||
SELECT unnest(['DE','ES','GB','FR','IT','PT','AT','CH','NL','BE','SE','NO','DK','FI',
|
||||
@@ -149,15 +175,21 @@ SELECT
|
||||
ELSE ac.country_code
|
||||
END, '[^a-zA-Z0-9]+', '-'
|
||||
)) AS country_slug,
|
||||
-- Income data
|
||||
-- Income: Eurostat PPS preferred, World Bank GNI PPP scaled to PPS as fallback
|
||||
COALESCE(
|
||||
i.median_income_pps,
|
||||
i.income_year,
|
||||
ROUND(wb.gni_ppp * (de_cal.de_eurostat_pps / NULLIF(de_cal.de_gni_ppp, 0)), 0)
|
||||
) AS median_income_pps,
|
||||
COALESCE(i.income_year, wb.wb_year) AS income_year,
|
||||
-- Raw energy and labour data (for reference / future staffed-scenario use)
|
||||
e.electricity_eur_kwh,
|
||||
g.gas_eur_gj,
|
||||
la.labour_cost_eur_hour,
|
||||
-- PLI indices per category (EU27=100)
|
||||
p.construction AS pli_construction,
|
||||
-- PLI construction: Eurostat preferred, World Bank price level ratio scaled to PLI as fallback
|
||||
COALESCE(
|
||||
p.construction,
|
||||
ROUND(wb.price_level_ratio / NULLIF(de_cal.de_price_level_ratio, 0) * de_cal.de_pli_construction, 1)
|
||||
) AS pli_construction,
|
||||
p.housing AS pli_housing,
|
||||
p.services AS pli_services,
|
||||
p.misc AS pli_misc,
|
||||
@@ -278,8 +310,10 @@ LEFT JOIN latest_electricity e ON ac.country_code = e.country_code
|
||||
LEFT JOIN latest_gas g ON ac.country_code = g.country_code
|
||||
LEFT JOIN latest_labour la ON ac.country_code = la.country_code
|
||||
LEFT JOIN pli_pivoted p ON ac.country_code = p.country_code
|
||||
LEFT JOIN latest_wb wb ON ac.country_code = wb.country_code
|
||||
CROSS JOIN de_pli de_p
|
||||
CROSS JOIN de_elec de_e
|
||||
CROSS JOIN de_gas de_g
|
||||
CROSS JOIN de_calibration de_cal
|
||||
-- Enforce grain
|
||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY ac.country_code ORDER BY ac.country_code) = 1
|
||||
|
||||
@@ -5,28 +5,36 @@
|
||||
--
|
||||
-- Two scores per location:
|
||||
--
|
||||
-- Padelnomics Market Score (Marktreife-Score v3, 0–100):
|
||||
-- Padelnomics Market Score (Marktreife-Score v4, 0–100):
|
||||
-- "How mature/established is this padel market?"
|
||||
-- Only meaningful for locations matched to a dim_cities row (city_slug IS NOT NULL)
|
||||
-- with padel venues. 0 for all other locations.
|
||||
--
|
||||
-- 40 pts supply development — log-scaled density (LN ceiling 20/100k) × count gate
|
||||
-- 25 pts demand evidence — occupancy when available; 40% density proxy otherwise
|
||||
-- v4 changes: lower count gate (5→3), lower density ceiling (LN(21)→LN(11)),
|
||||
-- better demand fallback (0.4→0.65 with 0.3 floor), economic context discrimination (200→25K).
|
||||
--
|
||||
-- 40 pts supply development — log-scaled density (LN ceiling 10/100k) × count gate (3)
|
||||
-- 25 pts demand evidence — occupancy when available; 65% density proxy + 0.3 floor otherwise
|
||||
-- 15 pts addressable market — log-scaled population, ceiling 1M
|
||||
-- 10 pts economic context — income PPS normalised to 200 ceiling
|
||||
-- 10 pts economic context — income PPS normalised to 25,000 ceiling
|
||||
-- 10 pts data quality — completeness discount
|
||||
--
|
||||
-- Padelnomics Opportunity Score (Marktpotenzial-Score v3, 0–100):
|
||||
-- Padelnomics Opportunity Score (Marktpotenzial-Score v5, 0–100):
|
||||
-- "Where should I build a padel court?"
|
||||
-- Computed for ALL locations — zero-court locations score highest on supply gap.
|
||||
-- H3 catchment methodology: addressable market and supply gap use a regional
|
||||
-- Computed for ALL locations — zero-court locations score highest on supply deficit.
|
||||
-- H3 catchment methodology: addressable market and supply deficit use a regional
|
||||
-- H3 catchment (res-5 cell + 6 neighbours, ~24km radius).
|
||||
--
|
||||
-- v5 changes: merge supply gap + catchment gap → single supply deficit (35 pts),
|
||||
-- add sports culture proxy (10 pts, tennis density), add construction affordability (5 pts),
|
||||
-- reduce economic power from 20 → 15 pts.
|
||||
--
|
||||
-- 25 pts addressable market — log-scaled catchment population, ceiling 500K
|
||||
-- 20 pts economic power — income PPS, normalised to 35,000
|
||||
-- 30 pts supply gap — inverted catchment venue density; 0 courts = full marks
|
||||
-- 15 pts catchment gap — distance to nearest padel court
|
||||
-- 10 pts sports culture — tennis courts within 25km
|
||||
-- 15 pts economic power — income PPS, normalised to 35,000
|
||||
-- 35 pts supply deficit — max(density gap, distance gap); eliminates double-count
|
||||
-- 10 pts sports culture — tennis court density as racquet-sport adoption proxy
|
||||
-- 5 pts construction affordability — income relative to construction costs (PLI)
|
||||
-- 10 pts market validation — country-level avg market maturity (from market_scored CTE)
|
||||
--
|
||||
-- Consumers query directly with WHERE filters:
|
||||
-- cities API: WHERE country_slug = ? AND city_slug IS NOT NULL
|
||||
@@ -105,7 +113,7 @@ city_match AS (
|
||||
ORDER BY c.padel_venue_count DESC
|
||||
) = 1
|
||||
),
|
||||
-- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment
|
||||
-- Pricing / occupancy from Playtomic (via city_slug) + H3 catchment + country PLI
|
||||
with_pricing AS (
|
||||
SELECT
|
||||
b.*,
|
||||
@@ -118,6 +126,7 @@ with_pricing AS (
|
||||
vpb.median_occupancy_rate,
|
||||
vpb.median_daily_revenue_per_venue,
|
||||
vpb.price_currency,
|
||||
dc.pli_construction,
|
||||
COALESCE(ct.catchment_population, b.population)::BIGINT AS catchment_population,
|
||||
COALESCE(ct.catchment_padel_courts, b.padel_venue_count)::INTEGER AS catchment_padel_courts
|
||||
FROM base b
|
||||
@@ -129,9 +138,11 @@ with_pricing AS (
|
||||
AND cm.city_slug = vpb.city_slug
|
||||
LEFT JOIN catchment ct
|
||||
ON b.geoname_id = ct.geoname_id
|
||||
LEFT JOIN foundation.dim_countries dc
|
||||
ON b.country_code = dc.country_code
|
||||
),
|
||||
-- Both scores computed from the enriched base
|
||||
scored AS (
|
||||
-- Step 1: market score only — needed first so we can aggregate country averages.
|
||||
market_scored AS (
|
||||
SELECT *,
|
||||
-- City-level venue density (from dim_cities exact count, not dim_locations spatial 5km)
|
||||
CASE WHEN population > 0
|
||||
@@ -144,34 +155,38 @@ scored AS (
|
||||
WHEN population > 0 OR COALESCE(city_padel_venue_count, 0) > 0 THEN 0.5
|
||||
ELSE 0.0
|
||||
END AS data_confidence,
|
||||
-- ── Market Score (Marktreife-Score v3) ──────────────────────────────────
|
||||
-- ── Market Score (Marktreife-Score v4) ──────────────────────────────────
|
||||
-- 0 when no city match or no venues (city_padel_venue_count NULL or 0)
|
||||
CASE WHEN COALESCE(city_padel_venue_count, 0) > 0 THEN
|
||||
ROUND(
|
||||
-- Supply development (40 pts)
|
||||
-- density ceiling 10/100k (LN(11)), count gate 3 venues
|
||||
40.0 * LEAST(1.0, LN(
|
||||
COALESCE(
|
||||
CASE WHEN population > 0
|
||||
THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
|
||||
ELSE 0 END
|
||||
, 0) + 1) / LN(21))
|
||||
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0)
|
||||
, 0) + 1) / LN(11))
|
||||
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0)
|
||||
-- Demand evidence (25 pts)
|
||||
-- with occupancy: scale to 65% target. Without: 65% of supply proxy + 0.3 floor
|
||||
-- (existence of venues IS evidence of demand)
|
||||
+ 25.0 * CASE
|
||||
WHEN median_occupancy_rate IS NOT NULL
|
||||
THEN LEAST(1.0, median_occupancy_rate / 0.65)
|
||||
ELSE 0.4 * LEAST(1.0, LN(
|
||||
ELSE GREATEST(0.3, 0.65 * LEAST(1.0, LN(
|
||||
COALESCE(
|
||||
CASE WHEN population > 0
|
||||
THEN COALESCE(city_padel_venue_count, 0)::DOUBLE / population * 100000
|
||||
ELSE 0 END
|
||||
, 0) + 1) / LN(21))
|
||||
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 5.0)
|
||||
, 0) + 1) / LN(11))
|
||||
* LEAST(1.0, COALESCE(city_padel_venue_count, 0) / 3.0))
|
||||
END
|
||||
-- Addressable market (15 pts)
|
||||
+ 15.0 * LEAST(1.0, LN(GREATEST(population, 1)) / LN(1000000))
|
||||
-- Economic context (10 pts)
|
||||
+ 10.0 * LEAST(1.0, COALESCE(median_income_pps, 100) / 200.0)
|
||||
-- ceiling 25,000 PPS discriminates between wealthy and poorer markets
|
||||
+ 10.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 25000.0)
|
||||
-- Data quality (10 pts)
|
||||
+ 10.0 * CASE
|
||||
WHEN population > 0 AND COALESCE(city_padel_venue_count, 0) > 0 THEN 1.0
|
||||
@@ -180,25 +195,57 @@ scored AS (
|
||||
END
|
||||
, 1)
|
||||
ELSE 0
|
||||
END AS market_score,
|
||||
-- ── Opportunity Score (Marktpotenzial-Score v3, H3 catchment) ──────────
|
||||
END AS market_score
|
||||
FROM with_pricing
|
||||
),
|
||||
-- Step 2: country-level avg market maturity — used as market validation signal (10 pts).
|
||||
-- Filter to market_score > 0 (cities with padel courts only) so zero-court locations
|
||||
-- don't dilute the country signal. ES proven demand → ~60, SE struggling → ~35.
|
||||
country_market AS (
|
||||
SELECT
|
||||
country_code,
|
||||
ROUND(AVG(market_score), 1) AS country_avg_market_score
|
||||
FROM market_scored
|
||||
WHERE market_score > 0
|
||||
GROUP BY country_code
|
||||
),
|
||||
-- Step 3: add opportunity_score using country market validation signal.
|
||||
scored AS (
|
||||
SELECT ms.*,
|
||||
-- ── Opportunity Score (Marktpotenzial-Score v5, H3 catchment) ──────────
|
||||
ROUND(
|
||||
-- Addressable market (25 pts): log-scaled catchment population, ceiling 500K
|
||||
25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000))
|
||||
-- Economic power (20 pts): income PPS normalised to 35,000
|
||||
+ 20.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
|
||||
-- Supply gap (30 pts): inverted catchment venue density
|
||||
+ 30.0 * GREATEST(0.0, 1.0 - COALESCE(
|
||||
-- Economic power (15 pts): income PPS normalised to 35,000
|
||||
+ 15.0 * LEAST(1.0, COALESCE(median_income_pps, 15000) / 35000.0)
|
||||
-- Supply deficit (35 pts): max of density gap and distance gap.
|
||||
-- Merges old supply gap (30) + catchment gap (15) which were ~80% correlated.
|
||||
+ 35.0 * GREATEST(
|
||||
-- density-based gap (H3 catchment): 0 courts = 1.0, 8/100k = 0.0
|
||||
GREATEST(0.0, 1.0 - COALESCE(
|
||||
CASE WHEN catchment_population > 0
|
||||
THEN catchment_padel_courts::DOUBLE / catchment_population * 100000
|
||||
THEN GREATEST(catchment_padel_courts, COALESCE(city_padel_venue_count, 0))::DOUBLE / catchment_population * 100000
|
||||
ELSE 0.0
|
||||
END, 0.0) / 8.0)
|
||||
-- Catchment gap (15 pts): distance to nearest court
|
||||
+ 15.0 * COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
|
||||
-- Sports culture (10 pts): tennis courts within 25km
|
||||
+ 10.0 * LEAST(1.0, tennis_courts_within_25km / 10.0)
|
||||
END, 0.0) / 8.0),
|
||||
-- distance-based gap: 30km+ = 1.0, 0km = 0.0; NULL = 0.5
|
||||
COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
|
||||
)
|
||||
-- Sports culture (10 pts): tennis density as racquet-sport adoption proxy.
|
||||
-- Ceiling 50 courts within 25km. Harmless when tennis data is zero (contributes 0).
|
||||
+ 10.0 * LEAST(1.0, COALESCE(tennis_courts_within_25km, 0) / 50.0)
|
||||
-- Construction affordability (5 pts): income purchasing power relative to build costs.
|
||||
-- PLI construction is EU27=100 index. High income + low construction cost = high score.
|
||||
+ 5.0 * LEAST(1.0,
|
||||
COALESCE(median_income_pps, 15000) / 35000.0
|
||||
/ GREATEST(0.5, COALESCE(pli_construction, 100.0) / 100.0)
|
||||
)
|
||||
-- Market validation (10 pts): country-level avg market maturity.
|
||||
-- ES (~70/100): proven demand → ~7 pts. SE (~35/100): emerging → ~3.5 pts.
|
||||
-- NULL (no courts in country yet): 0.5 neutral → 5 pts (untested, not penalised).
|
||||
+ 10.0 * COALESCE(cm.country_avg_market_score / 100.0, 0.5)
|
||||
, 1) AS opportunity_score
|
||||
FROM with_pricing
|
||||
FROM market_scored ms
|
||||
LEFT JOIN country_market cm ON ms.country_code = cm.country_code
|
||||
)
|
||||
SELECT
|
||||
s.geoname_id,
|
||||
|
||||
@@ -18,13 +18,14 @@ SELECT
|
||||
country_slug,
|
||||
COUNT(*) AS city_count,
|
||||
SUM(padel_venue_count) AS total_venues,
|
||||
ROUND(AVG(market_score), 1) AS avg_market_score,
|
||||
-- Population-weighted: large cities (Madrid, Barcelona) dominate, not hundreds of small towns
|
||||
ROUND(SUM(market_score * population) / NULLIF(SUM(population), 0), 1) AS avg_market_score,
|
||||
MAX(market_score) AS top_city_market_score,
|
||||
-- Top 5 cities by venue count (prominence), then score for internal linking
|
||||
LIST(city_slug ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_slugs,
|
||||
LIST(city_name ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_names,
|
||||
-- Opportunity score aggregates (NULL-safe: cities without geoname_id match excluded from AVG)
|
||||
ROUND(AVG(opportunity_score), 1) AS avg_opportunity_score,
|
||||
-- Opportunity score aggregates (population-weighted: saturated megacities dominate, not hundreds of small towns)
|
||||
ROUND(SUM(opportunity_score * population) / NULLIF(SUM(population), 0), 1) AS avg_opportunity_score,
|
||||
MAX(opportunity_score) AS top_opportunity_score,
|
||||
-- Top 5 opportunity cities by population (prominence), then opportunity score
|
||||
LIST(city_slug ORDER BY population DESC, opportunity_score DESC NULLS LAST)[1:5] AS top_opportunity_slugs,
|
||||
@@ -36,6 +37,8 @@ SELECT
|
||||
-- Use the most common currency in the country (MIN is deterministic for single-currency countries)
|
||||
MIN(price_currency) AS price_currency,
|
||||
SUM(population) AS total_population,
|
||||
ROUND(SUM(lat * population) / NULLIF(SUM(population), 0), 4) AS lat,
|
||||
ROUND(SUM(lon * population) / NULLIF(SUM(population), 0), 4) AS lon,
|
||||
CURRENT_DATE AS refreshed_date
|
||||
FROM serving.pseo_city_costs_de
|
||||
GROUP BY country_code, country_name_en, country_slug
|
||||
|
||||
@@ -0,0 +1,41 @@
|
||||
-- World Bank WDI indicators: GNI per capita PPP and price level ratio.
|
||||
-- Pivoted to one row per (country_code, ref_year) with both indicators as columns.
|
||||
--
|
||||
-- Source: data/landing/worldbank/{year}/{month}/wdi_indicators.json.gz
|
||||
-- Extracted by: worldbank.py
|
||||
-- Used by: dim_countries (fallback behind Eurostat for non-EU countries)
|
||||
|
||||
MODEL (
|
||||
name staging.stg_worldbank_income,
|
||||
kind FULL,
|
||||
cron '@daily',
|
||||
grain (country_code, ref_year)
|
||||
);
|
||||
|
||||
WITH parsed AS (
|
||||
SELECT
|
||||
row ->> 'country_code' AS country_code,
|
||||
TRY_CAST(row ->> 'ref_year' AS INTEGER) AS ref_year,
|
||||
row ->> 'indicator' AS indicator,
|
||||
TRY_CAST(row ->> 'value' AS DOUBLE) AS value,
|
||||
CURRENT_DATE AS extracted_date
|
||||
FROM (
|
||||
SELECT UNNEST(rows) AS row
|
||||
FROM read_json(
|
||||
@LANDING_DIR || '/worldbank/*/*/wdi_indicators.json.gz',
|
||||
auto_detect = true
|
||||
)
|
||||
)
|
||||
WHERE (row ->> 'country_code') IS NOT NULL
|
||||
)
|
||||
SELECT
|
||||
country_code,
|
||||
ref_year,
|
||||
MAX(value) FILTER (WHERE indicator = 'NY.GNP.PCAP.PP.CD') AS gni_ppp,
|
||||
MAX(value) FILTER (WHERE indicator = 'PA.NUS.PPPC.RF') AS price_level_ratio,
|
||||
MAX(extracted_date) AS extracted_date
|
||||
FROM parsed
|
||||
WHERE value IS NOT NULL
|
||||
AND value > 0
|
||||
AND LENGTH(country_code) = 2
|
||||
GROUP BY country_code, ref_year
|
||||
@@ -111,7 +111,7 @@ _DAG: dict[str, list[str]] = {
|
||||
"fct_daily_availability": ["fct_availability_slot", "dim_venue_capacity"],
|
||||
# Serving
|
||||
"venue_pricing_benchmarks": ["fct_daily_availability"],
|
||||
"location_profiles": ["dim_locations", "dim_cities", "venue_pricing_benchmarks"],
|
||||
"location_profiles": ["dim_locations", "dim_cities", "dim_countries", "venue_pricing_benchmarks"],
|
||||
"planner_defaults": ["venue_pricing_benchmarks", "location_profiles"],
|
||||
"pseo_city_costs_de": [
|
||||
"location_profiles", "planner_defaults",
|
||||
|
||||
@@ -8,6 +8,7 @@ daily when the pipeline runs).
|
||||
from quart import Blueprint, abort, jsonify
|
||||
|
||||
from .analytics import fetch_analytics
|
||||
from .auth.routes import login_required
|
||||
from .core import fetch_all, is_flag_enabled
|
||||
|
||||
bp = Blueprint("api", __name__)
|
||||
@@ -26,6 +27,7 @@ async def _require_maps_flag() -> None:
|
||||
|
||||
|
||||
@bp.route("/markets/countries.json")
|
||||
@login_required
|
||||
async def countries():
|
||||
"""Country-level aggregates for the markets hub map."""
|
||||
await _require_maps_flag()
|
||||
@@ -96,23 +98,3 @@ async def city_venues(country_slug: str, city_slug: str):
|
||||
)
|
||||
return jsonify(rows), 200, _CACHE_HEADERS
|
||||
|
||||
|
||||
@bp.route("/opportunity/<country_slug>.json")
|
||||
async def opportunity(country_slug: str):
|
||||
"""Location-level opportunity scores for the opportunity map."""
|
||||
await _require_maps_flag()
|
||||
assert country_slug, "country_slug required"
|
||||
rows = await fetch_analytics(
|
||||
"""
|
||||
SELECT location_name, location_slug, lat, lon,
|
||||
opportunity_score, market_score,
|
||||
nearest_padel_court_km,
|
||||
padel_venue_count, population
|
||||
FROM serving.location_profiles
|
||||
WHERE country_slug = ? AND opportunity_score > 0
|
||||
ORDER BY opportunity_score DESC
|
||||
LIMIT 500
|
||||
""",
|
||||
[country_slug],
|
||||
)
|
||||
return jsonify(rows), 200, _CACHE_HEADERS
|
||||
|
||||
@@ -9,6 +9,7 @@ from jinja2 import Environment, FileSystemLoader
|
||||
from markupsafe import Markup
|
||||
from quart import Blueprint, abort, g, redirect, render_template, request
|
||||
|
||||
from ..analytics import fetch_analytics
|
||||
from ..core import (
|
||||
REPO_ROOT,
|
||||
capture_waitlist_email,
|
||||
@@ -203,6 +204,14 @@ async def markets():
|
||||
)
|
||||
|
||||
articles = await _filter_articles(q, country, region)
|
||||
map_countries = await fetch_analytics("""
|
||||
SELECT country_code, country_name_en, country_slug,
|
||||
city_count, total_venues,
|
||||
avg_market_score, avg_opportunity_score,
|
||||
lat, lon
|
||||
FROM serving.pseo_country_overview
|
||||
ORDER BY total_venues DESC
|
||||
""")
|
||||
|
||||
return await render_template(
|
||||
"markets.html",
|
||||
@@ -212,6 +221,7 @@ async def markets():
|
||||
current_q=q,
|
||||
current_country=country,
|
||||
current_region=region,
|
||||
map_countries=map_countries,
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -92,10 +92,8 @@
|
||||
});
|
||||
}
|
||||
|
||||
fetch('/api/markets/countries.json')
|
||||
.then(function(r) { return r.json(); })
|
||||
.then(function(data) {
|
||||
if (!data.length) return;
|
||||
var data = {{ map_countries | tojson }};
|
||||
if (data.length) {
|
||||
var maxV = Math.max.apply(null, data.map(function(d) { return d.total_venues; }));
|
||||
var lang = document.documentElement.lang || 'en';
|
||||
data.forEach(function(c) {
|
||||
@@ -112,7 +110,7 @@
|
||||
.on('click', function() { window.location = '/' + lang + '/markets/' + c.country_slug; })
|
||||
.addTo(map);
|
||||
});
|
||||
});
|
||||
}
|
||||
})();
|
||||
</script>
|
||||
{% endblock %}
|
||||
|
||||
@@ -87,6 +87,46 @@ async def opportunity_map():
|
||||
return await render_template("opportunity_map.html", countries=countries)
|
||||
|
||||
|
||||
@bp.route("/opportunity-map/data")
|
||||
async def opportunity_map_data():
|
||||
"""HTMX partial: opportunity + reference data islands for Leaflet map."""
|
||||
from ..core import is_flag_enabled
|
||||
if not await is_flag_enabled("maps", default=True):
|
||||
abort(404)
|
||||
country_slug = request.args.get("country", "")
|
||||
if not country_slug:
|
||||
return ""
|
||||
opp_points = await fetch_analytics(
|
||||
"""
|
||||
SELECT location_name, location_slug, lat, lon,
|
||||
opportunity_score, market_score,
|
||||
nearest_padel_court_km, padel_venue_count, population
|
||||
FROM serving.location_profiles
|
||||
WHERE country_slug = ? AND opportunity_score > 0
|
||||
ORDER BY opportunity_score DESC
|
||||
LIMIT 500
|
||||
""",
|
||||
[country_slug],
|
||||
)
|
||||
ref_points = await fetch_analytics(
|
||||
"""
|
||||
SELECT city_name, city_slug, lat, lon,
|
||||
city_padel_venue_count AS padel_venue_count,
|
||||
market_score, population
|
||||
FROM serving.location_profiles
|
||||
WHERE country_slug = ? AND city_slug IS NOT NULL
|
||||
ORDER BY city_padel_venue_count DESC
|
||||
LIMIT 200
|
||||
""",
|
||||
[country_slug],
|
||||
)
|
||||
return await render_template(
|
||||
"partials/opportunity_map_data.html",
|
||||
opp_points=opp_points,
|
||||
ref_points=ref_points,
|
||||
)
|
||||
|
||||
|
||||
@bp.route("/imprint")
|
||||
async def imprint():
|
||||
lang = g.get("lang", "en")
|
||||
|
||||
@@ -24,7 +24,10 @@
|
||||
|
||||
<div class="card mb-4" style="padding: 1rem 1.25rem;">
|
||||
<label class="form-label" for="opp-country-select" style="margin-bottom: 0.5rem; display:block;">Select a country</label>
|
||||
<select id="opp-country-select" class="form-input" style="max-width: 280px;">
|
||||
<select id="opp-country-select" name="country" class="form-input" style="max-width:280px;"
|
||||
hx-get="{{ url_for('public.opportunity_map_data') }}"
|
||||
hx-target="#map-data"
|
||||
hx-trigger="change">
|
||||
<option value="">— choose country —</option>
|
||||
{% for c in countries %}
|
||||
<option value="{{ c.country_slug }}">{{ c.country_name_en }}</option>
|
||||
@@ -33,6 +36,7 @@
|
||||
</div>
|
||||
|
||||
<div id="opportunity-map"></div>
|
||||
<div id="map-data" style="display:none;"></div>
|
||||
|
||||
<div class="mt-4 text-sm text-slate">
|
||||
<strong>Circle size:</strong> population |
|
||||
@@ -86,18 +90,27 @@
|
||||
: (p || '');
|
||||
}
|
||||
|
||||
function loadCountry(slug) {
|
||||
function renderMap() {
|
||||
oppLayer.clearLayers();
|
||||
refLayer.clearLayers();
|
||||
if (!slug) return;
|
||||
var oppEl = document.getElementById('opp-data');
|
||||
var refEl = document.getElementById('ref-data');
|
||||
if (!oppEl) return;
|
||||
var oppData = JSON.parse(oppEl.textContent);
|
||||
var refData = JSON.parse(refEl.textContent);
|
||||
|
||||
fetch('/api/opportunity/' + slug + '.json')
|
||||
.then(function(r) { return r.json(); })
|
||||
.then(function(data) {
|
||||
if (!data.length) return;
|
||||
var maxPop = Math.max.apply(null, data.map(function(d) { return d.population || 1; }));
|
||||
refData.forEach(function(c) {
|
||||
if (!c.lat || !c.lon || !c.padel_venue_count) return;
|
||||
L.marker([c.lat, c.lon], { icon: REF_ICON })
|
||||
.bindTooltip(c.city_name + ' — ' + c.padel_venue_count + ' existing venues',
|
||||
{ className: 'map-tooltip', direction: 'top', offset: [0, -7] })
|
||||
.addTo(refLayer);
|
||||
});
|
||||
|
||||
if (!oppData.length) return;
|
||||
var maxPop = Math.max.apply(null, oppData.map(function(d) { return d.population || 1; }));
|
||||
var bounds = [];
|
||||
data.forEach(function(loc) {
|
||||
oppData.forEach(function(loc) {
|
||||
if (!loc.lat || !loc.lon) return;
|
||||
var size = 8 + 40 * Math.sqrt((loc.population || 1) / maxPop);
|
||||
var color = oppColor(loc.opportunity_score);
|
||||
@@ -115,24 +128,10 @@
|
||||
bounds.push([loc.lat, loc.lon]);
|
||||
});
|
||||
if (bounds.length) map.fitBounds(bounds, { padding: [30, 30] });
|
||||
});
|
||||
|
||||
// Existing venues as small gray reference dots (drawn first = behind opp dots)
|
||||
fetch('/api/markets/' + slug + '/cities.json')
|
||||
.then(function(r) { return r.json(); })
|
||||
.then(function(data) {
|
||||
data.forEach(function(c) {
|
||||
if (!c.lat || !c.lon || !c.padel_venue_count) return;
|
||||
L.marker([c.lat, c.lon], { icon: REF_ICON })
|
||||
.bindTooltip(c.city_name + ' — ' + c.padel_venue_count + ' existing venues',
|
||||
{ className: 'map-tooltip', direction: 'top', offset: [0, -7] })
|
||||
.addTo(refLayer);
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
document.getElementById('opp-country-select').addEventListener('change', function() {
|
||||
loadCountry(this.value);
|
||||
document.body.addEventListener('htmx:afterSwap', function(e) {
|
||||
if (e.detail.target.id === 'map-data') renderMap();
|
||||
});
|
||||
})();
|
||||
</script>
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
<script id="opp-data" type="application/json">{{ opp_points | tojson }}</script>
|
||||
<script id="ref-data" type="application/json">{{ ref_points | tojson }}</script>
|
||||
Reference in New Issue
Block a user