Compare commits

..

2 Commits

Author SHA1 Message Date
Deeman
544891611f feat(transform): opportunity score v4 — market validation + population-weighted aggregation
All checks were successful
CI / test (push) Successful in 57s
CI / tag (push) Successful in 2s
Two targeted fixes for inflated country scores (ES 83, SE 77):

1. pseo_country_overview: replace AVG() with population-weighted averages
   for avg_opportunity_score and avg_market_score. Madrid/Barcelona now
   dominate Spain's average instead of hundreds of 30K-town white-space
   towns. Expected ES drop from ~83 to ~55-65.

2. location_profiles: replace dead sports culture component (10 pts,
   tennis data all zeros) with market validation signal.
   Split scored CTE into: market_scored → country_market → scored.
   country_market aggregates AVG(market_score) per country from cities
   with padel courts (market_score > 0), so zero-court locations don't
   dilute the signal. ES (~60/100) → ~6 pts. SE (~35/100) → ~3.5 pts.
   NULL → 0.5 neutral → 5 pts (untested market, not penalised).

Score budget unchanged: 25+20+30+15+10 = 100 pts.
No new models, no new data sources, no cycles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 17:23:11 +01:00
Deeman
b071199895 fix(docker): copy content/ directory into image
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 2s
content/articles/ holds the cornerstone .md source files which
_sync_static_articles() reads on every /admin/articles load.
Without this COPY they were absent from the container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 15:03:44 +01:00
3 changed files with 36 additions and 12 deletions

View File

@@ -26,6 +26,7 @@ RUN mkdir -p /app/data && chown -R appuser:appuser /app
COPY --from=build --chown=appuser:appuser /app . COPY --from=build --chown=appuser:appuser /app .
COPY --from=css-build /app/web/src/padelnomics/static/css/output.css ./web/src/padelnomics/static/css/output.css COPY --from=css-build /app/web/src/padelnomics/static/css/output.css ./web/src/padelnomics/static/css/output.css
COPY --chown=appuser:appuser infra/supervisor/workflows.toml ./infra/supervisor/workflows.toml COPY --chown=appuser:appuser infra/supervisor/workflows.toml ./infra/supervisor/workflows.toml
COPY --chown=appuser:appuser content/ ./content/
USER appuser USER appuser
ENV PYTHONUNBUFFERED=1 ENV PYTHONUNBUFFERED=1
ENV DATABASE_PATH=/app/data/app.db ENV DATABASE_PATH=/app/data/app.db

View File

@@ -16,7 +16,7 @@
-- 10 pts economic context — income PPS normalised to 200 ceiling -- 10 pts economic context — income PPS normalised to 200 ceiling
-- 10 pts data quality — completeness discount -- 10 pts data quality — completeness discount
-- --
-- Padelnomics Opportunity Score (Marktpotenzial-Score v3, 0100): -- Padelnomics Opportunity Score (Marktpotenzial-Score v4, 0100):
-- "Where should I build a padel court?" -- "Where should I build a padel court?"
-- Computed for ALL locations — zero-court locations score highest on supply gap. -- Computed for ALL locations — zero-court locations score highest on supply gap.
-- H3 catchment methodology: addressable market and supply gap use a regional -- H3 catchment methodology: addressable market and supply gap use a regional
@@ -26,7 +26,9 @@
-- 20 pts economic power — income PPS, normalised to 35,000 -- 20 pts economic power — income PPS, normalised to 35,000
-- 30 pts supply gap — inverted catchment venue density; 0 courts = full marks -- 30 pts supply gap — inverted catchment venue density; 0 courts = full marks
-- 15 pts catchment gap — distance to nearest padel court -- 15 pts catchment gap — distance to nearest padel court
-- 10 pts sports culture — tennis courts within 25km -- 10 pts market validation — country-level avg market maturity (from market_scored CTE).
-- Replaces sports culture proxy (v3: tennis data was all zeros).
-- ES (~60/100) → ~6 pts, SE (~35/100) → ~3.5 pts, unknown → 5 pts.
-- --
-- Consumers query directly with WHERE filters: -- Consumers query directly with WHERE filters:
-- cities API: WHERE country_slug = ? AND city_slug IS NOT NULL -- cities API: WHERE country_slug = ? AND city_slug IS NOT NULL
@@ -130,8 +132,8 @@ with_pricing AS (
LEFT JOIN catchment ct LEFT JOIN catchment ct
ON b.geoname_id = ct.geoname_id ON b.geoname_id = ct.geoname_id
), ),
-- Both scores computed from the enriched base -- Step 1: market score only — needed first so we can aggregate country averages.
scored AS ( market_scored AS (
SELECT *, SELECT *,
-- City-level venue density (from dim_cities exact count, not dim_locations spatial 5km) -- City-level venue density (from dim_cities exact count, not dim_locations spatial 5km)
CASE WHEN population > 0 CASE WHEN population > 0
@@ -180,8 +182,24 @@ scored AS (
END END
, 1) , 1)
ELSE 0 ELSE 0
END AS market_score, END AS market_score
-- ── Opportunity Score (Marktpotenzial-Score v3, H3 catchment) ────────── FROM with_pricing
),
-- Step 2: country-level avg market maturity — used as market validation signal (10 pts).
-- Filter to market_score > 0 (cities with padel courts only) so zero-court locations
-- don't dilute the country signal. ES proven demand → ~60, SE struggling → ~35.
country_market AS (
SELECT
country_code,
ROUND(AVG(market_score), 1) AS country_avg_market_score
FROM market_scored
WHERE market_score > 0
GROUP BY country_code
),
-- Step 3: add opportunity_score using country market validation signal.
scored AS (
SELECT ms.*,
-- ── Opportunity Score (Marktpotenzial-Score v4, H3 catchment) ──────────
ROUND( ROUND(
-- Addressable market (25 pts): log-scaled catchment population, ceiling 500K -- Addressable market (25 pts): log-scaled catchment population, ceiling 500K
25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000)) 25.0 * LEAST(1.0, LN(GREATEST(catchment_population, 1)) / LN(500000))
@@ -195,10 +213,14 @@ scored AS (
END, 0.0) / 8.0) END, 0.0) / 8.0)
-- Catchment gap (15 pts): distance to nearest court -- Catchment gap (15 pts): distance to nearest court
+ 15.0 * COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5) + 15.0 * COALESCE(LEAST(1.0, nearest_padel_court_km / 30.0), 0.5)
-- Sports culture (10 pts): tennis courts within 25km -- Market validation (10 pts): country-level avg market maturity.
+ 10.0 * LEAST(1.0, tennis_courts_within_25km / 10.0) -- Replaces sports culture (v3 tennis data was all zeros = dead code).
-- ES (~60/100): proven demand → ~6 pts. SE (~35/100): struggling → ~3.5 pts.
-- NULL (no courts in country yet): 0.5 neutral → 5 pts (untested, not penalised).
+ 10.0 * COALESCE(cm.country_avg_market_score / 100.0, 0.5)
, 1) AS opportunity_score , 1) AS opportunity_score
FROM with_pricing FROM market_scored ms
LEFT JOIN country_market cm ON ms.country_code = cm.country_code
) )
SELECT SELECT
s.geoname_id, s.geoname_id,

View File

@@ -18,13 +18,14 @@ SELECT
country_slug, country_slug,
COUNT(*) AS city_count, COUNT(*) AS city_count,
SUM(padel_venue_count) AS total_venues, SUM(padel_venue_count) AS total_venues,
ROUND(AVG(market_score), 1) AS avg_market_score, -- Population-weighted: large cities (Madrid, Barcelona) dominate, not hundreds of small towns
ROUND(SUM(market_score * population) / NULLIF(SUM(population), 0), 1) AS avg_market_score,
MAX(market_score) AS top_city_market_score, MAX(market_score) AS top_city_market_score,
-- Top 5 cities by venue count (prominence), then score for internal linking -- Top 5 cities by venue count (prominence), then score for internal linking
LIST(city_slug ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_slugs, LIST(city_slug ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_slugs,
LIST(city_name ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_names, LIST(city_name ORDER BY padel_venue_count DESC, market_score DESC NULLS LAST)[1:5] AS top_city_names,
-- Opportunity score aggregates (NULL-safe: cities without geoname_id match excluded from AVG) -- Opportunity score aggregates (population-weighted: saturated megacities dominate, not hundreds of small towns)
ROUND(AVG(opportunity_score), 1) AS avg_opportunity_score, ROUND(SUM(opportunity_score * population) / NULLIF(SUM(population), 0), 1) AS avg_opportunity_score,
MAX(opportunity_score) AS top_opportunity_score, MAX(opportunity_score) AS top_opportunity_score,
-- Top 5 opportunity cities by population (prominence), then opportunity score -- Top 5 opportunity cities by population (prominence), then opportunity score
LIST(city_slug ORDER BY population DESC, opportunity_score DESC NULLS LAST)[1:5] AS top_opportunity_slugs, LIST(city_slug ORDER BY population DESC, opportunity_score DESC NULLS LAST)[1:5] AS top_opportunity_slugs,