docs: CHANGELOG + PROJECT.md for opportunity score data quality improvements

Documents Phase 0 (income ceiling fix), Phase 1b (overpass_tennis workflow),
and Phase 2b (dim_cities spatial population fallback, 70.5%→98.5% coverage).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-27 08:48:16 +01:00
parent 3aa30ab419
commit e32f7ba4b8
2 changed files with 10 additions and 1 deletions

View File

@@ -7,6 +7,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Changed
- **Opportunity Score v2 — income ceiling fix** (`location_opportunity_profile.sql`): income PPS normalisation changed from `/200.0` (caused LEAST(1.0, 115)=1.0 for ALL countries — no differentiation) to `/35000.0` with country-spread-matched ceiling. Default for missing data changed from 100 to 15000 (developing-market assumption). Country scores now reflect real PPS spread: LU 20.0, SE 14.3, DE 13.2, ES 10.7, GB 10.5 pts (was 20.0 everywhere).
- **dim_cities population coverage 70.5% → 98.5%** — added GeoNames spatial fallback CTE that finds the nearest GeoNames location within ~15 km when string name matching fails (~29% of cities). Fixes localization mismatches (Milano≠Milan, Wien≠Vienna, München≠Munich): Wien 0→1,691,468; Milano 0→1,371,498. Population cascade now: Eurostat EU > US Census > ONS UK > GeoNames string > GeoNames spatial > 0.
### Added
- **overpass_tennis** workflow added to `infra/supervisor/workflows.toml` — tennis courts extraction was only in the combined `all.py` extractor; now scheduled monthly by the supervisor so it runs automatically in production.
- **Market Score v3 (Marktreife-Score recalibration)** — fixes ranking inversion where early-stage markets (Germany 1/100k) outscored mature markets (Spain 36/100k):
- **Formula rewrite** (`city_market_profile.sql`): supply development now 40 pts (log-scaled density LN(d+1)/LN(21) × count gate min(1,count/5)); demand evidence 25 pts (occupancy or 40% density proxy); population reduced to 15 pts (context); income to 10 pts (context); data quality to 10 pts; saturation discount removed
- **Count gate** eliminates small-town inflation: a single venue in a 5k-resident town can no longer outscore Berlin (was 92.7 → now 43.9 for Bernau bei Berlin)

View File

@@ -1,7 +1,7 @@
# Padelnomics — Project Tracker
> Move tasks across columns as you work. Add new tasks at the top of the relevant column.
> Last updated: 2026-02-27.
> Last updated: 2026-02-27 (opportunity score data quality improvements).
---
@@ -89,6 +89,9 @@
- [x] **Opportunity Score integration**`opportunity_score` (Marktpotenzial) wired into city + country templates; `geoname_id` threaded through SQL chain (dim_cities → city_market_profile → pseo_city_costs_de); 71.4% city match rate; stats strip, intro paragraphs, market tables, and FAQ updated in both DE + EN
- [x] **Market Score v3 recalibration** — fixes ranking inversion (Germany 1/100k was outscoring Spain 36/100k); log-scaled density + count gate replaces linear formula; saturation discount removed; template thresholds updated across all 3 pSEO templates; verified: Málaga 70.1, Barcelona 67.4, Madrid 66.9, Amsterdam 58.4, Bernau 43.9 (was 92.7), Berlin 42.2, London 44.1
- [x] **Opportunity Score v2** — supply gap ceiling raised 4→8/100k (gentler gradient, accounts for 87% data undercount); formula documentation added (DuckDB LEAST NULL behaviour, income saturation, tennis data gap)
- [x] **Opportunity Score v2 — income ceiling fix** — PPS normalisation `/200.0``/35000.0`; economic power component now differentiates countries (DE 13.2, ES 10.7, SE 14.3 pts; was 20.0 everywhere)
- [x] **dim_cities population coverage 70.5% → 98.5%** — GeoNames spatial fallback CTE (ST_Distance_Sphere, 0.14° bbox) resolves localization mismatches (Wien→Vienna 1.69M, Milano→Milan 1.37M); population cascade: Eurostat > Census > ONS > GeoNames string > GeoNames spatial > 0
- [x] **overpass_tennis added to supervisor workflows** — monthly schedule in `workflows.toml`; was only in combined extractor
### Data Pipeline (DaaS)
- [x] Overpass API extractor (OSM padel courts)