merge: opportunity score data quality improvements
Phase 0 — income ceiling fix (opportunity_score): PPS normalisation /200→/35000; economic power now differentiates countries (DE 13.2, ES 10.7, SE 14.3 pts; was 20.0 everywhere) Phase 1b — overpass_tennis in workflows.toml: Monthly schedule added; was only in combined extractor Phase 2b — dim_cities spatial population fallback: GeoNames spatial CTE (ST_Distance_Sphere, 0.14° bbox) resolves localization mismatches: Wien→1.69M, Milano→1.37M, München→1.49M Coverage: 70.5% → 98.5% (5,401/5,481 cities with population)
This commit is contained in:
@@ -7,6 +7,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
- **Opportunity Score v2 — income ceiling fix** (`location_opportunity_profile.sql`): income PPS normalisation changed from `/200.0` (caused LEAST(1.0, 115)=1.0 for ALL countries — no differentiation) to `/35000.0` with country-spread-matched ceiling. Default for missing data changed from 100 to 15000 (developing-market assumption). Country scores now reflect real PPS spread: LU 20.0, SE 14.3, DE 13.2, ES 10.7, GB 10.5 pts (was 20.0 everywhere).
|
||||
- **dim_cities population coverage 70.5% → 98.5%** — added GeoNames spatial fallback CTE that finds the nearest GeoNames location within ~15 km when string name matching fails (~29% of cities). Fixes localization mismatches (Milano≠Milan, Wien≠Vienna, München≠Munich): Wien 0→1,691,468; Milano 0→1,371,498. Population cascade now: Eurostat EU > US Census > ONS UK > GeoNames string > GeoNames spatial > 0.
|
||||
|
||||
### Added
|
||||
- **overpass_tennis** workflow added to `infra/supervisor/workflows.toml` — tennis courts extraction was only in the combined `all.py` extractor; now scheduled monthly by the supervisor so it runs automatically in production.
|
||||
|
||||
- **Market Score v3 (Marktreife-Score recalibration)** — fixes ranking inversion where early-stage markets (Germany 1/100k) outscored mature markets (Spain 36/100k):
|
||||
- **Formula rewrite** (`city_market_profile.sql`): supply development now 40 pts (log-scaled density LN(d+1)/LN(21) × count gate min(1,count/5)); demand evidence 25 pts (occupancy or 40% density proxy); population reduced to 15 pts (context); income to 10 pts (context); data quality to 10 pts; saturation discount removed
|
||||
- **Count gate** eliminates small-town inflation: a single venue in a 5k-resident town can no longer outscore Berlin (was 92.7 → now 43.9 for Bernau bei Berlin)
|
||||
|
||||
Reference in New Issue
Block a user