merge: Phase 2a + 2b — EU NUTS-2 spatial join + US state income

Phase 2a: NUTS-1 regional income for Germany (16 Bundesländer via admin1→NUTS-1 mapping)
Phase 2b: EU-wide NUTS-2 via GISCO spatial join + US Census ACS state income
- All EU-27+EFTA+UK locations now auto-resolve to NUTS-2 via ST_Contains
- Germany gets sub-Bundesland (38 Regierungsbezirke) differentiation
- US gets state-level income with PPS normalisation
- Income cascade: NUTS-2 → NUTS-1 → US state → country-level

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-27 11:11:36 +01:00
12 changed files with 511 additions and 11 deletions

View File

@@ -1,7 +1,7 @@
# Padelnomics — Project Tracker
> Move tasks across columns as you work. Add new tasks at the top of the relevant column.
> Last updated: 2026-02-27 (opportunity score data quality improvements).
> Last updated: 2026-02-27 (Phase 2b — EU NUTS-2 spatial join + US state income).
---
@@ -92,6 +92,8 @@
- [x] **Opportunity Score v2 — income ceiling fix** — PPS normalisation `/200.0``/35000.0`; economic power component now differentiates countries (DE 13.2, ES 10.7, SE 14.3 pts; was 20.0 everywhere)
- [x] **dim_cities population coverage 70.5% → 98.5%** — GeoNames spatial fallback CTE (ST_Distance_Sphere, 0.14° bbox) resolves localization mismatches (Wien→Vienna 1.69M, Milano→Milan 1.37M); population cascade: Eurostat > Census > ONS > GeoNames string > GeoNames spatial > 0
- [x] **overpass_tennis added to supervisor workflows** — monthly schedule in `workflows.toml`; was only in combined extractor
- [x] **Phase 2a — NUTS-1 regional income**`eurostat.py` adds `nama_10r_2hhinc` dataset + URL filter params; `stg_regional_income.sql` new staging model (NUTS-1 codes, `EL→GR`/`UK→GB` normalisation); `dim_locations.sql` wires German Bundesland income via 16-row `admin1_to_nuts1` VALUES CTE; verified income spread Bayern > Hamburg > Berlin > Sachsen
- [x] **Phase 2b — EU NUTS-2 spatial join + US state income** — all EU-27+EFTA+UK locations auto-resolve to NUTS-2 via `ST_Contains` on GISCO boundary polygons; Germany now uses 38 Regierungsbezirke; US state income from Census ACS with PPS normalisation (`income / 80610 × 30000`); replaces brittle admin1 mapping CTE with zero-config spatial join; new files: `download_gisco_nuts.py`, `stg_nuts2_boundaries.sql`, `census_usa_income.py`, `stg_income_usa.sql`
### Data Pipeline (DaaS)
- [x] Overpass API extractor (OSM padel courts)
@@ -223,7 +225,8 @@
### Data & Intelligence
- [ ] Sports centre Overpass extract (`leisure=sports_centre`) — additional market signal for `dim_locations`
- [ ] City-level income enrichment (Eurostat NUTS-3 regional income — replaces country-level PPS proxy, higher granularity)
- [x] **Phase 2a — NUTS-1 regional income**`nama_10r_2hhinc` extractor + `stg_regional_income` staging model + `admin1_to_nuts1` VALUES CTE in `dim_locations`; all 16 German Bundesländer mapped; Bayern ~29K vs Sachsen ~19K PPS differentiation; country-level fallback for ES/FR/IT/etc.
- [ ] Phase 2b — city-level income (NUTS-3 granularity) if NUTS-1 proves insufficient
- [ ] Interactive opportunity map / explorer in web app (map UI over `location_opportunity_profile` — bounding box queries via ST_Distance_Sphere)
- [ ] Multi-source data aggregation (add booking platforms beyond Playtomic)
- [ ] Google Maps signals (reviews, ratings)