feat(data): Phase 2a — NUTS-1 regional income for opportunity score

- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
  request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
  filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
  + regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz

Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-27 10:26:15 +01:00
parent e32f7ba4b8
commit 5ade38eeaf
4 changed files with 101 additions and 4 deletions

View File

@@ -42,6 +42,15 @@ DATASETS: dict[str, dict] = {
"geo_dim": "geo",
"time_dim": "time",
},
"nama_10r_2hhinc": {
"filters": { # Net household income per inhabitant in PPS (NUTS-2 grain, contains NUTS-1)
"unit": "PPS_EU27_2020_HAB",
"na_item": "B6N",
"direct": "BAL",
},
"geo_dim": "geo",
"time_dim": "time",
},
}
@@ -189,6 +198,8 @@ def extract(
for dataset_code, config in DATASETS.items():
url = f"{EUROSTAT_BASE_URL}/{dataset_code}?format=JSON&lang=EN"
for key, val in config.get("filters", {}).items():
url += f"&{key}={val}"
dest_dir = landing_path(landing_dir, "eurostat", year, month)
dest = dest_dir / f"{dataset_code}.json.gz"