feat(data): Phase 2a — NUTS-1 regional income for opportunity score

- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to request URL so server pre-filters the large cube before download - stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz, filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB - dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer) + regional_income CTE; final SELECT uses COALESCE(regional, country) income - init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of both inheriting the same national average (~25.5K PPS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:26:15 +01:00
parent e32f7ba4b8
commit 5ade38eeaf
4 changed files with 101 additions and 4 deletions
--- a/extract/padelnomics_extract/src/padelnomics_extract/eurostat.py
+++ b/extract/padelnomics_extract/src/padelnomics_extract/eurostat.py
@@ -42,6 +42,15 @@ DATASETS: dict[str, dict] = {
        "geo_dim": "geo",
        "time_dim": "time",
    },
+    "nama_10r_2hhinc": {
+        "filters": {  # Net household income per inhabitant in PPS (NUTS-2 grain, contains NUTS-1)
+            "unit": "PPS_EU27_2020_HAB",
+            "na_item": "B6N",
+            "direct": "BAL",
+        },
+        "geo_dim": "geo",
+        "time_dim": "time",
+    },
 }


@@ -189,6 +198,8 @@ def extract(

    for dataset_code, config in DATASETS.items():
        url = f"{EUROSTAT_BASE_URL}/{dataset_code}?format=JSON&lang=EN"
+        for key, val in config.get("filters", {}).items():
+            url += f"&{key}={val}"
        dest_dir = landing_path(landing_dir, "eurostat", year, month)
        dest = dest_dir / f"{dataset_code}.json.gz"