feat(transform): individualise article costs with per-country Eurostat data

Add real per-country cost data to ~30 calculator fields so pSEO articles
show country-specific CAPEX/OPEX instead of hardcoded DE defaults.

Extractor:
- eurostat.py: add 8 new datasets (nrg_pc_205, nrg_pc_203, lc_lci_lev,
  5×prc_ppp_ind variants); add optional `dataset_code` field so multiple
  dict entries can share one Eurostat API endpoint

Staging (4 new models):
- stg_electricity_prices — EUR/kWh by country, semi-annual
- stg_gas_prices         — EUR/GJ by country, semi-annual
- stg_labour_costs       — EUR/hour by country, annual (future staffed scenario)
- stg_price_levels       — PLI indices (EU27=100) for 5 categories, annual

Foundation:
- dim_countries (new) — conformed country dimension; eliminates ~50-line CASE
  blocks duplicated in dim_cities/dim_locations; computes ~29 calculator cost
  override columns from PLI ratios and energy price ratios vs DE baseline;
  NULL for DE so calculator falls through to DEFAULTS unchanged
- dim_cities — replace country_name/slug CASE blocks + country_income CTE
  with JOIN dim_countries
- dim_locations — same refactor as dim_cities

Serving:
- pseo_city_costs_de — JOIN dim_countries; add 29 camelCase override columns
  auto-applied by calculator (electricity, heating, rentSqm, hallCostSqm, …)
- planner_defaults — JOIN dim_countries; same 29 cost columns flow through
  to /api/market-data endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-03-04 10:09:48 +01:00
parent ce466e3f7f
commit 2e68cfbe4f
12 changed files with 681 additions and 128 deletions

View File

@@ -6,6 +6,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- **Individualised article financial calculations with real per-country cost data** — ~30 CAPEX/OPEX calculator fields now scale to each country's actual cost level via Eurostat data, eliminating the identical DE-hardcoded numbers shown for every city globally.
- **New Eurostat datasets extracted** (8 new landing files): electricity prices (`nrg_pc_205`), gas prices (`nrg_pc_203`), labour costs (`lc_lci_lev`), and 5 price level index categories from `prc_ppp_ind` (construction, housing, services, misc, government).
- `extract/padelnomics_extract/src/padelnomics_extract/eurostat.py`: added 8 dataset entries; added `dataset_code` field support so multiple dict entries can share one Eurostat API endpoint (needed for 5 prc_ppp_ind variants).
- **4 new staging models**: `stg_electricity_prices`, `stg_gas_prices`, `stg_labour_costs`, `stg_price_levels` — all read from landing zone with ISO code normalisation (EL→GR, UK→GB).
- **New `foundation.dim_countries`** — conformed country dimension (grain: `country_code`). Consolidates country names/slugs and income data previously duplicated in `dim_cities` and `dim_locations` as ~50-line CASE blocks. Computes ~29 calculator cost override columns from Eurostat PLI indices and energy prices relative to DE baseline.
- **Refactored `dim_cities`** — removed ~50-line CASE blocks and `country_income` CTE; JOIN `dim_countries` for `country_name_en`, `country_slug`, `median_income_pps`, `income_year`.
- **Refactored `dim_locations`** — same refactor as `dim_cities`; income cascade still cascades EU NUTS-2 → US state → `dim_countries` country-level.
- **Updated `serving.pseo_city_costs_de`** — JOIN `dim_countries`; 29 new camelCase override columns (`electricity`, `heating`, `rentSqm`, `hallCostSqm`, …, `permitsCompliance`) auto-applied by calculator.
- **Updated `serving.planner_defaults`** — JOIN `dim_countries`; same 29 cost columns flow through to the planner API `/api/market-data` endpoint.
### Changed
- **Per-proxy dead tracking in tiered cycler** — `make_tiered_cycler` now accepts a `proxy_failure_limit` parameter (default 3). Individual proxies that hit the limit are marked dead and permanently skipped by `next_proxy()`. If all proxies in the active tier are dead, `next_proxy()` auto-escalates to the next tier without needing the tier-level threshold. `record_failure(proxy_url)` and `record_success(proxy_url)` accept an optional `proxy_url` argument for per-proxy tracking; callers without `proxy_url` are fully backward-compatible. New `dead_proxy_count()` callable exposed for monitoring.
- `extract/padelnomics_extract/src/padelnomics_extract/proxy.py`: added per-proxy state (`proxy_failure_counts`, `dead_proxies`), updated `next_proxy`/`record_failure`/`record_success`, added `dead_proxy_count`