feat(extract): three-tier proxy system with Webshare auto-fetch
Replace two-tier proxy setup (PROXY_URLS / PROXY_URLS_FALLBACK) with N-tier escalation: free → datacenter → residential. - proxy.py: fetch_webshare_proxies() auto-fetches the Webshare download API on each run (no more stale manually-copied lists). load_proxy_tiers() assembles tiers from WEBSHARE_DOWNLOAD_URL, PROXY_URLS_DATACENTER, PROXY_URLS_RESIDENTIAL. make_tiered_cycler() generalised to list[list[str]] with N-level escalation; is_fallback_active() replaced by is_exhausted(). Old load_proxy_urls() / load_fallback_proxy_urls() deleted. - playtomic_availability.py: both extract() and extract_recheck() use load_proxy_tiers() + generalised cycler. _fetch_venues_parallel fallback_urls param removed. All is_fallback_active() checks → is_exhausted(). - playtomic_tenants.py: flattens tiers for simple round-robin. - test_supervisor.py: TestLoadProxyUrls removed (function deleted). Added TestFetchWebshareProxies, TestLoadProxyTiers, TestTieredCyclerNTier (11 tests covering parse format, error handling, escalation, thread safety). 47 tests pass, ruff clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,6 +6,15 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
- **Three-tier proxy system** for extraction pipeline: free (Webshare auto-fetched) → datacenter (`PROXY_URLS_DATACENTER`) → residential (`PROXY_URLS_RESIDENTIAL`). Webshare free proxies are now auto-fetched from their download API on each run — no more manually copying stale proxy lists.
|
||||
- `proxy.py`: added `fetch_webshare_proxies()` (stdlib urllib, bounded read + timeout), `load_proxy_tiers()` (assembles N tiers from env), generalised `make_tiered_cycler()` to accept `list[list[str]]` with N-level escalation. Exposes `is_exhausted()`, `active_tier_index()`, `tier_count()`.
|
||||
- `playtomic_availability.py`: both `extract()` and `extract_recheck()` now use `load_proxy_tiers()` + N-tier cycler. `_fetch_venues_parallel` `fallback_urls` param removed. `is_fallback_active()` replaced by `is_exhausted()`.
|
||||
- `playtomic_tenants.py`: uses `load_proxy_tiers()` flattened for simple round-robin.
|
||||
|
||||
### Changed
|
||||
- **Env vars renamed** (breaking): `PROXY_URLS` → removed, `PROXY_URLS_FALLBACK` → removed. New vars: `WEBSHARE_DOWNLOAD_URL`, `PROXY_URLS_DATACENTER`, `PROXY_URLS_RESIDENTIAL`.
|
||||
|
||||
### Added
|
||||
- **Phase 2a — NUTS-1 regional income differentiation** (`opportunity_score`): Munich and Berlin no longer share the same income figure as Chemnitz.
|
||||
- `eurostat.py`: added `nama_10r_2hhinc` dataset config (NUTS-2 cube with NUTS-1 entries); filter params now appended to API URL so the server pre-filters the large cube before download (also makes `ilc_di03` requests smaller).
|
||||
|
||||
Reference in New Issue
Block a user