From 1ef22770aa5637ee37c1524d86301183f4e54da8 Mon Sep 17 00:00:00 2001 From: Deeman Date: Tue, 24 Feb 2026 22:31:19 +0100 Subject: [PATCH] docs: update CHANGELOG for extraction performance improvements Co-Authored-By: Claude Sonnet 4.6 --- CHANGELOG.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8184e00..3006338 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). queries, geometry columns). - **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both `.env.dev.sops` and `.env.prod.sops`. +- **Crash-safe partial JSONL** — `utils.load_partial_results()` and `flush_partial_batch()` + provide a generic opt-in mechanism for incremental progress flushing during long extractions. + Any extractor processing items one-by-one can flush every N records and resume from a + `.partial.jsonl` sidecar file after a crash. - **Methodology page updated** — `/en/market-score` now documents both scores with: Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial), score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics @@ -42,6 +46,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). First "padelnomics Market Score" mention in each article template now links to the methodology page (hub-and-spoke internal linking). +### Changed +- **`EXTRACT_WORKERS` env var removed** — worker count is now derived from `PROXY_URLS` length + (one worker per proxy). No proxies → single-threaded. No manual tuning needed. +- **Playtomic tenants extractor** — parallel batch page fetching when proxies are configured. + Each page in a batch fires concurrently using its own session + proxy. Expected speedup: + ~2.5 min → ~15 s with 10 Webshare datacenter proxies. +- **Playtomic availability extractor** — three performance changes: + 1. No per-request `time.sleep()` on success when a proxy is active (throttle only when + running direct). Retry/backoff sleeps for 429 and 5xx responses are unchanged. + 2. Worker count auto-detected from proxy count (drops `EXTRACT_WORKERS`). + 3. True crash resumption via `.partial.jsonl` sidecar: progress flushed every 50 venues, + resume skips already-fetched venues and merges prior results into the final file. + ### Fixed - **`datetime.utcnow()` deprecation warnings** — replaced all 94 occurrences across 22 files (source + tests) with `utcnow()` / `utcnow_iso()` helpers