docs: update CHANGELOG for extraction performance improvements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
17
CHANGELOG.md
17
CHANGELOG.md
@@ -29,6 +29,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
|||||||
queries, geometry columns).
|
queries, geometry columns).
|
||||||
- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both
|
- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both
|
||||||
`.env.dev.sops` and `.env.prod.sops`.
|
`.env.dev.sops` and `.env.prod.sops`.
|
||||||
|
- **Crash-safe partial JSONL** — `utils.load_partial_results()` and `flush_partial_batch()`
|
||||||
|
provide a generic opt-in mechanism for incremental progress flushing during long extractions.
|
||||||
|
Any extractor processing items one-by-one can flush every N records and resume from a
|
||||||
|
`.partial.jsonl` sidecar file after a crash.
|
||||||
- **Methodology page updated** — `/en/market-score` now documents both scores with:
|
- **Methodology page updated** — `/en/market-score` now documents both scores with:
|
||||||
Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial),
|
Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial),
|
||||||
score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics
|
score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics
|
||||||
@@ -42,6 +46,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
|||||||
First "padelnomics Market Score" mention in each article template now links
|
First "padelnomics Market Score" mention in each article template now links
|
||||||
to the methodology page (hub-and-spoke internal linking).
|
to the methodology page (hub-and-spoke internal linking).
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- **`EXTRACT_WORKERS` env var removed** — worker count is now derived from `PROXY_URLS` length
|
||||||
|
(one worker per proxy). No proxies → single-threaded. No manual tuning needed.
|
||||||
|
- **Playtomic tenants extractor** — parallel batch page fetching when proxies are configured.
|
||||||
|
Each page in a batch fires concurrently using its own session + proxy. Expected speedup:
|
||||||
|
~2.5 min → ~15 s with 10 Webshare datacenter proxies.
|
||||||
|
- **Playtomic availability extractor** — three performance changes:
|
||||||
|
1. No per-request `time.sleep()` on success when a proxy is active (throttle only when
|
||||||
|
running direct). Retry/backoff sleeps for 429 and 5xx responses are unchanged.
|
||||||
|
2. Worker count auto-detected from proxy count (drops `EXTRACT_WORKERS`).
|
||||||
|
3. True crash resumption via `.partial.jsonl` sidecar: progress flushed every 50 venues,
|
||||||
|
resume skips already-fetched venues and merges prior results into the final file.
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
- **`datetime.utcnow()` deprecation warnings** — replaced all 94 occurrences
|
- **`datetime.utcnow()` deprecation warnings** — replaced all 94 occurrences
|
||||||
across 22 files (source + tests) with `utcnow()` / `utcnow_iso()` helpers
|
across 22 files (source + tests) with `utcnow()` / `utcnow_iso()` helpers
|
||||||
|
|||||||
Reference in New Issue
Block a user