feat: restructure extraction to one file per source

Split monolithic execute.py into per-source modules with separate CLI
entry points. Each extractor now uses the framework from utils.py:
- SQLite state tracking (start_run / end_run per extractor)
- Proper logging (replace print() with logger)
- Atomic gzip writes (write_gzip_atomic)
- Connection pooling (niquests.Session)
- Bounded pagination (MAX_PAGES_PER_BBOX = 500)

New entry points:
  extract              — run all 4 extractors sequentially
  extract-overpass     — OSM padel courts
  extract-eurostat     — city demographics (etag dedup)
  extract-playtomic-tenants      — venue listings
  extract-playtomic-availability — booking slots + pricing (NEW)

The availability extractor reads tenant IDs from the latest tenants.json.gz,
queries next-day slots for each venue, and stores daily consolidated snapshots.
Supports resumability via cursor and retry with backoff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-22 18:56:41 +01:00
parent ea86940b78
commit 53e9bbd66b
10 changed files with 625 additions and 223 deletions

2
uv.lock generated
View File

@@ -1180,7 +1180,7 @@ requires-dist = [
[[package]]
name = "padelnomics-extract"
version = "0.1.0"
version = "0.2.0"
source = { editable = "extract/padelnomics_extract" }
dependencies = [
{ name = "niquests" },