Split monolithic execute.py into per-source modules with separate CLI entry points. Each extractor now uses the framework from utils.py: - SQLite state tracking (start_run / end_run per extractor) - Proper logging (replace print() with logger) - Atomic gzip writes (write_gzip_atomic) - Connection pooling (niquests.Session) - Bounded pagination (MAX_PAGES_PER_BBOX = 500) New entry points: extract — run all 4 extractors sequentially extract-overpass — OSM padel courts extract-eurostat — city demographics (etag dedup) extract-playtomic-tenants — venue listings extract-playtomic-availability — booking slots + pricing (NEW) The availability extractor reads tenant IDs from the latest tenants.json.gz, queries next-day slots for each venue, and stores daily consolidated snapshots. Supports resumability via cursor and retry with backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 lines
687 B
TOML
24 lines
687 B
TOML
[project]
|
|
name = "padelnomics_extract"
|
|
version = "0.2.0"
|
|
description = "Data extraction pipelines for padelnomics"
|
|
requires-python = ">=3.11"
|
|
dependencies = [
|
|
"niquests>=3.14.0",
|
|
"python-dotenv>=1.0.0",
|
|
]
|
|
|
|
[project.scripts]
|
|
extract = "padelnomics_extract.all:main"
|
|
extract-overpass = "padelnomics_extract.overpass:main"
|
|
extract-eurostat = "padelnomics_extract.eurostat:main"
|
|
extract-playtomic-tenants = "padelnomics_extract.playtomic_tenants:main"
|
|
extract-playtomic-availability = "padelnomics_extract.playtomic_availability:main"
|
|
|
|
[build-system]
|
|
requires = ["hatchling"]
|
|
build-backend = "hatchling.build"
|
|
|
|
[tool.hatch.build.targets.wheel]
|
|
packages = ["src/padelnomics_extract"]
|