Files
padelnomics/extract/padelnomics_extract/pyproject.toml
Deeman 53e9bbd66b feat: restructure extraction to one file per source
Split monolithic execute.py into per-source modules with separate CLI
entry points. Each extractor now uses the framework from utils.py:
- SQLite state tracking (start_run / end_run per extractor)
- Proper logging (replace print() with logger)
- Atomic gzip writes (write_gzip_atomic)
- Connection pooling (niquests.Session)
- Bounded pagination (MAX_PAGES_PER_BBOX = 500)

New entry points:
  extract              — run all 4 extractors sequentially
  extract-overpass     — OSM padel courts
  extract-eurostat     — city demographics (etag dedup)
  extract-playtomic-tenants      — venue listings
  extract-playtomic-availability — booking slots + pricing (NEW)

The availability extractor reads tenant IDs from the latest tenants.json.gz,
queries next-day slots for each venue, and stores daily consolidated snapshots.
Supports resumability via cursor and retry with backoff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:56:41 +01:00

24 lines
687 B
TOML

[project]
name = "padelnomics_extract"
version = "0.2.0"
description = "Data extraction pipelines for padelnomics"
requires-python = ">=3.11"
dependencies = [
"niquests>=3.14.0",
"python-dotenv>=1.0.0",
]
[project.scripts]
extract = "padelnomics_extract.all:main"
extract-overpass = "padelnomics_extract.overpass:main"
extract-eurostat = "padelnomics_extract.eurostat:main"
extract-playtomic-tenants = "padelnomics_extract.playtomic_tenants:main"
extract-playtomic-availability = "padelnomics_extract.playtomic_availability:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/padelnomics_extract"]