feat(daas): add extract workspace member with Overpass, Eurostat, Playtomic extractors

Adds padelnomics_extract UV workspace member at extract/padelnomics_extract/.
Implements three extractors in execute.py:
- extract_overpass(): global OverpassQL query for sport=padel OSM features
- extract_eurostat(): urb_cpop1 (city population) + ilc_di03 (NUTS2 income), etag-dedup
- extract_playtomic_tenants(): unauthenticated tenant search across 4 market bboxes,
  paginated, deduplicated by tenant_id, throttled at 1 req/2s

Landing zone at LANDING_DIR (default data/landing) with per-source subdirectories.
Entry point: `extract` script calls extract_dataset() for all three in sequence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-21 21:38:55 +01:00
parent f18e788fc7
commit af09597930
7 changed files with 436 additions and 0 deletions

View File

@@ -7,6 +7,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- `extract/padelnomics_extract` workspace member: Overpass API (padel courts via OSM), Eurostat city demographics (`urb_cpop1`, `ilc_di03`), and Playtomic unauthenticated tenant search extractors
- Landing zone structure at `data/landing/` with per-source subdirectories: `overpass/`, `eurostat/`, `playtomic/`
- `.env.example` entries for `DUCKDB_PATH` and `LANDING_DIR`
- content: `scripts/seed_content.py` — seeds two article templates (EN + DE) and 18 cities × 2 language rows into the database; run with `uv run python -m padelnomics.scripts.seed_content --generate` to produce 36 pre-built SEO articles covering Germany (8 cities), USA (6 cities), and UK (4 cities); each city has realistic per-market overrides for rates, rent, utilities, permits, and court configuration so the financial model produces genuinely unique output per article
- content: EN template (`city-padel-cost-en`) at `/padel-cost/{{ city_slug }}` and DE template (`city-padel-cost-de`) at `/padel-kosten/{{ city_slug }}` with Jinja2 Markdown bodies embedding `[scenario:slug:section]` cards for summary, CAPEX, operating, cashflow, and returns