merge: JSONL streaming landing format + regional overpass_tennis splitting

Converts extractors from large JSON blobs to streaming JSONL (.jsonl.gz),
eliminating in-memory accumulation, maximum_object_size workarounds, and
the playtomic availability consolidation step.

- compress_jsonl_atomic(): 1MB-chunk streaming compression, atomic rename
- playtomic_tenants → tenants.jsonl.gz (one tenant per line after dedup)
- playtomic_availability → availability_{date}.jsonl.gz (working file IS the output)
- geonames → cities_global.jsonl.gz (eliminates 30MB blob)
- overpass_tennis → 10 regional bbox queries + courts.jsonl.gz with crash recovery
- All modified staging SQL uses UNION ALL (JSONL + blob) for smooth transition
- init_landing_seeds.py: bootstrap seeds for both formats in 1970/01

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	CHANGELOG.md
This commit is contained in:
Deeman
2026-02-25 12:34:03 +01:00
14 changed files with 682 additions and 200 deletions

View File

@@ -93,6 +93,9 @@
- [x] `dim_venues` (OSM + Playtomic deduped), `dim_cities` (Eurostat population)
- [x] `city_market_profile` (market score OBT), `planner_defaults` (per-city calculator pre-fill)
- [x] DuckDB analytics reader in app lifecycle
- [x] **JSONL streaming landing format** — extractors write `.jsonl.gz` (one record per line); constant-memory compression via `compress_jsonl_atomic()`; eliminates `maximum_object_size` workarounds; all modified staging models use UNION ALL transition to support both formats
- [x] **Regional Overpass tennis splitting** — 10 regional bbox queries replace the single global 150K-element query that timed out; crash recovery via `working.jsonl` accumulation
- [x] **`init_landing_seeds.py`** — creates minimal seed files for both JSONL and blob formats so SQLMesh can run before real data arrives
### i18n
- [x] Full i18n across entire app (EN + DE)