merge: standardise recheck availability to JSONL + update docs
This commit is contained in:
@@ -36,7 +36,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
- 50 new tests covering all template renders (EN + DE), registry structure, gallery routes (access control, list, preview, lang fallback), and compose preview endpoint
|
||||
- **JSONL streaming landing format** — extractors now write one JSON object per line (`.jsonl.gz`) instead of a single large blob, eliminating in-memory accumulation and `maximum_object_size` workarounds:
|
||||
- `playtomic_tenants.py` → `tenants.jsonl.gz` (one tenant per line; dedup still happens in memory before write)
|
||||
- `playtomic_availability.py` → `availability_{date}.jsonl.gz` (one venue per line with `date`/`captured_at_utc` injected; working file IS the final file — eliminates the consolidation step)
|
||||
- `playtomic_availability.py` → `availability_{date}.jsonl.gz` (morning) + `availability_{date}_recheck_{HH}.jsonl.gz` (recheck); one venue per line with `date`/`captured_at_utc`/`recheck_hour` injected
|
||||
- `geonames.py` → `cities_global.jsonl.gz` (one city per line; eliminates 30 MB blob and its `maximum_object_size` workaround)
|
||||
- `compress_jsonl_atomic(jsonl_path, dest_path)` utility added to `utils.py` — streams compression in 1 MB chunks, atomic `.tmp` rename, deletes source
|
||||
- **Regional Overpass splitting for tennis courts** — replaces single global query (150K+ elements, timed out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client):
|
||||
@@ -49,7 +49,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
### Changed
|
||||
- All modified staging SQL models use **UNION ALL transition CTEs** — both JSONL (new) and blob (old) formats are readable simultaneously; old `.json.gz` files in the landing zone continue working until they rotate out naturally:
|
||||
- `stg_playtomic_venues`, `stg_playtomic_resources`, `stg_playtomic_opening_hours` — JSONL top-level columns (no `UNNEST(tenants)`)
|
||||
- `stg_playtomic_availability` — JSONL morning files + blob morning files + blob recheck files
|
||||
- `stg_playtomic_availability` — JSONL morning + recheck files; blob morning + recheck kept for transition
|
||||
- `stg_population_geonames` — JSONL city rows (no `UNNEST(rows)`, no `maximum_object_size`)
|
||||
- `stg_tennis_courts` — JSONL elements with `COALESCE(lat, center.lat)` for way/relation centre coords; blob UNNEST kept for old files
|
||||
|
||||
|
||||
Reference in New Issue
Block a user