docs: update CHANGELOG and PROJECT.md for JSONL landing format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-25 12:28:43 +01:00
parent ec7f115f16
commit 683ca3fc24
2 changed files with 21 additions and 0 deletions

View File

@@ -7,6 +7,24 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased] ## [Unreleased]
### Added ### Added
- **JSONL streaming landing format** — extractors now write one JSON object per line (`.jsonl.gz`) instead of a single large blob, eliminating in-memory accumulation and `maximum_object_size` workarounds:
- `playtomic_tenants.py``tenants.jsonl.gz` (one tenant per line; dedup still happens in memory before write)
- `playtomic_availability.py``availability_{date}.jsonl.gz` (one venue per line with `date`/`captured_at_utc` injected; working file IS the final file — eliminates the consolidation step)
- `geonames.py``cities_global.jsonl.gz` (one city per line; eliminates 30 MB blob and its `maximum_object_size` workaround)
- `compress_jsonl_atomic(jsonl_path, dest_path)` utility added to `utils.py` — streams compression in 1 MB chunks, atomic `.tmp` rename, deletes source
- **Regional Overpass splitting for tennis courts** — replaces single global query (150K+ elements, timed out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client):
- Regions: europe\_west, europe\_central, europe\_east, north\_america, south\_america, asia\_east, asia\_west, oceania, africa, asia\_north
- Per-region retry (2 attempts, 30s cooldown) + 5s inter-region polite delay
- Crash recovery via `working.jsonl` accumulation — already-written element IDs skipped on restart; completed regions produce 0 new elements on re-query
- Output: `courts.jsonl.gz` (one OSM element per line)
- **`scripts/init_landing_seeds.py`** — creates minimal `.jsonl.gz` and `.json.gz` seed files in `1970/01/` so SQLMesh staging models can run before real extraction data arrives; idempotent
### Changed
- All modified staging SQL models use **UNION ALL transition CTEs** — both JSONL (new) and blob (old) formats are readable simultaneously; old `.json.gz` files in the landing zone continue working until they rotate out naturally:
- `stg_playtomic_venues`, `stg_playtomic_resources`, `stg_playtomic_opening_hours` — JSONL top-level columns (no `UNNEST(tenants)`)
- `stg_playtomic_availability` — JSONL morning files + blob morning files + blob recheck files
- `stg_population_geonames` — JSONL city rows (no `UNNEST(rows)`, no `maximum_object_size`)
- `stg_tennis_courts` — JSONL elements with `COALESCE(lat, center.lat)` for way/relation centre coords; blob UNNEST kept for old files
- **Marketplace admin dashboard** (`/admin/marketplace`) — single-screen health view for the two-sided market: - **Marketplace admin dashboard** (`/admin/marketplace`) — single-screen health view for the two-sided market:
- **Lead funnel** — total / verified-new (ready to unlock) / unlocked / won / conversion rate - **Lead funnel** — total / verified-new (ready to unlock) / unlocked / won / conversion rate
- **Credit economy** — total credits issued, consumed (lead unlocks), outstanding balance across all paid suppliers, 30-day burn rate - **Credit economy** — total credits issued, consumed (lead unlocks), outstanding balance across all paid suppliers, 30-day burn rate

View File

@@ -93,6 +93,9 @@
- [x] `dim_venues` (OSM + Playtomic deduped), `dim_cities` (Eurostat population) - [x] `dim_venues` (OSM + Playtomic deduped), `dim_cities` (Eurostat population)
- [x] `city_market_profile` (market score OBT), `planner_defaults` (per-city calculator pre-fill) - [x] `city_market_profile` (market score OBT), `planner_defaults` (per-city calculator pre-fill)
- [x] DuckDB analytics reader in app lifecycle - [x] DuckDB analytics reader in app lifecycle
- [x] **JSONL streaming landing format** — extractors write `.jsonl.gz` (one record per line); constant-memory compression via `compress_jsonl_atomic()`; eliminates `maximum_object_size` workarounds; all modified staging models use UNION ALL transition to support both formats
- [x] **Regional Overpass tennis splitting** — 10 regional bbox queries replace the single global 150K-element query that timed out; crash recovery via `working.jsonl` accumulation
- [x] **`init_landing_seeds.py`** — creates minimal seed files for both JSONL and blob formats so SQLMesh can run before real data arrives
### i18n ### i18n
- [x] Full i18n across entire app (EN + DE) - [x] Full i18n across entire app (EN + DE)