merge: JSONL streaming landing format + regional overpass_tennis splitting
Converts extractors from large JSON blobs to streaming JSONL (.jsonl.gz),
eliminating in-memory accumulation, maximum_object_size workarounds, and
the playtomic availability consolidation step.
- compress_jsonl_atomic(): 1MB-chunk streaming compression, atomic rename
- playtomic_tenants → tenants.jsonl.gz (one tenant per line after dedup)
- playtomic_availability → availability_{date}.jsonl.gz (working file IS the output)
- geonames → cities_global.jsonl.gz (eliminates 30MB blob)
- overpass_tennis → 10 regional bbox queries + courts.jsonl.gz with crash recovery
- All modified staging SQL uses UNION ALL (JSONL + blob) for smooth transition
- init_landing_seeds.py: bootstrap seeds for both formats in 1970/01
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
# CHANGELOG.md
This commit is contained in:
19
CHANGELOG.md
19
CHANGELOG.md
@@ -16,10 +16,27 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
- **Admin email gallery** (`/admin/emails/gallery`) — card grid of all email types; preview page with EN/DE language toggle renders each template in a sandboxed iframe (`srcdoc`); "View in sent log →" cross-link; gallery link added to admin sidebar
|
||||
- **Compose live preview** — two-column compose layout: form on the left, HTMX-powered preview iframe on the right; `hx-trigger="input delay:500ms"` on the textarea; `POST /admin/emails/compose/preview` endpoint supports plain body or branded wrapper via `wrap` checkbox
|
||||
- 50 new tests covering all template renders (EN + DE), registry structure, gallery routes (access control, list, preview, lang fallback), and compose preview endpoint
|
||||
- **JSONL streaming landing format** — extractors now write one JSON object per line (`.jsonl.gz`) instead of a single large blob, eliminating in-memory accumulation and `maximum_object_size` workarounds:
|
||||
- `playtomic_tenants.py` → `tenants.jsonl.gz` (one tenant per line; dedup still happens in memory before write)
|
||||
- `playtomic_availability.py` → `availability_{date}.jsonl.gz` (one venue per line with `date`/`captured_at_utc` injected; working file IS the final file — eliminates the consolidation step)
|
||||
- `geonames.py` → `cities_global.jsonl.gz` (one city per line; eliminates 30 MB blob and its `maximum_object_size` workaround)
|
||||
- `compress_jsonl_atomic(jsonl_path, dest_path)` utility added to `utils.py` — streams compression in 1 MB chunks, atomic `.tmp` rename, deletes source
|
||||
- **Regional Overpass splitting for tennis courts** — replaces single global query (150K+ elements, timed out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client):
|
||||
- Regions: europe\_west, europe\_central, europe\_east, north\_america, south\_america, asia\_east, asia\_west, oceania, africa, asia\_north
|
||||
- Per-region retry (2 attempts, 30s cooldown) + 5s inter-region polite delay
|
||||
- Crash recovery via `working.jsonl` accumulation — already-written element IDs skipped on restart; completed regions produce 0 new elements on re-query
|
||||
- Output: `courts.jsonl.gz` (one OSM element per line)
|
||||
- **`scripts/init_landing_seeds.py`** — creates minimal `.jsonl.gz` and `.json.gz` seed files in `1970/01/` so SQLMesh staging models can run before real extraction data arrives; idempotent
|
||||
|
||||
### Changed
|
||||
- All modified staging SQL models use **UNION ALL transition CTEs** — both JSONL (new) and blob (old) formats are readable simultaneously; old `.json.gz` files in the landing zone continue working until they rotate out naturally:
|
||||
- `stg_playtomic_venues`, `stg_playtomic_resources`, `stg_playtomic_opening_hours` — JSONL top-level columns (no `UNNEST(tenants)`)
|
||||
- `stg_playtomic_availability` — JSONL morning files + blob morning files + blob recheck files
|
||||
- `stg_population_geonames` — JSONL city rows (no `UNNEST(rows)`, no `maximum_object_size`)
|
||||
- `stg_tennis_courts` — JSONL elements with `COALESCE(lat, center.lat)` for way/relation centre coords; blob UNNEST kept for old files
|
||||
|
||||
### Removed
|
||||
- `_email_wrap()` and `_email_button()` helper functions removed from `worker.py` — replaced by templates
|
||||
|
||||
- **Marketplace admin dashboard** (`/admin/marketplace`) — single-screen health view for the two-sided market:
|
||||
- **Lead funnel** — total / verified-new (ready to unlock) / unlocked / won / conversion rate
|
||||
- **Credit economy** — total credits issued, consumed (lead unlocks), outstanding balance across all paid suppliers, 30-day burn rate
|
||||
|
||||
Reference in New Issue
Block a user