feat: standardise recheck availability to JSONL output

- extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz
  (one venue per line with date/captured_at_utc/recheck_hour injected);
  uses compress_jsonl_atomic; removes write_gzip_atomic import
- stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited
  read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL;
  old recheck_blob CTE kept for transition
- init_landing_seeds.py: add JSONL recheck seed alongside blob seed
- Docs: README landing structure + data sources table updated; CHANGELOG
  availability bullets updated; data-sources-inventory paths corrected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-25 14:52:47 +01:00
parent 683ca3fc24
commit b33dd51d76
6 changed files with 63 additions and 31 deletions

View File

@@ -51,7 +51,11 @@ def main() -> None:
json.dumps({"date": "1970-01-01", "captured_at_utc": "1970-01-01T00:00:00Z",
"venue_count": 0, "venues": []}).encode(),
# --- Playtomic recheck (blob only, small format) ---
# --- Playtomic recheck ---
# JSONL: one null venue (filtered by WHERE tenant_id IS NOT NULL)
"playtomic/1970/01/availability_1970-01-01_recheck_00.jsonl.gz":
b'{"tenant_id":null,"date":"1970-01-01","captured_at_utc":"1970-01-01T00:00:00Z","recheck_hour":0,"slots":null}\n',
# Blob: empty venues array (old format, kept for transition)
"playtomic/1970/01/availability_1970-01-01_recheck_00.json.gz":
json.dumps({"date": "1970-01-01", "captured_at_utc": "1970-01-01T00:00:00Z",
"recheck_hour": 0, "venues": []}).encode(),