Commit Graph

2 Commits

Author SHA1 Message Date
Deeman
b33dd51d76 feat: standardise recheck availability to JSONL output
- extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz
  (one venue per line with date/captured_at_utc/recheck_hour injected);
  uses compress_jsonl_atomic; removes write_gzip_atomic import
- stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited
  read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL;
  old recheck_blob CTE kept for transition
- init_landing_seeds.py: add JSONL recheck seed alongside blob seed
- Docs: README landing structure + data sources table updated; CHANGELOG
  availability bullets updated; data-sources-inventory paths corrected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:52:47 +01:00
Deeman
ec7f115f16 feat: add init_landing_seeds.py for empty-landing bootstrap
Creates minimal .jsonl.gz and .json.gz seed files so all SQLMesh staging
models can compile and run before real extraction data arrives.

Each seed has a single null record filtered by the staging model's WHERE
clause (tenant_id IS NOT NULL, geoname_id IS NOT NULL, type IS NOT NULL, etc).

Covers both formats (JSONL + blob) for the UNION ALL transition CTEs:
  playtomic/1970/01/: tenants.{jsonl,json}.gz, availability seeds (morning + recheck)
  geonames/1970/01/: cities_global.{jsonl,json}.gz
  overpass_tennis/1970/01/: courts.{jsonl,json}.gz
  overpass/1970/01/: courts.json.gz (padel, unchanged format)
  eurostat/1970/01/: urb_cpop1.json.gz, ilc_di03.json.gz
  eurostat_city_labels/1970/01/: cities_codelist.json.gz
  ons_uk/1970/01/: lad_population.json.gz
  census_usa/1970/01/: acs5_places.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 12:24:48 +01:00