Commit Graph

4 Commits

Author SHA1 Message Date
Deeman
7737b79230 fix: DuckDB compat issues in Playtomic pipeline + export_serving
- Add maximum_object_size=128MB to read_json for 14K-venue tenants file
- Rewrite opening_hours to use UNION ALL unpivot (DuckDB struct dynamic access)
- Add seed file guard for availability model (empty result on first run)
- Fix snapshot_date VARCHAR→DATE comparison in venue_pricing_benchmarks
- Fix export_serving to resolve SQLMesh physical tables from view definitions
  (SQLMesh views reference "local" catalog unavailable outside its context)
- Add pyarrow dependency for Arrow-based cross-connection data transfer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:27:51 +01:00
Deeman
53e9bbd66b feat: restructure extraction to one file per source
Split monolithic execute.py into per-source modules with separate CLI
entry points. Each extractor now uses the framework from utils.py:
- SQLite state tracking (start_run / end_run per extractor)
- Proper logging (replace print() with logger)
- Atomic gzip writes (write_gzip_atomic)
- Connection pooling (niquests.Session)
- Bounded pagination (MAX_PAGES_PER_BBOX = 500)

New entry points:
  extract              — run all 4 extractors sequentially
  extract-overpass     — OSM padel courts
  extract-eurostat     — city demographics (etag dedup)
  extract-playtomic-tenants      — venue listings
  extract-playtomic-availability — booking slots + pricing (NEW)

The availability extractor reads tenant IDs from the latest tenants.json.gz,
queries next-day slots for each venue, and stores daily consolidated snapshots.
Supports resumability via cursor and retry with backoff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:56:41 +01:00
Deeman
044dfd836b fix(deps): add duckdb to padelnomics production dependencies
analytics.py imports duckdb at the top level. The Dockerfile runs
`uv sync --package padelnomics` which only installs padelnomics deps —
duckdb was missing, so hypercorn failed to import padelnomics.app
entirely and never bound to port 5000. The health check timed out and
the container was marked unhealthy. Tests passed because uv sync in CI
syncs all workspace members (including transform/ which has duckdb).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:43:34 +01:00
Deeman
4ae00b35d1 refactor: flatten padelnomics/padelnomics/ → repo root
git mv all tracked files from the nested padelnomics/ workspace
directory to the git repo root. Merged .gitignore files.
No code changes — pure path rename.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:44:40 +01:00