feat: migrate transform to 3-layer architecture with per-layer schemas

Remove raw/ layer — staging models now read landing JSON directly.
Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*.
Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH.
Supervisor gets daily sleep interval between pipeline runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-22 19:04:40 +01:00
parent 53e9bbd66b
commit 2db66efe77
19 changed files with 306 additions and 301 deletions

View File

@@ -6,6 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Changed
- **Extraction: one file per source** — replaced monolithic `execute.py` with per-source
modules (`overpass.py`, `eurostat.py`, `playtomic_tenants.py`, `playtomic_availability.py`);
each module has its own CLI entry point (`extract-overpass`, `extract-eurostat`, etc.);
shared boilerplate extracted to `_shared.py` with `run_extractor()` wrapper that handles
SQLite state tracking, logging, and session management
- **Transform: 4-layer → 3-layer** — removed `raw/` layer; staging models now read landing
zone JSON files directly via `read_json()` with `@LANDING_DIR` variable; model schemas
renamed from `padelnomics.*` to per-layer namespaces (`staging.*`, `foundation.*`, `serving.*`)
- **Two-DuckDB architecture** — web app now reads from `SERVING_DUCKDB_PATH` (analytics.duckdb)
instead of `DUCKDB_PATH` (lakehouse.duckdb); `export_serving.py` atomically swaps serving
tables after each transform run
- Supervisor: added daily sleep interval between pipeline runs
### Added
- **Playtomic availability extractor** (`playtomic_availability.py`) — daily next-day booking
slot snapshots for occupancy rate estimation and pricing benchmarking; reads tenant IDs from
latest `tenants.json.gz`, queries `/v1/availability` per venue with 2s throttle, resumable
via cursor, bounded at 10K venues per run
- Template sync: copier update v0.9.0 → v0.10.0 — `export_serving.py` module,
`@padelnomics_glob()` macro, `setup_server.sh`, supervisor export_serving step
### Removed
- `extract/.../execute.py` — replaced by per-source modules
- `models/raw/` directory — raw layer eliminated; staging reads landing files directly
### Added
- Template sync: copier update from `29ac25b``v0.9.0` (29 template commits)
- `.claude/CLAUDE.md`: project-specific Claude Code instructions (skills, commands, architecture)