feat: migrate transform to 3-layer architecture with per-layer schemas
Remove raw/ layer — staging models now read landing JSON directly. Rename all model schemas from padelnomics.* to staging.*/foundation.*/serving.*. Web app queries updated to serving.planner_defaults via SERVING_DUCKDB_PATH. Supervisor gets daily sleep interval between pipeline runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -17,7 +17,7 @@ External APIs → extract → landing zone → SQLMesh transform → DuckDB →
|
||||
- `web/` — Quart + HTMX web application (auth, billing, dashboard)
|
||||
|
||||
- `extract/padelnomics_extract/` — data extraction to local landing zone
|
||||
- `transform/sqlmesh_padelnomics/` — 4-layer SQL transformation (raw → staging → foundation → serving)
|
||||
- `transform/sqlmesh_padelnomics/` — 3-layer SQL transformation (staging → foundation → serving)
|
||||
- `src/padelnomics/` — CLI utilities, export_serving helper
|
||||
|
||||
|
||||
@@ -27,10 +27,10 @@ External APIs → extract → landing zone → SQLMesh transform → DuckDB →
|
||||
|
||||
Use the **`data-engineer`** skill for:
|
||||
- Designing or reviewing SQLMesh model logic
|
||||
- Adding a new data source (extract + raw + staging models)
|
||||
- Adding a new data source (extract + staging model)
|
||||
- Performance tuning DuckDB queries
|
||||
- Data modeling decisions (dimensions, facts, aggregates)
|
||||
- Understanding the 4-layer architecture
|
||||
- Understanding the 3-layer architecture
|
||||
|
||||
```
|
||||
/data-engineer (or ask Claude to invoke it)
|
||||
@@ -79,16 +79,18 @@ DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
|
||||
| Topic | File |
|
||||
|-------|------|
|
||||
| Extraction patterns, state tracking, adding new sources | `extract/padelnomics_extract/README.md` |
|
||||
| 4-layer SQLMesh architecture, materialization strategy | `transform/sqlmesh_padelnomics/README.md` |
|
||||
| 3-layer SQLMesh architecture, materialization strategy | `transform/sqlmesh_padelnomics/README.md` |
|
||||
| Two-file DuckDB architecture (SQLMesh lock isolation) | `src/padelnomics/export_serving.py` docstring |
|
||||
|
||||
## Pipeline data flow
|
||||
|
||||
```
|
||||
data/landing/
|
||||
└── padelnomics/{year}/{etag}.csv.gz ← extraction output
|
||||
├── overpass/{year}/{month}/courts.json.gz
|
||||
├── eurostat/{year}/{month}/urb_cpop1.json.gz
|
||||
└── playtomic/{year}/{month}/tenants.json.gz
|
||||
|
||||
local.duckdb ← SQLMesh exclusive (raw → staging → foundation → serving)
|
||||
data/lakehouse.duckdb ← SQLMesh exclusive (staging → foundation → serving)
|
||||
|
||||
analytics.duckdb ← serving tables only, web app read-only
|
||||
└── serving.* ← atomically replaced by export_serving.py
|
||||
|
||||
Reference in New Issue
Block a user