feat(extract): add OpenWeatherMap daily weather extractor
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.
Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core
SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
(location_id, observation_date), dedup via hash key, crop stress flags:
is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season
Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
16
CLAUDE.md
16
CLAUDE.md
@@ -44,23 +44,24 @@ uv run materia secrets get
|
||||
|
||||
**Workspace packages** (`pyproject.toml` → `tool.uv.workspace`):
|
||||
- `extract/psdonline/` — Downloads USDA PSD Online data, normalizes ZIP→gzip CSV, writes to local landing directory
|
||||
- `transform/sqlmesh_materia/` — 4-layer SQL transformation pipeline (local DuckDB)
|
||||
- `extract/openweathermap/` — Daily weather for 8 coffee-growing regions (OWM One Call API 3.0)
|
||||
- `transform/sqlmesh_materia/` — 3-layer SQL transformation pipeline (local DuckDB)
|
||||
- `src/materia/` — CLI (Typer) for pipeline execution, worker management, secrets
|
||||
- `web/` — Future web frontend
|
||||
|
||||
**Data flow:**
|
||||
```
|
||||
USDA API → extract → /data/materia/landing/psd/{year}/{month}/{etag}.csv.gzip
|
||||
OWM API → extract → /data/materia/landing/weather/{location_id}/{year}/{date}.json.gz
|
||||
→ rclone cron syncs landing/ to R2
|
||||
→ SQLMesh raw → staging → cleaned → serving → /data/materia/lakehouse.duckdb
|
||||
→ SQLMesh staging → foundation → serving → /data/materia/lakehouse.duckdb
|
||||
→ Web app reads lakehouse.duckdb (read-only)
|
||||
```
|
||||
|
||||
**SQLMesh 4-layer model structure** (`transform/sqlmesh_materia/models/`):
|
||||
1. `raw/` — Immutable source reads (read_csv from landing directory)
|
||||
2. `staging/` — Type casting, lookup joins, basic cleansing
|
||||
3. `cleaned/` — Business logic, pivoting, integration
|
||||
4. `serving/` — Analytics-ready facts, dimensions, aggregates
|
||||
**SQLMesh 3-layer model structure** (`transform/sqlmesh_materia/models/`):
|
||||
1. `staging/` — Type casting, lookup joins, basic cleansing (reads landing directly)
|
||||
2. `foundation/` — Business logic, pivoting, dimensions, facts (also reads landing directly)
|
||||
3. `serving/` — Analytics-ready aggregates for the web app
|
||||
|
||||
**CLI modules** (`src/materia/`):
|
||||
- `cli.py` — Typer app with subcommands: worker, pipeline, secrets, version
|
||||
@@ -100,3 +101,4 @@ Read `coding_philosophy.md` for the full guide. Key points:
|
||||
|----------|---------|-------------|
|
||||
| `LANDING_DIR` | `data/landing` | Root directory for extracted landing data |
|
||||
| `DUCKDB_PATH` | `local.duckdb` | Path to the DuckDB lakehouse database |
|
||||
| `OPENWEATHERMAP_API_KEY` | — | OWM One Call API 3.0 key (required for weather extraction) |
|
||||
|
||||
Reference in New Issue
Block a user