feat(extract): add OpenWeatherMap daily weather extractor

Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.

Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
  calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core

SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
  (location_id, observation_date), dedup via hash key, crop stress flags:
  is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season

Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-25 22:40:27 +01:00
parent c3c8333407
commit 08e74665bb
31 changed files with 1377 additions and 915 deletions

View File

@@ -1,60 +1,51 @@
-- Serving mart: ICE certified Coffee C stock aging report, analytics-ready.
--
-- Shows the age distribution of certified stocks across delivery ports.
-- Age buckets represent how long coffee has been in certified storage.
-- Older stock approaching certificate limits is a supply quality signal.
--
-- Source: ICE Certified Stock Aging Report (monthly)
-- Grain: one row per (report_date, age_bucket).
MODEL (
name serving.ice_aging_stocks,
kind INCREMENTAL_BY_TIME_RANGE (
time_column report_date
),
grain (report_date, age_bucket),
start '2020-01-01',
cron '@daily'
);
WITH base AS (
SELECT
f.report_date,
f.age_bucket,
-- Parse age range from "0000 to 0120" format for correct sort order
TRY_CAST(split_part(f.age_bucket, ' to ', 1) AS int) AS age_bucket_start_days,
TRY_CAST(split_part(f.age_bucket, ' to ', 2) AS int) AS age_bucket_end_days,
f.antwerp_bags,
f.hamburg_bremen_bags,
f.houston_bags,
f.miami_bags,
f.new_orleans_bags,
f.new_york_bags,
f.total_bags,
f.source_file
FROM foundation.fct_ice_aging_stocks f
WHERE f.report_date BETWEEN @start_ds AND @end_ds
)
SELECT
b.report_date,
d.commodity_name,
d.ice_stock_report_code,
b.age_bucket,
b.age_bucket_start_days,
b.age_bucket_end_days,
b.antwerp_bags,
b.hamburg_bremen_bags,
b.houston_bags,
b.miami_bags,
b.new_orleans_bags,
b.new_york_bags,
b.total_bags,
b.source_file
FROM base b
CROSS JOIN foundation.dim_commodity d
WHERE d.ice_stock_report_code = 'COFFEE-C'
ORDER BY b.report_date, b.age_bucket_start_days
/* Serving mart: ICE certified Coffee C stock aging report, analytics-ready. */ /* Shows the age distribution of certified stocks across delivery ports. */ /* Age buckets represent how long coffee has been in certified storage. */ /* Older stock approaching certificate limits is a supply quality signal. */ /* Source: ICE Certified Stock Aging Report (monthly) */ /* Grain: one row per (report_date, age_bucket). */
MODEL (
name serving.ice_aging_stocks,
kind INCREMENTAL_BY_TIME_RANGE (
time_column report_date
),
grain (report_date, age_bucket),
start '2020-01-01',
cron '@daily'
);
WITH base AS (
SELECT
f.report_date,
f.age_bucket,
TRY_CAST(SPLIT_PART(f.age_bucket, ' to ', 1) AS INT) AS age_bucket_start_days, /* Parse age range from "0000 to 0120" format for correct sort order */
TRY_CAST(SPLIT_PART(f.age_bucket, ' to ', 2) AS INT) AS age_bucket_end_days,
f.antwerp_bags,
f.hamburg_bremen_bags,
f.houston_bags,
f.miami_bags,
f.new_orleans_bags,
f.new_york_bags,
f.total_bags,
f.source_file
FROM foundation.fct_ice_aging_stocks AS f
WHERE
f.report_date BETWEEN @start_ds AND @end_ds
)
SELECT
b.report_date,
d.commodity_name,
d.ice_stock_report_code,
b.age_bucket,
b.age_bucket_start_days,
b.age_bucket_end_days,
b.antwerp_bags,
b.hamburg_bremen_bags,
b.houston_bags,
b.miami_bags,
b.new_orleans_bags,
b.new_york_bags,
b.total_bags,
b.source_file
FROM base AS b
CROSS JOIN foundation.dim_commodity AS d
WHERE
d.ice_stock_report_code = 'COFFEE-C'
ORDER BY
b.report_date,
b.age_bucket_start_days