feat(extract): add OpenWeatherMap daily weather extractor
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.
Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core
SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
(location_id, observation_date), dedup via hash key, crop stress flags:
is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season
Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,23 +1,15 @@
|
||||
-- Commodity dimension: conforms identifiers across source systems.
|
||||
--
|
||||
-- This is the ontology. Each row is a commodity tracked by BeanFlows.
|
||||
-- As new sources are added (ICO, futures prices, satellite), their
|
||||
-- commodity identifiers are added as columns here — not as separate tables.
|
||||
-- As new commodities are added (cocoa, sugar), rows are added here.
|
||||
--
|
||||
-- References:
|
||||
-- usda_commodity_code → staging.psdalldata__commodity.commodity_code (numeric string, e.g. '0711100')
|
||||
-- cftc_commodity_code → foundation.fct_cot_positioning.cftc_commodity_code (3-char, e.g. '083')
|
||||
--
|
||||
-- NOTE: Defined as FULL model (not SEED) to guarantee leading-zero preservation.
|
||||
-- Pandas CSV loading converts '083' → 83 even with varchar column declarations.
|
||||
|
||||
/* Commodity dimension: conforms identifiers across source systems. */ /* This is the ontology. Each row is a commodity tracked by BeanFlows. */ /* As new sources are added (ICO, futures prices, satellite), their */ /* commodity identifiers are added as columns here — not as separate tables. */ /* As new commodities are added (cocoa, sugar), rows are added here. */ /* References: */ /* usda_commodity_code → staging.psdalldata__commodity.commodity_code (numeric string, e.g. '0711100') */ /* cftc_commodity_code → foundation.fct_cot_positioning.cftc_commodity_code (3-char, e.g. '083') */ /* NOTE: Defined as FULL model (not SEED) to guarantee leading-zero preservation. */ /* Pandas CSV loading converts '083' → 83 even with varchar column declarations. */
|
||||
MODEL (
|
||||
name foundation.dim_commodity,
|
||||
kind FULL
|
||||
);
|
||||
|
||||
SELECT usda_commodity_code, cftc_commodity_code, ticker, ice_stock_report_code, commodity_name, commodity_group
|
||||
SELECT
|
||||
usda_commodity_code,
|
||||
cftc_commodity_code,
|
||||
ticker,
|
||||
ice_stock_report_code,
|
||||
commodity_name,
|
||||
commodity_group
|
||||
FROM (VALUES
|
||||
('0711100', '083', 'KC=F', 'COFFEE-C', 'Coffee, Green', 'Softs')
|
||||
) AS t(usda_commodity_code, cftc_commodity_code, ticker, ice_stock_report_code, commodity_name, commodity_group)
|
||||
('0711100', '083', 'KC=F', 'COFFEE-C', 'Coffee, Green', 'Softs')) AS t(usda_commodity_code, cftc_commodity_code, ticker, ice_stock_report_code, commodity_name, commodity_group)
|
||||
Reference in New Issue
Block a user