Commit Graph

216 Commits

Author SHA1 Message Date
Deeman
54dbb296dd fix(secrets): add secrets-updatekeys-prod target, use --input-type dotenv
sops updatekeys doesn't inherit --input-type from context, so calling it bare
on .env.prod.sops causes "Error unmarshalling input json" (guesses JSON from
the .sops extension). Explicit --input-type dotenv fixes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:40:03 +01:00
Deeman
7d3263a39c chore: add server age key 2026-02-27 07:37:36 +01:00
Deeman
fd164ca66a chore: add server age key 2026-02-27 07:31:56 +01:00
Deeman
a3ce707a5b fix(infra): fix setup_server.sh summary — correct bootstrap command + sops format
- Detect server IP at runtime (hostname -I) and print real ssh command
- Replace misleading >- yaml block + '+' notation with correct comma-separated
  age key format: age: <dev-key>,<server-key>
- Label next steps as "(run from your workstation)"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:31:14 +01:00
Deeman
d14b45f7d6 fix(infra): guard SSH config write, add ROTATE_KEYS for key rotation
setup_server.sh is now fully idempotent on re-runs:
- deploy key generation was already guarded; SSH config write was not
- SSH config now only written if it doesn't exist (content never changes)
- ROTATE_KEYS=1 deletes the old keypair before generation, prints new
  public key to add to GitLab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 07:12:09 +01:00
Deeman
c778903264 merge(infra): consolidate tool installs in setup, strip bootstrap to essentials
Merges worktree-sops-supervisor-docs → master.

Summary of changes:
- setup_server.sh: now installs all tools (git, curl, age, sops, rclone, uv) —
  single source of truth for server provisioning
- bootstrap_supervisor.sh: stripped to ~45 lines — zero tool installs, only
  clone/fetch + decrypt + uv sync + systemd enable
- readme.md: updated descriptions to reflect new responsibilities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 06:57:09 +01:00
Deeman
cf65fa16b6 refactor(infra): consolidate tool installs in setup, strip bootstrap to essentials
- setup_server.sh: add git/curl/ca-certificates apt install, add uv install
  as service user, fix SSH config write (root + chown vs sudo heredoc), remove
  noise log lines after set -e makes them redundant
- bootstrap_supervisor.sh: remove all tool installs (apt, uv, sops, age) —
  setup_server.sh is now the single source of truth; strip to ~45 lines:
  age-key check, clone/fetch, tag checkout, decrypt, uv sync, systemd enable
- readme.md: update step 1 and step 3 descriptions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 22:25:31 +01:00
Deeman
0317cb885f feat(infra): use beanflows_service for supervisor
- materia-supervisor.service: User=root → User=beanflows_service,
  add PATH so uv (~/.local/bin) is found without a login shell
- setup_server.sh: full rewrite — creates beanflows_service (nologin),
  generates SSH deploy key + age keypair as service user at XDG path
  (~/.config/sops/age/keys.txt), installs age/sops/rclone as root,
  prints both public keys + numbered next-step instructions
- bootstrap_supervisor.sh: full rewrite — removes GITLAB_READ_TOKEN
  requirement, clones via SSH as service user, installs uv as service
  user, decrypts with SOPS auto-discovery, uv sync as service user,
  systemctl as root
- web/deploy.sh: remove self-contained sops/age install + keypair
  generation; replace with simple sops check (exit if missing) and
  SOPS auto-discovery decrypt (no explicit key file needed)
- infra/readme.md: update architecture diagram for beanflows_service
  paths, update setup steps to match new scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 21:33:31 +01:00
Deeman
b27f06d811 chore: remove stale ralph-loop session file
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:28:31 +01:00
Deeman
ba3994a63b chore: add secrets-encrypt-dev/prod targets to match template 2026-02-26 20:27:35 +01:00
Deeman
6d4921d1a6 chore: align Makefile with padelnomics (pinned tailwind version, help target, dev target, .PHONY) 2026-02-26 20:23:31 +01:00
Deeman
c469f585eb docs: add PROJECT.md with backlog (retry/backoff for ICE + yfinance) 2026-02-26 20:08:12 +01:00
Deeman
8f97c6b0c9 fix(positioning): prevent canvas collapse on type/range toggle
- Lock #positioning-canvas min-height to current offsetHeight before each
  HTMX swap, release it in htmx:afterSwap — prevents flash-to-zero during
  Chart.js initialization in the new content
- Add CSS min-height:200px fallback on all canvas containers so they never
  fully collapse even before JS runs
- Extract _swapCanvas() helper to deduplicate setRange/setType logic

Root cause of visual collapse: cot_positioning_combined table missing
(needs sqlmesh plan prod + export_serving to materialize).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:04:19 +01:00
Deeman
d3c9d95386 fix(analytics): wrap max_date in ANY_VALUE() in get_weather_stress_latest
DuckDB requires all selected columns to be aggregate expressions when there
is no GROUP BY. latest.max_date is a scalar CTE value but still needs
ANY_VALUE() wrapping to satisfy the binder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 19:50:27 +01:00
Deeman
6461c58957 fix(web): fix Chart.js sizing after HTMX swaps on all dashboard pages
Two-part fix for charts going tiny on range changes (especially 3m) and
staying broken after subsequent navigations:

1. dashboard_base.html: global htmx:beforeSwap handler destroys any Chart.js
   instances in the swap target before HTMX replaces the DOM. Without this,
   the old chart's ResizeObserver remains attached to the parent container and
   interferes with the newly created chart instance's dimension calculations.

2. All chart pages (positioning, supply, warehouse, weather): afterSwap handler
   now wraps chart resize in requestAnimationFrame, ensuring the browser has
   completed layout before Chart.js measures container dimensions. MA toggle
   state is also restored inside the rAF callback after resize.

Root cause: chart init scripts run synchronously during innerHTML swap, before
browser layout is complete. Fast server responses (e.g. 3m = small dataset)
gave even less time for layout, making the timing issue reproducible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 15:16:13 +01:00
Deeman
411aea3954 merge: SOPS migration + Python supervisor + docs (3 repos) 2026-02-26 12:15:35 +01:00
Deeman
518b50d0f5 docs(claude+infra): expand CLAUDE.md + infra/readme.md for full architecture
CLAUDE.md additions:
- List all 6 extractor packages + extract_core
- Full data flow with all sources + dual-DuckDB
- Foundation-as-ontology: dim_commodity conforms cross-source identifiers
- Two-DuckDB architecture explanation (why not serving.duckdb)
- Extraction pattern: one-package-per-source, state SQLite, adding new source
- Supervisor: croniter scheduling, topological waves, tag-based deploy
- CI/CD: pull-based via git tags, no SSH
- Secrets management: SOPS+age section, file table, server key workflow
- uv workspace management section
- Remove Pulumi ESC references; update env vars table

infra/readme.md:
- Update architecture diagram (add analytics.duckdb, age-key.txt)
- Rewrite setup flow: setup_server.sh → add key to SOPS → bootstrap
- Secrets management section with file table
- Deploy model: pull-based (no SSH/CI credentials)
- Monitoring: add supervisor status + extraction state DB query

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:04:55 +01:00
Deeman
95f881827e feat(infra): replace Pulumi ESC with SOPS in bootstrap + setup scripts
- bootstrap_supervisor.sh: remove esc CLI + PULUMI_ACCESS_TOKEN; install
  sops+age; check age keypair exists; decrypt .env.prod.sops → .env;
  checkout latest release tag; use uv sync --all-packages
- setup_server.sh: add age keypair generation at /opt/materia/age-key.txt;
  install age binary; print public key with .sops.yaml instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:03:11 +01:00
Deeman
5d7d53a260 feat(supervisor): port Python supervisor from padelnomics + workflows.toml
Port padelnomics' schedule-aware Python supervisor to materia:
- src/materia/supervisor.py — croniter scheduling, topological wave
  execution (parallel independent workflows), tag-based git pull + deploy,
  status CLI subcommand
- infra/supervisor/workflows.toml — workflow registry (psd daily, cot
  weekly, prices daily, ice daily, weather daily)
- infra/supervisor/materia-supervisor.service — updated ExecStart to Python
  supervisor, added SUPERVISOR_GIT_PULL=1

Adaptations from padelnomics:
- Uses extract_core.state.open_state_db (not padelnomics_extract.utils)
- uv run sqlmesh -p transform/sqlmesh_materia run
- uv run materia pipeline run export_serving
- web/deploy.sh path (materia's deploy.sh is under web/)
- Removed proxy_mode (not used in materia)

Also: add croniter dependency to src/materia, delete old supervisor.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:59:55 +01:00
Deeman
64687d192c merge: CFTC COT combined (futures+options) report — extractor, transform, web toggle 2026-02-26 11:29:20 +01:00
Deeman
0326e5c83d feat(web): add F+O Combined toggle to positioning dashboard
- analytics.py: add _cot_table() helper; add combined=False param to
  get_cot_positioning_time_series(), get_cot_positioning_latest(),
  get_cot_index_trend(); add get_cot_options_delta() for MM net delta
  between combined and futures-only
- dashboard/routes.py: read ?type=fut|combined param; pass combined flag
  to analytics calls; conditionally fetch options_delta when combined
- api/routes.py: add ?type= param to /positioning and /positioning/latest
  endpoints; returned JSON includes type field
- positioning.html: add report type pill group (Futures / F+O Combined)
  with setType() JS; setRange() and popstate now preserve the type param
- positioning_canvas.html: sync type pills on HTMX swap; show Opt Δ badge
  on MM Net card when combined+options_delta available; conditional chart
  title and subtitle reflect which report variant is shown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 11:25:05 +01:00
Deeman
b884bc2b4a feat(cot): add combined (futures+options) COT extractor and transform models
- extract/cftc_cot: refactor extract_cot_year() to accept url_template and
  landing_subdir params; add _extract_cot() shared loop; add extract_cot_combined()
  entry point using com_disagg_txt_{year}.zip → landing/cot_combined/
- pyproject.toml: add extract_cot_combined script entry point
- macros/__init__.py: add @cot_combined_glob() for cot_combined/**/*.csv.gzip
- fct_cot_positioning.sql: union cot_glob and cot_combined_glob in src CTE;
  add report_type column (FutOnly_or_Combined) to cast_and_clean + deduplicated;
  include FutOnly_or_Combined in hkey to avoid key collisions; add report_type to grain
- obt_cot_positioning.sql: add report_type = 'FutOnly' filter to preserve
  existing serving behavior
- obt_cot_positioning_combined.sql: new serving model filtered to report_type =
  'Combined'; identical analytics (COT index, net %, windows) on combined data
- pipelines.py: register extract_cot_combined; add to extract_all meta-pipeline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 11:24:56 +01:00
Deeman
520da2c920 feat(ci): switch to pull-based deploy via git tags
Replace push-based SSH deploy (deploy:web stage with SSH credentials +
individual env var injection) with tag-based pull deploy:

- Add `tag` stage: creates v${CI_PIPELINE_IID} tag using CI_JOB_TOKEN
- Remove all SSH variables (SSH_PRIVATE_KEY, SSH_KNOWN_HOSTS, DEPLOY_USER,
  DEPLOY_HOST) and all individual secret variables from CI
- Zero deploy secrets in CI — only CI_JOB_TOKEN (built-in) needed

Deployment is now handled by the on-server supervisor (src/materia/supervisor.py)
which polls for new v* tags every 60s and runs web/deploy.sh automatically.
Secrets live in .env.prod.sops (git-committed, age-encrypted), decrypted at
deploy time by deploy.sh — never stored in GitLab CI variables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:10:06 +01:00
Deeman
4c7e520804 fix(deploy): add analytics.duckdb bind-mount to docker-compose.prod.yml
App containers need access to the serving DuckDB populated by the
pipeline supervisor. Bind-mounts /data/materia/analytics.duckdb as
read-only and sets SERVING_DUCKDB_PATH in container environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:59:33 +01:00
Deeman
f253e39c2c feat(deploy): port padelnomics deploy.sh improvements to web/deploy.sh
- Auto-install sops + age binaries to web/bin/ if not present
- Generate age keypair at repo root age-key.txt if missing (prints public
  key with instructions to add to .sops.yaml, then exits)
- Decrypt .env.prod.sops → web/.env at deploy time (no CI secrets needed)
- Backup SQLite DB before migration (timestamped, keeps last 3)
- Rollback on health check failure: dump logs + restore DB backup
- Reset nginx router to current slot before --wait to avoid upstream errors
- Remove web/scripts/deploy.sh (duplicate)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:59:07 +01:00
Deeman
643c0b2db9 feat(secrets): update core.py dotenv to load from repo root .env
Load .env from repo root first (created by `make secrets-decrypt-dev`),
falling back to web/.env for legacy setups. Also fixes import sort order
and removes unused httpx import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:47:34 +01:00
Deeman
6d716a83ae feat(secrets): rewrite secrets.py for SOPS, update cli.py
secrets.py: replace Pulumi ESC (esc CLI) with SOPS decrypt. Reads
.env.prod.sops via `sops --decrypt`, parses dotenv output. Same public
API: get_secret(), list_secrets(), test_connection().

cli.py: update secrets subcommand help text and test command messaging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:44:25 +01:00
Deeman
9d0e6843f4 feat(secrets): add SOPS+age secret management infrastructure
- .sops.yaml: creation rules matching .env.{dev,prod}.sops (dotenv format)
- .env.dev.sops: encrypted dev defaults (blank API keys, local paths)
- .env.prod.sops: encrypted prod template (placeholder values to fill in)
- Makefile: root Makefile with secrets-decrypt-dev/prod, secrets-edit-dev/prod, css-build/watch
- .gitignore: add age-key.txt

Dev workflow: make secrets-decrypt-dev → .env (repo root) → web app picks it up.
Server: deploy.sh will auto-decrypt .env.prod.sops on each deploy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:36:14 +01:00
Deeman
b25b8780a7 docs: update inventory with ICE options research findings
- yfinance confirmed not viable (OPRA only, KC=F not covered)
- CFTC COT combined report is the free immediate path (URL change only)
- ICE Report Center settlement data viable with WebICE login automation
- Barchart OnDemand has correct coverage but requires paid subscription
- All OpenBB providers, Polygon.io, Nasdaq Data Link confirmed no KC=F coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 10:16:50 +01:00
Deeman
70415e23b8 docs: add data sources inventory
Documents all 7 ingested sources (CFTC COT, Yahoo Finance KC=F, ICE stocks×3,
USDA PSD, Open-Meteo ERA5) plus planned sources (ICE options, COT combined,
World Bank Pink Sheet, FAO crop calendar). Matches padelnomics inventory format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 09:57:46 +01:00
Deeman
4fae358f97 fix(extract,transform): fix COT/prices column name mismatches + OWM rate limit skip
- fct_cot_positioning: quote Swap__Positions_Short_All and Swap__Positions_Spread_All
  (CSV uses double underscore; DuckDB preserves header names exactly)
- fct_cot_positioning: quote Report_Date_as_YYYY-MM-DD (dashes preserved in header)
- fct_coffee_prices: quote "Adj Close" (space in CSV header)
- openmeteo/execute.py: skip API call in backfill when all daily files already exist
  (_count_existing_files pre-check prevents 429 rate limit on re-runs)
- dev_run.sh: open browser as admin@beanflows.coffee instead of pro@

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 09:46:34 +01:00
Deeman
611a4af966 fix(dev): restore execute permission on dev_run.sh 2026-02-26 02:56:49 +01:00
Deeman
a9fb0d38c1 merge: weather data integration — serving layer + web app + browser auto-open 2026-02-26 02:55:19 +01:00
Deeman
8628496881 feat(dev): open browser automatically on dev server ready
Polls /auth/dev-login until the app responds, then opens an incognito/private
window — same pattern as padelnomics. Tries flatpak Chrome → flatpak Firefox
→ system Chrome → Chromium → Firefox in that order.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:52:45 +01:00
Deeman
3ab0cd122f claude 2026-02-26 02:45:09 +01:00
Deeman
302ba07851 add untracked 2026-02-26 02:44:48 +01:00
Deeman
3629783bbf feat: add CMS/pSEO engine, feature flags, email log (template v0.17.0 backport) ... 2026-02-26 02:43:10 +01:00
Deeman
494f7ff1ee feat(web): integrate crop stress into Pulse page
- index() route: add get_weather_stress_latest() and get_weather_stress_trend(90d)
  to asyncio.gather; pass weather_stress_latest and weather_stress_trend to template
- pulse.html: add 5th metric card (Crop Stress Index, color-coded green/copper/danger)
- pulse.html: add 5th sparkline card (90d avg stress trend) linking to /dashboard/weather
- pulse.html: update spark-grid to auto-fit (minmax 280px) to accommodate 5 cards
- pulse.html: add Weather freshness badge to the freshness bar

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:39:29 +01:00
Deeman
89c9f89c8e feat(web): add weather API endpoints (locations, series, stress, alerts)
Adds 4 REST endpoints under /api/v1/weather/:
- GET /weather/locations — 12 locations with latest stress, sorted by severity
- GET /weather/locations/<id> — daily series for one location (?metrics, ?days)
- GET /weather/stress — global daily stress trend (?days)
- GET /weather/alerts — locations with active crop stress flags

All endpoints use @api_key_required(scopes=["read"]) and return {"data": ...}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:39:24 +01:00
Deeman
a8cfd68eda feat(web): add Weather dashboard page with Leaflet map, location cards, and stress charts
- routes.py: add weather() route (range/location params, asyncio.gather, HTMX support)
- weather.html: page shell loading Leaflet + Chart.js, HTMX canvas scaffold
- weather_canvas.html: HTMX partial with overview (map, metric cards, global stress chart,
  alert table, location card grid) and detail view (stress+precip chart, temp+water chart)
- dashboard_base.html: add Weather to sidebar (after Warehouse) and mobile bottom nav
  (replaces Origins; Origins remains in desktop sidebar)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:39:19 +01:00
Deeman
127881f7d8 feat(web): add weather analytics query functions to analytics.py
Adds ALLOWED_WEATHER_METRICS frozenset and 5 new query functions:
- get_weather_locations(): 12 locations with latest stress index for map/cards
- get_weather_location_series(): time series for one location (dynamic metrics)
- get_weather_stress_latest(): global snapshot for Pulse metric card
- get_weather_stress_trend(): daily global avg/max for chart and sparkline
- get_weather_active_alerts(): locations with active stress flags

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:39:12 +01:00
Deeman
07b813198a feat(transform): add serving.weather_daily with rolling analytics and crop stress index
Incremental serving model for 12 coffee-growing locations. Adds:
- Rolling aggregates: precip_sum_7d/30d, temp_mean_30d, temp_anomaly, water_balance_7d
- Gaps-and-islands streak counters: drought_streak_days, heat_streak_days, vpd_streak_days
- Composite crop_stress_index 0–100 (drought 30%, water deficit 25%, heat 20%, VPD 15%, frost 10%)
- lookback 90: ensures rolling windows and streak counters see sufficient history on daily runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:39:07 +01:00
Deeman
3ae8c7e98a merge: SQL fixes (cot_positioning SELECT *, fct_weather_daily src ref) 2026-02-26 01:32:19 +01:00
Deeman
690691ea36 fix(transform): expand SELECT * in cot_positioning, fix src ref in fct_weather_daily
- obt_cot_positioning.sql: replace final SELECT * with explicit column list
  so linter can resolve schema without foundation.fct_cot_positioning in DB
- fct_weather_daily.sql: fix HASH(location_id, src."date") → located."date"
  (cast_and_clean CTE references FROM located, not FROM src)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 01:32:16 +01:00
Deeman
8285daaa17 merge: Open-Meteo weather extractor (replaces OpenWeatherMap) 2026-02-26 01:01:29 +01:00
Deeman
9de3a3ba01 feat(extract): replace OpenWeatherMap with Open-Meteo weather extractor
Replaced the OWM extractor (8 locations, API key required, 14,600-call
backfill over 30+ days) with Open-Meteo (12 locations, no API key,
ERA5 reanalysis, full backfill in 12 API calls ~30 seconds).

- Rename extract/openweathermap → extract/openmeteo (git mv)
- Rewrite api.py: fetch_archive (ERA5, date-range) + fetch_recent (forecast,
  past_days=10 to cover ERA5 lag); 9 daily variables incl. et0 and VPD
- Rewrite execute.py: _split_and_write() unzips parallel arrays into per-day
  flat JSON; no cursor / rate limiting / call cap needed
- Update pipelines.py: --package openmeteo, timeout 120s (was 1200s)
- Update fct_weather_daily.sql: flat Open-Meteo field names (temperature_2m_*
  etc.), remove pressure_afternoon_hpa, add et0_mm + vpd_max_kpa + is_high_vpd
- Remove OPENWEATHERMAP_API_KEY from CLAUDE.md env vars table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:59:54 +01:00
Deeman
32c9d7ae07 merge: expand weather locations to 12 regions 2026-02-26 00:12:33 +01:00
Deeman
4817f7de2f feat(extract): add 4 weather locations (ES, PE, UG, CI)
Expands coverage from 8 to 12 coffee-growing regions:
- brazil_espirito_santo (Robusta/Conilon — largest BR Robusta state)
- peru_jaen (Arabica — fastest-growing origin, top-10 global producer)
- uganda_elgon (Robusta — 4th largest African producer)
- ivory_coast_daloa (Robusta — historically significant West African origin)

Now 8 Arabica + 4 Robusta regions = 12 calls/day (well within OWM free tier).
Backfill cost: ~21,900 additional calls over ~44 days at 500/run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:12:29 +01:00
Deeman
99055caaa2 merge: OpenWeatherMap daily weather extractor 2026-02-25 22:40:32 +01:00
Deeman
08e74665bb feat(extract): add OpenWeatherMap daily weather extractor
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.

Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
  calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core

SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
  (location_id, observation_date), dedup via hash key, crop stress flags:
  is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season

Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 22:40:27 +01:00