Commit Graph

31 Commits

Author SHA1 Message Date
Deeman
c8b86569ff chore: consolidate to single ruff config in root pyproject.toml
All checks were successful
CI / test-cli (push) Successful in 11s
CI / test-sqlmesh (push) Successful in 14s
CI / test-web (push) Successful in 14s
CI / tag (push) Successful in 2s
- Merge web ruff settings (select E/F/I/UP, line-length 100) into root config
- Remove [tool.ruff] section from web/pyproject.toml
- Remove "web" from root ruff exclude list
- Simplify pre-commit hook to one command: ruff check .
- Update CI to use: uv run ruff check . (from repo root)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 12:21:01 +01:00
Deeman
42c1309b20 chore: add pre-commit ruff hook with auto-fix
Some checks failed
CI / test-cli (push) Successful in 11s
CI / test-sqlmesh (push) Successful in 12s
CI / test-web (push) Failing after 14s
CI / tag (push) Has been skipped
- scripts/hooks/pre-commit: runs ruff --fix for root and web/ (matching CI)
  and re-stages any auto-fixed files so they land in the commit
- Makefile: add install-hooks target (run once after clone)
- pyproject.toml: exclude web/ from root ruff (web has its own config)
- Fix remaining import sort warnings caught by the new hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 10:19:29 +01:00
Deeman
5d7d53a260 feat(supervisor): port Python supervisor from padelnomics + workflows.toml
Port padelnomics' schedule-aware Python supervisor to materia:
- src/materia/supervisor.py — croniter scheduling, topological wave
  execution (parallel independent workflows), tag-based git pull + deploy,
  status CLI subcommand
- infra/supervisor/workflows.toml — workflow registry (psd daily, cot
  weekly, prices daily, ice daily, weather daily)
- infra/supervisor/materia-supervisor.service — updated ExecStart to Python
  supervisor, added SUPERVISOR_GIT_PULL=1

Adaptations from padelnomics:
- Uses extract_core.state.open_state_db (not padelnomics_extract.utils)
- uv run sqlmesh -p transform/sqlmesh_materia run
- uv run materia pipeline run export_serving
- web/deploy.sh path (materia's deploy.sh is under web/)
- Removed proxy_mode (not used in materia)

Also: add croniter dependency to src/materia, delete old supervisor.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:59:55 +01:00
Deeman
9de3a3ba01 feat(extract): replace OpenWeatherMap with Open-Meteo weather extractor
Replaced the OWM extractor (8 locations, API key required, 14,600-call
backfill over 30+ days) with Open-Meteo (12 locations, no API key,
ERA5 reanalysis, full backfill in 12 API calls ~30 seconds).

- Rename extract/openweathermap → extract/openmeteo (git mv)
- Rewrite api.py: fetch_archive (ERA5, date-range) + fetch_recent (forecast,
  past_days=10 to cover ERA5 lag); 9 daily variables incl. et0 and VPD
- Rewrite execute.py: _split_and_write() unzips parallel arrays into per-day
  flat JSON; no cursor / rate limiting / call cap needed
- Update pipelines.py: --package openmeteo, timeout 120s (was 1200s)
- Update fct_weather_daily.sql: flat Open-Meteo field names (temperature_2m_*
  etc.), remove pressure_afternoon_hpa, add et0_mm + vpd_max_kpa + is_high_vpd
- Remove OPENWEATHERMAP_API_KEY from CLAUDE.md env vars table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:59:54 +01:00
Deeman
08e74665bb feat(extract): add OpenWeatherMap daily weather extractor
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.

Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
  calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core

SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
  (location_id, observation_date), dedup via hash key, crop stress flags:
  is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season

Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 22:40:27 +01:00
Deeman
80c1163a7f feat: extraction framework overhaul — extract_core shared package + SQLite state tracking
- Add extract/extract_core/ workspace package with three modules:
  - state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor)
  - http.py: niquests session factory + etag normalization helpers
  - files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes)
- State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed
- SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical
- Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks):
  - Replace inline boilerplate with extract_core helpers
  - Add start_run/end_run tracking to every extraction entry point
  - extract_cot_year returns int (bytes_written) instead of bool
- Update tests: assert result == 0 (not `is False`) for the return type change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 14:37:50 +01:00
Deeman
ff7301d6a8 ICE extraction overhaul: API discovery + aging report + historical backfill
- Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via
  the private ICE Report Center JSON API (no auth required)
- Add rolling CSV → XLS fallback in extract_ice_stocks() using
  find_latest_report() from ice_api.py
- Add ice_api.py: fetch_report_listings(), find_latest_report() with
  pagination up to MAX_API_PAGES
- Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows()
  using xlrd for OLE2/BIFF XLS files
- Add extract_ice_aging(): monthly certified stock aging report by
  age bucket × port → ice_aging/ landing dir
- Add extract_ice_historical(): 30-year EOM by-port stocks from static
  ICE URL → ice_stocks_by_port/ landing dir
- Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures)
- Add SQLMesh raw + foundation models for both new datasets
- Add ice_aging_glob(), ice_stocks_by_port_glob() macros
- Add extract_ice_aging + extract_ice_historical pipeline entries
- Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output)

Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/)
must be created locally — data/ is gitignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:13:18 +01:00
Deeman
b167a0a9f4 Add scout MCP server for browser recon + msgspec workspace dep
- tools/scout/: browser automation MCP server using Pydoll (CDP, no WebDriver)
  - scout_visit, scout_elements (text-first), scout_click, scout_fill, scout_select
  - scout_scroll, scout_text, scout_screenshot (opt-in)
  - scout_har_start / scout_har_stop (asyncio task holds recording context open)
  - scout_analyze: HAR parsing with HarEntry/HarSummary msgspec structs
  - Standalone project (not workspace member — websockets conflict with prefect)
  - Runs via: uv run --directory tools/scout scout-server

- .mcp.json: registers scout as Claude Code MCP server (project scope)

- msgspec>=0.19 added to root project deps (workspace-wide struct/validation)

- coding_philosophy.md: document msgspec as approved dep, usage rules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 15:44:02 +01:00
Deeman
67c048485b Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation
Phase 1A — KC=F Coffee Futures Prices:
- New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV,
  stores as gzip CSV with SHA256-based idempotency
- SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices →
  serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %)
- Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA)
- API: GET /commodities/<ticker>/prices

Phase 1B — Data Methodology Page:
- New /methodology route with full-page template (base.html)
- 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks,
  data quality model, update schedule table
- "Methodology" link added to marketing footer

Phase 1C — Automated Pipeline:
- supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in
  sequence before transform
- Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram)

ICE Warehouse Stocks:
- New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV
  to canonical schema, hash-based idempotency, soft-fail on 404 with guidance
- SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks
  → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown)
- Dashboard: 4 metric cards + line chart (certified bags + 30d avg)
- API: GET /commodities/<code>/stocks

Foundation:
- dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns
- macros/__init__.py: added prices_glob() and ice_stocks_glob()
- pipelines.py: added extract_prices and extract_ice entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 11:41:43 +01:00
Deeman
0a83b2cb74 Add CFTC COT data integration with foundation data model layer
- New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs
  from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization
- SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart)
- dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology
- fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities
- obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios
- Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest]
- Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart
- 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 23:28:10 +01:00
Deeman
d6d2aa8efe Merge remote-tracking branch 'origin/master'
# Conflicts:
#	infra/readme.md
2026-02-18 21:09:24 +01:00
Deeman
c1d00dcdc4 Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.

- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:50:19 +01:00
Deeman
09ae88be19 cleanup and prefect service setup 2026-02-05 20:01:50 +01:00
Deeman
6d4377ccf9 cleanup and prefect service setup 2026-02-04 22:24:55 +01:00
Hendrik Dreesmann
b702e6565a Update SQLMesh for R2 data access & Convert psd data to gzip 2025-11-02 00:26:01 +01:00
Deeman
6c93021f2d remove stupid rules 2025-10-12 21:44:56 +02:00
Deeman
7e06eae5ac Add comprehensive ruff linting rules and migrate to uv build backend
- Configure ruff with strict linting rules (pycodestyle, pyflakes, isort, pylint, etc.)
- Exclude notebooks folder from linting
- Set line length to 88 characters and target Python 3.13
- Migrate build backend from hatchling to uv_build for better integration
- Add per-file ignores for __init__.py and scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 21:41:39 +02:00
Deeman
5ce112f44d Add comprehensive E2E tests for materia CLI
- Add pytest and pytest-cov for testing
- Add niquests for modern HTTP/2 support (keep requests for hcloud compatibility)
- Create 13 E2E tests covering CLI, workers, pipelines, and secrets (71% coverage)
- Fix Pulumi ESC environment path (beanflows/prod) and secret key names
- Update GitLab CI to run CLI tests with coverage reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 21:32:51 +02:00
Deeman
55bb84f0fa implement cli/infra update cicd 2025-10-12 21:00:41 +02:00
Deeman
77dd277ebf updates 2025-10-12 14:26:37 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
bd65ddcac8 adding incremental load abilities 2025-07-26 21:10:02 +02:00
Deeman
b8ad73202c finish historical extraction 2025-07-13 23:20:50 +02:00
Deeman
c3c281fcd8 update structure 2025-07-08 22:41:59 +02:00
Deeman
5368c1e521 changes
'
2025-06-11 11:49:20 +02:00
Deeman
265250864c add dlt script to extract data from fas.usda.gov 2025-04-30 22:35:31 +02:00
Simon Dmsn
bff78f1e72 Psd 2025-04-30 18:56:39 +02:00
2839757cf8 add cicd and precommit 2025-03-01 18:23:56 +01:00
2a4e7fe668 update 2025-03-01 18:15:34 +01:00
32b7df714e add yfinance and more readme 2025-03-01 18:13:38 +01:00
96a6abf1e0 Initial commit 2025-03-01 18:11:57 +01:00