beanflows

Author	SHA1	Message	Date
Deeman	80c1163a7f	feat: extraction framework overhaul — extract_core shared package + SQLite state tracking - Add extract/extract_core/ workspace package with three modules: - state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor) - http.py: niquests session factory + etag normalization helpers - files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes) - State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed - SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical - Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks): - Replace inline boilerplate with extract_core helpers - Add start_run/end_run tracking to every extraction entry point - extract_cot_year returns int (bytes_written) instead of bool - Update tests: assert result == 0 (not `is False`) for the return type change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 14:37:50 +01:00
Deeman	c92e5a8e07	ice_stocks: add backfill extractor for historical daily stocks The ICE API at /marketdata/api/reports/293/results stores all historical daily XLS reports date-descending. Previously the extractor only fetched the latest. New extract_ice_backfill entry point pages through the API and downloads all matching 'Daily Warehouse Stocks' reports. - ice_api.py: add find_all_reports() alongside find_latest_report() - execute.py: add extract_ice_stocks_backfill(max_pages=3) — default covers ~6 months; max_pages=20 fetches ~3 years of history - pyproject.toml: register extract_ice_backfill entry point Ran backfill: 131 files, 2025-08-15 → 2026-02-20 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 01:35:57 +01:00
Deeman	493ce64fde	fix ice_stocks XLS date parsing: handle 'Feb 20, 2026' format ICE changed the daily stocks XLS header from 'As of: 1/30/2026' to 'As of: Feb 20, 2026 1:35:39PM'. Expand _build_canonical_csv_from_xls to try multiple strptime formats (%m/%d/%Y, %b %d, %Y, etc.) on both single-token and three-token date candidates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 22:18:17 +01:00
Deeman	ff896685d2	Add extract_ice_all command to run all three ICE extractors in sequence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:59:08 +01:00
Deeman	ff7301d6a8	ICE extraction overhaul: API discovery + aging report + historical backfill - Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via the private ICE Report Center JSON API (no auth required) - Add rolling CSV → XLS fallback in extract_ice_stocks() using find_latest_report() from ice_api.py - Add ice_api.py: fetch_report_listings(), find_latest_report() with pagination up to MAX_API_PAGES - Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows() using xlrd for OLE2/BIFF XLS files - Add extract_ice_aging(): monthly certified stock aging report by age bucket × port → ice_aging/ landing dir - Add extract_ice_historical(): 30-year EOM by-port stocks from static ICE URL → ice_stocks_by_port/ landing dir - Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures) - Add SQLMesh raw + foundation models for both new datasets - Add ice_aging_glob(), ice_stocks_by_port_glob() macros - Add extract_ice_aging + extract_ice_historical pipeline entries - Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output) Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/) must be created locally — data/ is gitignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:13:18 +01:00
Deeman	67c048485b	Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation Phase 1A — KC=F Coffee Futures Prices: - New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV, stores as gzip CSV with SHA256-based idempotency - SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices → serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %) - Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA) - API: GET /commodities/<ticker>/prices Phase 1B — Data Methodology Page: - New /methodology route with full-page template (base.html) - 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks, data quality model, update schedule table - "Methodology" link added to marketing footer Phase 1C — Automated Pipeline: - supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in sequence before transform - Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram) ICE Warehouse Stocks: - New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV to canonical schema, hash-based idempotency, soft-fail on 404 with guidance - SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown) - Dashboard: 4 metric cards + line chart (certified bags + 30d avg) - API: GET /commodities/<code>/stocks Foundation: - dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns - macros/__init__.py: added prices_glob() and ice_stocks_glob() - pipelines.py: added extract_prices and extract_ice entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 11:41:43 +01:00

6 Commits