beanflows

Author	SHA1	Message	Date
Deeman	c3c8333407	refactor(transform): remove raw layer, read landing zone directly - Delete 6 data raw models (coffee_prices, cot_disaggregated, ice_, psd_data) — pure read_csv passthroughs with no added value - Move 3 PSD seed models raw/ → seeds/, rename schema raw. → seeds.* - Update staging.psdalldata__commodity: read_csv(@psd_glob()) directly, join seeds.psd_* instead of raw.psd_* - Update 5 foundation models: inline read_csv() with src CTE, removing raw.* dependency (fct_coffee_prices, fct_cot_positioning, fct_ice_*) - Remove fixture-based SQLMesh test that depended on raw.cot_disaggregated (unit tests incompatible with inline read_csv; integration run covers this) - Update readme.md: 3-layer architecture (staging/foundation → serving) Landing files are immutable and content-addressed — the landing directory is the audit trail. A raw SQL layer duplicated file bytes into DuckDB with no added value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 17:30:18 +01:00
Deeman	ff956b0138	ICE aging + by-port: serving models, API endpoints, dashboard integration - serving/ice_aging_stocks.sql: pass-through from foundation, parses age bucket string to start/end days ints for correct sort order - serving/ice_warehouse_stocks_by_port.sql: monthly by-port since 1996, adds MoM change, MoM %, 12-month rolling average - analytics.py: get_ice_aging_latest(), get_ice_aging_trend(), get_ice_stocks_by_port_trend(), get_ice_stocks_by_port_latest() - api/routes.py: GET /commodities/<code>/stocks/aging and GET /commodities/<code>/stocks/by-port with auth + rate limiting - dashboard/routes.py: add 3 new queries to asyncio.gather(), pass to template - index.html: aging stacked bar chart (age buckets × port) with 4 metric cards; by-port stacked area chart (30-year history) with 4 metric cards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:52:35 +01:00
Deeman	ff7301d6a8	ICE extraction overhaul: API discovery + aging report + historical backfill - Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via the private ICE Report Center JSON API (no auth required) - Add rolling CSV → XLS fallback in extract_ice_stocks() using find_latest_report() from ice_api.py - Add ice_api.py: fetch_report_listings(), find_latest_report() with pagination up to MAX_API_PAGES - Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows() using xlrd for OLE2/BIFF XLS files - Add extract_ice_aging(): monthly certified stock aging report by age bucket × port → ice_aging/ landing dir - Add extract_ice_historical(): 30-year EOM by-port stocks from static ICE URL → ice_stocks_by_port/ landing dir - Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures) - Add SQLMesh raw + foundation models for both new datasets - Add ice_aging_glob(), ice_stocks_by_port_glob() macros - Add extract_ice_aging + extract_ice_historical pipeline entries - Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output) Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/) must be created locally — data/ is gitignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:13:18 +01:00
Deeman	67c048485b	Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation Phase 1A — KC=F Coffee Futures Prices: - New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV, stores as gzip CSV with SHA256-based idempotency - SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices → serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %) - Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA) - API: GET /commodities/<ticker>/prices Phase 1B — Data Methodology Page: - New /methodology route with full-page template (base.html) - 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks, data quality model, update schedule table - "Methodology" link added to marketing footer Phase 1C — Automated Pipeline: - supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in sequence before transform - Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram) ICE Warehouse Stocks: - New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV to canonical schema, hash-based idempotency, soft-fail on 404 with guidance - SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown) - Dashboard: 4 metric cards + line chart (certified bags + 30d avg) - API: GET /commodities/<code>/stocks Foundation: - dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns - macros/__init__.py: added prices_glob() and ice_stocks_glob() - pipelines.py: added extract_prices and extract_ice entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 11:41:43 +01:00
Deeman	2962bf5e3b	Fix COT pipeline: TRY_CAST nulls, dim_commodity leading zeros, correct CFTC codes - config.yaml: remove ambiguousorinvalidcolumn linter rule (false positives on read_csv TVFs) - fct_cot_positioning: use TRY_CAST throughout — CFTC uses '.' as null in many columns - raw/cot_disaggregated: add columns() declaration for 33 varchar cols - dim_commodity: switch from SEED to FULL model with SQL VALUES to preserve leading zeros Pandas auto-converts '083' → 83 even with varchar column declarations in SEED models - seeds/dim_commodity.csv: correct cftc_commodity_code from '083731' (contract market code) to '083' (3-digit CFTC commodity code); add CSV quoting - test_cot_foundation.yaml: fix output key name, vars for time range, partial: true, and correct cftc_commodity_code to '083' - analytics.py: COFFEE_CFTC_CODE '083731' → '083' to match actual data Result: serving.cot_positioning has 685 rows (2013-01-08 to 2026-02-17), 23/23 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 23:28:10 +01:00
Deeman	0a83b2cb74	Add CFTC COT data integration with foundation data model layer - New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization - SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart) - dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology - fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities - obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios - Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest] - Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart - 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 23:28:10 +01:00
Deeman	423fb8c619	Fix extract and SQLMesh pipeline to build DuckDB lakehouse extract: wrap response.content in BytesIO before passing to normalize_zipped_csv, and call .read() on the returned BytesIO before write_bytes (two bugs: wrong type in, wrong type out) sqlmesh: {{ var() }} inside SQL string literals is not substituted by SQLMesh's Jinja (SQL parser treats them as opaque strings). Replace with a @psd_glob() macro that evaluates LANDING_DIR at render time and returns a quoted glob path string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 17:02:59 +01:00
Deeman	c1d00dcdc4	Refactor to local-first architecture on Hetzner NVMe Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 19:50:19 +01:00
Deeman	2748c606e9	Add BeanFlows MVP: coffee analytics dashboard, API, and web app - Fix pipeline granularity: add market_year to cleaned/serving SQL models - Add DuckDB data access layer with async query functions (analytics.py) - Build Chart.js dashboard: supply/demand, STU ratio, top producers, YoY table - Add country comparison page with multi-select picker - Replace items CRUD with read-only commodity API (list, metrics, countries, CSV) - Configure BeanFlows plan tiers (Free/Starter/Pro) with feature gating - Rewrite public pages for coffee market intelligence positioning - Remove boilerplate items schema, update health check for DuckDB - Add test suite: 139 tests passing (dashboard, API, billing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 16:11:50 +01:00
Deeman	6d4377ccf9	cleanup and prefect service setup	2026-02-04 22:24:55 +01:00
Deeman	38897617e7	Refactor PSD extraction: simplify to latest-only + add R2 support ## Key Changes 1. Simplified extraction logic - Changed from downloading 220+ historical archives to checking only latest available month - Tries current month and falls back up to 3 months (handles USDA publication lag) - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary 2. Flat storage structure - Old: `data/{year}/{month}/{etag}.zip` - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2) - Migrated 226 existing files to flat structure 3. Dual storage modes - Local mode: Downloads to local directory (development) - R2 mode: Uploads to Cloudflare R2 (production) - Mode determined by presence of R2 environment variables - Added boto3 dependency for S3-compatible R2 API 4. Updated raw SQLMesh model - Changed pattern from `*/.zip` to `*.zip` to match flat structure ## Benefits - Simpler: Single file check instead of 220+ URL attempts - Efficient: ETag-based deduplication works naturally - Flexible: Supports both local dev and production R2 storage - Maintainable: Removed unnecessary complexity ## Testing - ✅ Local extraction works and respects ETags - ✅ Falls back correctly when current month unavailable - ✅ Linting passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-20 22:02:15 +02:00
Deeman	025dda16c6	update dedupe logic -> much faster now	2025-10-07 22:32:45 +02:00
Deeman	da89c2bf6e	update staging pipeline	2025-10-07 22:20:48 +02:00
Deeman	0a409acbea	update path	2025-09-10 18:56:32 +02:00
Deeman	85704a4bf1	Change layer naming	2025-09-10 18:46:18 +02:00
Deeman	f5f2dbc7a5	refactor	2025-08-25 20:50:25 +02:00
Simon Dmsn	5588be152b	Update 3 files - /notebooks/03_Extraction.ipynb - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql	2025-08-01 14:52:55 +00:00
Simon Dmsn	1c87488cc7	Update 4 files - /transform/sqlmesh_materia/models/staging/stg_psd_alldata.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_0.sql	2025-08-01 14:45:34 +00:00
Simon Dmsn	4ad4386ccc	Update 2 files - /transform/sqlmesh_materia/models/staging/Commodity Exchange Codes.xls - /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv	2025-08-01 14:24:26 +00:00
Simon Dmsn	918b0071b1	Update file Commodity Exchange Codes.xls	2025-08-01 14:22:01 +00:00
Deeman	91f8968990	remove comment	2025-07-31 19:48:18 +02:00
Deeman	641f794d61	fix seeds; update models	2025-07-27 22:49:37 +02:00
Deeman	c0d8f60d1c	add reference data	2025-07-27 18:28:30 +02:00
Deeman	8b5d05b3c2	raw ingest model	2025-07-27 15:40:41 +02:00
Deeman	f5c73e32c5	testing sqlmesh	2025-07-27 00:18:14 +02:00
Deeman	9baa0d185c	testing sqlmesh	2025-07-27 00:18:03 +02:00
Deeman	f0de8a505b	update projects to packages	2025-07-26 22:32:47 +02:00

27 Commits