beanflows

Author	SHA1	Message	Date
Deeman	0a83b2cb74	Add CFTC COT data integration with foundation data model layer - New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization - SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart) - dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology - fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities - obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios - Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest] - Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart - 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 23:28:10 +01:00
Deeman	423fb8c619	Fix extract and SQLMesh pipeline to build DuckDB lakehouse extract: wrap response.content in BytesIO before passing to normalize_zipped_csv, and call .read() on the returned BytesIO before write_bytes (two bugs: wrong type in, wrong type out) sqlmesh: {{ var() }} inside SQL string literals is not substituted by SQLMesh's Jinja (SQL parser treats them as opaque strings). Replace with a @psd_glob() macro that evaluates LANDING_DIR at render time and returns a quoted glob path string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 17:02:59 +01:00
Deeman	c1d00dcdc4	Refactor to local-first architecture on Hetzner NVMe Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 19:50:19 +01:00
Deeman	2748c606e9	Add BeanFlows MVP: coffee analytics dashboard, API, and web app - Fix pipeline granularity: add market_year to cleaned/serving SQL models - Add DuckDB data access layer with async query functions (analytics.py) - Build Chart.js dashboard: supply/demand, STU ratio, top producers, YoY table - Add country comparison page with multi-select picker - Replace items CRUD with read-only commodity API (list, metrics, countries, CSV) - Configure BeanFlows plan tiers (Free/Starter/Pro) with feature gating - Rewrite public pages for coffee market intelligence positioning - Remove boilerplate items schema, update health check for DuckDB - Add test suite: 139 tests passing (dashboard, API, billing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 16:11:50 +01:00
Deeman	6d4377ccf9	cleanup and prefect service setup	2026-02-04 22:24:55 +01:00
Deeman	38897617e7	Refactor PSD extraction: simplify to latest-only + add R2 support ## Key Changes 1. Simplified extraction logic - Changed from downloading 220+ historical archives to checking only latest available month - Tries current month and falls back up to 3 months (handles USDA publication lag) - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary 2. Flat storage structure - Old: `data/{year}/{month}/{etag}.zip` - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2) - Migrated 226 existing files to flat structure 3. Dual storage modes - Local mode: Downloads to local directory (development) - R2 mode: Uploads to Cloudflare R2 (production) - Mode determined by presence of R2 environment variables - Added boto3 dependency for S3-compatible R2 API 4. Updated raw SQLMesh model - Changed pattern from `*/.zip` to `*.zip` to match flat structure ## Benefits - Simpler: Single file check instead of 220+ URL attempts - Efficient: ETag-based deduplication works naturally - Flexible: Supports both local dev and production R2 storage - Maintainable: Removed unnecessary complexity ## Testing - ✅ Local extraction works and respects ETags - ✅ Falls back correctly when current month unavailable - ✅ Linting passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-20 22:02:15 +02:00
Deeman	2d248a2eef	Fix SQLMesh config to use correct Pulumi ESC env var names - Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN - Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME - Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI - Remove CREATE SCHEMA and USE statements - DuckDB has bug with Iceberg REST: missing Content-Type header - Schema creation via SQL currently not supported - Models will use fully-qualified table names instead Successfully tested with real R2 credentials: - Iceberg catalog attachment works ✓ - Plan dry-run executes ✓ - Only fails on missing source data (expected) ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:21:27 +02:00
Deeman	05ef15bfdf	Configure Iceberg catalog with proper secret reference - Add catalog ATTACH statement in before_all with SECRET parameter - References r2_secret created by connection configuration - Uses proper DuckDB ATTACH syntax per Cloudflare docs - Single-line format to avoid Jinja parsing issues - Remove manual CREATE SECRET from before_all hooks - Secret automatically created by SQLMesh from connection config - Cleaner separation: connection defines credentials, hooks use them Successfully tested - config validates without warnings. Only fails on missing env vars (expected locally). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:10:51 +02:00
Deeman	2ad344abf4	Refactor SQLMesh config to use connection-level secrets - Move Iceberg secret from before_all hook to connection.secrets - Fixes SQLMesh warning about unsupported @env_var syntax - Uses Jinja templating {{ env_var() }} instead of @env_var() - Remove database: ':memory:' (incompatible with catalogs) - DuckDB doesn't allow both database and catalogs config - Connection defaults to in-memory when no database specified - Simplify before_all hooks to only handle ATTACH and schema setup - Secret is now created automatically by SQLMesh - Cleaner separation: connection config vs runtime setup Based on: - https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/ - https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:04:25 +02:00
Deeman	120fef369a	Fix SQLMesh config and CI/CD deployment issues - Fix SQLMesh config: Add semicolons to SQL statements in before_all hooks - Resolves "unsupported syntax" warning for CREATE SECRET and ATTACH - DuckDB requires semicolons to terminate statements properly - Fix deploy:infra job: Update Pulumi authentication - Remove `pulumi login --token` (not supported in Docker image) - Use PULUMI_ACCESS_TOKEN environment variable directly - Chain commands with && to avoid "unknown command 'sh'" error - Fix deploy:supervisor job: Update esc login syntax - Change `esc login --token` to `esc login` (--token flag doesn't exist) - esc CLI reads token from PULUMI_ACCESS_TOKEN env var - Simplify Pulumi CLI installation (remove apk fallback logic) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:58:43 +02:00
Deeman	d2352c1876	Simplify SQLMesh to use single prod gateway with virtual environments - Remove dev gateway (local DuckDB file no longer needed) - Single prod gateway connects to R2 Iceberg catalog - Use virtual environments for dev isolation (e.g., dev_<username>) - Update CLAUDE.md with new workflow and environment strategy - Create comprehensive transform/sqlmesh_materia/README.md Benefits: - Simpler configuration (one gateway instead of two) - All environments use same R2 Iceberg catalog - SQLMesh handles environment isolation automatically - No need to maintain local 13GB materia_dev.db file - before_all hooks only run for prod gateway (no conditional logic needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:47:04 +02:00
Deeman	6536724e00	Fix SQLMesh config: remove invalid init_script parameter - Remove init_script from DuckDB connection config (not a valid parameter) - Move R2 Iceberg catalog initialization to before_all hooks - Hooks run before sqlmesh plan/run commands - Uses SQLMesh @env_var() macro syntax for environment variables Fixes CI/CD error: 'invalid duckdb connection config: invalid field init_script' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:31:56 +02:00
Deeman	55bb84f0fa	implement cli/infra update cicd	2025-10-12 21:00:41 +02:00
Deeman	025dda16c6	update dedupe logic -> much faster now	2025-10-07 22:32:45 +02:00
Deeman	da89c2bf6e	update staging pipeline	2025-10-07 22:20:48 +02:00
Deeman	0a409acbea	update path	2025-09-10 18:56:32 +02:00
Deeman	85704a4bf1	Change layer naming	2025-09-10 18:46:18 +02:00
Deeman	f5f2dbc7a5	refactor	2025-08-25 20:50:25 +02:00
Simon Dmsn	5588be152b	Update 3 files - /notebooks/03_Extraction.ipynb - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql	2025-08-01 14:52:55 +00:00
Simon Dmsn	1c87488cc7	Update 4 files - /transform/sqlmesh_materia/models/staging/stg_psd_alldata.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql - /transform/sqlmesh_materia/models/staging/stg_psd_alldata_0.sql	2025-08-01 14:45:34 +00:00
Simon Dmsn	82b27e7c55	Update 2 files - /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv - /transform/sqlmesh_materia/seeds/psd_codes_exchange_codes_merge.csv	2025-08-01 14:41:48 +00:00
Simon Dmsn	9d7cc4e1fb	Update file commodity_exchange_codes.csv	2025-08-01 14:26:19 +00:00
Simon Dmsn	4ad4386ccc	Update 2 files - /transform/sqlmesh_materia/models/staging/Commodity Exchange Codes.xls - /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv	2025-08-01 14:24:26 +00:00
Simon Dmsn	918b0071b1	Update file Commodity Exchange Codes.xls	2025-08-01 14:22:01 +00:00
Deeman	91f8968990	remove comment	2025-07-31 19:48:18 +02:00
Deeman	641f794d61	fix seeds; update models	2025-07-27 22:49:37 +02:00
Deeman	c0d8f60d1c	add reference data	2025-07-27 18:28:30 +02:00
Deeman	8b5d05b3c2	raw ingest model	2025-07-27 15:40:41 +02:00
Deeman	f5c73e32c5	testing sqlmesh	2025-07-27 00:18:14 +02:00
Deeman	9baa0d185c	testing sqlmesh	2025-07-27 00:18:03 +02:00
Deeman	f0de8a505b	update projects to packages	2025-07-26 22:32:47 +02:00

31 Commits