beanflows

Author	SHA1	Message	Date
Deeman	2962bf5e3b	Fix COT pipeline: TRY_CAST nulls, dim_commodity leading zeros, correct CFTC codes - config.yaml: remove ambiguousorinvalidcolumn linter rule (false positives on read_csv TVFs) - fct_cot_positioning: use TRY_CAST throughout — CFTC uses '.' as null in many columns - raw/cot_disaggregated: add columns() declaration for 33 varchar cols - dim_commodity: switch from SEED to FULL model with SQL VALUES to preserve leading zeros Pandas auto-converts '083' → 83 even with varchar column declarations in SEED models - seeds/dim_commodity.csv: correct cftc_commodity_code from '083731' (contract market code) to '083' (3-digit CFTC commodity code); add CSV quoting - test_cot_foundation.yaml: fix output key name, vars for time range, partial: true, and correct cftc_commodity_code to '083' - analytics.py: COFFEE_CFTC_CODE '083731' → '083' to match actual data Result: serving.cot_positioning has 685 rows (2013-01-08 to 2026-02-17), 23/23 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 23:28:10 +01:00
Deeman	c1d00dcdc4	Refactor to local-first architecture on Hetzner NVMe Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 19:50:19 +01:00
Deeman	6d4377ccf9	cleanup and prefect service setup	2026-02-04 22:24:55 +01:00
Deeman	2d248a2eef	Fix SQLMesh config to use correct Pulumi ESC env var names - Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN - Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME - Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI - Remove CREATE SCHEMA and USE statements - DuckDB has bug with Iceberg REST: missing Content-Type header - Schema creation via SQL currently not supported - Models will use fully-qualified table names instead Successfully tested with real R2 credentials: - Iceberg catalog attachment works ✓ - Plan dry-run executes ✓ - Only fails on missing source data (expected) ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:21:27 +02:00
Deeman	05ef15bfdf	Configure Iceberg catalog with proper secret reference - Add catalog ATTACH statement in before_all with SECRET parameter - References r2_secret created by connection configuration - Uses proper DuckDB ATTACH syntax per Cloudflare docs - Single-line format to avoid Jinja parsing issues - Remove manual CREATE SECRET from before_all hooks - Secret automatically created by SQLMesh from connection config - Cleaner separation: connection defines credentials, hooks use them Successfully tested - config validates without warnings. Only fails on missing env vars (expected locally). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:10:51 +02:00
Deeman	2ad344abf4	Refactor SQLMesh config to use connection-level secrets - Move Iceberg secret from before_all hook to connection.secrets - Fixes SQLMesh warning about unsupported @env_var syntax - Uses Jinja templating {{ env_var() }} instead of @env_var() - Remove database: ':memory:' (incompatible with catalogs) - DuckDB doesn't allow both database and catalogs config - Connection defaults to in-memory when no database specified - Simplify before_all hooks to only handle ATTACH and schema setup - Secret is now created automatically by SQLMesh - Cleaner separation: connection config vs runtime setup Based on: - https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/ - https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 22:04:25 +02:00
Deeman	120fef369a	Fix SQLMesh config and CI/CD deployment issues - Fix SQLMesh config: Add semicolons to SQL statements in before_all hooks - Resolves "unsupported syntax" warning for CREATE SECRET and ATTACH - DuckDB requires semicolons to terminate statements properly - Fix deploy:infra job: Update Pulumi authentication - Remove `pulumi login --token` (not supported in Docker image) - Use PULUMI_ACCESS_TOKEN environment variable directly - Chain commands with && to avoid "unknown command 'sh'" error - Fix deploy:supervisor job: Update esc login syntax - Change `esc login --token` to `esc login` (--token flag doesn't exist) - esc CLI reads token from PULUMI_ACCESS_TOKEN env var - Simplify Pulumi CLI installation (remove apk fallback logic) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:58:43 +02:00
Deeman	d2352c1876	Simplify SQLMesh to use single prod gateway with virtual environments - Remove dev gateway (local DuckDB file no longer needed) - Single prod gateway connects to R2 Iceberg catalog - Use virtual environments for dev isolation (e.g., dev_<username>) - Update CLAUDE.md with new workflow and environment strategy - Create comprehensive transform/sqlmesh_materia/README.md Benefits: - Simpler configuration (one gateway instead of two) - All environments use same R2 Iceberg catalog - SQLMesh handles environment isolation automatically - No need to maintain local 13GB materia_dev.db file - before_all hooks only run for prod gateway (no conditional logic needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:47:04 +02:00
Deeman	6536724e00	Fix SQLMesh config: remove invalid init_script parameter - Remove init_script from DuckDB connection config (not a valid parameter) - Move R2 Iceberg catalog initialization to before_all hooks - Hooks run before sqlmesh plan/run commands - Uses SQLMesh @env_var() macro syntax for environment variables Fixes CI/CD error: 'invalid duckdb connection config: invalid field init_script' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:31:56 +02:00
Deeman	55bb84f0fa	implement cli/infra update cicd	2025-10-12 21:00:41 +02:00
Deeman	da89c2bf6e	update staging pipeline	2025-10-07 22:20:48 +02:00
Deeman	8b5d05b3c2	raw ingest model	2025-07-27 15:40:41 +02:00
Deeman	9baa0d185c	testing sqlmesh	2025-07-27 00:18:03 +02:00
Deeman	f0de8a505b	update projects to packages	2025-07-26 22:32:47 +02:00

14 Commits