- New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization - SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart) - dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology - fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities - obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios - Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest] - Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart - 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Materia SQLMesh Transform Layer
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
Quick Start
cd transform/sqlmesh_materia
# Local development (virtual environment)
sqlmesh plan dev_<username>
# Production
sqlmesh plan prod
# Run tests
sqlmesh test
# Format SQL
sqlmesh format
Architecture
Gateway Configuration
Single Gateway: All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
- Production:
sqlmesh plan prod - Development:
sqlmesh plan dev_<username>(isolated virtual environment)
SQLMesh manages environment isolation automatically - no need for separate local databases.
4-Layer Data Model
See models/README.md for detailed architecture documentation:
- Raw - Immutable source data
- Staging - Schema, types, basic cleansing
- Cleaned - Business logic, integration
- Serving - Analytics-ready (facts, dimensions, aggregates)
Configuration
Config: config.yaml
- DuckDB in-memory with R2 Iceberg catalog
- Extensions: httpfs, iceberg
- Auto-apply enabled (no prompts)
- Initialization hooks for R2 secret/catalog attachment
Commands
# Plan changes for dev environment
sqlmesh plan dev_yourname
# Plan changes for prod
sqlmesh plan prod
# Run tests
sqlmesh test
# Validate models
sqlmesh validate
# Run audits
sqlmesh audit
# Format SQL files
sqlmesh format
# Start web UI
sqlmesh ui
Environment Variables (Prod)
Required for production R2 Iceberg catalog:
CLOUDFLARE_API_TOKEN- R2 API tokenICEBERG_REST_URI- R2 catalog REST endpointR2_WAREHOUSE_NAME- Warehouse name (default: "materia")
These are injected via Pulumi ESC (beanflows/prod) on the supervisor instance.
Development Workflow
- Make changes to models in
models/ - Test locally:
sqlmesh test - Plan changes:
sqlmesh plan dev_yourname - Review and apply changes
- Commit and push to trigger CI/CD
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.