# Materia SQLMesh Transform Layer Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture. ## Quick Start ```bash cd transform/sqlmesh_materia # Local development (virtual environment) sqlmesh plan dev_ # Production sqlmesh plan prod # Run tests sqlmesh test # Format SQL sqlmesh format ``` ## Architecture ### Gateway Configuration **Single Gateway:** All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg) - **Production:** `sqlmesh plan prod` - **Development:** `sqlmesh plan dev_` (isolated virtual environment) SQLMesh manages environment isolation automatically - no need for separate local databases. ### 4-Layer Data Model See `models/README.md` for detailed architecture documentation: 1. **Raw** - Immutable source data 2. **Staging** - Schema, types, basic cleansing 3. **Cleaned** - Business logic, integration 4. **Serving** - Analytics-ready (facts, dimensions, aggregates) ## Configuration **Config:** `config.yaml` - DuckDB in-memory with R2 Iceberg catalog - Extensions: httpfs, iceberg - Auto-apply enabled (no prompts) - Initialization hooks for R2 secret/catalog attachment ## Commands ```bash # Plan changes for dev environment sqlmesh plan dev_yourname # Plan changes for prod sqlmesh plan prod # Run tests sqlmesh test # Validate models sqlmesh validate # Run audits sqlmesh audit # Format SQL files sqlmesh format # Start web UI sqlmesh ui ``` ## Environment Variables (Prod) Required for production R2 Iceberg catalog: - `CLOUDFLARE_API_TOKEN` - R2 API token - `ICEBERG_REST_URI` - R2 catalog REST endpoint - `R2_WAREHOUSE_NAME` - Warehouse name (default: "materia") These are injected via Pulumi ESC (`beanflows/prod`) on the supervisor instance. ## Development Workflow 1. Make changes to models in `models/` 2. Test locally: `sqlmesh test` 3. Plan changes: `sqlmesh plan dev_yourname` 4. Review and apply changes 5. Commit and push to trigger CI/CD SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.