Files
beanflows/transform/sqlmesh_materia/readme.md
2026-02-05 20:01:50 +01:00

93 lines
2.0 KiB
Markdown

# Materia SQLMesh Transform Layer
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
## Quick Start
```bash
cd transform/sqlmesh_materia
# Local development (virtual environment)
sqlmesh plan dev_<username>
# Production
sqlmesh plan prod
# Run tests
sqlmesh test
# Format SQL
sqlmesh format
```
## Architecture
### Gateway Configuration
**Single Gateway:** All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
- **Production:** `sqlmesh plan prod`
- **Development:** `sqlmesh plan dev_<username>` (isolated virtual environment)
SQLMesh manages environment isolation automatically - no need for separate local databases.
### 4-Layer Data Model
See `models/README.md` for detailed architecture documentation:
1. **Raw** - Immutable source data
2. **Staging** - Schema, types, basic cleansing
3. **Cleaned** - Business logic, integration
4. **Serving** - Analytics-ready (facts, dimensions, aggregates)
## Configuration
**Config:** `config.yaml`
- DuckDB in-memory with R2 Iceberg catalog
- Extensions: httpfs, iceberg
- Auto-apply enabled (no prompts)
- Initialization hooks for R2 secret/catalog attachment
## Commands
```bash
# Plan changes for dev environment
sqlmesh plan dev_yourname
# Plan changes for prod
sqlmesh plan prod
# Run tests
sqlmesh test
# Validate models
sqlmesh validate
# Run audits
sqlmesh audit
# Format SQL files
sqlmesh format
# Start web UI
sqlmesh ui
```
## Environment Variables (Prod)
Required for production R2 Iceberg catalog:
- `CLOUDFLARE_API_TOKEN` - R2 API token
- `ICEBERG_REST_URI` - R2 catalog REST endpoint
- `R2_WAREHOUSE_NAME` - Warehouse name (default: "materia")
These are injected via Pulumi ESC (`beanflows/prod`) on the supervisor instance.
## Development Workflow
1. Make changes to models in `models/`
2. Test locally: `sqlmesh test`
3. Plan changes: `sqlmesh plan dev_yourname`
4. Review and apply changes
5. Commit and push to trigger CI/CD
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.