93 lines
2.0 KiB
Markdown
93 lines
2.0 KiB
Markdown
# Materia SQLMesh Transform Layer
|
|
|
|
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
cd transform/sqlmesh_materia
|
|
|
|
# Local development (virtual environment)
|
|
sqlmesh plan dev_<username>
|
|
|
|
# Production
|
|
sqlmesh plan prod
|
|
|
|
# Run tests
|
|
sqlmesh test
|
|
|
|
# Format SQL
|
|
sqlmesh format
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Gateway Configuration
|
|
|
|
**Single Gateway:** All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
|
|
- **Production:** `sqlmesh plan prod`
|
|
- **Development:** `sqlmesh plan dev_<username>` (isolated virtual environment)
|
|
|
|
SQLMesh manages environment isolation automatically - no need for separate local databases.
|
|
|
|
### 4-Layer Data Model
|
|
|
|
See `models/README.md` for detailed architecture documentation:
|
|
|
|
1. **Raw** - Immutable source data
|
|
2. **Staging** - Schema, types, basic cleansing
|
|
3. **Cleaned** - Business logic, integration
|
|
4. **Serving** - Analytics-ready (facts, dimensions, aggregates)
|
|
|
|
## Configuration
|
|
|
|
**Config:** `config.yaml`
|
|
- DuckDB in-memory with R2 Iceberg catalog
|
|
- Extensions: httpfs, iceberg
|
|
- Auto-apply enabled (no prompts)
|
|
- Initialization hooks for R2 secret/catalog attachment
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Plan changes for dev environment
|
|
sqlmesh plan dev_yourname
|
|
|
|
# Plan changes for prod
|
|
sqlmesh plan prod
|
|
|
|
# Run tests
|
|
sqlmesh test
|
|
|
|
# Validate models
|
|
sqlmesh validate
|
|
|
|
# Run audits
|
|
sqlmesh audit
|
|
|
|
# Format SQL files
|
|
sqlmesh format
|
|
|
|
# Start web UI
|
|
sqlmesh ui
|
|
```
|
|
|
|
## Environment Variables (Prod)
|
|
|
|
Required for production R2 Iceberg catalog:
|
|
- `CLOUDFLARE_API_TOKEN` - R2 API token
|
|
- `ICEBERG_REST_URI` - R2 catalog REST endpoint
|
|
- `R2_WAREHOUSE_NAME` - Warehouse name (default: "materia")
|
|
|
|
These are injected via Pulumi ESC (`beanflows/prod`) on the supervisor instance.
|
|
|
|
## Development Workflow
|
|
|
|
1. Make changes to models in `models/`
|
|
2. Test locally: `sqlmesh test`
|
|
3. Plan changes: `sqlmesh plan dev_yourname`
|
|
4. Review and apply changes
|
|
5. Commit and push to trigger CI/CD
|
|
|
|
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.
|