Simplify SQLMesh to use single prod gateway with virtual environments
- Remove dev gateway (local DuckDB file no longer needed) - Single prod gateway connects to R2 Iceberg catalog - Use virtual environments for dev isolation (e.g., dev_<username>) - Update CLAUDE.md with new workflow and environment strategy - Create comprehensive transform/sqlmesh_materia/README.md Benefits: - Simpler configuration (one gateway instead of two) - All environments use same R2 Iceberg catalog - SQLMesh handles environment isolation automatically - No need to maintain local 13GB materia_dev.db file - before_all hooks only run for prod gateway (no conditional logic needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
# Materia SQLMesh Transform Layer
|
||||
|
||||
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd transform/sqlmesh_materia
|
||||
|
||||
# Local development (virtual environment)
|
||||
sqlmesh plan dev_<username>
|
||||
|
||||
# Production
|
||||
sqlmesh plan prod
|
||||
|
||||
# Run tests
|
||||
sqlmesh test
|
||||
|
||||
# Format SQL
|
||||
sqlmesh format
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Gateway Configuration
|
||||
|
||||
**Single Gateway:** All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
|
||||
- **Production:** `sqlmesh plan prod`
|
||||
- **Development:** `sqlmesh plan dev_<username>` (isolated virtual environment)
|
||||
|
||||
SQLMesh manages environment isolation automatically - no need for separate local databases.
|
||||
|
||||
### 4-Layer Data Model
|
||||
|
||||
See `models/README.md` for detailed architecture documentation:
|
||||
|
||||
1. **Raw** - Immutable source data
|
||||
2. **Staging** - Schema, types, basic cleansing
|
||||
3. **Cleaned** - Business logic, integration
|
||||
4. **Serving** - Analytics-ready (facts, dimensions, aggregates)
|
||||
|
||||
## Configuration
|
||||
|
||||
**Config:** `config.yaml`
|
||||
- DuckDB in-memory with R2 Iceberg catalog
|
||||
- Extensions: httpfs, iceberg
|
||||
- Auto-apply enabled (no prompts)
|
||||
- Initialization hooks for R2 secret/catalog attachment
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
# Plan changes for dev environment
|
||||
sqlmesh plan dev_yourname
|
||||
|
||||
# Plan changes for prod
|
||||
sqlmesh plan prod
|
||||
|
||||
# Run tests
|
||||
sqlmesh test
|
||||
|
||||
# Validate models
|
||||
sqlmesh validate
|
||||
|
||||
# Run audits
|
||||
sqlmesh audit
|
||||
|
||||
# Format SQL files
|
||||
sqlmesh format
|
||||
|
||||
# Start web UI
|
||||
sqlmesh ui
|
||||
```
|
||||
|
||||
## Environment Variables (Prod)
|
||||
|
||||
Required for production R2 Iceberg catalog:
|
||||
- `CLOUDFLARE_API_TOKEN` - R2 API token
|
||||
- `ICEBERG_REST_URI` - R2 catalog REST endpoint
|
||||
- `R2_WAREHOUSE_NAME` - Warehouse name (default: "materia")
|
||||
|
||||
These are injected via Pulumi ESC (`beanflows/prod`) on the supervisor instance.
|
||||
|
||||
## Development Workflow
|
||||
|
||||
1. Make changes to models in `models/`
|
||||
2. Test locally: `sqlmesh test`
|
||||
3. Plan changes: `sqlmesh plan dev_yourname`
|
||||
4. Review and apply changes
|
||||
5. Commit and push to trigger CI/CD
|
||||
|
||||
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.
|
||||
|
||||
@@ -1,18 +1,8 @@
|
||||
# --- Gateway Connection ---
|
||||
# Single gateway connecting to R2 Iceberg catalog
|
||||
# Local dev uses virtual environments (e.g., dev_<username>)
|
||||
# Production uses the 'prod' environment
|
||||
gateways:
|
||||
|
||||
dev:
|
||||
connection:
|
||||
# For more information on configuring the connection to your execution engine, visit:
|
||||
# https://sqlmesh.readthedocs.io/en/stable/reference/configuration/#connection
|
||||
# https://sqlmesh.readthedocs.io/en/stable/integrations/engines/duckdb/#connection-options
|
||||
type: duckdb
|
||||
database: materia_dev.db
|
||||
extensions:
|
||||
- name: zipfs
|
||||
- name: httpfs
|
||||
- name: iceberg
|
||||
|
||||
prod:
|
||||
connection:
|
||||
type: duckdb
|
||||
@@ -21,8 +11,7 @@ gateways:
|
||||
- name: httpfs
|
||||
- name: iceberg
|
||||
|
||||
|
||||
default_gateway: dev
|
||||
default_gateway: prod
|
||||
|
||||
# --- Hooks ---
|
||||
# Run initialization SQL before all plans/runs
|
||||
|
||||
Reference in New Issue
Block a user