Simplify SQLMesh to use single prod gateway with virtual environments
- Remove dev gateway (local DuckDB file no longer needed) - Single prod gateway connects to R2 Iceberg catalog - Use virtual environments for dev isolation (e.g., dev_<username>) - Update CLAUDE.md with new workflow and environment strategy - Create comprehensive transform/sqlmesh_materia/README.md Benefits: - Simpler configuration (one gateway instead of two) - All environments use same R2 Iceberg catalog - SQLMesh handles environment isolation automatically - No need to maintain local 13GB materia_dev.db file - before_all hooks only run for prod gateway (no conditional logic needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
39
CLAUDE.md
39
CLAUDE.md
@@ -55,8 +55,11 @@ SQLMesh project implementing a layered data architecture.
|
|||||||
```bash
|
```bash
|
||||||
cd transform/sqlmesh_materia
|
cd transform/sqlmesh_materia
|
||||||
|
|
||||||
# Plan changes (no prompts, auto-apply enabled in config)
|
# Local development (creates virtual environment)
|
||||||
sqlmesh plan
|
sqlmesh plan dev_<username>
|
||||||
|
|
||||||
|
# Production
|
||||||
|
sqlmesh plan prod
|
||||||
|
|
||||||
# Run tests
|
# Run tests
|
||||||
sqlmesh test
|
sqlmesh test
|
||||||
@@ -76,10 +79,17 @@ sqlmesh ui
|
|||||||
|
|
||||||
**Configuration:**
|
**Configuration:**
|
||||||
- Config: `transform/sqlmesh_materia/config.yaml`
|
- Config: `transform/sqlmesh_materia/config.yaml`
|
||||||
- Default gateway: `dev` (uses `materia_dev.db`)
|
- Single gateway: `prod` (connects to R2 Iceberg catalog)
|
||||||
- Production gateway: `prod` (uses `materia_prod.db`)
|
- Uses virtual environments for dev isolation (e.g., `dev_deeman`)
|
||||||
|
- Production uses `prod` environment
|
||||||
- Auto-apply enabled, no interactive prompts
|
- Auto-apply enabled, no interactive prompts
|
||||||
- DuckDB extensions: zipfs, httpfs, iceberg
|
- DuckDB extensions: httpfs, iceberg
|
||||||
|
|
||||||
|
**Environment Strategy:**
|
||||||
|
- All environments connect to the same R2 Iceberg catalog
|
||||||
|
- Dev environments (e.g., `dev_deeman`) are isolated virtual environments
|
||||||
|
- SQLMesh manages environment isolation and table versioning
|
||||||
|
- No local DuckDB files needed
|
||||||
|
|
||||||
### 3. Core Package (`src/materia/`)
|
### 3. Core Package (`src/materia/`)
|
||||||
Currently minimal; main logic resides in workspace packages.
|
Currently minimal; main logic resides in workspace packages.
|
||||||
@@ -254,10 +264,10 @@ Supervisor: uv run materia pipeline run <pipeline>
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### 5. Data Storage
|
#### 5. Data Storage
|
||||||
- **Dev**: Local DuckDB file (`materia_dev.db`)
|
- **All environments**: DuckDB in-memory + Cloudflare R2 Data Catalog (Iceberg REST API)
|
||||||
- **Prod**: DuckDB in-memory + Cloudflare R2 Data Catalog (Iceberg REST API)
|
|
||||||
- ACID transactions on object storage
|
- ACID transactions on object storage
|
||||||
- No persistent database on workers
|
- No persistent database on workers
|
||||||
|
- Virtual environments for dev isolation (e.g., `dev_deeman`)
|
||||||
|
|
||||||
**Execution Flow:**
|
**Execution Flow:**
|
||||||
1. Supervisor loop wakes up every 15 minutes
|
1. Supervisor loop wakes up every 15 minutes
|
||||||
@@ -299,14 +309,15 @@ Supervisor: uv run materia pipeline run <pipeline>
|
|||||||
- Leverage SQLMesh's built-in time macros (`@start_ds`, `@end_ds`)
|
- Leverage SQLMesh's built-in time macros (`@start_ds`, `@end_ds`)
|
||||||
- Keep raw layer thin, push transformations to staging+
|
- Keep raw layer thin, push transformations to staging+
|
||||||
|
|
||||||
## Database Location
|
## Data Storage
|
||||||
|
|
||||||
- **Dev database:** `materia_dev.db` (13GB, in project root)
|
All data is stored in Cloudflare R2 Data Catalog (Apache Iceberg) via REST API:
|
||||||
- **Prod database:** `materia_prod.db` (not yet created)
|
- **Production environment:** `prod`
|
||||||
|
- **Dev environments:** `dev_<username>` (virtual environments)
|
||||||
Note: The dev database is large and should not be committed to git (.gitignore already configured).
|
- SQLMesh manages environment isolation and table versioning
|
||||||
|
- No local database files needed
|
||||||
- We use a monorepo with uv workspaces
|
- We use a monorepo with uv workspaces
|
||||||
- The pulumi env is called beanflows/prod
|
- The pulumi env is called beanflows/prod
|
||||||
- NEVER hardcode secrets in plaintext
|
- NEVER hardcode secrets in plaintext
|
||||||
- Never add ssh keys to the git repo!
|
- Never add ssh keys to the git repo!
|
||||||
- If there is a simpler more direct solution and there is no other tradeoff, always choose the simpler solution
|
- If there is a simpler more direct solution and there is no other tradeoff, always choose the simpler solution
|
||||||
@@ -0,0 +1,92 @@
|
|||||||
|
# Materia SQLMesh Transform Layer
|
||||||
|
|
||||||
|
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd transform/sqlmesh_materia
|
||||||
|
|
||||||
|
# Local development (virtual environment)
|
||||||
|
sqlmesh plan dev_<username>
|
||||||
|
|
||||||
|
# Production
|
||||||
|
sqlmesh plan prod
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
sqlmesh test
|
||||||
|
|
||||||
|
# Format SQL
|
||||||
|
sqlmesh format
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Gateway Configuration
|
||||||
|
|
||||||
|
**Single Gateway:** All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
|
||||||
|
- **Production:** `sqlmesh plan prod`
|
||||||
|
- **Development:** `sqlmesh plan dev_<username>` (isolated virtual environment)
|
||||||
|
|
||||||
|
SQLMesh manages environment isolation automatically - no need for separate local databases.
|
||||||
|
|
||||||
|
### 4-Layer Data Model
|
||||||
|
|
||||||
|
See `models/README.md` for detailed architecture documentation:
|
||||||
|
|
||||||
|
1. **Raw** - Immutable source data
|
||||||
|
2. **Staging** - Schema, types, basic cleansing
|
||||||
|
3. **Cleaned** - Business logic, integration
|
||||||
|
4. **Serving** - Analytics-ready (facts, dimensions, aggregates)
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
**Config:** `config.yaml`
|
||||||
|
- DuckDB in-memory with R2 Iceberg catalog
|
||||||
|
- Extensions: httpfs, iceberg
|
||||||
|
- Auto-apply enabled (no prompts)
|
||||||
|
- Initialization hooks for R2 secret/catalog attachment
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Plan changes for dev environment
|
||||||
|
sqlmesh plan dev_yourname
|
||||||
|
|
||||||
|
# Plan changes for prod
|
||||||
|
sqlmesh plan prod
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
sqlmesh test
|
||||||
|
|
||||||
|
# Validate models
|
||||||
|
sqlmesh validate
|
||||||
|
|
||||||
|
# Run audits
|
||||||
|
sqlmesh audit
|
||||||
|
|
||||||
|
# Format SQL files
|
||||||
|
sqlmesh format
|
||||||
|
|
||||||
|
# Start web UI
|
||||||
|
sqlmesh ui
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables (Prod)
|
||||||
|
|
||||||
|
Required for production R2 Iceberg catalog:
|
||||||
|
- `CLOUDFLARE_API_TOKEN` - R2 API token
|
||||||
|
- `ICEBERG_REST_URI` - R2 catalog REST endpoint
|
||||||
|
- `R2_WAREHOUSE_NAME` - Warehouse name (default: "materia")
|
||||||
|
|
||||||
|
These are injected via Pulumi ESC (`beanflows/prod`) on the supervisor instance.
|
||||||
|
|
||||||
|
## Development Workflow
|
||||||
|
|
||||||
|
1. Make changes to models in `models/`
|
||||||
|
2. Test locally: `sqlmesh test`
|
||||||
|
3. Plan changes: `sqlmesh plan dev_yourname`
|
||||||
|
4. Review and apply changes
|
||||||
|
5. Commit and push to trigger CI/CD
|
||||||
|
|
||||||
|
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.
|
||||||
|
|||||||
@@ -1,18 +1,8 @@
|
|||||||
# --- Gateway Connection ---
|
# --- Gateway Connection ---
|
||||||
|
# Single gateway connecting to R2 Iceberg catalog
|
||||||
|
# Local dev uses virtual environments (e.g., dev_<username>)
|
||||||
|
# Production uses the 'prod' environment
|
||||||
gateways:
|
gateways:
|
||||||
|
|
||||||
dev:
|
|
||||||
connection:
|
|
||||||
# For more information on configuring the connection to your execution engine, visit:
|
|
||||||
# https://sqlmesh.readthedocs.io/en/stable/reference/configuration/#connection
|
|
||||||
# https://sqlmesh.readthedocs.io/en/stable/integrations/engines/duckdb/#connection-options
|
|
||||||
type: duckdb
|
|
||||||
database: materia_dev.db
|
|
||||||
extensions:
|
|
||||||
- name: zipfs
|
|
||||||
- name: httpfs
|
|
||||||
- name: iceberg
|
|
||||||
|
|
||||||
prod:
|
prod:
|
||||||
connection:
|
connection:
|
||||||
type: duckdb
|
type: duckdb
|
||||||
@@ -21,8 +11,7 @@ gateways:
|
|||||||
- name: httpfs
|
- name: httpfs
|
||||||
- name: iceberg
|
- name: iceberg
|
||||||
|
|
||||||
|
default_gateway: prod
|
||||||
default_gateway: dev
|
|
||||||
|
|
||||||
# --- Hooks ---
|
# --- Hooks ---
|
||||||
# Run initialization SQL before all plans/runs
|
# Run initialization SQL before all plans/runs
|
||||||
|
|||||||
Reference in New Issue
Block a user