Split the single lakehouse.duckdb into two files to eliminate the exclusive write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader): lakehouse.duckdb — SQLMesh exclusive (all pipeline layers) serving.duckdb — web app reads (serving tables only, atomically swapped) Changes: web/src/beanflows/analytics.py - Replace persistent global _conn with per-thread connections (threading.local) - Add _get_conn(): opens read_only=True on first call per thread, reopens automatically on inode change (~1μs os.stat) to pick up atomic file swaps - Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH - Add module docstring documenting architecture + DuckLake migration path web/src/beanflows/app.py - Startup check: use SERVING_DUCKDB_PATH - Health check: use _db_path instead of _conn src/materia/export_serving.py (new) - Reads all serving.* tables from lakehouse.duckdb (read_only) - Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic) - ~50 lines; runs after each SQLMesh transform src/materia/pipelines.py - Add export_serving pipeline entry (uv run python -c ...) infra/supervisor/supervisor.sh - Add SERVING_DUCKDB_PATH env var comment - Add export step: uv run materia pipeline run export_serving infra/supervisor/materia-supervisor.service - Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb infra/bootstrap_supervisor.sh - Add SERVING_DUCKDB_PATH to .env template web/.env.example + web/docker-compose.yml - Document both env vars; switch web service to SERVING_DUCKDB_PATH web/src/beanflows/dashboard/templates/settings.html - Minor settings page fix from prior session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Materia
A commodity data analytics platform built on a modern data engineering stack. Extracts agricultural commodity data from USDA PSD Online, transforms it through a layered SQL pipeline using SQLMesh, and stores it in DuckDB + Cloudflare R2 for analysis.
Tech Stack
- Python 3.13 with
uvpackage manager - SQLMesh for SQL transformation and orchestration
- DuckDB as the analytical database
- Cloudflare R2 (Iceberg) for data storage
- Pulumi ESC for secrets management
- Hetzner Cloud for infrastructure
Quick Start
1. Install UV
UV is our Python package manager for faster, more reliable dependency management.
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Install Dependencies
uv sync
This installs Python and all dependencies declared in pyproject.toml.
3. Setup Pre-commit Hooks
pre-commit install
This enables automatic linting with ruff on every commit.
4. Install Pulumi ESC (for running with secrets)
# Install ESC CLI
curl -fsSL https://get.pulumi.com/esc/install.sh | sh
# Login
esc login
Project Structure
This is a uv workspace with three main packages:
Extract Layer (extract/)
psdonline - Extracts USDA PSD commodity data
# Local development (downloads to local directory)
uv run extract_psd
# Production (uploads to R2)
esc run beanflows/prod -- uv run extract_psd
Transform Layer (transform/sqlmesh_materia/)
SQLMesh project implementing a 4-layer data architecture (raw → staging → cleaned → serving).
All commands run from project root with -p transform/sqlmesh_materia:
# Local development
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan dev_<username>
# Production
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan prod
# Run tests (no secrets needed)
uv run sqlmesh -p transform/sqlmesh_materia test
# Format SQL
uv run sqlmesh -p transform/sqlmesh_materia format
Core Package (src/materia/)
CLI for managing infrastructure and pipelines (currently minimal).
Development Workflow
Adding Dependencies
For workspace root:
uv add <package-name>
For specific package:
uv add --package psdonline <package-name>
Linting and Formatting
# Check for issues
ruff check .
# Auto-fix issues
ruff check --fix .
# Format code
ruff format .
Running Tests
# Python tests
uv run pytest tests/ -v --cov=src/materia
# SQLMesh tests
uv run sqlmesh -p transform/sqlmesh_materia test
Secrets Management
All secrets are managed via Pulumi ESC environment beanflows/prod.
Load secrets into shell:
eval $(esc env open beanflows/prod --format shell)
Run commands with secrets:
# Single command
esc run beanflows/prod -- uv run extract_psd
# Multiple commands
esc run beanflows/prod -- bash -c "
uv run extract_psd
uv run sqlmesh -p transform/sqlmesh_materia plan prod
"
Production Architecture
Git-Based Deployment
- Supervisor (Hetzner CPX11): Always-on orchestrator that pulls latest code every 15 minutes
- Workers (Ephemeral): Created on-demand for each pipeline run, destroyed after completion
- Storage: Cloudflare R2 Data Catalog (Apache Iceberg REST API)
CI/CD Pipeline
GitLab CI runs on every push to master:
- Lint -
ruff check - Test - pytest + SQLMesh tests
- Deploy - Updates supervisor infrastructure and bootstraps if needed
No build artifacts - supervisor pulls code directly from git!
Architecture Principles
- Simplicity First - Avoid unnecessary abstractions
- Data-Oriented Design - Identify data by content, not metadata
- Cost Optimization - Ephemeral workers, minimal always-on infrastructure
- Inspectable - Easy to understand, test locally, and debug
Resources
- Architecture Plans: See
.claude/plans/for design decisions - UV Docs: https://docs.astral.sh/uv/
- SQLMesh Docs: https://sqlmesh.readthedocs.io/