Go to file

Deeman b899bcbad4 feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention

Split the single lakehouse.duckdb into two files to eliminate the exclusive
write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader):

  lakehouse.duckdb  — SQLMesh exclusive (all pipeline layers)
  serving.duckdb    — web app reads (serving tables only, atomically swapped)

Changes:

web/src/beanflows/analytics.py
- Replace persistent global _conn with per-thread connections (threading.local)
- Add _get_conn(): opens read_only=True on first call per thread, reopens
  automatically on inode change (~1μs os.stat) to pick up atomic file swaps
- Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH
- Add module docstring documenting architecture + DuckLake migration path

web/src/beanflows/app.py
- Startup check: use SERVING_DUCKDB_PATH
- Health check: use _db_path instead of _conn

src/materia/export_serving.py (new)
- Reads all serving.* tables from lakehouse.duckdb (read_only)
- Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic)
- ~50 lines; runs after each SQLMesh transform

src/materia/pipelines.py
- Add export_serving pipeline entry (uv run python -c ...)

infra/supervisor/supervisor.sh
- Add SERVING_DUCKDB_PATH env var comment
- Add export step: uv run materia pipeline run export_serving

infra/supervisor/materia-supervisor.service
- Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb

infra/bootstrap_supervisor.sh
- Add SERVING_DUCKDB_PATH to .env template

web/.env.example + web/docker-compose.yml
- Document both env vars; switch web service to SERVING_DUCKDB_PATH

web/src/beanflows/dashboard/templates/settings.html
- Minor settings page fix from prior session

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-22 11:06:55 +01:00

.gitlab

update cicd & philosophy

2026-02-18 16:11:56 +01:00

assets

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

extract

ice_stocks: add backfill extractor for historical daily stocks

2026-02-22 01:35:57 +01:00

infra

feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention

2026-02-22 11:06:55 +01:00

notebooks

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

src/materia

feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention

2026-02-22 11:06:55 +01:00

tests

ICE extraction overhaul: API discovery + aging report + historical backfill

2026-02-21 21:13:18 +01:00

transform/sqlmesh_materia

ICE aging + by-port: serving models, API endpoints, dashboard integration

2026-02-21 21:52:35 +01:00

web

feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention

2026-02-22 11:06:55 +01:00

.gitignore

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

.mcp.json

scout: extract to standalone repo at Projects/scout

2026-02-21 17:58:03 +01:00

.python-version

Initial commit

2025-03-01 18:11:57 +01:00

CHANGELOG.md

changelog: bring up to date through Feb 2026

2026-02-21 23:22:04 +01:00

chatnotes.md

chat notes

2025-04-01 18:33:40 +02:00

CLAUDE.md

Refactor to local-first architecture on Hetzner NVMe

2026-02-18 19:50:19 +01:00

coding_philosophy.md

Add scout MCP server for browser recon + msgspec workspace dep

2026-02-21 15:44:02 +01:00

market_overview.md

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

materia.drawio

add prototype ui

2025-04-01 20:26:45 +02:00

pyproject.toml

ICE extraction overhaul: API discovery + aging report + historical backfill

2026-02-21 21:13:18 +01:00

readme.md

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

single_server_arch.excalidraw

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

uv.lock

ICE extraction overhaul: API discovery + aging report + historical backfill

2026-02-21 21:13:18 +01:00

vision.md

dashboard: JTBD-driven restructure — Pulse, Supply, Positioning, Warehouse

2026-02-22 01:27:44 +01:00

readme.md

Materia

A commodity data analytics platform built on a modern data engineering stack. Extracts agricultural commodity data from USDA PSD Online, transforms it through a layered SQL pipeline using SQLMesh, and stores it in DuckDB + Cloudflare R2 for analysis.

Tech Stack

Python 3.13 with uv package manager
SQLMesh for SQL transformation and orchestration
DuckDB as the analytical database
Cloudflare R2 (Iceberg) for data storage
Pulumi ESC for secrets management
Hetzner Cloud for infrastructure

Quick Start

1. Install UV

UV is our Python package manager for faster, more reliable dependency management.

curl -LsSf https://astral.sh/uv/install.sh | sh

📚 UV Documentation

2. Install Dependencies

uv sync

This installs Python and all dependencies declared in pyproject.toml.

3. Setup Pre-commit Hooks

pre-commit install

This enables automatic linting with ruff on every commit.

4. Install Pulumi ESC (for running with secrets)

# Install ESC CLI
curl -fsSL https://get.pulumi.com/esc/install.sh | sh

# Login
esc login

Project Structure

This is a uv workspace with three main packages:

Extract Layer (`extract/`)

psdonline - Extracts USDA PSD commodity data

# Local development (downloads to local directory)
uv run extract_psd

# Production (uploads to R2)
esc run beanflows/prod -- uv run extract_psd

Transform Layer (`transform/sqlmesh_materia/`)

SQLMesh project implementing a 4-layer data architecture (raw → staging → cleaned → serving).

All commands run from project root with -p transform/sqlmesh_materia:

# Local development
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan dev_<username>

# Production
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan prod

# Run tests (no secrets needed)
uv run sqlmesh -p transform/sqlmesh_materia test

# Format SQL
uv run sqlmesh -p transform/sqlmesh_materia format

Core Package (`src/materia/`)

CLI for managing infrastructure and pipelines (currently minimal).

Development Workflow

Adding Dependencies

For workspace root:

uv add <package-name>

For specific package:

uv add --package psdonline <package-name>

Linting and Formatting

# Check for issues
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

Running Tests

# Python tests
uv run pytest tests/ -v --cov=src/materia

# SQLMesh tests
uv run sqlmesh -p transform/sqlmesh_materia test

Secrets Management

All secrets are managed via Pulumi ESC environment beanflows/prod.

Load secrets into shell:

eval $(esc env open beanflows/prod --format shell)

Run commands with secrets:

# Single command
esc run beanflows/prod -- uv run extract_psd

# Multiple commands
esc run beanflows/prod -- bash -c "
  uv run extract_psd
  uv run sqlmesh -p transform/sqlmesh_materia plan prod
"

Production Architecture

Git-Based Deployment

Supervisor (Hetzner CPX11): Always-on orchestrator that pulls latest code every 15 minutes
Workers (Ephemeral): Created on-demand for each pipeline run, destroyed after completion
Storage: Cloudflare R2 Data Catalog (Apache Iceberg REST API)

CI/CD Pipeline

GitLab CI runs on every push to master:

Lint - ruff check
Test - pytest + SQLMesh tests
Deploy - Updates supervisor infrastructure and bootstraps if needed

No build artifacts - supervisor pulls code directly from git!

Architecture Principles

Simplicity First - Avoid unnecessary abstractions
Data-Oriented Design - Identify data by content, not metadata
Cost Optimization - Ephemeral workers, minimal always-on infrastructure
Inspectable - Easy to understand, test locally, and debug

Resources

Architecture Plans: See .claude/plans/ for design decisions
UV Docs: https://docs.astral.sh/uv/
SQLMesh Docs: https://sqlmesh.readthedocs.io/

Languages

Python 50.8%

HTML 33.7%

Jupyter Notebook 8.3%

Shell 3.6%

CSS 2.9%

Other 0.7%

readme.md

Materia

Tech Stack

Quick Start

1. Install UV

2. Install Dependencies

3. Setup Pre-commit Hooks

4. Install Pulumi ESC (for running with secrets)

Project Structure

Extract Layer (extract/)

Transform Layer (transform/sqlmesh_materia/)

Core Package (src/materia/)

Development Workflow

Adding Dependencies

Linting and Formatting

Running Tests

Secrets Management

Load secrets into shell:

Run commands with secrets:

Production Architecture

Git-Based Deployment

CI/CD Pipeline

Architecture Principles

Resources

Extract Layer (`extract/`)

Transform Layer (`transform/sqlmesh_materia/`)

Core Package (`src/materia/`)