Go to file

Deeman 0317cb885f feat(infra): use beanflows_service for supervisor

- materia-supervisor.service: User=root → User=beanflows_service,
  add PATH so uv (~/.local/bin) is found without a login shell
- setup_server.sh: full rewrite — creates beanflows_service (nologin),
  generates SSH deploy key + age keypair as service user at XDG path
  (~/.config/sops/age/keys.txt), installs age/sops/rclone as root,
  prints both public keys + numbered next-step instructions
- bootstrap_supervisor.sh: full rewrite — removes GITLAB_READ_TOKEN
  requirement, clones via SSH as service user, installs uv as service
  user, decrypts with SOPS auto-discovery, uv sync as service user,
  systemctl as root
- web/deploy.sh: remove self-contained sops/age install + keypair
  generation; replace with simple sops check (exit if missing) and
  SOPS auto-discovery decrypt (no explicit key file needed)
- infra/readme.md: update architecture diagram for beanflows_service
  paths, update setup steps to match new scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-26 21:33:31 +01:00

.gitlab

feat(ci): switch to pull-based deploy via git tags

2026-02-26 11:10:06 +01:00

assets

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

extract

feat(extract): replace OpenWeatherMap with Open-Meteo weather extractor

2026-02-26 00:59:54 +01:00

infra

feat(infra): use beanflows_service for supervisor

2026-02-26 21:33:31 +01:00

notebooks

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

research

add untracked

2026-02-26 02:44:48 +01:00

src/materia

feat(supervisor): port Python supervisor from padelnomics + workflows.toml

2026-02-26 11:59:55 +01:00

tests

feat: extraction framework overhaul — extract_core shared package + SQLite state tracking

2026-02-22 14:37:50 +01:00

transform/sqlmesh_materia

add untracked

2026-02-26 02:44:48 +01:00

web

feat(infra): use beanflows_service for supervisor

2026-02-26 21:33:31 +01:00

.env.dev.sops

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

.env.prod.sops

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

.gitignore

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

.mcp.json

scout: extract to standalone repo at Projects/scout

2026-02-21 17:58:03 +01:00

.python-version

Initial commit

2025-03-01 18:11:57 +01:00

.sops.yaml

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

CHANGELOG.md

changelog: bring up to date through Feb 2026

2026-02-21 23:22:04 +01:00

chatnotes.md

chat notes

2025-04-01 18:33:40 +02:00

CLAUDE.md

docs(claude+infra): expand CLAUDE.md + infra/readme.md for full architecture

2026-02-26 12:04:55 +01:00

coding_philosophy.md

Add scout MCP server for browser recon + msgspec workspace dep

2026-02-21 15:44:02 +01:00

Makefile

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

materia.drawio

add prototype ui

2025-04-01 20:26:45 +02:00

pyproject.toml

feat(supervisor): port Python supervisor from padelnomics + workflows.toml

2026-02-26 11:59:55 +01:00

readme.md

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

single_server_arch.excalidraw

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

uv.lock

feat(supervisor): port Python supervisor from padelnomics + workflows.toml

2026-02-26 11:59:55 +01:00

vision.md

dashboard: JTBD-driven restructure — Pulse, Supply, Positioning, Warehouse

2026-02-22 01:27:44 +01:00

readme.md

Materia

A commodity data analytics platform built on a modern data engineering stack. Extracts agricultural commodity data from USDA PSD Online, transforms it through a layered SQL pipeline using SQLMesh, and stores it in DuckDB + Cloudflare R2 for analysis.

Tech Stack

Python 3.13 with uv package manager
SQLMesh for SQL transformation and orchestration
DuckDB as the analytical database
Cloudflare R2 (Iceberg) for data storage
Pulumi ESC for secrets management
Hetzner Cloud for infrastructure

Quick Start

1. Install UV

UV is our Python package manager for faster, more reliable dependency management.

curl -LsSf https://astral.sh/uv/install.sh | sh

📚 UV Documentation

2. Install Dependencies

uv sync

This installs Python and all dependencies declared in pyproject.toml.

3. Setup Pre-commit Hooks

pre-commit install

This enables automatic linting with ruff on every commit.

4. Install Pulumi ESC (for running with secrets)

# Install ESC CLI
curl -fsSL https://get.pulumi.com/esc/install.sh | sh

# Login
esc login

Project Structure

This is a uv workspace with three main packages:

Extract Layer (`extract/`)

psdonline - Extracts USDA PSD commodity data

# Local development (downloads to local directory)
uv run extract_psd

# Production (uploads to R2)
esc run beanflows/prod -- uv run extract_psd

Transform Layer (`transform/sqlmesh_materia/`)

SQLMesh project implementing a 4-layer data architecture (raw → staging → cleaned → serving).

All commands run from project root with -p transform/sqlmesh_materia:

# Local development
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan dev_<username>

# Production
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan prod

# Run tests (no secrets needed)
uv run sqlmesh -p transform/sqlmesh_materia test

# Format SQL
uv run sqlmesh -p transform/sqlmesh_materia format

Core Package (`src/materia/`)

CLI for managing infrastructure and pipelines (currently minimal).

Development Workflow

Adding Dependencies

For workspace root:

uv add <package-name>

For specific package:

uv add --package psdonline <package-name>

Linting and Formatting

# Check for issues
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

Running Tests

# Python tests
uv run pytest tests/ -v --cov=src/materia

# SQLMesh tests
uv run sqlmesh -p transform/sqlmesh_materia test

Secrets Management

All secrets are managed via Pulumi ESC environment beanflows/prod.

Load secrets into shell:

eval $(esc env open beanflows/prod --format shell)

Run commands with secrets:

# Single command
esc run beanflows/prod -- uv run extract_psd

# Multiple commands
esc run beanflows/prod -- bash -c "
  uv run extract_psd
  uv run sqlmesh -p transform/sqlmesh_materia plan prod
"

Production Architecture

Git-Based Deployment

Supervisor (Hetzner CPX11): Always-on orchestrator that pulls latest code every 15 minutes
Workers (Ephemeral): Created on-demand for each pipeline run, destroyed after completion
Storage: Cloudflare R2 Data Catalog (Apache Iceberg REST API)

CI/CD Pipeline

GitLab CI runs on every push to master:

Lint - ruff check
Test - pytest + SQLMesh tests
Deploy - Updates supervisor infrastructure and bootstraps if needed

No build artifacts - supervisor pulls code directly from git!

Architecture Principles

Simplicity First - Avoid unnecessary abstractions
Data-Oriented Design - Identify data by content, not metadata
Cost Optimization - Ephemeral workers, minimal always-on infrastructure
Inspectable - Easy to understand, test locally, and debug

Resources

Architecture Plans: See .claude/plans/ for design decisions
UV Docs: https://docs.astral.sh/uv/
SQLMesh Docs: https://sqlmesh.readthedocs.io/

Languages

Python 50.8%

HTML 33.7%

Jupyter Notebook 8.3%

Shell 3.6%

CSS 2.9%

Other 0.7%

readme.md

Materia

Tech Stack

Quick Start

1. Install UV

2. Install Dependencies

3. Setup Pre-commit Hooks

4. Install Pulumi ESC (for running with secrets)

Project Structure

Extract Layer (extract/)

Transform Layer (transform/sqlmesh_materia/)

Core Package (src/materia/)

Development Workflow

Adding Dependencies

Linting and Formatting

Running Tests

Secrets Management

Load secrets into shell:

Run commands with secrets:

Production Architecture

Git-Based Deployment

CI/CD Pipeline

Architecture Principles

Resources

Extract Layer (`extract/`)

Transform Layer (`transform/sqlmesh_materia/`)

Core Package (`src/materia/`)