deemanone/beanflows

Fork 0

Go to file

Deeman 66d484955d

CI / test-cli (push) Waiting to run

Details

CI / test-sqlmesh (push) Waiting to run

Details

CI / test-web (push) Waiting to run

Details

CI / tag (push) Blocked by required conditions

Details

fix: correct Gitea repo name materia → beanflows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-27 18:19:18 +01:00

.gitea/workflows

feat(ci): add Gitea Actions workflow

2026-02-27 08:20:28 +01:00

.gitlab

fix(ci): always run tests, remove needs:[] so tag waits for passing tests

2026-02-27 07:47:02 +01:00

assets

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

docs

docs: update inventory with ICE options research findings

2026-02-26 10:16:50 +01:00

extract

merge: CFTC COT combined (futures+options) report — extractor, transform, web toggle

2026-02-26 11:29:20 +01:00

infra

fix: correct Gitea repo name materia → beanflows

2026-02-27 18:19:18 +01:00

notebooks

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

research

add untracked

2026-02-26 02:44:48 +01:00

router

refactor: move deployment files from web/ to repo root

2026-02-27 10:24:52 +01:00

src/materia

fix(supervisor): use sqlmesh plan prod --auto-apply instead of run

2026-02-27 15:49:38 +01:00

tests

feat: extraction framework overhaul — extract_core shared package + SQLite state tracking

2026-02-22 14:37:50 +01:00

transform/sqlmesh_materia

merge: CFTC COT combined (futures+options) report — extractor, transform, web toggle

2026-02-26 11:29:20 +01:00

web

fix(billing): add missing helper functions and fix upsert_subscription signature

2026-02-27 14:43:14 +01:00

.copier-answers.yml

chore: use git remote for copier _src_path

2026-02-27 11:03:18 +01:00

.env.dev.sops

feat(secrets): add SOPS+age secret management infrastructure

2026-02-26 10:36:14 +01:00

.env.example

update secrets

2026-02-27 13:30:53 +01:00

.env.prod.sops

update secrets

2026-02-27 13:30:53 +01:00

.gitignore

refactor: add .copier-answers.yml at root + feature flags + .env.example

2026-02-27 10:25:28 +01:00

.gitlab-ci.yml

fix(ci): move .gitlab-ci.yml to repo root so GitLab picks it up

2026-02-27 14:26:37 +01:00

.mcp.json

scout: extract to standalone repo at Projects/scout

2026-02-21 17:58:03 +01:00

.python-version

Initial commit

2025-03-01 18:11:57 +01:00

.sops.yaml

chore: add server age key

2026-02-27 07:37:36 +01:00

CHANGELOG.md

chore: delete stale web/ deployment files (now at repo root)

2026-02-27 10:26:26 +01:00

chatnotes.md

chat notes

2025-04-01 18:33:40 +02:00

CLAUDE.md

update secrets

2026-02-27 13:30:53 +01:00

coding_philosophy.md

Add scout MCP server for browser recon + msgspec workspace dep

2026-02-21 15:44:02 +01:00

deploy.sh

refactor: move deployment files from web/ to repo root

2026-02-27 10:24:52 +01:00

docker-compose.prod.yml

fix(infra): change host port to 5001 to avoid conflict with padelnomics

2026-02-27 14:12:45 +01:00

docker-compose.yml

refactor: move deployment files from web/ to repo root

2026-02-27 10:24:52 +01:00

Dockerfile

refactor: move deployment files from web/ to repo root

2026-02-27 10:24:52 +01:00

litestream.yml

refactor: move deployment files from web/ to repo root

2026-02-27 10:24:52 +01:00

Makefile

fix(secrets): add secrets-updatekeys-prod target, use --input-type dotenv

2026-02-27 07:40:03 +01:00

materia.drawio

add prototype ui

2025-04-01 20:26:45 +02:00

PROJECT.md

docs: add PROJECT.md with backlog (retry/backoff for ICE + yfinance)

2026-02-26 20:08:12 +01:00

pyproject.toml

feat(supervisor): port Python supervisor from padelnomics + workflows.toml

2026-02-26 11:59:55 +01:00

readme.md

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

single_server_arch.excalidraw

cleanup and prefect service setup

2026-02-04 22:24:55 +01:00

uv.lock

feat(supervisor): port Python supervisor from padelnomics + workflows.toml

2026-02-26 11:59:55 +01:00

vision.md

dashboard: JTBD-driven restructure — Pulse, Supply, Positioning, Warehouse

2026-02-22 01:27:44 +01:00

readme.md

Materia

A commodity data analytics platform built on a modern data engineering stack. Extracts agricultural commodity data from USDA PSD Online, transforms it through a layered SQL pipeline using SQLMesh, and stores it in DuckDB + Cloudflare R2 for analysis.

Tech Stack

Python 3.13 with uv package manager
SQLMesh for SQL transformation and orchestration
DuckDB as the analytical database
Cloudflare R2 (Iceberg) for data storage
Pulumi ESC for secrets management
Hetzner Cloud for infrastructure

Quick Start

1. Install UV

UV is our Python package manager for faster, more reliable dependency management.

curl -LsSf https://astral.sh/uv/install.sh | sh

📚 UV Documentation

2. Install Dependencies

uv sync

This installs Python and all dependencies declared in pyproject.toml.

3. Setup Pre-commit Hooks

pre-commit install

This enables automatic linting with ruff on every commit.

4. Install Pulumi ESC (for running with secrets)

# Install ESC CLI
curl -fsSL https://get.pulumi.com/esc/install.sh | sh

# Login
esc login

Project Structure

This is a uv workspace with three main packages:

Extract Layer (`extract/`)

psdonline - Extracts USDA PSD commodity data

# Local development (downloads to local directory)
uv run extract_psd

# Production (uploads to R2)
esc run beanflows/prod -- uv run extract_psd

Transform Layer (`transform/sqlmesh_materia/`)

SQLMesh project implementing a 4-layer data architecture (raw → staging → cleaned → serving).

All commands run from project root with -p transform/sqlmesh_materia:

# Local development
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan dev_<username>

# Production
esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan prod

# Run tests (no secrets needed)
uv run sqlmesh -p transform/sqlmesh_materia test

# Format SQL
uv run sqlmesh -p transform/sqlmesh_materia format

Core Package (`src/materia/`)

CLI for managing infrastructure and pipelines (currently minimal).

Development Workflow

Adding Dependencies

For workspace root:

uv add <package-name>

For specific package:

uv add --package psdonline <package-name>

Linting and Formatting

# Check for issues
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

Running Tests

# Python tests
uv run pytest tests/ -v --cov=src/materia

# SQLMesh tests
uv run sqlmesh -p transform/sqlmesh_materia test

Secrets Management

All secrets are managed via Pulumi ESC environment beanflows/prod.

Load secrets into shell:

eval $(esc env open beanflows/prod --format shell)

Run commands with secrets:

# Single command
esc run beanflows/prod -- uv run extract_psd

# Multiple commands
esc run beanflows/prod -- bash -c "
  uv run extract_psd
  uv run sqlmesh -p transform/sqlmesh_materia plan prod
"

Production Architecture

Git-Based Deployment

Supervisor (Hetzner CPX11): Always-on orchestrator that pulls latest code every 15 minutes
Workers (Ephemeral): Created on-demand for each pipeline run, destroyed after completion
Storage: Cloudflare R2 Data Catalog (Apache Iceberg REST API)

CI/CD Pipeline

GitLab CI runs on every push to master:

Lint - ruff check
Test - pytest + SQLMesh tests
Deploy - Updates supervisor infrastructure and bootstraps if needed

No build artifacts - supervisor pulls code directly from git!

Architecture Principles

Simplicity First - Avoid unnecessary abstractions
Data-Oriented Design - Identify data by content, not metadata
Cost Optimization - Ephemeral workers, minimal always-on infrastructure
Inspectable - Easy to understand, test locally, and debug

Resources

Architecture Plans: See .claude/plans/ for design decisions
UV Docs: https://docs.astral.sh/uv/
SQLMesh Docs: https://sqlmesh.readthedocs.io/

Languages

Python 50.7%

HTML 33.8%

Jupyter Notebook 8.3%

Shell 3.6%

CSS 2.9%

Other 0.7%

readme.md

Materia

Tech Stack

Quick Start

1. Install UV

2. Install Dependencies

3. Setup Pre-commit Hooks

4. Install Pulumi ESC (for running with secrets)

Project Structure

Extract Layer (extract/)

Transform Layer (transform/sqlmesh_materia/)

Core Package (src/materia/)

Development Workflow

Adding Dependencies

Linting and Formatting

Running Tests

Secrets Management

Load secrets into shell:

Run commands with secrets:

Production Architecture

Git-Based Deployment

CI/CD Pipeline

Architecture Principles

Resources

Extract Layer (`extract/`)

Transform Layer (`transform/sqlmesh_materia/`)

Core Package (`src/materia/`)