Files
beanflows/CLAUDE.md
Deeman b222c01828 Add CLAUDE.md for Claude Code context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 22:04:22 +01:00

4.0 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Materia is a commodity data analytics platform (product: BeanFlows.coffee) for coffee traders. It's a uv workspace monorepo with three packages: extraction (USDA PSD data), SQL transformation (SQLMesh + DuckDB), and a CLI for orchestrating cloud workers and pipelines.

Commands

# Install dependencies
uv sync

# Lint & format
ruff check .            # Check
ruff check --fix .      # Auto-fix
ruff format .           # Format

# Tests
uv run pytest tests/ -v --cov=src/materia         # CLI/Python tests
cd transform/sqlmesh_materia && uv run sqlmesh test  # SQLMesh model tests

# Run a single test
uv run pytest tests/test_cli.py::test_name -v

# Extract data
uv run extract_psd

# SQLMesh (from repo root)
uv run sqlmesh -p transform/sqlmesh_materia plan              # Plans to dev_<username> by default
uv run sqlmesh -p transform/sqlmesh_materia plan prod          # Production
uv run sqlmesh -p transform/sqlmesh_materia test               # Run model tests
uv run sqlmesh -p transform/sqlmesh_materia format             # Format SQL

# With production secrets
esc run beanflows/prod -- <command>

# CLI
uv run materia worker create|destroy|list
uv run materia pipeline run
uv run materia secrets get

Architecture

Workspace packages (pyproject.tomltool.uv.workspace):

  • extract/psdonline/ — Downloads USDA PSD Online data, normalizes ZIP→gzip CSV, uploads to R2
  • transform/sqlmesh_materia/ — 4-layer SQL transformation pipeline (DuckDB + Iceberg)
  • src/materia/ — CLI (Typer) for worker management, pipeline orchestration, secrets
  • web/ — Future web frontend

Data flow:

USDA API → extract (psdonline) → R2/local CSV → SQLMesh transforms → DuckDB/Iceberg

SQLMesh 4-layer model structure (transform/sqlmesh_materia/models/):

  1. raw/ — Immutable source reads (read_csv from extracted files)
  2. staging/ — Type casting, lookup joins, basic cleansing
  3. cleaned/ — Business logic, pivoting, integration
  4. serving/ — Analytics-ready facts, dimensions, aggregates

CLI modules (src/materia/):

  • cli.py — Typer app with subcommands: worker, pipeline, secrets, version
  • workers.py — Ephemeral cloud instance management (Hetzner, with planned OVH/Scaleway/Oracle)
  • pipelines.py — SSH-based pipeline execution on workers (download artifact, run, destroy)
  • secrets.py — Pulumi ESC integration for environment secrets

Infrastructure (infra/):

  • Pulumi IaC for Cloudflare R2 buckets and Hetzner compute
  • Supervisor systemd service for always-on orchestration (pulls git every 15 min)

Coding Philosophy

Read coding_philosophy.md for the full guide. Key points:

  • Simple, procedural code — Functions over classes, no inheritance hierarchies, no "Manager" patterns
  • Data-oriented — Use dicts/lists/tuples, not objects hiding data behind getters
  • Keep logic in SQL — Let DuckDB do the heavy lifting, don't pull data into Python to transform it
  • Build minimum that works — No premature abstraction, three examples before generalizing
  • Explicit over implicit — No framework magic, no metaprogramming, no hidden behavior
  • Question every dependency — Can you write it simply yourself? Are you using 5% of a large framework?

Key Configuration

  • Python 3.13 (.python-version)
  • Ruff: double quotes, spaces, E501 ignored (formatter handles line length)
  • SQLMesh: DuckDB dialect, @daily cron, start date 2025-07-07, default env dev_{{ user() }}
  • Storage: Cloudflare R2 with Iceberg catalog (zero egress cost)
  • Secrets: Pulumi ESC (esc run beanflows/prod -- <cmd>)
  • CI: GitLab CI (.gitlab/.gitlab-ci.yml) — runs pytest and sqlmesh test on push/MR
  • Pre-commit hooks: installed via pre-commit install