From b222c0182889e49e1b01bf513d9b4c5aee27f98e Mon Sep 17 00:00:00 2001
From: Deeman <hendriknote@gmail.com>
Date: Tue, 17 Feb 2026 22:04:22 +0100
Subject: [PATCH] Add CLAUDE.md for Claude Code context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CLAUDE.md | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 CLAUDE.md
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..a0c9d0a
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,93 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Materia is a commodity data analytics platform (product: **BeanFlows.coffee**) for coffee traders. It's a uv workspace monorepo with three packages: extraction (USDA PSD data), SQL transformation (SQLMesh + DuckDB), and a CLI for orchestrating cloud workers and pipelines.
+
+## Commands
+
+```bash
+# Install dependencies
+uv sync
+
+# Lint & format
+ruff check .            # Check
+ruff check --fix .      # Auto-fix
+ruff format .           # Format
+
+# Tests
+uv run pytest tests/ -v --cov=src/materia         # CLI/Python tests
+cd transform/sqlmesh_materia && uv run sqlmesh test  # SQLMesh model tests
+
+# Run a single test
+uv run pytest tests/test_cli.py::test_name -v
+
+# Extract data
+uv run extract_psd
+
+# SQLMesh (from repo root)
+uv run sqlmesh -p transform/sqlmesh_materia plan              # Plans to dev_<username> by default
+uv run sqlmesh -p transform/sqlmesh_materia plan prod          # Production
+uv run sqlmesh -p transform/sqlmesh_materia test               # Run model tests
+uv run sqlmesh -p transform/sqlmesh_materia format             # Format SQL
+
+# With production secrets
+esc run beanflows/prod -- <command>
+
+# CLI
+uv run materia worker create|destroy|list
+uv run materia pipeline run
+uv run materia secrets get
+```
+
+## Architecture
+
+**Workspace packages** (`pyproject.toml` → `tool.uv.workspace`):
+- `extract/psdonline/` — Downloads USDA PSD Online data, normalizes ZIP→gzip CSV, uploads to R2
+- `transform/sqlmesh_materia/` — 4-layer SQL transformation pipeline (DuckDB + Iceberg)
+- `src/materia/` — CLI (Typer) for worker management, pipeline orchestration, secrets
+- `web/` — Future web frontend
+
+**Data flow:**
+```
+USDA API → extract (psdonline) → R2/local CSV → SQLMesh transforms → DuckDB/Iceberg
+```
+
+**SQLMesh 4-layer model structure** (`transform/sqlmesh_materia/models/`):
+1. `raw/` — Immutable source reads (read_csv from extracted files)
+2. `staging/` — Type casting, lookup joins, basic cleansing
+3. `cleaned/` — Business logic, pivoting, integration
+4. `serving/` — Analytics-ready facts, dimensions, aggregates
+
+**CLI modules** (`src/materia/`):
+- `cli.py` — Typer app with subcommands: worker, pipeline, secrets, version
+- `workers.py` — Ephemeral cloud instance management (Hetzner, with planned OVH/Scaleway/Oracle)
+- `pipelines.py` — SSH-based pipeline execution on workers (download artifact, run, destroy)
+- `secrets.py` — Pulumi ESC integration for environment secrets
+
+**Infrastructure** (`infra/`):
+- Pulumi IaC for Cloudflare R2 buckets and Hetzner compute
+- Supervisor systemd service for always-on orchestration (pulls git every 15 min)
+
+## Coding Philosophy
+
+Read `coding_philosophy.md` for the full guide. Key points:
+
+- **Simple, procedural code** — Functions over classes, no inheritance hierarchies, no "Manager" patterns
+- **Data-oriented** — Use dicts/lists/tuples, not objects hiding data behind getters
+- **Keep logic in SQL** — Let DuckDB do the heavy lifting, don't pull data into Python to transform it
+- **Build minimum that works** — No premature abstraction, three examples before generalizing
+- **Explicit over implicit** — No framework magic, no metaprogramming, no hidden behavior
+- **Question every dependency** — Can you write it simply yourself? Are you using 5% of a large framework?
+
+## Key Configuration
+
+- **Python 3.13** (`.python-version`)
+- **Ruff**: double quotes, spaces, E501 ignored (formatter handles line length)
+- **SQLMesh**: DuckDB dialect, `@daily` cron, start date `2025-07-07`, default env `dev_{{ user() }}`
+- **Storage**: Cloudflare R2 with Iceberg catalog (zero egress cost)
+- **Secrets**: Pulumi ESC (`esc run beanflows/prod -- <cmd>`)
+- **CI**: GitLab CI (`.gitlab/.gitlab-ci.yml`) — runs pytest and sqlmesh test on push/MR
+- **Pre-commit hooks**: installed via `pre-commit install`