From b222c0182889e49e1b01bf513d9b4c5aee27f98e Mon Sep 17 00:00:00 2001 From: Deeman Date: Tue, 17 Feb 2026 22:04:22 +0100 Subject: [PATCH] Add CLAUDE.md for Claude Code context Co-Authored-By: Claude Opus 4.6 --- CLAUDE.md | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..a0c9d0a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,93 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +Materia is a commodity data analytics platform (product: **BeanFlows.coffee**) for coffee traders. It's a uv workspace monorepo with three packages: extraction (USDA PSD data), SQL transformation (SQLMesh + DuckDB), and a CLI for orchestrating cloud workers and pipelines. + +## Commands + +```bash +# Install dependencies +uv sync + +# Lint & format +ruff check . # Check +ruff check --fix . # Auto-fix +ruff format . # Format + +# Tests +uv run pytest tests/ -v --cov=src/materia # CLI/Python tests +cd transform/sqlmesh_materia && uv run sqlmesh test # SQLMesh model tests + +# Run a single test +uv run pytest tests/test_cli.py::test_name -v + +# Extract data +uv run extract_psd + +# SQLMesh (from repo root) +uv run sqlmesh -p transform/sqlmesh_materia plan # Plans to dev_ by default +uv run sqlmesh -p transform/sqlmesh_materia plan prod # Production +uv run sqlmesh -p transform/sqlmesh_materia test # Run model tests +uv run sqlmesh -p transform/sqlmesh_materia format # Format SQL + +# With production secrets +esc run beanflows/prod -- + +# CLI +uv run materia worker create|destroy|list +uv run materia pipeline run +uv run materia secrets get +``` + +## Architecture + +**Workspace packages** (`pyproject.toml` → `tool.uv.workspace`): +- `extract/psdonline/` — Downloads USDA PSD Online data, normalizes ZIP→gzip CSV, uploads to R2 +- `transform/sqlmesh_materia/` — 4-layer SQL transformation pipeline (DuckDB + Iceberg) +- `src/materia/` — CLI (Typer) for worker management, pipeline orchestration, secrets +- `web/` — Future web frontend + +**Data flow:** +``` +USDA API → extract (psdonline) → R2/local CSV → SQLMesh transforms → DuckDB/Iceberg +``` + +**SQLMesh 4-layer model structure** (`transform/sqlmesh_materia/models/`): +1. `raw/` — Immutable source reads (read_csv from extracted files) +2. `staging/` — Type casting, lookup joins, basic cleansing +3. `cleaned/` — Business logic, pivoting, integration +4. `serving/` — Analytics-ready facts, dimensions, aggregates + +**CLI modules** (`src/materia/`): +- `cli.py` — Typer app with subcommands: worker, pipeline, secrets, version +- `workers.py` — Ephemeral cloud instance management (Hetzner, with planned OVH/Scaleway/Oracle) +- `pipelines.py` — SSH-based pipeline execution on workers (download artifact, run, destroy) +- `secrets.py` — Pulumi ESC integration for environment secrets + +**Infrastructure** (`infra/`): +- Pulumi IaC for Cloudflare R2 buckets and Hetzner compute +- Supervisor systemd service for always-on orchestration (pulls git every 15 min) + +## Coding Philosophy + +Read `coding_philosophy.md` for the full guide. Key points: + +- **Simple, procedural code** — Functions over classes, no inheritance hierarchies, no "Manager" patterns +- **Data-oriented** — Use dicts/lists/tuples, not objects hiding data behind getters +- **Keep logic in SQL** — Let DuckDB do the heavy lifting, don't pull data into Python to transform it +- **Build minimum that works** — No premature abstraction, three examples before generalizing +- **Explicit over implicit** — No framework magic, no metaprogramming, no hidden behavior +- **Question every dependency** — Can you write it simply yourself? Are you using 5% of a large framework? + +## Key Configuration + +- **Python 3.13** (`.python-version`) +- **Ruff**: double quotes, spaces, E501 ignored (formatter handles line length) +- **SQLMesh**: DuckDB dialect, `@daily` cron, start date `2025-07-07`, default env `dev_{{ user() }}` +- **Storage**: Cloudflare R2 with Iceberg catalog (zero egress cost) +- **Secrets**: Pulumi ESC (`esc run beanflows/prod -- `) +- **CI**: GitLab CI (`.gitlab/.gitlab-ci.yml`) — runs pytest and sqlmesh test on push/MR +- **Pre-commit hooks**: installed via `pre-commit install`