Update README with comprehensive project documentation

Added complete project overview including: - Tech stack and architecture overview - Quick start guide with UV and Pulumi ESC setup - Project structure (extract, transform, core packages) - Development workflow (dependencies, linting, testing) - Secrets management with ESC examples - Production architecture explanation - Architecture principles Removed outdated content and references to CLAUDE.md (internal memory only). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 21:51:52 +02:00
parent d4e6c65f97
commit 3c7a99a699
1 changed files with 156 additions and 14 deletions
--- a/README.md
+++ b/README.md
@@ -1,39 +1,181 @@
-# Materia Environment Setup
+# Materia
-We use `uv` as our Python package manager for faster, more reliable dependency management.
+A commodity data analytics platform built on a modern data engineering stack. Extracts agricultural commodity data from USDA PSD Online, transforms it through a layered SQL pipeline using SQLMesh, and stores it in DuckDB + Cloudflare R2 for analysis.
 https://docs.astral.sh/uv/
-We recommend using vscode as your IDE.
+## Tech Stack
-https://code.visualstudio.com/
+
 - **Python 3.13** with `uv` package manager
 - **SQLMesh** for SQL transformation and orchestration
 - **DuckDB** as the analytical database
 - **Cloudflare R2** (Iceberg) for data storage
 - **Pulumi ESC** for secrets management
 - **Hetzner Cloud** for infrastructure
 ## Quick Start
 ### 1. Install UV
 UV is our Python package manager for faster, more reliable dependency management.
 ```bash
 curl -LsSf https://astral.sh/uv/install.sh | sh
 ```
-### 2. Setup the env
+📚 [UV Documentation](https://docs.astral.sh/uv/)
-Simply run:
+
 ### 2. Install Dependencies
 ```bash
 uv sync
 ```
 This will install python & the dependencies declared so far
-### 3. Setup pre-commit
+This installs Python and all dependencies declared in `pyproject.toml`.
 ### 3. Setup Pre-commit Hooks
 ```bash
 pre-commit install
 ```
-### 4. Adding a dependency
+This enables automatic linting with `ruff` on every commit.
 ### 4. Install Pulumi ESC (for running with secrets)
 ```bash
-uv add requests
+# Install ESC CLI
 curl -fsSL https://get.pulumi.com/esc/install.sh | sh
 # Login
 esc login
 ```
-# Managing a project with uv
+## Project Structure
-https://docs.astral.sh/uv/guides/projects/#managing-dependencies
+This is a `uv` workspace with three main packages:
 ### Extract Layer (`extract/`)
-test
+**psdonline** - Extracts USDA PSD commodity data
 ```bash
 # Local development (downloads to local directory)
 uv run extract_psd
 # Production (uploads to R2)
 esc run beanflows/prod -- uv run extract_psd
 ```
 ### Transform Layer (`transform/sqlmesh_materia/`)
 SQLMesh project implementing a 4-layer data architecture (raw → staging → cleaned → serving).
 **All commands run from project root with `-p transform/sqlmesh_materia`:**
 ```bash
 # Local development
 esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan dev_<username>
 # Production
 esc run beanflows/prod -- uv run sqlmesh -p transform/sqlmesh_materia plan prod
 # Run tests (no secrets needed)
 uv run sqlmesh -p transform/sqlmesh_materia test
 # Format SQL
 uv run sqlmesh -p transform/sqlmesh_materia format
 ```
 ### Core Package (`src/materia/`)
 CLI for managing infrastructure and pipelines (currently minimal).
 ## Development Workflow
 ### Adding Dependencies
 For workspace root:
 ```bash
 uv add <package-name>
 ```
 For specific package:
 ```bash
 uv add --package psdonline <package-name>
 ```
 ### Linting and Formatting
 ```bash
 # Check for issues
 ruff check .
 # Auto-fix issues
 ruff check --fix .
 # Format code
 ruff format .
 ```
 ### Running Tests
 ```bash
 # Python tests
 uv run pytest tests/ -v --cov=src/materia
 # SQLMesh tests
 uv run sqlmesh -p transform/sqlmesh_materia test
 ```
 ## Secrets Management
 All secrets are managed via **Pulumi ESC** environment `beanflows/prod`.
 ### Load secrets into shell:
 ```bash
 eval $(esc env open beanflows/prod --format shell)
 ```
 ### Run commands with secrets:
 ```bash
 # Single command
 esc run beanflows/prod -- uv run extract_psd
 # Multiple commands
 esc run beanflows/prod -- bash -c "
  uv run extract_psd
  uv run sqlmesh -p transform/sqlmesh_materia plan prod
 "
 ```
 ## Production Architecture
 ### Git-Based Deployment
 - **Supervisor** (Hetzner CPX11): Always-on orchestrator that pulls latest code every 15 minutes
 - **Workers** (Ephemeral): Created on-demand for each pipeline run, destroyed after completion
 - **Storage**: Cloudflare R2 Data Catalog (Apache Iceberg REST API)
 ### CI/CD Pipeline
 **GitLab CI** runs on every push to master:
 1. **Lint** - `ruff check`
 2. **Test** - pytest + SQLMesh tests
 3. **Deploy** - Updates supervisor infrastructure and bootstraps if needed
 No build artifacts - supervisor pulls code directly from git!
 ## Architecture Principles
 - **Simplicity First** - Avoid unnecessary abstractions
 - **Data-Oriented Design** - Identify data by content, not metadata
 - **Cost Optimization** - Ephemeral workers, minimal always-on infrastructure
 - **Inspectable** - Easy to understand, test locally, and debug
 ## Resources
 - **Architecture Plans**: See `.claude/plans/` for design decisions
 - **UV Docs**: https://docs.astral.sh/uv/
 - **SQLMesh Docs**: https://sqlmesh.readthedocs.io/