diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 609f411..eddca42 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -59,7 +59,7 @@ ruff format . uv run pytest tests/ -v # Dev server -./scripts/dev_run.sh +make dev # Extract data LANDING_DIR=data/landing uv run extract @@ -152,6 +152,37 @@ All env vars are defined in the sops files. See `.env.dev.sops` for the full lis | `RESEND_WEBHOOK_SECRET` | `""` | Resend webhook signature secret (skip verification if empty) | +## uv workspace management + +```bash +# Install everything (run from repo root) +uv sync --all-packages --all-groups + +# Add a dependency to an existing package +uv add --package padelnomics +uv add --package padelnomics-web duckdb + +# Add a new extraction package (if splitting extract further) +uv init --package extract/new_source +uv add --package new_source padelnomics-extract niquests +# Then add to [tool.uv.workspace] members in pyproject.toml +``` + +Always use `uv` CLI to manage dependencies — never edit `pyproject.toml` manually for dependency changes. + +## Data modeling patterns + +**Foundation layer is the ontology.** Dimension tables conform identifiers across all data sources: +- `dim_venues` maps Overpass, Playtomic, and other source identifiers to a single row per venue +- `dim_cities` conforms city/municipality identifiers across Eurostat, Overpass, and geocoding results +- New data sources add columns to existing dims, not new tables +- Facts join to dims via surrogate keys (MD5 hash keys generated in staging) + +**Extraction pattern:** +- State tracked in SQLite (`{LANDING_DIR}/.state.sqlite`, WAL mode) — not DuckDB; it's OLTP +- Landing zone is immutable and content-addressed: `{LANDING_DIR}/{source}/{partitions}/{hash}.ext` +- Adding a new source: create package, add to workflows.toml, add staging + foundation models + ## Coding philosophy - **Simple and procedural** — functions over classes, no "Manager" patterns