Files
padelnomics/.claude/CLAUDE.md
Deeman 518a4e4fe2 docs(claude): add uv workspace management + data modeling patterns
- uv workspace section: sync all-packages, add deps, create new source package
- Data modeling patterns: foundation-as-ontology (dim_venues, dim_cities
  conform cross-source identifiers); extraction pattern notes (state SQLite)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:15:24 +01:00

194 lines
6.5 KiB
Markdown

# CLAUDE.md — Padelnomics
This file tells Claude Code how to work in this repository.
## Project Overview
Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.
It includes a full data pipeline:
```
External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app
```
**Packages** (uv workspace):
- `web/` — Quart + HTMX web application (auth, billing, dashboard)
- `extract/padelnomics_extract/` — data extraction to local landing zone
- `transform/sqlmesh_padelnomics/` — 3-layer SQL transformation (staging → foundation → serving)
- `src/padelnomics/` — CLI utilities, export_serving helper
## Skills: invoke these for domain tasks
### Working on extraction or transformation?
Use the **`data-engineer`** skill for:
- Designing or reviewing SQLMesh model logic
- Adding a new data source (extract + staging model)
- Performance tuning DuckDB queries
- Data modeling decisions (dimensions, facts, aggregates)
- Understanding the 3-layer architecture
```
/data-engineer (or ask Claude to invoke it)
```
### Working on the web app UI or frontend?
Use the **`frontend-design`** skill for UI components, templates, or dashboard layouts.
### Working on payments or subscriptions?
Use the **`paddle-integration`** skill for billing, webhooks, and subscription logic.
## Key commands
```bash
# Install all dependencies
uv sync --all-packages
# Lint & format
ruff check .
ruff format .
# Run tests
uv run pytest tests/ -v
# Dev server
make dev
# Extract data
LANDING_DIR=data/landing uv run extract
# SQLMesh plan + run (from repo root)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
# Export serving tables (run after SQLMesh)
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
uv run python src/padelnomics/export_serving.py
```
## Architecture documentation
| Topic | File |
|-------|------|
| Extraction patterns, state tracking, adding new sources | `extract/padelnomics_extract/README.md` |
| 3-layer SQLMesh architecture, materialization strategy | `transform/sqlmesh_padelnomics/README.md` |
| Two-file DuckDB architecture (SQLMesh lock isolation) | `src/padelnomics/export_serving.py` docstring |
| Email hub: delivery tracking, webhook handler, admin UI | `web/src/padelnomics/webhooks.py` docstring |
| User flows (all admin + public routes) | `docs/USER_FLOWS.md` |
## Pipeline data flow
```
data/landing/
├── overpass/{year}/{month}/courts.json.gz
├── eurostat/{year}/{month}/urb_cpop1.json.gz
└── playtomic/{year}/{month}/tenants.json.gz
data/lakehouse.duckdb ← SQLMesh exclusive (staging → foundation → serving)
analytics.duckdb ← serving tables only, web app read-only
└── serving.* ← atomically replaced by export_serving.py
```
## Backup & disaster recovery
| Data | Tool | Target | Frequency |
|------|------|--------|-----------|
| `app.db` (auth, billing) | Litestream | R2 `padelnomics/app.db` | Continuous (WAL) |
| `.state.sqlite` (extraction state) | Litestream | R2 `padelnomics/state.sqlite` | Continuous (WAL) |
| `data/landing/` (JSON.gz files) | rclone sync | R2 `padelnomics/landing/` | Every 30 min (systemd timer) |
| `lakehouse.duckdb`, `analytics.duckdb` | N/A (derived) | Re-run pipeline | On demand |
Recovery:
```bash
# App database (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /app/data/app.db
# Extraction state (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /data/landing/.state.sqlite
# Landing zone files
source /opt/padelnomics/.env && bash infra/restore_landing.sh
```
## Secrets management (SOPS + age)
Secrets are stored encrypted in the repo using SOPS with age encryption:
| File | Purpose |
|------|---------|
| `.env.dev.sops` | Dev defaults (safe/blank values) |
| `.env.prod.sops` | Production secrets |
| `.sops.yaml` | Maps file patterns to age public keys |
```bash
# Decrypt dev secrets to .env (one-time, or after changes)
make secrets-decrypt-dev
# Edit prod secrets (opens in $EDITOR, re-encrypts on save)
make secrets-edit-prod
# deploy.sh auto-decrypts .env.prod.sops → .env on the server
```
All env vars are defined in the sops files. See `.env.dev.sops` for the full list
(decrypt with `make secrets-decrypt-dev` to read).
## Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `LANDING_DIR` | `data/landing` | Landing zone root (extraction writes here) |
| `DUCKDB_PATH` | `local.duckdb` | SQLMesh pipeline DB (exclusive write) |
| `SERVING_DUCKDB_PATH` | `analytics.duckdb` | Read-only DB for web app |
| `RESEND_WEBHOOK_SECRET` | `""` | Resend webhook signature secret (skip verification if empty) |
## uv workspace management
```bash
# Install everything (run from repo root)
uv sync --all-packages --all-groups
# Add a dependency to an existing package
uv add --package padelnomics <package>
uv add --package padelnomics-web duckdb
# Add a new extraction package (if splitting extract further)
uv init --package extract/new_source
uv add --package new_source padelnomics-extract niquests
# Then add to [tool.uv.workspace] members in pyproject.toml
```
Always use `uv` CLI to manage dependencies — never edit `pyproject.toml` manually for dependency changes.
## Data modeling patterns
**Foundation layer is the ontology.** Dimension tables conform identifiers across all data sources:
- `dim_venues` maps Overpass, Playtomic, and other source identifiers to a single row per venue
- `dim_cities` conforms city/municipality identifiers across Eurostat, Overpass, and geocoding results
- New data sources add columns to existing dims, not new tables
- Facts join to dims via surrogate keys (MD5 hash keys generated in staging)
**Extraction pattern:**
- State tracked in SQLite (`{LANDING_DIR}/.state.sqlite`, WAL mode) — not DuckDB; it's OLTP
- Landing zone is immutable and content-addressed: `{LANDING_DIR}/{source}/{partitions}/{hash}.ext`
- Adding a new source: create package, add to workflows.toml, add staging + foundation models
## Coding philosophy
- **Simple and procedural** — functions over classes, no "Manager" patterns
- **Idempotent operations** — running twice produces the same result
- **Explicit assertions** — assert preconditions at function boundaries
- **Bounded operations** — set timeouts, page limits, buffer sizes
Read `coding_philosophy.md` (if present) for the full guide.