Files

Deeman 518a4e4fe2 docs(claude): add uv workspace management + data modeling patterns

- uv workspace section: sync all-packages, add deps, create new source package
- Data modeling patterns: foundation-as-ontology (dim_venues, dim_cities
  conform cross-source identifiers); extraction pattern notes (state SQLite)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-26 12:15:24 +01:00

6.5 KiB

Raw Blame History

CLAUDE.md — Padelnomics

This file tells Claude Code how to work in this repository.

Project Overview

Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.

It includes a full data pipeline:

External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app

Packages (uv workspace):

web/ — Quart + HTMX web application (auth, billing, dashboard)
extract/padelnomics_extract/ — data extraction to local landing zone
transform/sqlmesh_padelnomics/ — 3-layer SQL transformation (staging → foundation → serving)
src/padelnomics/ — CLI utilities, export_serving helper

Skills: invoke these for domain tasks

Working on extraction or transformation?

Use the data-engineer skill for:

Designing or reviewing SQLMesh model logic
Adding a new data source (extract + staging model)
Performance tuning DuckDB queries
Data modeling decisions (dimensions, facts, aggregates)
Understanding the 3-layer architecture

/data-engineer  (or ask Claude to invoke it)

Working on the web app UI or frontend?

Use the frontend-design skill for UI components, templates, or dashboard layouts.

Working on payments or subscriptions?

Use the paddle-integration skill for billing, webhooks, and subscription logic.

Key commands

# Install all dependencies
uv sync --all-packages

# Lint & format
ruff check .
ruff format .

# Run tests
uv run pytest tests/ -v

# Dev server
make dev

# Extract data
LANDING_DIR=data/landing uv run extract

# SQLMesh plan + run (from repo root)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod

# Export serving tables (run after SQLMesh)
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
    uv run python src/padelnomics/export_serving.py

Architecture documentation

Topic	File
Extraction patterns, state tracking, adding new sources	`extract/padelnomics_extract/README.md`
3-layer SQLMesh architecture, materialization strategy	`transform/sqlmesh_padelnomics/README.md`
Two-file DuckDB architecture (SQLMesh lock isolation)	`src/padelnomics/export_serving.py` docstring
Email hub: delivery tracking, webhook handler, admin UI	`web/src/padelnomics/webhooks.py` docstring
User flows (all admin + public routes)	`docs/USER_FLOWS.md`

Pipeline data flow

data/landing/
  ├── overpass/{year}/{month}/courts.json.gz
  ├── eurostat/{year}/{month}/urb_cpop1.json.gz
  └── playtomic/{year}/{month}/tenants.json.gz

data/lakehouse.duckdb       ← SQLMesh exclusive (staging → foundation → serving)

analytics.duckdb            ← serving tables only, web app read-only
  └── serving.*             ← atomically replaced by export_serving.py

Backup & disaster recovery

Data	Tool	Target	Frequency
`app.db` (auth, billing)	Litestream	R2 `padelnomics/app.db`	Continuous (WAL)
`.state.sqlite` (extraction state)	Litestream	R2 `padelnomics/state.sqlite`	Continuous (WAL)
`data/landing/` (JSON.gz files)	rclone sync	R2 `padelnomics/landing/`	Every 30 min (systemd timer)
`lakehouse.duckdb`, `analytics.duckdb`	N/A (derived)	Re-run pipeline	On demand

Recovery:

# App database (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /app/data/app.db

# Extraction state (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /data/landing/.state.sqlite

# Landing zone files
source /opt/padelnomics/.env && bash infra/restore_landing.sh

Secrets management (SOPS + age)

Secrets are stored encrypted in the repo using SOPS with age encryption:

File	Purpose
`.env.dev.sops`	Dev defaults (safe/blank values)
`.env.prod.sops`	Production secrets
`.sops.yaml`	Maps file patterns to age public keys

# Decrypt dev secrets to .env (one-time, or after changes)
make secrets-decrypt-dev

# Edit prod secrets (opens in $EDITOR, re-encrypts on save)
make secrets-edit-prod

# deploy.sh auto-decrypts .env.prod.sops → .env on the server

All env vars are defined in the sops files. See .env.dev.sops for the full list (decrypt with make secrets-decrypt-dev to read).

Environment variables

Variable	Default	Description
`LANDING_DIR`	`data/landing`	Landing zone root (extraction writes here)
`DUCKDB_PATH`	`local.duckdb`	SQLMesh pipeline DB (exclusive write)
`SERVING_DUCKDB_PATH`	`analytics.duckdb`	Read-only DB for web app
`RESEND_WEBHOOK_SECRET`	`""`	Resend webhook signature secret (skip verification if empty)

uv workspace management

# Install everything (run from repo root)
uv sync --all-packages --all-groups

# Add a dependency to an existing package
uv add --package padelnomics <package>
uv add --package padelnomics-web duckdb

# Add a new extraction package (if splitting extract further)
uv init --package extract/new_source
uv add --package new_source padelnomics-extract niquests
# Then add to [tool.uv.workspace] members in pyproject.toml

Always use uv CLI to manage dependencies — never edit pyproject.toml manually for dependency changes.

Data modeling patterns

Foundation layer is the ontology. Dimension tables conform identifiers across all data sources:

dim_venues maps Overpass, Playtomic, and other source identifiers to a single row per venue
dim_cities conforms city/municipality identifiers across Eurostat, Overpass, and geocoding results
New data sources add columns to existing dims, not new tables
Facts join to dims via surrogate keys (MD5 hash keys generated in staging)

Extraction pattern:

State tracked in SQLite ({LANDING_DIR}/.state.sqlite, WAL mode) — not DuckDB; it's OLTP
Landing zone is immutable and content-addressed: {LANDING_DIR}/{source}/{partitions}/{hash}.ext
Adding a new source: create package, add to workflows.toml, add staging + foundation models

Coding philosophy

Simple and procedural — functions over classes, no "Manager" patterns
Idempotent operations — running twice produces the same result
Explicit assertions — assert preconditions at function boundaries
Bounded operations — set timeouts, page limits, buffer sizes

Read coding_philosophy.md (if present) for the full guide.

6.5 KiB Raw Blame History