Files
padelnomics/.claude/CLAUDE.md
Deeman 518a4e4fe2 docs(claude): add uv workspace management + data modeling patterns
- uv workspace section: sync all-packages, add deps, create new source package
- Data modeling patterns: foundation-as-ontology (dim_venues, dim_cities
  conform cross-source identifiers); extraction pattern notes (state SQLite)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:15:24 +01:00

6.5 KiB

CLAUDE.md — Padelnomics

This file tells Claude Code how to work in this repository.

Project Overview

Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.

It includes a full data pipeline:

External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app

Packages (uv workspace):

  • web/ — Quart + HTMX web application (auth, billing, dashboard)

  • extract/padelnomics_extract/ — data extraction to local landing zone

  • transform/sqlmesh_padelnomics/ — 3-layer SQL transformation (staging → foundation → serving)

  • src/padelnomics/ — CLI utilities, export_serving helper

Skills: invoke these for domain tasks

Working on extraction or transformation?

Use the data-engineer skill for:

  • Designing or reviewing SQLMesh model logic
  • Adding a new data source (extract + staging model)
  • Performance tuning DuckDB queries
  • Data modeling decisions (dimensions, facts, aggregates)
  • Understanding the 3-layer architecture
/data-engineer  (or ask Claude to invoke it)

Working on the web app UI or frontend?

Use the frontend-design skill for UI components, templates, or dashboard layouts.

Working on payments or subscriptions?

Use the paddle-integration skill for billing, webhooks, and subscription logic.

Key commands

# Install all dependencies
uv sync --all-packages

# Lint & format
ruff check .
ruff format .

# Run tests
uv run pytest tests/ -v

# Dev server
make dev

# Extract data
LANDING_DIR=data/landing uv run extract

# SQLMesh plan + run (from repo root)
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod

# Export serving tables (run after SQLMesh)
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
    uv run python src/padelnomics/export_serving.py

Architecture documentation

Topic File
Extraction patterns, state tracking, adding new sources extract/padelnomics_extract/README.md
3-layer SQLMesh architecture, materialization strategy transform/sqlmesh_padelnomics/README.md
Two-file DuckDB architecture (SQLMesh lock isolation) src/padelnomics/export_serving.py docstring
Email hub: delivery tracking, webhook handler, admin UI web/src/padelnomics/webhooks.py docstring
User flows (all admin + public routes) docs/USER_FLOWS.md

Pipeline data flow

data/landing/
  ├── overpass/{year}/{month}/courts.json.gz
  ├── eurostat/{year}/{month}/urb_cpop1.json.gz
  └── playtomic/{year}/{month}/tenants.json.gz

data/lakehouse.duckdb       ← SQLMesh exclusive (staging → foundation → serving)

analytics.duckdb            ← serving tables only, web app read-only
  └── serving.*             ← atomically replaced by export_serving.py

Backup & disaster recovery

Data Tool Target Frequency
app.db (auth, billing) Litestream R2 padelnomics/app.db Continuous (WAL)
.state.sqlite (extraction state) Litestream R2 padelnomics/state.sqlite Continuous (WAL)
data/landing/ (JSON.gz files) rclone sync R2 padelnomics/landing/ Every 30 min (systemd timer)
lakehouse.duckdb, analytics.duckdb N/A (derived) Re-run pipeline On demand

Recovery:

# App database (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /app/data/app.db

# Extraction state (auto-restored by Litestream container on startup)
litestream restore -config /etc/litestream.yml /data/landing/.state.sqlite

# Landing zone files
source /opt/padelnomics/.env && bash infra/restore_landing.sh

Secrets management (SOPS + age)

Secrets are stored encrypted in the repo using SOPS with age encryption:

File Purpose
.env.dev.sops Dev defaults (safe/blank values)
.env.prod.sops Production secrets
.sops.yaml Maps file patterns to age public keys
# Decrypt dev secrets to .env (one-time, or after changes)
make secrets-decrypt-dev

# Edit prod secrets (opens in $EDITOR, re-encrypts on save)
make secrets-edit-prod

# deploy.sh auto-decrypts .env.prod.sops → .env on the server

All env vars are defined in the sops files. See .env.dev.sops for the full list (decrypt with make secrets-decrypt-dev to read).

Environment variables

Variable Default Description
LANDING_DIR data/landing Landing zone root (extraction writes here)
DUCKDB_PATH local.duckdb SQLMesh pipeline DB (exclusive write)
SERVING_DUCKDB_PATH analytics.duckdb Read-only DB for web app
RESEND_WEBHOOK_SECRET "" Resend webhook signature secret (skip verification if empty)

uv workspace management

# Install everything (run from repo root)
uv sync --all-packages --all-groups

# Add a dependency to an existing package
uv add --package padelnomics <package>
uv add --package padelnomics-web duckdb

# Add a new extraction package (if splitting extract further)
uv init --package extract/new_source
uv add --package new_source padelnomics-extract niquests
# Then add to [tool.uv.workspace] members in pyproject.toml

Always use uv CLI to manage dependencies — never edit pyproject.toml manually for dependency changes.

Data modeling patterns

Foundation layer is the ontology. Dimension tables conform identifiers across all data sources:

  • dim_venues maps Overpass, Playtomic, and other source identifiers to a single row per venue
  • dim_cities conforms city/municipality identifiers across Eurostat, Overpass, and geocoding results
  • New data sources add columns to existing dims, not new tables
  • Facts join to dims via surrogate keys (MD5 hash keys generated in staging)

Extraction pattern:

  • State tracked in SQLite ({LANDING_DIR}/.state.sqlite, WAL mode) — not DuckDB; it's OLTP
  • Landing zone is immutable and content-addressed: {LANDING_DIR}/{source}/{partitions}/{hash}.ext
  • Adding a new source: create package, add to workflows.toml, add staging + foundation models

Coding philosophy

  • Simple and procedural — functions over classes, no "Manager" patterns
  • Idempotent operations — running twice produces the same result
  • Explicit assertions — assert preconditions at function boundaries
  • Bounded operations — set timeouts, page limits, buffer sizes

Read coding_philosophy.md (if present) for the full guide.