Landing files (append-only JSON.gz) synced to R2 every 30 min via systemd timer + rclone. Extraction state DB (.state.sqlite) continuously replicated via Litestream (second DB entry). Auto-restore on container startup for both app.db and .state.sqlite. Reuses existing R2 bucket and credentials — no new env vars needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
137 lines
4.2 KiB
Markdown
137 lines
4.2 KiB
Markdown
# CLAUDE.md — Padelnomics
|
|
|
|
This file tells Claude Code how to work in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.
|
|
|
|
It includes a full data pipeline:
|
|
|
|
```
|
|
External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app
|
|
```
|
|
|
|
|
|
**Packages** (uv workspace):
|
|
- `web/` — Quart + HTMX web application (auth, billing, dashboard)
|
|
|
|
- `extract/padelnomics_extract/` — data extraction to local landing zone
|
|
- `transform/sqlmesh_padelnomics/` — 3-layer SQL transformation (staging → foundation → serving)
|
|
- `src/padelnomics/` — CLI utilities, export_serving helper
|
|
|
|
|
|
## Skills: invoke these for domain tasks
|
|
|
|
### Working on extraction or transformation?
|
|
|
|
Use the **`data-engineer`** skill for:
|
|
- Designing or reviewing SQLMesh model logic
|
|
- Adding a new data source (extract + staging model)
|
|
- Performance tuning DuckDB queries
|
|
- Data modeling decisions (dimensions, facts, aggregates)
|
|
- Understanding the 3-layer architecture
|
|
|
|
```
|
|
/data-engineer (or ask Claude to invoke it)
|
|
```
|
|
|
|
|
|
### Working on the web app UI or frontend?
|
|
|
|
Use the **`frontend-design`** skill for UI components, templates, or dashboard layouts.
|
|
|
|
### Working on payments or subscriptions?
|
|
|
|
Use the **`paddle-integration`** skill for billing, webhooks, and subscription logic.
|
|
|
|
## Key commands
|
|
|
|
```bash
|
|
# Install all dependencies
|
|
uv sync --all-packages
|
|
|
|
# Lint & format
|
|
ruff check .
|
|
ruff format .
|
|
|
|
# Run tests
|
|
uv run pytest tests/ -v
|
|
|
|
# Dev server
|
|
./scripts/dev_run.sh
|
|
|
|
# Extract data
|
|
LANDING_DIR=data/landing uv run extract
|
|
|
|
# SQLMesh plan + run (from repo root)
|
|
uv run sqlmesh -p transform/sqlmesh_padelnomics plan
|
|
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
|
|
|
|
# Export serving tables (run after SQLMesh)
|
|
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
|
|
uv run python -m padelnomics.export_serving
|
|
|
|
```
|
|
|
|
## Architecture documentation
|
|
|
|
| Topic | File |
|
|
|-------|------|
|
|
| Extraction patterns, state tracking, adding new sources | `extract/padelnomics_extract/README.md` |
|
|
| 3-layer SQLMesh architecture, materialization strategy | `transform/sqlmesh_padelnomics/README.md` |
|
|
| Two-file DuckDB architecture (SQLMesh lock isolation) | `src/padelnomics/export_serving.py` docstring |
|
|
|
|
## Pipeline data flow
|
|
|
|
```
|
|
data/landing/
|
|
├── overpass/{year}/{month}/courts.json.gz
|
|
├── eurostat/{year}/{month}/urb_cpop1.json.gz
|
|
└── playtomic/{year}/{month}/tenants.json.gz
|
|
|
|
data/lakehouse.duckdb ← SQLMesh exclusive (staging → foundation → serving)
|
|
|
|
analytics.duckdb ← serving tables only, web app read-only
|
|
└── serving.* ← atomically replaced by export_serving.py
|
|
```
|
|
|
|
## Backup & disaster recovery
|
|
|
|
| Data | Tool | Target | Frequency |
|
|
|------|------|--------|-----------|
|
|
| `app.db` (auth, billing) | Litestream | R2 `padelnomics/app.db` | Continuous (WAL) |
|
|
| `.state.sqlite` (extraction state) | Litestream | R2 `padelnomics/state.sqlite` | Continuous (WAL) |
|
|
| `data/landing/` (JSON.gz files) | rclone sync | R2 `padelnomics/landing/` | Every 30 min (systemd timer) |
|
|
| `lakehouse.duckdb`, `analytics.duckdb` | N/A (derived) | Re-run pipeline | On demand |
|
|
|
|
Recovery:
|
|
```bash
|
|
# App database (auto-restored by Litestream container on startup)
|
|
litestream restore -config /etc/litestream.yml /app/data/app.db
|
|
|
|
# Extraction state (auto-restored by Litestream container on startup)
|
|
litestream restore -config /etc/litestream.yml /data/landing/.state.sqlite
|
|
|
|
# Landing zone files
|
|
source /opt/padelnomics/.env && bash infra/restore_landing.sh
|
|
```
|
|
|
|
## Environment variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `LANDING_DIR` | `data/landing` | Landing zone root (extraction writes here) |
|
|
| `DUCKDB_PATH` | `local.duckdb` | SQLMesh pipeline DB (exclusive write) |
|
|
| `SERVING_DUCKDB_PATH` | `analytics.duckdb` | Read-only DB for web app |
|
|
|
|
|
|
## Coding philosophy
|
|
|
|
- **Simple and procedural** — functions over classes, no "Manager" patterns
|
|
- **Idempotent operations** — running twice produces the same result
|
|
- **Explicit assertions** — assert preconditions at function boundaries
|
|
- **Bounded operations** — set timeouts, page limits, buffer sizes
|
|
|
|
Read `coding_philosophy.md` (if present) for the full guide.
|