feat: copier update v0.9.0 → v0.10.0

Pulls in template changes: export_serving.py for atomic DuckDB swap,
supervisor export step, SQLMesh glob macro, server provisioning script,
imprint template, and formatting improvements.

Template scaffold SQL models excluded (padelnomics has real models).
Web app routes/analytics unchanged (padelnomics-specific customizations).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-22 17:50:36 +01:00
parent 59306d5a90
commit ea86940b78
10 changed files with 236 additions and 16 deletions

View File

@@ -5,18 +5,22 @@ This file tells Claude Code how to work in this repository.
## Project Overview ## Project Overview
Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite. Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.
It includes a full data pipeline: It includes a full data pipeline:
``` ```
External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app
``` ```
**Packages** (uv workspace): **Packages** (uv workspace):
- `web/` — Quart + HTMX web application (auth, billing, dashboard) - `web/` — Quart + HTMX web application (auth, billing, dashboard)
- `extract/padelnomics_extract/` — data extraction to local landing zone - `extract/padelnomics_extract/` — data extraction to local landing zone
- `transform/sqlmesh_padelnomics/` — 4-layer SQL transformation (raw → staging → foundation → serving) - `transform/sqlmesh_padelnomics/` — 4-layer SQL transformation (raw → staging → foundation → serving)
- `src/padelnomics/` — CLI utilities, export_serving helper - `src/padelnomics/` — CLI utilities, export_serving helper
## Skills: invoke these for domain tasks ## Skills: invoke these for domain tasks
### Working on extraction or transformation? ### Working on extraction or transformation?
@@ -32,6 +36,7 @@ Use the **`data-engineer`** skill for:
/data-engineer (or ask Claude to invoke it) /data-engineer (or ask Claude to invoke it)
``` ```
### Working on the web app UI or frontend? ### Working on the web app UI or frontend?
Use the **`frontend-design`** skill for UI components, templates, or dashboard layouts. Use the **`frontend-design`** skill for UI components, templates, or dashboard layouts.
@@ -66,6 +71,7 @@ uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
# Export serving tables (run after SQLMesh) # Export serving tables (run after SQLMesh)
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \ DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
uv run python -m padelnomics.export_serving uv run python -m padelnomics.export_serving
``` ```
## Architecture documentation ## Architecture documentation
@@ -96,6 +102,7 @@ analytics.duckdb ← serving tables only, web app read-only
| `DUCKDB_PATH` | `local.duckdb` | SQLMesh pipeline DB (exclusive write) | | `DUCKDB_PATH` | `local.duckdb` | SQLMesh pipeline DB (exclusive write) |
| `SERVING_DUCKDB_PATH` | `analytics.duckdb` | Read-only DB for web app | | `SERVING_DUCKDB_PATH` | `analytics.duckdb` | Read-only DB for web app |
## Coding philosophy ## Coding philosophy
- **Simple and procedural** — functions over classes, no "Manager" patterns - **Simple and procedural** — functions over classes, no "Manager" patterns

View File

@@ -1,5 +1,5 @@
# Changes here will be overwritten by Copier; NEVER EDIT MANUALLY # Changes here will be overwritten by Copier; NEVER EDIT MANUALLY
_commit: v0.9.0 _commit: v0.10.0
_src_path: /home/Deeman/Projects/quart_saas_boilerplate _src_path: /home/Deeman/Projects/quart_saas_boilerplate
author_email: '' author_email: ''
author_name: '' author_name: ''

51
infra/setup_server.sh Normal file
View File

@@ -0,0 +1,51 @@
#!/bin/bash
# One-time server setup: create app directory and GitLab deploy key.
# Run as root on a fresh server before deploying.
#
# Usage:
# bash infra/setup_server.sh
set -euo pipefail
APP_DIR="/opt/padelnomics"
KEY_PATH="$HOME/.ssh/padelnomics_deploy"
# Create app directory
mkdir -p "$APP_DIR"
echo "Created $APP_DIR"
# Generate deploy key if not already present
if [ ! -f "$KEY_PATH" ]; then
mkdir -p "$HOME/.ssh"
ssh-keygen -t ed25519 -f "$KEY_PATH" -N "" -C "padelnomics-server"
chmod 700 "$HOME/.ssh"
chmod 600 "$KEY_PATH"
chmod 644 "$KEY_PATH.pub"
# Configure SSH to use this key for gitlab.com
if ! grep -q "# padelnomics" "$HOME/.ssh/config" 2>/dev/null; then
cat >> "$HOME/.ssh/config" <<EOF
# padelnomics
Host gitlab.com
IdentityFile $KEY_PATH
EOF
chmod 600 "$HOME/.ssh/config"
fi
echo "Generated deploy key: $KEY_PATH"
else
echo "Deploy key already exists, skipping"
fi
echo ""
echo "=== Next steps ==="
echo "1. Add this deploy key to GitLab (Settings → Repository → Deploy Keys, read-only):"
echo ""
cat "$KEY_PATH.pub"
echo ""
echo "2. Clone the repo:"
echo " git clone git@gitlab.com:YOUR_USER/padelnomics.git $APP_DIR"
echo ""
echo "3. Deploy:"
echo " cd $APP_DIR && bash deploy.sh"

View File

@@ -13,6 +13,7 @@ RestartSec=10
EnvironmentFile=/opt/padelnomics/.env EnvironmentFile=/opt/padelnomics/.env
Environment=LANDING_DIR=/data/padelnomics/landing Environment=LANDING_DIR=/data/padelnomics/landing
Environment=DUCKDB_PATH=/data/padelnomics/lakehouse.duckdb Environment=DUCKDB_PATH=/data/padelnomics/lakehouse.duckdb
Environment=SERVING_DUCKDB_PATH=/data/padelnomics/analytics.duckdb
LimitNOFILE=65536 LimitNOFILE=65536

View File

@@ -5,7 +5,8 @@
# #
# Environment variables (set in systemd EnvironmentFile or .env): # Environment variables (set in systemd EnvironmentFile or .env):
# LANDING_DIR — local path for extracted landing data # LANDING_DIR — local path for extracted landing data
# DUCKDB_PATH — path to DuckDB lakehouse file # DUCKDB_PATH — path to DuckDB lakehouse (pipeline DB, SQLMesh exclusive)
# SERVING_DUCKDB_PATH — path to serving-only DuckDB (web app reads from here)
# ALERT_WEBHOOK_URL — optional ntfy.sh / Slack / Telegram webhook for failures # ALERT_WEBHOOK_URL — optional ntfy.sh / Slack / Telegram webhook for failures
set -eu set -eu
@@ -37,6 +38,12 @@ do
DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \ DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
uv run --package sqlmesh_padelnomics sqlmesh run --select-model "serving.*" uv run --package sqlmesh_padelnomics sqlmesh run --select-model "serving.*"
# Export serving tables to analytics.duckdb (atomic swap).
# The web app detects the inode change on next query — no restart needed.
DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
SERVING_DUCKDB_PATH="${SERVING_DUCKDB_PATH:-/data/padelnomics/analytics.duckdb}" \
uv run python -m padelnomics.export_serving
) || { ) || {
if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then
curl -s -d "Padelnomics pipeline failed at $(date)" \ curl -s -d "Padelnomics pipeline failed at $(date)" \

View File

@@ -0,0 +1,79 @@
"""
Export serving tables from the pipeline DuckDB to the serving DuckDB (atomic swap).
Called by the supervisor after each SQLMesh transform run. Reads all tables in
the 'serving' schema from the pipeline DB (DUCKDB_PATH), writes them to a temp
file, then atomically renames it to the serving DB path (SERVING_DUCKDB_PATH).
The web app's analytics connection detects the inode change on the next query
and reopens the connection automatically — no restart or signal required.
Why two files?
SQLMesh holds an exclusive write lock on DUCKDB_PATH during plan/run.
The web app needs read-only access at all times. Two separate files allow
both to operate concurrently: SQLMesh writes to the pipeline DB, the web
app reads from the serving DB, and this script swaps them atomically.
The temp file is named _export.duckdb (not serving.duckdb.tmp) because DuckDB
names its catalog after the filename stem. A file named serving.* would create
a catalog named 'serving', which conflicts with the schema named 'serving'
inside the file, making all queries ambiguous.
Usage:
DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \\
uv run python -m padelnomics.export_serving
"""
import logging
import os
import duckdb
logger = logging.getLogger(__name__)
def export_serving() -> None:
"""Copy all serving.* tables from the pipeline DB to the serving DB atomically."""
pipeline_path = os.getenv("DUCKDB_PATH", "")
serving_path = os.getenv("SERVING_DUCKDB_PATH", "")
assert pipeline_path, "DUCKDB_PATH must be set"
assert serving_path, "SERVING_DUCKDB_PATH must be set"
assert os.path.exists(pipeline_path), f"Pipeline DB not found: {pipeline_path}"
# Temp path in the same directory as the serving DB so rename() is atomic
# (rename across filesystems is not atomic on Linux).
tmp_path = os.path.join(os.path.dirname(os.path.abspath(serving_path)), "_export.duckdb")
src = duckdb.connect(pipeline_path, read_only=True)
try:
tables = src.sql(
"SELECT table_name FROM information_schema.tables"
" WHERE table_schema = 'serving' ORDER BY table_name"
).fetchall()
assert tables, f"No tables found in serving schema of {pipeline_path}"
logger.info(f"Exporting {len(tables)} serving tables: {[t[0] for t in tables]}")
dst = duckdb.connect(tmp_path)
try:
dst.execute("CREATE SCHEMA IF NOT EXISTS serving")
for (table,) in tables:
# Read via Arrow to avoid cross-connection catalog ambiguity.
arrow_data = src.sql(f"SELECT * FROM serving.{table}").arrow()
dst.register("_src", arrow_data)
dst.execute(f"CREATE OR REPLACE TABLE serving.{table} AS SELECT * FROM _src")
dst.unregister("_src")
row_count = dst.sql(f"SELECT count(*) FROM serving.{table}").fetchone()[0]
logger.info(f" serving.{table}: {row_count:,} rows")
finally:
dst.close()
finally:
src.close()
# Atomic rename — on Linux, rename() is atomic when src and dst are on the same filesystem.
os.rename(tmp_path, serving_path)
logger.info(f"Serving DB atomically updated: {serving_path}")
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(levelname)s %(message)s")
export_serving()

View File

@@ -21,21 +21,21 @@ uv run sqlmesh -p transform/sqlmesh_padelnomics format
## 4-layer architecture ## 4-layer architecture
``` ```
landing/ <- raw files (extraction output) landing/ raw files (extraction output)
+-- padelnomics/ └── padelnomics/
+-- {year}/{etag}.csv.gz └── {year}/{etag}.csv.gz
raw/ <- reads files verbatim raw/ reads files verbatim
+-- raw.padelnomics └── raw.padelnomics
staging/ <- type casting, deduplication staging/ type casting, deduplication
+-- staging.stg_padelnomics └── staging.stg_padelnomics
foundation/ <- business logic, dimensions, facts foundation/ business logic, dimensions, facts
+-- foundation.dim_category └── foundation.dim_category
serving/ <- pre-aggregated for web app serving/ pre-aggregated for web app
+-- serving.padelnomics_metrics └── serving.padelnomics_metrics
``` ```
### raw/ — verbatim source reads ### raw/ — verbatim source reads

View File

@@ -0,0 +1,20 @@
import os
from sqlmesh import macro
@macro()
def padelnomics_glob(evaluator) -> str:
"""Return a quoted glob path for all padelnomics CSV gz files under LANDING_DIR.
Used in raw models: SELECT * FROM read_csv(@padelnomics_glob(), ...)
The LANDING_DIR variable is read from the SQLMesh config variables block first,
then falls back to the LANDING_DIR environment variable, then to 'data/landing'.
"""
landing_dir = evaluator.var("LANDING_DIR") or os.environ.get("LANDING_DIR", "data/landing")
return f"'{landing_dir}/padelnomics/**/*.csv.gz'"
# Add one macro per landing zone subdirectory you create.
# Pattern: def {source}_glob(evaluator) → f"'{landing_dir}/{source}/**/*.csv.gz'"

View File

@@ -0,0 +1,55 @@
{% extends "base.html" %}
{% block title %}Imprint — {{ config.APP_NAME }}{% endblock %}
{% block head %}
<meta name="description" content="Legal imprint for {{ config.APP_NAME }} — company information and contact details as required by §5 DDG.">
<meta name="robots" content="noindex">
{% endblock %}
{% block content %}
<main class="container-page py-12">
<div class="card max-w-3xl mx-auto">
<h1 class="text-2xl mb-1">Imprint</h1>
<p class="text-sm text-slate mb-8">Legal disclosure pursuant to §5 DDG (Digitale-Dienste-Gesetz)</p>
<div class="space-y-6 text-slate-dark leading-relaxed">
<section>
<h2 class="text-lg mb-2">Service Provider</h2>
<p>
<!-- TODO: Your full name --><br>
<!-- TODO: Your address, city, country -->
</p>
</section>
<section>
<h2 class="text-lg mb-2">Contact</h2>
<p>Email: <a href="mailto:{{ config.EMAIL_FROM }}" class="underline">{{ config.EMAIL_FROM }}</a></p>
</section>
<section>
<h2 class="text-lg mb-2">VAT</h2>
<!-- TODO: choose one of:
Small business owner pursuant to §19 UStG. VAT is not charged and no VAT ID is issued.
OR: VAT identification number: DE...
-->
<p>Small business owner pursuant to §19 UStG (Umsatzsteuergesetz). VAT is not charged and no VAT identification number is issued.</p>
</section>
<section>
<h2 class="text-lg mb-2">Responsible for Content</h2>
<p>
<!-- TODO: Your full name and address (pursuant to §18 Abs. 2 MStV) -->
</p>
</section>
<section>
<h2 class="text-lg mb-2">Disclaimer</h2>
<p>Despite careful content control we assume no liability for the content of external links. The operators of linked pages are solely responsible for their content.</p>
</section>
</div>
</div>
</main>
{% endblock %}