feat: copier update v0.9.0 → v0.10.0

Pulls in template changes: export_serving.py for atomic DuckDB swap, supervisor export step, SQLMesh glob macro, server provisioning script, imprint template, and formatting improvements. Template scaffold SQL models excluded (padelnomics has real models). Web app routes/analytics unchanged (padelnomics-specific customizations). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:50:36 +01:00
parent 59306d5a90
commit ea86940b78
10 changed files with 236 additions and 16 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -5,18 +5,22 @@ This file tells Claude Code how to work in this repository.
 ## Project Overview

 Padelnomics is a SaaS application built with Quart (async Python), HTMX, and SQLite.
+
 It includes a full data pipeline:

 ```
 External APIs → extract → landing zone → SQLMesh transform → DuckDB → web app
 ```

+
 **Packages** (uv workspace):
 - `web/` — Quart + HTMX web application (auth, billing, dashboard)
+
 - `extract/padelnomics_extract/` — data extraction to local landing zone
 - `transform/sqlmesh_padelnomics/` — 4-layer SQL transformation (raw → staging → foundation → serving)
 - `src/padelnomics/` — CLI utilities, export_serving helper

+
 ## Skills: invoke these for domain tasks

 ### Working on extraction or transformation?
@@ -32,6 +36,7 @@ Use the **`data-engineer`** skill for:
 /data-engineer  (or ask Claude to invoke it)
 ```

+
 ### Working on the web app UI or frontend?

 Use the **`frontend-design`** skill for UI components, templates, or dashboard layouts.
@@ -66,6 +71,7 @@ uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
 # Export serving tables (run after SQLMesh)
 DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \
    uv run python -m padelnomics.export_serving
+
 ```

 ## Architecture documentation
@@ -96,6 +102,7 @@ analytics.duckdb            ← serving tables only, web app read-only
 | `DUCKDB_PATH` | `local.duckdb` | SQLMesh pipeline DB (exclusive write) |
 | `SERVING_DUCKDB_PATH` | `analytics.duckdb` | Read-only DB for web app |

+
 ## Coding philosophy

 - **Simple and procedural** — functions over classes, no "Manager" patterns
--- a/.copier-answers.yml
+++ b/.copier-answers.yml
@@ -1,5 +1,5 @@
 # Changes here will be overwritten by Copier; NEVER EDIT MANUALLY
-_commit: v0.9.0
+_commit: v0.10.0
 _src_path: /home/Deeman/Projects/quart_saas_boilerplate
 author_email: ''
 author_name: ''
--- a/extract/padelnomics_extract/README.md
+++ b/extract/padelnomics_extract/README.md
@@ -83,7 +83,7 @@ State table schema:
 ```
 data/landing/
 ├── .state.sqlite              # extraction run history
-└── padelnomics/               # one subdirectory per source
+└── padelnomics/        # one subdirectory per source
    └── {year}/
        └── {month:02d}/
            └── {etag}.csv.gz  # immutable, content-addressed files
--- a/infra/setup_server.sh
+++ b/infra/setup_server.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+# One-time server setup: create app directory and GitLab deploy key.
+# Run as root on a fresh server before deploying.
+#
+# Usage:
+#   bash infra/setup_server.sh
+
+set -euo pipefail
+
+APP_DIR="/opt/padelnomics"
+KEY_PATH="$HOME/.ssh/padelnomics_deploy"
+
+# Create app directory
+mkdir -p "$APP_DIR"
+echo "Created $APP_DIR"
+
+# Generate deploy key if not already present
+if [ ! -f "$KEY_PATH" ]; then
+    mkdir -p "$HOME/.ssh"
+    ssh-keygen -t ed25519 -f "$KEY_PATH" -N "" -C "padelnomics-server"
+    chmod 700 "$HOME/.ssh"
+    chmod 600 "$KEY_PATH"
+    chmod 644 "$KEY_PATH.pub"
+
+    # Configure SSH to use this key for gitlab.com
+    if ! grep -q "# padelnomics" "$HOME/.ssh/config" 2>/dev/null; then
+        cat >> "$HOME/.ssh/config" <<EOF
+
+# padelnomics
+Host gitlab.com
+    IdentityFile $KEY_PATH
+EOF
+        chmod 600 "$HOME/.ssh/config"
+    fi
+
+    echo "Generated deploy key: $KEY_PATH"
+else
+    echo "Deploy key already exists, skipping"
+fi
+
+echo ""
+echo "=== Next steps ==="
+echo "1. Add this deploy key to GitLab (Settings → Repository → Deploy Keys, read-only):"
+echo ""
+cat "$KEY_PATH.pub"
+echo ""
+echo "2. Clone the repo:"
+echo "   git clone git@gitlab.com:YOUR_USER/padelnomics.git $APP_DIR"
+echo ""
+echo "3. Deploy:"
+echo "   cd $APP_DIR && bash deploy.sh"
--- a/infra/supervisor/padelnomics-supervisor.service
+++ b/infra/supervisor/padelnomics-supervisor.service
@@ -13,6 +13,7 @@ RestartSec=10
 EnvironmentFile=/opt/padelnomics/.env
 Environment=LANDING_DIR=/data/padelnomics/landing
 Environment=DUCKDB_PATH=/data/padelnomics/lakehouse.duckdb
+Environment=SERVING_DUCKDB_PATH=/data/padelnomics/analytics.duckdb

 LimitNOFILE=65536

--- a/infra/supervisor/supervisor.sh
+++ b/infra/supervisor/supervisor.sh
@@ -4,9 +4,10 @@
 # https://github.com/tigerbeetle/tigerbeetle/blob/main/src/scripts/cfo_supervisor.sh
 #
 # Environment variables (set in systemd EnvironmentFile or .env):
-#   LANDING_DIR        — local path for extracted landing data
-#   DUCKDB_PATH        — path to DuckDB lakehouse file
-#   ALERT_WEBHOOK_URL  — optional ntfy.sh / Slack / Telegram webhook for failures
+#   LANDING_DIR          — local path for extracted landing data
+#   DUCKDB_PATH          — path to DuckDB lakehouse (pipeline DB, SQLMesh exclusive)
+#   SERVING_DUCKDB_PATH  — path to serving-only DuckDB (web app reads from here)
+#   ALERT_WEBHOOK_URL    — optional ntfy.sh / Slack / Telegram webhook for failures

 set -eu

@@ -37,6 +38,12 @@ do
        DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
            uv run --package sqlmesh_padelnomics sqlmesh run --select-model "serving.*"

+        # Export serving tables to analytics.duckdb (atomic swap).
+        # The web app detects the inode change on next query — no restart needed.
+        DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
+        SERVING_DUCKDB_PATH="${SERVING_DUCKDB_PATH:-/data/padelnomics/analytics.duckdb}" \
+            uv run python -m padelnomics.export_serving
+
    ) || {
        if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then
            curl -s -d "Padelnomics pipeline failed at $(date)" \
--- a/src/padelnomics/export_serving.py
+++ b/src/padelnomics/export_serving.py
@@ -0,0 +1,79 @@
+"""
+Export serving tables from the pipeline DuckDB to the serving DuckDB (atomic swap).
+
+Called by the supervisor after each SQLMesh transform run. Reads all tables in
+the 'serving' schema from the pipeline DB (DUCKDB_PATH), writes them to a temp
+file, then atomically renames it to the serving DB path (SERVING_DUCKDB_PATH).
+
+The web app's analytics connection detects the inode change on the next query
+and reopens the connection automatically — no restart or signal required.
+
+Why two files?
+  SQLMesh holds an exclusive write lock on DUCKDB_PATH during plan/run.
+  The web app needs read-only access at all times. Two separate files allow
+  both to operate concurrently: SQLMesh writes to the pipeline DB, the web
+  app reads from the serving DB, and this script swaps them atomically.
+
+The temp file is named _export.duckdb (not serving.duckdb.tmp) because DuckDB
+names its catalog after the filename stem. A file named serving.* would create
+a catalog named 'serving', which conflicts with the schema named 'serving'
+inside the file, making all queries ambiguous.
+
+Usage:
+    DUCKDB_PATH=local.duckdb SERVING_DUCKDB_PATH=analytics.duckdb \\
+        uv run python -m padelnomics.export_serving
+"""
+
+import logging
+import os
+
+import duckdb
+
+logger = logging.getLogger(__name__)
+
+
+def export_serving() -> None:
+    """Copy all serving.* tables from the pipeline DB to the serving DB atomically."""
+    pipeline_path = os.getenv("DUCKDB_PATH", "")
+    serving_path = os.getenv("SERVING_DUCKDB_PATH", "")
+    assert pipeline_path, "DUCKDB_PATH must be set"
+    assert serving_path, "SERVING_DUCKDB_PATH must be set"
+    assert os.path.exists(pipeline_path), f"Pipeline DB not found: {pipeline_path}"
+
+    # Temp path in the same directory as the serving DB so rename() is atomic
+    # (rename across filesystems is not atomic on Linux).
+    tmp_path = os.path.join(os.path.dirname(os.path.abspath(serving_path)), "_export.duckdb")
+
+    src = duckdb.connect(pipeline_path, read_only=True)
+    try:
+        tables = src.sql(
+            "SELECT table_name FROM information_schema.tables"
+            " WHERE table_schema = 'serving' ORDER BY table_name"
+        ).fetchall()
+        assert tables, f"No tables found in serving schema of {pipeline_path}"
+        logger.info(f"Exporting {len(tables)} serving tables: {[t[0] for t in tables]}")
+
+        dst = duckdb.connect(tmp_path)
+        try:
+            dst.execute("CREATE SCHEMA IF NOT EXISTS serving")
+            for (table,) in tables:
+                # Read via Arrow to avoid cross-connection catalog ambiguity.
+                arrow_data = src.sql(f"SELECT * FROM serving.{table}").arrow()
+                dst.register("_src", arrow_data)
+                dst.execute(f"CREATE OR REPLACE TABLE serving.{table} AS SELECT * FROM _src")
+                dst.unregister("_src")
+                row_count = dst.sql(f"SELECT count(*) FROM serving.{table}").fetchone()[0]
+                logger.info(f"  serving.{table}: {row_count:,} rows")
+        finally:
+            dst.close()
+    finally:
+        src.close()
+
+    # Atomic rename — on Linux, rename() is atomic when src and dst are on the same filesystem.
+    os.rename(tmp_path, serving_path)
+    logger.info(f"Serving DB atomically updated: {serving_path}")
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s %(levelname)s %(message)s")
+    export_serving()
--- a/transform/sqlmesh_padelnomics/README.md
+++ b/transform/sqlmesh_padelnomics/README.md
@@ -21,21 +21,21 @@ uv run sqlmesh -p transform/sqlmesh_padelnomics format
 ## 4-layer architecture

 ```
-landing/                    <- raw files (extraction output)
-  +-- padelnomics/
-      +-- {year}/{etag}.csv.gz
+landing/                    ← raw files (extraction output)
+  └── padelnomics/
+      └── {year}/{etag}.csv.gz

-raw/                        <- reads files verbatim
-  +-- raw.padelnomics
+raw/                        ← reads files verbatim
+  └── raw.padelnomics

-staging/                    <- type casting, deduplication
-  +-- staging.stg_padelnomics
+staging/                    ← type casting, deduplication
+  └── staging.stg_padelnomics

-foundation/                 <- business logic, dimensions, facts
-  +-- foundation.dim_category
+foundation/                 ← business logic, dimensions, facts
+  └── foundation.dim_category

-serving/                    <- pre-aggregated for web app
-  +-- serving.padelnomics_metrics
+serving/                    ← pre-aggregated for web app
+  └── serving.padelnomics_metrics
 ```

 ### raw/ — verbatim source reads
--- a/transform/sqlmesh_padelnomics/macros/init.py
+++ b/transform/sqlmesh_padelnomics/macros/init.py
@@ -0,0 +1,20 @@
+import os
+
+from sqlmesh import macro
+
+
+@macro()
+def padelnomics_glob(evaluator) -> str:
+    """Return a quoted glob path for all padelnomics CSV gz files under LANDING_DIR.
+
+    Used in raw models: SELECT * FROM read_csv(@padelnomics_glob(), ...)
+
+    The LANDING_DIR variable is read from the SQLMesh config variables block first,
+    then falls back to the LANDING_DIR environment variable, then to 'data/landing'.
+    """
+    landing_dir = evaluator.var("LANDING_DIR") or os.environ.get("LANDING_DIR", "data/landing")
+    return f"'{landing_dir}/padelnomics/**/*.csv.gz'"
+
+
+# Add one macro per landing zone subdirectory you create.
+# Pattern: def {source}_glob(evaluator) → f"'{landing_dir}/{source}/**/*.csv.gz'"
--- a/web/src/padelnomics/public/templates/imprint.html
+++ b/web/src/padelnomics/public/templates/imprint.html
@@ -0,0 +1,55 @@
+{% extends "base.html" %}
+
+{% block title %}Imprint — {{ config.APP_NAME }}{% endblock %}
+
+{% block head %}
+<meta name="description" content="Legal imprint for {{ config.APP_NAME }} — company information and contact details as required by §5 DDG.">
+<meta name="robots" content="noindex">
+{% endblock %}
+
+{% block content %}
+<main class="container-page py-12">
+  <div class="card max-w-3xl mx-auto">
+    <h1 class="text-2xl mb-1">Imprint</h1>
+    <p class="text-sm text-slate mb-8">Legal disclosure pursuant to §5 DDG (Digitale-Dienste-Gesetz)</p>
+
+    <div class="space-y-6 text-slate-dark leading-relaxed">
+
+      <section>
+        <h2 class="text-lg mb-2">Service Provider</h2>
+        <p>
+          <!-- TODO: Your full name --><br>
+          <!-- TODO: Your address, city, country -->
+        </p>
+      </section>
+
+      <section>
+        <h2 class="text-lg mb-2">Contact</h2>
+        <p>Email: <a href="mailto:{{ config.EMAIL_FROM }}" class="underline">{{ config.EMAIL_FROM }}</a></p>
+      </section>
+
+      <section>
+        <h2 class="text-lg mb-2">VAT</h2>
+        <!-- TODO: choose one of:
+          Small business owner pursuant to §19 UStG. VAT is not charged and no VAT ID is issued.
+          OR: VAT identification number: DE...
+        -->
+        <p>Small business owner pursuant to §19 UStG (Umsatzsteuergesetz). VAT is not charged and no VAT identification number is issued.</p>
+      </section>
+
+      <section>
+        <h2 class="text-lg mb-2">Responsible for Content</h2>
+        <p>
+          <!-- TODO: Your full name and address (pursuant to §18 Abs. 2 MStV) -->
+        </p>
+      </section>
+
+      <section>
+        <h2 class="text-lg mb-2">Disclaimer</h2>
+        <p>Despite careful content control we assume no liability for the content of external links. The operators of linked pages are solely responsible for their content.</p>
+      </section>
+
+    </div>
+  </div>
+</main>
+{% endblock %}