feat: copier update v0.9.0 — extraction docs, state tracking, architecture guides

Sync template from 29ac25b → v0.9.0 (29 template commits). Due to
template's _subdirectory migration, new files were manually rendered
rather than auto-merged by copier.

New files:
- .claude/CLAUDE.md + coding_philosophy.md (agent instructions)
- extract utils.py: SQLite state tracking for extraction runs
- extract/transform READMEs: architecture & pattern documentation
- infra/supervisor: systemd service + orchestration script
- Per-layer model READMEs (raw, staging, foundation, serving)

Also fixes copier-answers.yml (adds 4 feature toggles, removes stale
payment_provider key) and scopes CLAUDE.md gitignore to root only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-22 15:44:48 +01:00
parent b76e87a0b6
commit 18ee24818b
14 changed files with 1084 additions and 2 deletions

View File

@@ -0,0 +1,24 @@
[Unit]
Description=Padelnomics Supervisor — Pipeline Orchestration
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/padelnomics
ExecStart=/opt/padelnomics/infra/supervisor/supervisor.sh
Restart=always
RestartSec=10
EnvironmentFile=/opt/padelnomics/.env
Environment=LANDING_DIR=/data/padelnomics/landing
Environment=DUCKDB_PATH=/data/padelnomics/lakehouse.duckdb
LimitNOFILE=65536
StandardOutput=journal
StandardError=journal
SyslogIdentifier=padelnomics-supervisor
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,47 @@
#!/bin/sh
# Padelnomics Supervisor — continuous pipeline orchestration.
# Inspired by TigerBeetle's CFO supervisor: simple, resilient, easy to understand.
# https://github.com/tigerbeetle/tigerbeetle/blob/main/src/scripts/cfo_supervisor.sh
#
# Environment variables (set in systemd EnvironmentFile or .env):
# LANDING_DIR — local path for extracted landing data
# DUCKDB_PATH — path to DuckDB lakehouse file
# ALERT_WEBHOOK_URL — optional ntfy.sh / Slack / Telegram webhook for failures
set -eu
readonly REPO_DIR="/opt/padelnomics"
while true
do
(
if ! [ -d "$REPO_DIR/.git" ]; then
echo "Repository not found at $REPO_DIR — bootstrap required!"
exit 1
fi
cd "$REPO_DIR"
# Pull latest code
git fetch origin master
git switch --discard-changes --detach origin/master
uv sync
# Extract
LANDING_DIR="${LANDING_DIR:-/data/padelnomics/landing}" \
DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
uv run --package padelnomics_extract extract
# Transform
LANDING_DIR="${LANDING_DIR:-/data/padelnomics/landing}" \
DUCKDB_PATH="${DUCKDB_PATH:-/data/padelnomics/lakehouse.duckdb}" \
uv run --package sqlmesh_padelnomics sqlmesh run --select-model "serving.*"
) || {
if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then
curl -s -d "Padelnomics pipeline failed at $(date)" \
"$ALERT_WEBHOOK_URL" 2>/dev/null || true
fi
sleep 600 # back off 10 min on failure
}
done