Files
beanflows/infra/supervisor/supervisor.sh
Deeman c1d00dcdc4 Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.

- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:50:19 +01:00

38 lines
1.1 KiB
Bash

#!/bin/sh
# Materia Supervisor - Continuous pipeline orchestration
# Inspired by TigerBeetle's CFO supervisor: simple, resilient, easy to understand
# https://github.com/tigerbeetle/tigerbeetle/blob/main/src/scripts/cfo_supervisor.sh
set -eu
readonly REPO_DIR="/opt/materia"
while true
do
(
# Clone repo if missing
if ! [ -d "$REPO_DIR/.git" ]
then
echo "Repository not found, bootstrap required!"
exit 1
fi
cd "$REPO_DIR"
# Update code from git
git fetch origin master
git switch --discard-changes --detach origin/master
uv sync
# Run pipelines
LANDING_DIR="${LANDING_DIR:-/data/materia/landing}" \
DUCKDB_PATH="${DUCKDB_PATH:-/data/materia/lakehouse.duckdb}" \
uv run materia pipeline run extract
LANDING_DIR="${LANDING_DIR:-/data/materia/landing}" \
DUCKDB_PATH="${DUCKDB_PATH:-/data/materia/lakehouse.duckdb}" \
uv run materia pipeline run transform
) || sleep 600 # Sleep 10 min on failure to avoid busy-loop retries
done