Refactor to local-first architecture on Hetzner NVMe

Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 18:05:41 +01:00
parent 910424c956
commit c1d00dcdc4
25 changed files with 231 additions and 1807 deletions
--- a/infra/bootstrap_supervisor.sh
+++ b/infra/bootstrap_supervisor.sh
@@ -79,6 +79,9 @@ else
    cd "$REPO_DIR"
 fi

+echo "--- Creating data directories ---"
+mkdir -p /data/materia/landing/psd
+
 echo "--- Installing Python dependencies ---"
 uv sync

@@ -88,6 +91,8 @@ cat > "$REPO_DIR/.env" <<EOF
 # Loaded from Pulumi ESC: beanflows/prod
 PULUMI_ACCESS_TOKEN=${PULUMI_ACCESS_TOKEN}
 PATH=/root/.cargo/bin:/root/.pulumi/bin:/usr/local/bin:/usr/bin:/bin
+LANDING_DIR=/data/materia/landing
+DUCKDB_PATH=/data/materia/lakehouse.duckdb
 EOF

 echo "--- Setting up systemd service ---"