Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.
- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -79,6 +79,9 @@ else
|
||||
cd "$REPO_DIR"
|
||||
fi
|
||||
|
||||
echo "--- Creating data directories ---"
|
||||
mkdir -p /data/materia/landing/psd
|
||||
|
||||
echo "--- Installing Python dependencies ---"
|
||||
uv sync
|
||||
|
||||
@@ -88,6 +91,8 @@ cat > "$REPO_DIR/.env" <<EOF
|
||||
# Loaded from Pulumi ESC: beanflows/prod
|
||||
PULUMI_ACCESS_TOKEN=${PULUMI_ACCESS_TOKEN}
|
||||
PATH=/root/.cargo/bin:/root/.pulumi/bin:/usr/local/bin:/usr/bin:/bin
|
||||
LANDING_DIR=/data/materia/landing
|
||||
DUCKDB_PATH=/data/materia/lakehouse.duckdb
|
||||
EOF
|
||||
|
||||
echo "--- Setting up systemd service ---"
|
||||
|
||||
Reference in New Issue
Block a user