Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.
- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 lines
493 B
Plaintext
15 lines
493 B
Plaintext
# Cloudflare R2 remote for landing data backup
|
|
# Copy to /root/.config/rclone/rclone.conf and fill in credentials
|
|
#
|
|
# Get credentials from: Cloudflare Dashboard → R2 → Manage R2 API Tokens
|
|
# Or from Pulumi ESC: esc env open beanflows/prod --format shell
|
|
|
|
[r2]
|
|
type = s3
|
|
provider = Cloudflare
|
|
access_key_id = <R2_ACCESS_KEY_ID>
|
|
secret_access_key = <R2_SECRET_ACCESS_KEY>
|
|
endpoint = https://<CLOUDFLARE_ACCOUNT_ID>.r2.cloudflarestorage.com
|
|
acl = private
|
|
no_check_bucket = true
|