Commit Graph

5 Commits

Author SHA1 Message Date
Deeman
c1d00dcdc4 Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.

- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:50:19 +01:00
Deeman
2fff895a73 Simplify supervisor architecture and automate bootstrap
- Simplify supervisor.sh following TigerBeetle pattern
  - Remove complex functions, use simple while loop
  - Add || sleep 600 for resilience against crashes
  - Use git switch --discard-changes for clean updates
  - Run pipelines every hour (SQLMesh handles scheduling)
  - Use POSIX sh instead of bash

- Remove /repo subdirectory nesting
  - Repository clones directly to /opt/materia
  - Simpler paths throughout

- Move systemd service to repo
  - Bootstrap copies from repo instead of hardcoding
  - Service can be updated via git pull

- Automate bootstrap in CI/CD
  - deploy:supervisor now auto-bootstraps on first deploy
  - Waits for SSH to be ready (retry loop)
  - Injects secrets via SSH environment
  - Idempotent: detects if already bootstrapped

Result: Push to master and supervisor "just works"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:17:12 +02:00
Deeman
21f99767bf Use GitLab project access token instead of SSH deploy key
More secure approach:
- Uses HTTPS with token instead of SSH keys
- Token can be rotated without touching infrastructure
- Scoped to read_repository only
- Token stored in Pulumi ESC (beanflows/prod)

Setup:
1. Create project access token in GitLab with read_repository scope
2. Add GITLAB_READ_TOKEN to Pulumi ESC
3. Bootstrap script will use it for git clone/pull
2025-10-13 20:37:28 +02:00
Deeman
f46fd53d38 Update bootstrap script with correct GitLab repo URL 2025-10-13 20:36:08 +02:00
Deeman
558829f70b Refactor to git-based deployment: simplify CI/CD and supervisor
Addresses GitLab PR comments:
1. Remove hardcoded secrets from Pulumi.prod.yaml, use ESC environment
2. Simplify deployment by using git pull instead of R2 artifacts
3. Add bootstrap script for one-time supervisor setup

Major changes:
- **Pulumi config**: Use ESC environment (beanflows/prod) for all secrets
- **Supervisor script**: Git-based deployment (git pull every 15 min)
  * No more artifact downloads from R2
  * Runs code directly via `uv run materia`
  * Self-updating from master branch
- **Bootstrap script**: New infra/bootstrap_supervisor.sh for initial setup
  * One-time script to clone repo and setup systemd service
  * Idempotent and simple
- **CI/CD simplification**: Remove build and R2 deployment stages
  * Eliminated build:extract, build:transform, build:cli jobs
  * Eliminated deploy:r2 job
  * Simplified deploy:supervisor to just check bootstrap status
  * Reduced from 4 stages to 3 stages (Lint → Test → Deploy)
- **Documentation**: Updated CLAUDE.md with new architecture
  * Git-based deployment flow
  * Bootstrap instructions
  * Simplified execution model

Benefits:
-  No hardcoded secrets in config files
-  Simpler deployment (no artifact builds)
-  Easy to test locally (just git clone + uv sync)
-  Auto-updates every 15 minutes
-  Fewer CI/CD jobs (faster pipelines)
-  Cleaner separation of concerns

Inspired by TigerBeetle's CFO supervisor pattern.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 20:31:38 +02:00