Refactor to git-based deployment: simplify CI/CD and supervisor
Addresses GitLab PR comments: 1. Remove hardcoded secrets from Pulumi.prod.yaml, use ESC environment 2. Simplify deployment by using git pull instead of R2 artifacts 3. Add bootstrap script for one-time supervisor setup Major changes: - **Pulumi config**: Use ESC environment (beanflows/prod) for all secrets - **Supervisor script**: Git-based deployment (git pull every 15 min) * No more artifact downloads from R2 * Runs code directly via `uv run materia` * Self-updating from master branch - **Bootstrap script**: New infra/bootstrap_supervisor.sh for initial setup * One-time script to clone repo and setup systemd service * Idempotent and simple - **CI/CD simplification**: Remove build and R2 deployment stages * Eliminated build:extract, build:transform, build:cli jobs * Eliminated deploy:r2 job * Simplified deploy:supervisor to just check bootstrap status * Reduced from 4 stages to 3 stages (Lint → Test → Deploy) - **Documentation**: Updated CLAUDE.md with new architecture * Git-based deployment flow * Bootstrap instructions * Simplified execution model Benefits: - ✅ No hardcoded secrets in config files - ✅ Simpler deployment (no artifact builds) - ✅ Easy to test locally (just git clone + uv sync) - ✅ Auto-updates every 15 minutes - ✅ Fewer CI/CD jobs (faster pipelines) - ✅ Cleaner separation of concerns Inspired by TigerBeetle's CFO supervisor pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
86
CLAUDE.md
86
CLAUDE.md
@@ -168,11 +168,11 @@ pytest --cov=./ --cov-report=xml
|
||||
|
||||
### CI/CD Pipeline (`.gitlab-ci.yml`)
|
||||
|
||||
**4 Stages: Lint → Test → Build → Deploy**
|
||||
**3 Stages: Lint → Test → Deploy**
|
||||
|
||||
#### 1. Lint Stage
|
||||
- Runs `ruff check` and `ruff format --check`
|
||||
- Validates code quality on every commit
|
||||
- Runs `ruff check` on every commit
|
||||
- Validates code quality
|
||||
|
||||
#### 2. Test Stage
|
||||
- **`test:cli`**: Runs pytest on materia CLI with 71% coverage
|
||||
@@ -182,53 +182,51 @@ pytest --cov=./ --cov-report=xml
|
||||
- Exports coverage reports to GitLab
|
||||
- **`test:sqlmesh`**: Runs SQLMesh model tests in transform layer
|
||||
|
||||
#### 3. Build Stage (only on master branch)
|
||||
Creates separate artifacts for each workspace package:
|
||||
- **`build:extract`**: Builds `materia-extract-latest.tar.gz` (psdonline package)
|
||||
- **`build:transform`**: Builds `materia-transform-latest.tar.gz` (sqlmesh_materia package)
|
||||
- **`build:cli`**: Builds `materia-cli-latest.tar.gz` (materia management CLI)
|
||||
|
||||
Each artifact is a self-contained tarball with all dependencies.
|
||||
|
||||
#### 4. Deploy Stage (only on master branch)
|
||||
- **`deploy:r2`**: Uploads artifacts to Cloudflare R2 using rclone
|
||||
- Loads secrets from Pulumi ESC (`beanflows/prod`)
|
||||
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
|
||||
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
|
||||
#### 3. Deploy Stage (only on master branch)
|
||||
- **`deploy:infra`**: Runs `pulumi up` to ensure supervisor instance exists
|
||||
- Runs on every master push (not just on infra changes)
|
||||
- Creates/updates Hetzner CCX11 supervisor instance
|
||||
- Configures Cloudflare R2 buckets (`beanflows-artifacts`, `beanflows-data-prod`)
|
||||
- **`deploy:supervisor`**: Deploys supervisor script and materia CLI
|
||||
- Runs after `deploy:r2` and `deploy:infra`
|
||||
- Copies `supervisor.sh` and systemd service to supervisor instance
|
||||
- Downloads and installs latest materia CLI from R2
|
||||
- Restarts supervisor service to pick up changes
|
||||
- Runs on every master push
|
||||
- Creates/updates Hetzner CPX11 supervisor instance (~€4.49/mo)
|
||||
- Uses Pulumi ESC (`beanflows/prod`) for all secrets
|
||||
- **`deploy:supervisor`**: Checks supervisor status
|
||||
- Verifies supervisor is bootstrapped
|
||||
- Supervisor auto-updates via `git pull` every 15 minutes (no CI/CD deployment needed)
|
||||
|
||||
### Production Architecture: Ephemeral Worker Model
|
||||
**Note:** No build artifacts! Supervisor pulls code directly from git and runs via `uv`.
|
||||
|
||||
### Production Architecture: Git-Based Deployment with Ephemeral Workers
|
||||
|
||||
**Design Philosophy:**
|
||||
- No always-on workers (cost optimization)
|
||||
- Supervisor instance dynamically creates/destroys workers on-demand
|
||||
- Language-agnostic artifacts enable future migration to C/Rust/Go
|
||||
- Supervisor pulls latest code from git (no artifact builds)
|
||||
- Supervisor dynamically creates/destroys workers on-demand
|
||||
- Simple, inspectable, easy to test locally
|
||||
- Multi-cloud abstraction for pricing optimization
|
||||
|
||||
**Components:**
|
||||
|
||||
#### 1. Supervisor Instance (Small Hetzner VM)
|
||||
- Runs `supervisor.sh` - continuous orchestration loop (inspired by TigerBeetle's CFO supervisor)
|
||||
- Hetzner CCX11: 2 vCPU, 4GB RAM (~€4/mo)
|
||||
- Hetzner CPX11: 2 vCPU (shared), 2GB RAM (~€4.49/mo)
|
||||
- Always-on, minimal resource usage
|
||||
- Checks for new CLI versions every hour (self-updating)
|
||||
- Git-based deployment: `git pull` every 15 minutes for auto-updates
|
||||
- Runs pipelines on schedule:
|
||||
- Extract: Daily at 2 AM UTC
|
||||
- Transform: Daily at 3 AM UTC
|
||||
- Uses systemd service for automatic restart on failure
|
||||
- Pulls secrets from Pulumi ESC and passes to workers
|
||||
- Pulls secrets from Pulumi ESC
|
||||
|
||||
**Bootstrap (one-time):**
|
||||
```bash
|
||||
# Get supervisor IP from Pulumi
|
||||
cd infra && pulumi stack output supervisor_ip -s prod
|
||||
|
||||
# Run bootstrap script
|
||||
export PULUMI_ACCESS_TOKEN=<your-token>
|
||||
ssh root@<supervisor-ip> 'bash -s' < infra/bootstrap_supervisor.sh
|
||||
```
|
||||
|
||||
#### 2. Ephemeral Workers (On-Demand)
|
||||
- Created for each pipeline execution
|
||||
- Downloads pre-built artifacts from R2 (no git, no uv on worker)
|
||||
- Created for each pipeline execution by materia CLI
|
||||
- Receives secrets via SSH environment variable injection
|
||||
- Destroyed immediately after job completion
|
||||
- Different instance types per pipeline:
|
||||
@@ -239,18 +237,20 @@ Each artifact is a self-contained tarball with all dependencies.
|
||||
```
|
||||
Pulumi ESC (beanflows/prod)
|
||||
↓
|
||||
Supervisor Instance (materia CLI)
|
||||
Supervisor Instance (via esc CLI)
|
||||
↓
|
||||
Workers (injected as env vars via SSH)
|
||||
```
|
||||
|
||||
#### 4. Artifact Flow
|
||||
#### 4. Code Deployment Flow
|
||||
```
|
||||
GitLab CI: uv build → tar.gz
|
||||
GitLab (master branch)
|
||||
↓
|
||||
Cloudflare R2 (artifact storage)
|
||||
Supervisor: git pull origin master (every 15 min)
|
||||
↓
|
||||
Worker: curl → extract → execute
|
||||
Supervisor: uv sync (update dependencies)
|
||||
↓
|
||||
Supervisor: uv run materia pipeline run <pipeline>
|
||||
```
|
||||
|
||||
#### 5. Data Storage
|
||||
@@ -261,12 +261,12 @@ Worker: curl → extract → execute
|
||||
|
||||
**Execution Flow:**
|
||||
1. Supervisor loop wakes up every 15 minutes
|
||||
2. Checks if current time matches pipeline schedule (e.g., 2 AM for extract)
|
||||
3. Checks for CLI updates (hourly) and self-updates if needed
|
||||
4. CLI runs: `materia pipeline run extract`
|
||||
5. Creates Hetzner worker with SSH key
|
||||
6. Worker downloads `materia-extract-latest.tar.gz` from R2
|
||||
7. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
|
||||
2. Runs `git fetch` and checks if new commits on master
|
||||
3. If updates available: `git pull && uv sync`
|
||||
4. Checks if current time matches pipeline schedule (e.g., 2 AM for extract)
|
||||
5. If scheduled: `uv run materia pipeline run extract`
|
||||
6. CLI creates Hetzner worker with SSH key
|
||||
7. CLI injects secrets via SSH and executes pipeline
|
||||
8. Pipeline executes, writes to R2 Iceberg catalog
|
||||
9. Worker destroyed (entire lifecycle ~5-10 minutes)
|
||||
10. Supervisor logs results and continues loop
|
||||
|
||||
Reference in New Issue
Block a user