- Simplify supervisor.sh following TigerBeetle pattern - Remove complex functions, use simple while loop - Add || sleep 600 for resilience against crashes - Use git switch --discard-changes for clean updates - Run pipelines every hour (SQLMesh handles scheduling) - Use POSIX sh instead of bash - Remove /repo subdirectory nesting - Repository clones directly to /opt/materia - Simpler paths throughout - Move systemd service to repo - Bootstrap copies from repo instead of hardcoding - Service can be updated via git pull - Automate bootstrap in CI/CD - deploy:supervisor now auto-bootstraps on first deploy - Waits for SSH to be ready (retry loop) - Injects secrets via SSH environment - Idempotent: detects if already bootstrapped Result: Push to master and supervisor "just works" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Materia Infrastructure
Pulumi-managed infrastructure for BeanFlows.coffee
Stack Overview
- Storage: Cloudflare R2 buckets with Iceberg Data Catalog
- Compute: Hetzner Cloud CCX dedicated vCPU instances
- Orchestration: Custom Python scheduler (see
src/orchestrator/)
Prerequisites
-
Cloudflare Account
- Sign up at https://dash.cloudflare.com
- Create API token with R2 + Data Catalog permissions
- Get your Account ID from dashboard
-
Hetzner Cloud Account
- Sign up at https://console.hetzner.cloud
- Create API token with Read & Write permissions
-
Pulumi Account (optional, can use local state)
- Sign up at https://app.pulumi.com
- Or use local state with
pulumi login --local
-
SSH Key
- Generate if needed:
ssh-keygen -t ed25519 -C "materia-deploy"
- Generate if needed:
Initial Setup
cd infra
# Login to Pulumi (local or cloud)
pulumi login # or: pulumi login --local
# Initialize the stack
pulumi stack init dev
# Configure secrets
pulumi config set --secret cloudflare:apiToken <your-cloudflare-token>
pulumi config set cloudflare_account_id <your-account-id>
pulumi config set --secret hcloud:token <your-hetzner-token>
pulumi config set --secret ssh_public_key "$(cat ~/.ssh/id_ed25519.pub)"
# Preview changes
pulumi preview
# Deploy infrastructure
pulumi up
What Gets Provisioned
Cloudflare R2 Buckets
- materia-raw - Raw data from extraction (immutable archives)
- materia-lakehouse - Iceberg tables for SQLMesh (ACID transactions)
Hetzner Cloud Servers
-
materia-scheduler (CCX12: 2 vCPU, 8GB RAM)
- Runs cron scheduler
- Lightweight orchestration tasks
- Always-on, low cost (~€6/mo)
-
materia-worker-01 (CCX22: 4 vCPU, 16GB RAM)
- Heavy SQLMesh transformations
- Can be stopped when not in use
- Scale up to CCX32/CCX42 for larger workloads (~€24-90/mo)
-
materia-firewall
- SSH access (port 22)
- All outbound traffic allowed
- No inbound HTTP/HTTPS (we're not running web services yet)
Enabling R2 Data Catalog (Iceberg)
As of October 2025, R2 Data Catalog is in public beta. Enable it manually:
- Go to Cloudflare Dashboard → R2
- Select the
materia-lakehousebucket - Navigate to Settings → Data Catalog
- Click "Enable Data Catalog"
Once enabled, you can connect DuckDB to the Iceberg REST catalog:
import duckdb
# Get catalog URI from Pulumi outputs
# pulumi stack output duckdb_r2_config
conn = duckdb.connect()
conn.execute("INSTALL iceberg; LOAD iceberg;")
conn.execute(f"""
ATTACH 'iceberg_rest://catalog.cloudflarestorage.com/<account_id>/r2-data-catalog'
AS lakehouse (
TYPE ICEBERG_REST,
SECRET '<r2_api_token>'
);
""")
Server Access
Get server IPs from Pulumi outputs:
pulumi stack output scheduler_ip
pulumi stack output worker_ip
SSH into servers:
ssh root@<scheduler_ip>
ssh root@<worker_ip>
Cost Estimates (Monthly)
| Resource | Type | Cost |
|---|---|---|
| R2 Storage | 10 GB | $0.15 |
| R2 Operations | 1M reads | $0.36 |
| R2 Egress | Unlimited | $0.00 (zero egress!) |
| Scheduler | CCX12 | €6.00 |
| Worker (on-demand) | CCX22 | €24.00 |
| Total |
Compare to AWS equivalent: ~$300-500/mo with S3 + EC2 + egress fees.
Scaling Workers
To add more worker capacity or different instance sizes:
- Edit
infra/__main__.pyto add new server resources - Update worker config in
src/orchestrator/workers.yaml - Run
pulumi upto provision
Example worker sizes:
- CCX12: 2 vCPU, 8GB RAM (light workloads)
- CCX22: 4 vCPU, 16GB RAM (medium workloads)
- CCX32: 8 vCPU, 32GB RAM (heavy workloads)
- CCX42: 16 vCPU, 64GB RAM (very heavy workloads)
Destroying Infrastructure
cd infra
pulumi destroy
Warning: This will delete all buckets and servers. Backup data first!
Next Steps
- Deploy orchestrator to scheduler server (see
src/orchestrator/README.md) - Configure SQLMesh to use R2 lakehouse (see
transform/sqlmesh_materia/config.yaml) - Set up CI/CD pipeline to deploy on push (see
.gitlab-ci.yml)