Files
beanflows/infra
Deeman d14990bb01
Some checks failed
CI / test-cli (push) Failing after 5s
CI / test-sqlmesh (push) Failing after 4s
CI / test-web (push) Failing after 5s
CI / tag (push) Has been skipped
refactor: rename materia → beanflows throughout codebase
- Rename src/materia/ → src/beanflows/ (Python package)
- Rename transform/sqlmesh_materia/ → transform/sqlmesh_beanflows/
- Rename infra/supervisor/materia-supervisor.service → beanflows-supervisor.service
- Rename infra/backup/materia-backup.{service,timer} → beanflows-backup.{service,timer}
- Update all path strings: /opt/materia → /opt/beanflows, /data/materia → /data/beanflows
- Update pyproject.toml: project name, CLI entrypoint, workspace source key
- Update all internal imports from materia.* → beanflows.*
- Update infra scripts: REPO_DIR, service names, systemctl references
- Fix docker-compose.prod.yml: /data/materia → /data/beanflows (bind mount path)

Intentionally left unchanged: Pulumi stack name (materia-infrastructure) and
Hetzner resource names ("materia-key", "managed_by: materia") — these reference
live cloud infrastructure and require separate cloud-side renames.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 23:00:52 +01:00
..
2026-02-05 20:01:50 +01:00

Materia Infrastructure

Single-server local-first setup for BeanFlows.coffee on Hetzner NVMe.

Architecture

Hetzner Server (NVMe)
├── beanflows_service (system user, nologin)
│   ├── ~/.ssh/beanflows_deploy         # ed25519 deploy key for Gitea read access
│   └── ~/.config/sops/age/keys.txt   # age keypair (auto-discovered by SOPS)
├── /opt/materia/                      # Git repo (owned by beanflows_service, latest release tag)
├── /opt/materia/.env                  # Decrypted from .env.prod.sops at deploy time
├── /data/materia/landing/             # Extracted raw data (immutable, content-addressed)
├── /data/materia/lakehouse.duckdb     # SQLMesh exclusive write
├── /data/materia/analytics.duckdb    # Read-only serving copy for web app
└── systemd services:
    ├── materia-supervisor             # Python supervisor: extract → transform → export → deploy
    └── materia-backup.timer          # rclone: syncs landing/ to R2 every 6 hours

Data Flow

  1. Extract — Supervisor runs due extractors per infra/supervisor/workflows.toml
  2. Transform — SQLMesh reads landing → writes lakehouse.duckdb
  3. Exportexport_serving copies serving.*analytics.duckdb (atomic rename)
  4. Backup — rclone syncs /data/materia/landing/ → R2 backup/materia/landing/
  5. Web — Web app reads analytics.duckdb read-only (per-thread connections)

Setup (new server)

1. Run setup_server.sh

bash infra/setup_server.sh

This creates the beanflows_service user, data directories, installs all tools (git, curl, age, sops, rclone, uv), generates an ed25519 SSH deploy key and an age keypair (both as the service user). It prints both public keys.

2. Add keys to Gitea and SOPS

# Add the SSH deploy key to Gitea:
#   → git.padelnomics.io → beanflows repo → Settings → Deploy Keys → Add key (read-only)

# Add the server age public key to .sops.yaml on your workstation,
# then re-encrypt prod secrets to include the server key:
sops updatekeys .env.prod.sops
git add .sops.yaml .env.prod.sops
git commit -m "chore: add server age key"
git push

3. Bootstrap the supervisor

ssh root@<server_ip> 'bash -s' < infra/bootstrap_supervisor.sh

This clones the repo via SSH, decrypts secrets, installs Python dependencies, and starts the supervisor service. No access tokens required — access is via the SSH deploy key. (All tools must already be installed by setup_server.sh.)

If R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, and R2_ENDPOINT are present in .env.prod.sops, bootstrap also generates rclone.conf and enables materia-backup.timer automatically. No manual R2 setup step needed.

Secrets management

Secrets are stored as SOPS-encrypted dotenv files in the repo root:

File Purpose
.env.dev.sops Dev defaults (safe values, local paths)
.env.prod.sops Production secrets
.sops.yaml Maps file patterns to age public keys
# Decrypt for local dev
make secrets-decrypt-dev

# Edit prod secrets
make secrets-edit-prod

bootstrap_supervisor.sh decrypts .env.prod.sops/opt/materia/.env during setup. web/deploy.sh re-decrypts on every deploy (so secret rotations take effect automatically). SOPS auto-discovers the service user's age key at ~/.config/sops/age/keys.txt (XDG default).

Deploy model (pull-based)

No SSH keys or deploy credentials in CI.

  1. CI runs tests (test-cli, test-sqlmesh, test-web)
  2. On master, CI creates tag v${github.run_number} using built-in github.token
  3. Supervisor polls for new tags every 60s
  4. When a new tag appears: git checkout --detach <tag> + uv sync --all-packages
  5. If web/ files changed: ./web/deploy.sh (Docker blue/green + health check)

Monitoring

# Supervisor status and logs
systemctl status materia-supervisor
journalctl -u materia-supervisor -f

# Workflow status table
cd /opt/materia && sudo -u beanflows_service uv run python src/materia/supervisor.py status

# Backup timer status
systemctl list-timers materia-backup.timer
journalctl -u materia-backup -f

# Extraction state DB
sqlite3 /data/materia/landing/.state.sqlite \
  "SELECT extractor, status, finished_at FROM extraction_runs ORDER BY run_id DESC LIMIT 20"

Pulumi IaC

Still manages Cloudflare R2 buckets:

cd infra
pulumi login
pulumi stack select prod
pulumi up

Cost

Resource Type Cost
Hetzner Server CCX22 (4 vCPU, 16GB) ~€24/mo
R2 Storage Backup (~10 GB) $0.15/mo
R2 Egress Zero $0.00
Total €24/mo ($26)