beanflows

Author	SHA1	Message	Date
Deeman	0317cb885f	feat(infra): use beanflows_service for supervisor - materia-supervisor.service: User=root → User=beanflows_service, add PATH so uv (~/.local/bin) is found without a login shell - setup_server.sh: full rewrite — creates beanflows_service (nologin), generates SSH deploy key + age keypair as service user at XDG path (~/.config/sops/age/keys.txt), installs age/sops/rclone as root, prints both public keys + numbered next-step instructions - bootstrap_supervisor.sh: full rewrite — removes GITLAB_READ_TOKEN requirement, clones via SSH as service user, installs uv as service user, decrypts with SOPS auto-discovery, uv sync as service user, systemctl as root - web/deploy.sh: remove self-contained sops/age install + keypair generation; replace with simple sops check (exit if missing) and SOPS auto-discovery decrypt (no explicit key file needed) - infra/readme.md: update architecture diagram for beanflows_service paths, update setup steps to match new scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-26 21:33:31 +01:00
Deeman	5d7d53a260	feat(supervisor): port Python supervisor from padelnomics + workflows.toml Port padelnomics' schedule-aware Python supervisor to materia: - src/materia/supervisor.py — croniter scheduling, topological wave execution (parallel independent workflows), tag-based git pull + deploy, status CLI subcommand - infra/supervisor/workflows.toml — workflow registry (psd daily, cot weekly, prices daily, ice daily, weather daily) - infra/supervisor/materia-supervisor.service — updated ExecStart to Python supervisor, added SUPERVISOR_GIT_PULL=1 Adaptations from padelnomics: - Uses extract_core.state.open_state_db (not padelnomics_extract.utils) - uv run sqlmesh -p transform/sqlmesh_materia run - uv run materia pipeline run export_serving - web/deploy.sh path (materia's deploy.sh is under web/) - Removed proxy_mode (not used in materia) Also: add croniter dependency to src/materia, delete old supervisor.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:59:55 +01:00
Deeman	9ee7a3d9d3	fix: export_serving — Arrow-based copy, rename to analytics.duckdb Two bugs fixed: 1. Cross-connection COPY: DuckDB doesn't support referencing another connection's tables as src.serving.table. Replace with Arrow as intermediate: src reads to Arrow, dst.register() + CREATE TABLE. 2. Catalog/schema name collision: naming the export file serving.duckdb made DuckDB assign catalog name "serving" — same as the schema we create inside it. Every serving.table query became ambiguous. Rename to analytics.duckdb (catalog "analytics", schema "serving" = no clash). SERVING_DUCKDB_PATH values updated: serving.duckdb → analytics.duckdb in supervisor, service, bootstrap, dev_run.sh, .env.example, docker-compose. 3. Temp file: use _export.duckdb (not serving.duckdb.tmp) to avoid the same catalog collision during the write phase. Verified: 6 tables exported, serving.* queries work read-only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 12:54:39 +01:00
Deeman	b899bcbad4	feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention Split the single lakehouse.duckdb into two files to eliminate the exclusive write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader): lakehouse.duckdb — SQLMesh exclusive (all pipeline layers) serving.duckdb — web app reads (serving tables only, atomically swapped) Changes: web/src/beanflows/analytics.py - Replace persistent global _conn with per-thread connections (threading.local) - Add _get_conn(): opens read_only=True on first call per thread, reopens automatically on inode change (~1μs os.stat) to pick up atomic file swaps - Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH - Add module docstring documenting architecture + DuckLake migration path web/src/beanflows/app.py - Startup check: use SERVING_DUCKDB_PATH - Health check: use _db_path instead of _conn src/materia/export_serving.py (new) - Reads all serving.* tables from lakehouse.duckdb (read_only) - Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic) - ~50 lines; runs after each SQLMesh transform src/materia/pipelines.py - Add export_serving pipeline entry (uv run python -c ...) infra/supervisor/supervisor.sh - Add SERVING_DUCKDB_PATH env var comment - Add export step: uv run materia pipeline run export_serving infra/supervisor/materia-supervisor.service - Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb infra/bootstrap_supervisor.sh - Add SERVING_DUCKDB_PATH to .env template web/.env.example + web/docker-compose.yml - Document both env vars; switch web service to SERVING_DUCKDB_PATH web/src/beanflows/dashboard/templates/settings.html - Minor settings page fix from prior session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 11:06:55 +01:00
Deeman	67c048485b	Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation Phase 1A — KC=F Coffee Futures Prices: - New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV, stores as gzip CSV with SHA256-based idempotency - SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices → serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %) - Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA) - API: GET /commodities/<ticker>/prices Phase 1B — Data Methodology Page: - New /methodology route with full-page template (base.html) - 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks, data quality model, update schedule table - "Methodology" link added to marketing footer Phase 1C — Automated Pipeline: - supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in sequence before transform - Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram) ICE Warehouse Stocks: - New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV to canonical schema, hash-based idempotency, soft-fail on 404 with guidance - SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown) - Dashboard: 4 metric cards + line chart (certified bags + 30d avg) - API: GET /commodities/<code>/stocks Foundation: - dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns - macros/__init__.py: added prices_glob() and ice_stocks_glob() - pipelines.py: added extract_prices and extract_ice entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 11:41:43 +01:00
Deeman	c1d00dcdc4	Refactor to local-first architecture on Hetzner NVMe Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 19:50:19 +01:00
Deeman	6d4377ccf9	cleanup and prefect service setup	2026-02-04 22:24:55 +01:00
Deeman	2fff895a73	Simplify supervisor architecture and automate bootstrap - Simplify supervisor.sh following TigerBeetle pattern - Remove complex functions, use simple while loop - Add \|\| sleep 600 for resilience against crashes - Use git switch --discard-changes for clean updates - Run pipelines every hour (SQLMesh handles scheduling) - Use POSIX sh instead of bash - Remove /repo subdirectory nesting - Repository clones directly to /opt/materia - Simpler paths throughout - Move systemd service to repo - Bootstrap copies from repo instead of hardcoding - Service can be updated via git pull - Automate bootstrap in CI/CD - deploy:supervisor now auto-bootstraps on first deploy - Waits for SSH to be ready (retry loop) - Injects secrets via SSH environment - Idempotent: detects if already bootstrapped Result: Push to master and supervisor "just works" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 21:17:12 +02:00
Deeman	558829f70b	Refactor to git-based deployment: simplify CI/CD and supervisor Addresses GitLab PR comments: 1. Remove hardcoded secrets from Pulumi.prod.yaml, use ESC environment 2. Simplify deployment by using git pull instead of R2 artifacts 3. Add bootstrap script for one-time supervisor setup Major changes: - Pulumi config: Use ESC environment (beanflows/prod) for all secrets - Supervisor script: Git-based deployment (git pull every 15 min) * No more artifact downloads from R2 * Runs code directly via `uv run materia` * Self-updating from master branch - Bootstrap script: New infra/bootstrap_supervisor.sh for initial setup * One-time script to clone repo and setup systemd service * Idempotent and simple - CI/CD simplification: Remove build and R2 deployment stages * Eliminated build:extract, build:transform, build:cli jobs * Eliminated deploy:r2 job * Simplified deploy:supervisor to just check bootstrap status * Reduced from 4 stages to 3 stages (Lint → Test → Deploy) - Documentation: Updated CLAUDE.md with new architecture * Git-based deployment flow * Bootstrap instructions * Simplified execution model Benefits: - ✅ No hardcoded secrets in config files - ✅ Simpler deployment (no artifact builds) - ✅ Easy to test locally (just git clone + uv sync) - ✅ Auto-updates every 15 minutes - ✅ Fewer CI/CD jobs (faster pipelines) - ✅ Cleaner separation of concerns Inspired by TigerBeetle's CFO supervisor pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-13 20:31:38 +02:00
Deeman	f207fb441d	Add supervisor deployment with continuous pipeline orchestration Implements automated supervisor instance deployment that runs scheduled pipelines using a TigerBeetle-inspired continuous orchestration pattern. Infrastructure changes: - Update Pulumi to use existing R2 buckets (beanflows-artifacts, beanflows-data-prod) - Rename scheduler → supervisor, optimize to CCX11 (€4/mo) - Remove always-on worker (workers are now ephemeral only) - Add artifacts bucket resource for CLI/pipeline packages Supervisor architecture: - supervisor.sh: Continuous loop checking schedules every 15 minutes - Self-updating: Checks for new CLI versions hourly - Fixed schedules: Extract at 2 AM UTC, Transform at 3 AM UTC - systemd service for automatic restart on failure - Logs to systemd journal for observability CI/CD changes: - deploy:infra now runs on every master push (not just on changes) - New deploy:supervisor job: * Deploys supervisor.sh and systemd service * Installs latest materia CLI from R2 * Configures environment with Pulumi ESC secrets * Restarts supervisor service Future enhancements documented: - SQLMesh-aware scheduling (check models before running) - Model tags for worker sizing (heavy/distributed hints) - Multi-pipeline support, distributed execution - Cost optimization with multi-cloud spot pricing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-12 22:23:55 +02:00

10 Commits