Commit Graph

227 Commits

Author SHA1 Message Date
Hendrik Dreesmann
70854394c3 Merge branch 'feature/supervisor-deployment' into 'master'
Add supervisor deployment with continuous pipeline orchestration

See merge request deemanone/materia!7
2025-10-13 21:51:05 +02:00
Deeman
d2352c1876 Simplify SQLMesh to use single prod gateway with virtual environments
- Remove dev gateway (local DuckDB file no longer needed)
- Single prod gateway connects to R2 Iceberg catalog
- Use virtual environments for dev isolation (e.g., dev_<username>)
- Update CLAUDE.md with new workflow and environment strategy
- Create comprehensive transform/sqlmesh_materia/README.md

Benefits:
- Simpler configuration (one gateway instead of two)
- All environments use same R2 Iceberg catalog
- SQLMesh handles environment isolation automatically
- No need to maintain local 13GB materia_dev.db file
- before_all hooks only run for prod gateway (no conditional logic needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:47:04 +02:00
Deeman
6536724e00 Fix SQLMesh config: remove invalid init_script parameter
- Remove init_script from DuckDB connection config (not a valid parameter)
- Move R2 Iceberg catalog initialization to before_all hooks
- Hooks run before sqlmesh plan/run commands
- Uses SQLMesh @env_var() macro syntax for environment variables

Fixes CI/CD error: 'invalid duckdb connection config: invalid field init_script'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:31:56 +02:00
Deeman
2fff895a73 Simplify supervisor architecture and automate bootstrap
- Simplify supervisor.sh following TigerBeetle pattern
  - Remove complex functions, use simple while loop
  - Add || sleep 600 for resilience against crashes
  - Use git switch --discard-changes for clean updates
  - Run pipelines every hour (SQLMesh handles scheduling)
  - Use POSIX sh instead of bash

- Remove /repo subdirectory nesting
  - Repository clones directly to /opt/materia
  - Simpler paths throughout

- Move systemd service to repo
  - Bootstrap copies from repo instead of hardcoding
  - Service can be updated via git pull

- Automate bootstrap in CI/CD
  - deploy:supervisor now auto-bootstraps on first deploy
  - Waits for SSH to be ready (retry loop)
  - Injects secrets via SSH environment
  - Idempotent: detects if already bootstrapped

Result: Push to master and supervisor "just works"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:17:12 +02:00
Deeman
21f99767bf Use GitLab project access token instead of SSH deploy key
More secure approach:
- Uses HTTPS with token instead of SSH keys
- Token can be rotated without touching infrastructure
- Scoped to read_repository only
- Token stored in Pulumi ESC (beanflows/prod)

Setup:
1. Create project access token in GitLab with read_repository scope
2. Add GITLAB_READ_TOKEN to Pulumi ESC
3. Bootstrap script will use it for git clone/pull
2025-10-13 20:37:28 +02:00
Deeman
f46fd53d38 Update bootstrap script with correct GitLab repo URL 2025-10-13 20:36:08 +02:00
Deeman
558829f70b Refactor to git-based deployment: simplify CI/CD and supervisor
Addresses GitLab PR comments:
1. Remove hardcoded secrets from Pulumi.prod.yaml, use ESC environment
2. Simplify deployment by using git pull instead of R2 artifacts
3. Add bootstrap script for one-time supervisor setup

Major changes:
- **Pulumi config**: Use ESC environment (beanflows/prod) for all secrets
- **Supervisor script**: Git-based deployment (git pull every 15 min)
  * No more artifact downloads from R2
  * Runs code directly via `uv run materia`
  * Self-updating from master branch
- **Bootstrap script**: New infra/bootstrap_supervisor.sh for initial setup
  * One-time script to clone repo and setup systemd service
  * Idempotent and simple
- **CI/CD simplification**: Remove build and R2 deployment stages
  * Eliminated build:extract, build:transform, build:cli jobs
  * Eliminated deploy:r2 job
  * Simplified deploy:supervisor to just check bootstrap status
  * Reduced from 4 stages to 3 stages (Lint → Test → Deploy)
- **Documentation**: Updated CLAUDE.md with new architecture
  * Git-based deployment flow
  * Bootstrap instructions
  * Simplified execution model

Benefits:
-  No hardcoded secrets in config files
-  Simpler deployment (no artifact builds)
-  Easy to test locally (just git clone + uv sync)
-  Auto-updates every 15 minutes
-  Fewer CI/CD jobs (faster pipelines)
-  Cleaner separation of concerns

Inspired by TigerBeetle's CFO supervisor pattern.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 20:31:38 +02:00
Deeman
60989675b0 Add Pulumi prod stack config file 2025-10-12 23:19:10 +02:00
Deeman
719aa8edd9 Remove R2 bucket management from Pulumi, use cpx11 for supervisor
- R2 buckets (beanflows-artifacts, beanflows-data-prod) managed manually in Cloudflare UI
- R2 API tokens don't work with Cloudflare Pulumi provider
- Use cpx11 (€4.49/mo) instead of non-existent ccx11
- Import existing SSH key (deeman@DeemanPC)
- Successfully deployed supervisor at 49.13.231.178
2025-10-12 23:18:52 +02:00
Deeman
da17a29987 Rename Pulumi resource names to match actual R2 bucket names 2025-10-12 22:31:59 +02:00
Deeman
f207fb441d Add supervisor deployment with continuous pipeline orchestration
Implements automated supervisor instance deployment that runs scheduled
pipelines using a TigerBeetle-inspired continuous orchestration pattern.

Infrastructure changes:
- Update Pulumi to use existing R2 buckets (beanflows-artifacts, beanflows-data-prod)
- Rename scheduler → supervisor, optimize to CCX11 (€4/mo)
- Remove always-on worker (workers are now ephemeral only)
- Add artifacts bucket resource for CLI/pipeline packages

Supervisor architecture:
- supervisor.sh: Continuous loop checking schedules every 15 minutes
- Self-updating: Checks for new CLI versions hourly
- Fixed schedules: Extract at 2 AM UTC, Transform at 3 AM UTC
- systemd service for automatic restart on failure
- Logs to systemd journal for observability

CI/CD changes:
- deploy:infra now runs on every master push (not just on changes)
- New deploy:supervisor job:
  * Deploys supervisor.sh and systemd service
  * Installs latest materia CLI from R2
  * Configures environment with Pulumi ESC secrets
  * Restarts supervisor service

Future enhancements documented:
- SQLMesh-aware scheduling (check models before running)
- Model tags for worker sizing (heavy/distributed hints)
- Multi-pipeline support, distributed execution
- Cost optimization with multi-cloud spot pricing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 22:23:55 +02:00
Deeman
7e6ff29dea add claude memory update 2025-10-12 21:52:39 +02:00
Deeman
6c93021f2d remove stupid rules 2025-10-12 21:44:56 +02:00
Deeman
7e06eae5ac Add comprehensive ruff linting rules and migrate to uv build backend
- Configure ruff with strict linting rules (pycodestyle, pyflakes, isort, pylint, etc.)
- Exclude notebooks folder from linting
- Set line length to 88 characters and target Python 3.13
- Migrate build backend from hatchling to uv_build for better integration
- Add per-file ignores for __init__.py and scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 21:41:39 +02:00
Deeman
ce1cad4c41 fix 2025-10-12 21:36:32 +02:00
Deeman
5ce112f44d Add comprehensive E2E tests for materia CLI
- Add pytest and pytest-cov for testing
- Add niquests for modern HTTP/2 support (keep requests for hcloud compatibility)
- Create 13 E2E tests covering CLI, workers, pipelines, and secrets (71% coverage)
- Fix Pulumi ESC environment path (beanflows/prod) and secret key names
- Update GitLab CI to run CLI tests with coverage reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 21:32:51 +02:00
Deeman
ca308a7275 delete todos 2025-10-12 21:05:21 +02:00
Deeman
55bb84f0fa implement cli/infra update cicd 2025-10-12 21:00:41 +02:00
Deeman
790e802edd updates 2025-10-12 14:26:55 +02:00
Deeman
77dd277ebf updates 2025-10-12 14:26:37 +02:00
Deeman
ac9b23af17 Add CLAUDE.md documentation for AI-assisted development
Comprehensive guide covering project architecture, SQLMesh workflow,
data layer conventions, and development commands for the Materia
commodity analytics platform.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 13:21:13 +02:00
Deeman
025dda16c6 update dedupe logic -> much faster now 2025-10-07 22:32:45 +02:00
Deeman
da89c2bf6e update staging pipeline 2025-10-07 22:20:48 +02:00
Deeman
0a409acbea update path 2025-09-10 18:56:32 +02:00
Deeman
85704a4bf1 Change layer naming 2025-09-10 18:46:18 +02:00
Deeman
f5f2dbc7a5 refactor 2025-08-25 20:50:25 +02:00
Hendrik Dreesmann
a2ffc96aa3 Merge branch 'CEC' into 'master'
Update file Commodity Exchange Codes.xls

See merge request deemanone/materia!6
2025-08-01 20:03:27 +02:00
Simon Dmsn
5588be152b Update 3 files
- /notebooks/03_Extraction.ipynb
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
2025-08-01 14:52:55 +00:00
Simon Dmsn
1c87488cc7 Update 4 files
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_0.sql
2025-08-01 14:45:34 +00:00
Simon Dmsn
82b27e7c55 Update 2 files
- /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv
- /transform/sqlmesh_materia/seeds/psd_codes_exchange_codes_merge.csv
2025-08-01 14:41:48 +00:00
Simon Dmsn
9d7cc4e1fb Update file commodity_exchange_codes.csv 2025-08-01 14:26:19 +00:00
Simon Dmsn
4ad4386ccc Update 2 files
- /transform/sqlmesh_materia/models/staging/Commodity Exchange Codes.xls
- /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv
2025-08-01 14:24:26 +00:00
Simon Dmsn
918b0071b1 Update file Commodity Exchange Codes.xls 2025-08-01 14:22:01 +00:00
Deeman
91f8968990 remove comment 2025-07-31 19:48:18 +02:00
Deeman
641f794d61 fix seeds; update models 2025-07-27 22:49:37 +02:00
Deeman
c0d8f60d1c add reference data 2025-07-27 18:28:30 +02:00
Deeman
ff283b62ff exclude dbs
'
2025-07-27 15:41:34 +02:00
Deeman
8b5d05b3c2 raw ingest model 2025-07-27 15:40:41 +02:00
Deeman
f5c73e32c5 testing sqlmesh 2025-07-27 00:18:14 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
f0de8a505b update projects to packages 2025-07-26 22:32:47 +02:00
Deeman
0bbbd25b68 update projects to packages 2025-07-26 22:32:37 +02:00
Deeman
00fffb2089 more simplification 2025-07-26 22:19:33 +02:00
Deeman
1c3455a906 more simplification 2025-07-26 22:18:47 +02:00
Deeman
4fd1b96114 simplify using etags 2025-07-26 22:08:35 +02:00
Deeman
bd65ddcac8 adding incremental load abilities 2025-07-26 21:10:02 +02:00
Deeman
0a60bf8746 finish historical extraction 2025-07-13 23:20:55 +02:00
Deeman
b8ad73202c finish historical extraction 2025-07-13 23:20:50 +02:00
Deeman
70bd8a52db async is requesting stuff too fast 2025-07-13 18:08:25 +02:00
Deeman
8143c6ed8e async is requesting stuff too fast 2025-07-13 18:08:19 +02:00