- Remove import of get_user_with_subscription (function was removed)
- Use g.user and g.subscription from eager loading instead
- Fixes ImportError in dashboard routes
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Record v0.4.0 commit in .copier-answers.yml
- Apply flattened paths in docker-compose.prod.yml
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Load .env via python-dotenv in core.py
- Skip analytics DB open if file doesn't exist
- Guard dashboard analytics calls when DB not available
- Namespace admin templates under admin/ to avoid blueprint conflicts
- Add dev-login routes for user and admin (DEBUG only)
- Update .copier-answers.yml src_path to GitLab remote
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.
- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Corrected SQLMesh commands to show proper usage:
- Run from project root (not from transform/sqlmesh_materia/)
- Use -p flag to specify project directory
- Use uv run for all commands
- Use esc run for commands requiring secrets (plan, audit, ui)
- Clarified which commands need secrets vs local-only
This aligns with the actual working pattern and Pulumi ESC integration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Documents the complete analysis, implementation, and results of the
PSD extraction refactoring from the architecture advisor's recommendations.
Includes:
- Problem statement and key insights
- Architecture analysis (data-oriented approach)
- Implementation phases and results
- Testing outcomes and metrics
- 227 files migrated, ~40 lines reduced, 220+ → 1-4 requests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Support ESC environment variable names**
- Fallback to R2_ADMIN_ACCESS_KEY_ID if R2_ACCESS_KEY not set
- Fallback to R2_ADMIN_SECRET_ACCESS_KEY if R2_SECRET_KEY not set
- Allows script to work with Pulumi ESC (beanflows/prod) variables
2. **Use landing bucket path**
- Changed R2 path from `psd/{etag}.zip` to `landing/psd/{etag}.zip`
- All extracted data goes to landing bucket for consistent organization
3. **Updated Pulumi ESC environment**
- Added R2_BUCKET=beanflows-data-prod
- Fixed R2_ENDPOINT to remove bucket path (now just account URL)
## Testing
- ✅ R2 upload works: Uploaded to landing/psd/316039e2612edc1_0.zip
- ✅ R2 deduplication works: Skips upload if file exists
- ✅ Local mode still works without credentials
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Added Pulumi ESC section**
- How to login and load secrets into shell
- `esc run` command for running commands with secrets
- List of available secrets in `beanflows/prod` environment
- Examples for common use cases
2. **Fixed supervisor bootstrap documentation**
- Clarified that bootstrapping happens automatically in CI/CD
- Pipeline checks if supervisor is already bootstrapped
- Runs bootstrap script automatically only if needed
- Removed misleading "one-time" manual bootstrap instructions
- Added note that it's only needed manually in exceptional cases
3. **Updated deploy:supervisor stage description**
- More accurate description of the bootstrap check logic
- Explains the conditional execution (bootstrap vs status check)
These updates make the documentation more accurate and helpful for both
local development (with ESC) and understanding the production deployment.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Key Changes
1. **Simplified extraction logic**
- Changed from downloading 220+ historical archives to checking only latest available month
- Tries current month and falls back up to 3 months (handles USDA publication lag)
- Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary
2. **Flat storage structure**
- Old: `data/{year}/{month}/{etag}.zip`
- New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
- Migrated 226 existing files to flat structure
3. **Dual storage modes**
- **Local mode**: Downloads to local directory (development)
- **R2 mode**: Uploads to Cloudflare R2 (production)
- Mode determined by presence of R2 environment variables
- Added boto3 dependency for S3-compatible R2 API
4. **Updated raw SQLMesh model**
- Changed pattern from `**/*.zip` to `*.zip` to match flat structure
## Benefits
- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity
## Testing
- ✅ Local extraction works and respects ETags
- ✅ Falls back correctly when current month unavailable
- ✅ Linting passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN
- Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME
- Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI
- Remove CREATE SCHEMA and USE statements
- DuckDB has bug with Iceberg REST: missing Content-Type header
- Schema creation via SQL currently not supported
- Models will use fully-qualified table names instead
Successfully tested with real R2 credentials:
- Iceberg catalog attachment works ✓
- Plan dry-run executes ✓
- Only fails on missing source data (expected) ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add catalog ATTACH statement in before_all with SECRET parameter
- References r2_secret created by connection configuration
- Uses proper DuckDB ATTACH syntax per Cloudflare docs
- Single-line format to avoid Jinja parsing issues
- Remove manual CREATE SECRET from before_all hooks
- Secret automatically created by SQLMesh from connection config
- Cleaner separation: connection defines credentials, hooks use them
Successfully tested - config validates without warnings.
Only fails on missing env vars (expected locally).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Move Iceberg secret from before_all hook to connection.secrets
- Fixes SQLMesh warning about unsupported @env_var syntax
- Uses Jinja templating {{ env_var() }} instead of @env_var()
- Remove database: ':memory:' (incompatible with catalogs)
- DuckDB doesn't allow both database and catalogs config
- Connection defaults to in-memory when no database specified
- Simplify before_all hooks to only handle ATTACH and schema setup
- Secret is now created automatically by SQLMesh
- Cleaner separation: connection config vs runtime setup
Based on:
- https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/
- https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove dev gateway (local DuckDB file no longer needed)
- Single prod gateway connects to R2 Iceberg catalog
- Use virtual environments for dev isolation (e.g., dev_<username>)
- Update CLAUDE.md with new workflow and environment strategy
- Create comprehensive transform/sqlmesh_materia/README.md
Benefits:
- Simpler configuration (one gateway instead of two)
- All environments use same R2 Iceberg catalog
- SQLMesh handles environment isolation automatically
- No need to maintain local 13GB materia_dev.db file
- before_all hooks only run for prod gateway (no conditional logic needed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Simplify supervisor.sh following TigerBeetle pattern
- Remove complex functions, use simple while loop
- Add || sleep 600 for resilience against crashes
- Use git switch --discard-changes for clean updates
- Run pipelines every hour (SQLMesh handles scheduling)
- Use POSIX sh instead of bash
- Remove /repo subdirectory nesting
- Repository clones directly to /opt/materia
- Simpler paths throughout
- Move systemd service to repo
- Bootstrap copies from repo instead of hardcoding
- Service can be updated via git pull
- Automate bootstrap in CI/CD
- deploy:supervisor now auto-bootstraps on first deploy
- Waits for SSH to be ready (retry loop)
- Injects secrets via SSH environment
- Idempotent: detects if already bootstrapped
Result: Push to master and supervisor "just works"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
More secure approach:
- Uses HTTPS with token instead of SSH keys
- Token can be rotated without touching infrastructure
- Scoped to read_repository only
- Token stored in Pulumi ESC (beanflows/prod)
Setup:
1. Create project access token in GitLab with read_repository scope
2. Add GITLAB_READ_TOKEN to Pulumi ESC
3. Bootstrap script will use it for git clone/pull
Addresses GitLab PR comments:
1. Remove hardcoded secrets from Pulumi.prod.yaml, use ESC environment
2. Simplify deployment by using git pull instead of R2 artifacts
3. Add bootstrap script for one-time supervisor setup
Major changes:
- **Pulumi config**: Use ESC environment (beanflows/prod) for all secrets
- **Supervisor script**: Git-based deployment (git pull every 15 min)
* No more artifact downloads from R2
* Runs code directly via `uv run materia`
* Self-updating from master branch
- **Bootstrap script**: New infra/bootstrap_supervisor.sh for initial setup
* One-time script to clone repo and setup systemd service
* Idempotent and simple
- **CI/CD simplification**: Remove build and R2 deployment stages
* Eliminated build:extract, build:transform, build:cli jobs
* Eliminated deploy:r2 job
* Simplified deploy:supervisor to just check bootstrap status
* Reduced from 4 stages to 3 stages (Lint → Test → Deploy)
- **Documentation**: Updated CLAUDE.md with new architecture
* Git-based deployment flow
* Bootstrap instructions
* Simplified execution model
Benefits:
- ✅ No hardcoded secrets in config files
- ✅ Simpler deployment (no artifact builds)
- ✅ Easy to test locally (just git clone + uv sync)
- ✅ Auto-updates every 15 minutes
- ✅ Fewer CI/CD jobs (faster pipelines)
- ✅ Cleaner separation of concerns
Inspired by TigerBeetle's CFO supervisor pattern.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements automated supervisor instance deployment that runs scheduled
pipelines using a TigerBeetle-inspired continuous orchestration pattern.
Infrastructure changes:
- Update Pulumi to use existing R2 buckets (beanflows-artifacts, beanflows-data-prod)
- Rename scheduler → supervisor, optimize to CCX11 (€4/mo)
- Remove always-on worker (workers are now ephemeral only)
- Add artifacts bucket resource for CLI/pipeline packages
Supervisor architecture:
- supervisor.sh: Continuous loop checking schedules every 15 minutes
- Self-updating: Checks for new CLI versions hourly
- Fixed schedules: Extract at 2 AM UTC, Transform at 3 AM UTC
- systemd service for automatic restart on failure
- Logs to systemd journal for observability
CI/CD changes:
- deploy:infra now runs on every master push (not just on changes)
- New deploy:supervisor job:
* Deploys supervisor.sh and systemd service
* Installs latest materia CLI from R2
* Configures environment with Pulumi ESC secrets
* Restarts supervisor service
Future enhancements documented:
- SQLMesh-aware scheduling (check models before running)
- Model tags for worker sizing (heavy/distributed hints)
- Multi-pipeline support, distributed execution
- Cost optimization with multi-cloud spot pricing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Configure ruff with strict linting rules (pycodestyle, pyflakes, isort, pylint, etc.)
- Exclude notebooks folder from linting
- Set line length to 88 characters and target Python 3.13
- Migrate build backend from hatchling to uv_build for better integration
- Add per-file ignores for __init__.py and scripts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>