_conn.execute() is not thread-safe for concurrent calls from multiple
threads. asyncio.gather submits each analytics query to the thread pool
via asyncio.to_thread, causing race conditions that silently returned
empty result sets. _conn.cursor() creates an independent cursor that is
safe to use from separate threads simultaneously.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLMesh normalizes unquoted identifiers to lowercase in physical tables,
so commodity_metrics columns are e.g. 'production' not 'Production'.
Update ALLOWED_METRICS, all analytics.py SQL queries, dashboard routes,
and both dashboard templates (Jinja + JS chart references) to use
lowercase column names consistently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
extract: wrap response.content in BytesIO before passing to
normalize_zipped_csv, and call .read() on the returned BytesIO before
write_bytes (two bugs: wrong type in, wrong type out)
sqlmesh: {{ var() }} inside SQL string literals is not substituted by
SQLMesh's Jinja (SQL parser treats them as opaque strings). Replace with
a @psd_glob() macro that evaluates LANDING_DIR at render time and returns
a quoted glob path string.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- admin_required now accepts users with 'admin' role (via g.user) in
addition to the password-based is_admin session flag, so both auth
methods grant access
- impersonate stores the admin's user_id (not True) in admin_impersonating
so stop-impersonating can restore the correct session
- stop_impersonating restores user_id from admin_impersonating instead of
just popping it
- remove s.stripe_customer_id from get_user_by_id (Paddle project, no
stripe_customer_id column in subscriptions)
Fixes 3 test_roles.py failures: test_admin_index_accessible_with_admin_role,
test_impersonate_stores_admin_id, test_stop_impersonating_restores_admin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
g.subscription is explicitly set to None in load_user, so
g.get("subscription", {}) returns None (key exists), not {}.
Use (g.get(...) or {}) to coalesce None to an empty dict.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The subscriptions table still had paddle_subscription_id but the new
code references provider_subscription_id. Renamed the DB column and
updated all queries in billing/routes.py to match.
Also removed unused get_subscription import from dashboard/routes.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove import of get_user_with_subscription (function was removed)
- Use g.user and g.subscription from eager loading instead
- Fixes ImportError in dashboard routes
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Record v0.4.0 commit in .copier-answers.yml
- Apply flattened paths in docker-compose.prod.yml
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Load .env via python-dotenv in core.py
- Skip analytics DB open if file doesn't exist
- Guard dashboard analytics calls when DB not available
- Namespace admin templates under admin/ to avoid blueprint conflicts
- Add dev-login routes for user and admin (DEBUG only)
- Update .copier-answers.yml src_path to GitLab remote
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.
- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Corrected SQLMesh commands to show proper usage:
- Run from project root (not from transform/sqlmesh_materia/)
- Use -p flag to specify project directory
- Use uv run for all commands
- Use esc run for commands requiring secrets (plan, audit, ui)
- Clarified which commands need secrets vs local-only
This aligns with the actual working pattern and Pulumi ESC integration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Documents the complete analysis, implementation, and results of the
PSD extraction refactoring from the architecture advisor's recommendations.
Includes:
- Problem statement and key insights
- Architecture analysis (data-oriented approach)
- Implementation phases and results
- Testing outcomes and metrics
- 227 files migrated, ~40 lines reduced, 220+ → 1-4 requests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Support ESC environment variable names**
- Fallback to R2_ADMIN_ACCESS_KEY_ID if R2_ACCESS_KEY not set
- Fallback to R2_ADMIN_SECRET_ACCESS_KEY if R2_SECRET_KEY not set
- Allows script to work with Pulumi ESC (beanflows/prod) variables
2. **Use landing bucket path**
- Changed R2 path from `psd/{etag}.zip` to `landing/psd/{etag}.zip`
- All extracted data goes to landing bucket for consistent organization
3. **Updated Pulumi ESC environment**
- Added R2_BUCKET=beanflows-data-prod
- Fixed R2_ENDPOINT to remove bucket path (now just account URL)
## Testing
- ✅ R2 upload works: Uploaded to landing/psd/316039e2612edc1_0.zip
- ✅ R2 deduplication works: Skips upload if file exists
- ✅ Local mode still works without credentials
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Added Pulumi ESC section**
- How to login and load secrets into shell
- `esc run` command for running commands with secrets
- List of available secrets in `beanflows/prod` environment
- Examples for common use cases
2. **Fixed supervisor bootstrap documentation**
- Clarified that bootstrapping happens automatically in CI/CD
- Pipeline checks if supervisor is already bootstrapped
- Runs bootstrap script automatically only if needed
- Removed misleading "one-time" manual bootstrap instructions
- Added note that it's only needed manually in exceptional cases
3. **Updated deploy:supervisor stage description**
- More accurate description of the bootstrap check logic
- Explains the conditional execution (bootstrap vs status check)
These updates make the documentation more accurate and helpful for both
local development (with ESC) and understanding the production deployment.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Key Changes
1. **Simplified extraction logic**
- Changed from downloading 220+ historical archives to checking only latest available month
- Tries current month and falls back up to 3 months (handles USDA publication lag)
- Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary
2. **Flat storage structure**
- Old: `data/{year}/{month}/{etag}.zip`
- New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
- Migrated 226 existing files to flat structure
3. **Dual storage modes**
- **Local mode**: Downloads to local directory (development)
- **R2 mode**: Uploads to Cloudflare R2 (production)
- Mode determined by presence of R2 environment variables
- Added boto3 dependency for S3-compatible R2 API
4. **Updated raw SQLMesh model**
- Changed pattern from `**/*.zip` to `*.zip` to match flat structure
## Benefits
- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity
## Testing
- ✅ Local extraction works and respects ETags
- ✅ Falls back correctly when current month unavailable
- ✅ Linting passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN
- Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME
- Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI
- Remove CREATE SCHEMA and USE statements
- DuckDB has bug with Iceberg REST: missing Content-Type header
- Schema creation via SQL currently not supported
- Models will use fully-qualified table names instead
Successfully tested with real R2 credentials:
- Iceberg catalog attachment works ✓
- Plan dry-run executes ✓
- Only fails on missing source data (expected) ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add catalog ATTACH statement in before_all with SECRET parameter
- References r2_secret created by connection configuration
- Uses proper DuckDB ATTACH syntax per Cloudflare docs
- Single-line format to avoid Jinja parsing issues
- Remove manual CREATE SECRET from before_all hooks
- Secret automatically created by SQLMesh from connection config
- Cleaner separation: connection defines credentials, hooks use them
Successfully tested - config validates without warnings.
Only fails on missing env vars (expected locally).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Move Iceberg secret from before_all hook to connection.secrets
- Fixes SQLMesh warning about unsupported @env_var syntax
- Uses Jinja templating {{ env_var() }} instead of @env_var()
- Remove database: ':memory:' (incompatible with catalogs)
- DuckDB doesn't allow both database and catalogs config
- Connection defaults to in-memory when no database specified
- Simplify before_all hooks to only handle ATTACH and schema setup
- Secret is now created automatically by SQLMesh
- Cleaner separation: connection config vs runtime setup
Based on:
- https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/
- https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove dev gateway (local DuckDB file no longer needed)
- Single prod gateway connects to R2 Iceberg catalog
- Use virtual environments for dev isolation (e.g., dev_<username>)
- Update CLAUDE.md with new workflow and environment strategy
- Create comprehensive transform/sqlmesh_materia/README.md
Benefits:
- Simpler configuration (one gateway instead of two)
- All environments use same R2 Iceberg catalog
- SQLMesh handles environment isolation automatically
- No need to maintain local 13GB materia_dev.db file
- before_all hooks only run for prod gateway (no conditional logic needed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Simplify supervisor.sh following TigerBeetle pattern
- Remove complex functions, use simple while loop
- Add || sleep 600 for resilience against crashes
- Use git switch --discard-changes for clean updates
- Run pipelines every hour (SQLMesh handles scheduling)
- Use POSIX sh instead of bash
- Remove /repo subdirectory nesting
- Repository clones directly to /opt/materia
- Simpler paths throughout
- Move systemd service to repo
- Bootstrap copies from repo instead of hardcoding
- Service can be updated via git pull
- Automate bootstrap in CI/CD
- deploy:supervisor now auto-bootstraps on first deploy
- Waits for SSH to be ready (retry loop)
- Injects secrets via SSH environment
- Idempotent: detects if already bootstrapped
Result: Push to master and supervisor "just works"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
More secure approach:
- Uses HTTPS with token instead of SSH keys
- Token can be rotated without touching infrastructure
- Scoped to read_repository only
- Token stored in Pulumi ESC (beanflows/prod)
Setup:
1. Create project access token in GitLab with read_repository scope
2. Add GITLAB_READ_TOKEN to Pulumi ESC
3. Bootstrap script will use it for git clone/pull