Use return_exceptions=True so a CatalogException from a single query
(e.g. table not yet populated in a fresh env) degrades gracefully
instead of crashing the whole dashboard render.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- config.yaml: remove ambiguousorinvalidcolumn linter rule (false positives on read_csv TVFs)
- fct_cot_positioning: use TRY_CAST throughout — CFTC uses '.' as null in many columns
- raw/cot_disaggregated: add columns() declaration for 33 varchar cols
- dim_commodity: switch from SEED to FULL model with SQL VALUES to preserve leading zeros
Pandas auto-converts '083' → 83 even with varchar column declarations in SEED models
- seeds/dim_commodity.csv: correct cftc_commodity_code from '083731' (contract market code)
to '083' (3-digit CFTC commodity code); add CSV quoting
- test_cot_foundation.yaml: fix output key name, vars for time range, partial: true,
and correct cftc_commodity_code to '083'
- analytics.py: COFFEE_CFTC_CODE '083731' → '083' to match actual data
Result: serving.cot_positioning has 685 rows (2013-01-08 to 2026-02-17), 23/23 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Admin flow:
- Remove /admin/login (password-based) and /admin/dev-login routes entirely
- admin_required now checks only the 'admin' role; redirects to auth.login
- auth/dev-login with an ADMIN_EMAILS address redirects directly to /admin/
- .env.example: replace ADMIN_PASSWORD with ADMIN_EMAILS=admin@beanflows.coffee
Dev seeding:
- Add dev_seed.py: idempotent upsert of 4 fixed accounts (admin, free,
starter, pro) so every access tier is testable after dev_run.sh
- dev_run.sh: seed after migrations, show all 4 login shortcuts
Regression tests (37 passing):
- test_analytics.py: concurrent fetch_analytics calls return correct row
counts (cursor thread-safety regression), column names are lowercase
- test_roles.py TestAdminAuthFlow: password login routes return 404,
admin_required redirects to auth.login, dev-login grants admin role
and redirects to admin panel when email is in ADMIN_EMAILS
- conftest.py: add mock_analytics fixture (fixes 7 pre-existing dashboard
test errors); fix assertion text and lowercase metric param in tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_conn.execute() is not thread-safe for concurrent calls from multiple
threads. asyncio.gather submits each analytics query to the thread pool
via asyncio.to_thread, causing race conditions that silently returned
empty result sets. _conn.cursor() creates an independent cursor that is
safe to use from separate threads simultaneously.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLMesh normalizes unquoted identifiers to lowercase in physical tables,
so commodity_metrics columns are e.g. 'production' not 'Production'.
Update ALLOWED_METRICS, all analytics.py SQL queries, dashboard routes,
and both dashboard templates (Jinja + JS chart references) to use
lowercase column names consistently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
extract: wrap response.content in BytesIO before passing to
normalize_zipped_csv, and call .read() on the returned BytesIO before
write_bytes (two bugs: wrong type in, wrong type out)
sqlmesh: {{ var() }} inside SQL string literals is not substituted by
SQLMesh's Jinja (SQL parser treats them as opaque strings). Replace with
a @psd_glob() macro that evaluates LANDING_DIR at render time and returns
a quoted glob path string.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- admin_required now accepts users with 'admin' role (via g.user) in
addition to the password-based is_admin session flag, so both auth
methods grant access
- impersonate stores the admin's user_id (not True) in admin_impersonating
so stop-impersonating can restore the correct session
- stop_impersonating restores user_id from admin_impersonating instead of
just popping it
- remove s.stripe_customer_id from get_user_by_id (Paddle project, no
stripe_customer_id column in subscriptions)
Fixes 3 test_roles.py failures: test_admin_index_accessible_with_admin_role,
test_impersonate_stores_admin_id, test_stop_impersonating_restores_admin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
g.subscription is explicitly set to None in load_user, so
g.get("subscription", {}) returns None (key exists), not {}.
Use (g.get(...) or {}) to coalesce None to an empty dict.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The subscriptions table still had paddle_subscription_id but the new
code references provider_subscription_id. Renamed the DB column and
updated all queries in billing/routes.py to match.
Also removed unused get_subscription import from dashboard/routes.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove import of get_user_with_subscription (function was removed)
- Use g.user and g.subscription from eager loading instead
- Fixes ImportError in dashboard routes
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Record v0.4.0 commit in .copier-answers.yml
- Apply flattened paths in docker-compose.prod.yml
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Load .env via python-dotenv in core.py
- Skip analytics DB open if file doesn't exist
- Guard dashboard analytics calls when DB not available
- Namespace admin templates under admin/ to avoid blueprint conflicts
- Add dev-login routes for user and admin (DEBUG only)
- Update .copier-answers.yml src_path to GitLab remote
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.
- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Corrected SQLMesh commands to show proper usage:
- Run from project root (not from transform/sqlmesh_materia/)
- Use -p flag to specify project directory
- Use uv run for all commands
- Use esc run for commands requiring secrets (plan, audit, ui)
- Clarified which commands need secrets vs local-only
This aligns with the actual working pattern and Pulumi ESC integration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Documents the complete analysis, implementation, and results of the
PSD extraction refactoring from the architecture advisor's recommendations.
Includes:
- Problem statement and key insights
- Architecture analysis (data-oriented approach)
- Implementation phases and results
- Testing outcomes and metrics
- 227 files migrated, ~40 lines reduced, 220+ → 1-4 requests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Support ESC environment variable names**
- Fallback to R2_ADMIN_ACCESS_KEY_ID if R2_ACCESS_KEY not set
- Fallback to R2_ADMIN_SECRET_ACCESS_KEY if R2_SECRET_KEY not set
- Allows script to work with Pulumi ESC (beanflows/prod) variables
2. **Use landing bucket path**
- Changed R2 path from `psd/{etag}.zip` to `landing/psd/{etag}.zip`
- All extracted data goes to landing bucket for consistent organization
3. **Updated Pulumi ESC environment**
- Added R2_BUCKET=beanflows-data-prod
- Fixed R2_ENDPOINT to remove bucket path (now just account URL)
## Testing
- ✅ R2 upload works: Uploaded to landing/psd/316039e2612edc1_0.zip
- ✅ R2 deduplication works: Skips upload if file exists
- ✅ Local mode still works without credentials
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
1. **Added Pulumi ESC section**
- How to login and load secrets into shell
- `esc run` command for running commands with secrets
- List of available secrets in `beanflows/prod` environment
- Examples for common use cases
2. **Fixed supervisor bootstrap documentation**
- Clarified that bootstrapping happens automatically in CI/CD
- Pipeline checks if supervisor is already bootstrapped
- Runs bootstrap script automatically only if needed
- Removed misleading "one-time" manual bootstrap instructions
- Added note that it's only needed manually in exceptional cases
3. **Updated deploy:supervisor stage description**
- More accurate description of the bootstrap check logic
- Explains the conditional execution (bootstrap vs status check)
These updates make the documentation more accurate and helpful for both
local development (with ESC) and understanding the production deployment.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Key Changes
1. **Simplified extraction logic**
- Changed from downloading 220+ historical archives to checking only latest available month
- Tries current month and falls back up to 3 months (handles USDA publication lag)
- Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary
2. **Flat storage structure**
- Old: `data/{year}/{month}/{etag}.zip`
- New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
- Migrated 226 existing files to flat structure
3. **Dual storage modes**
- **Local mode**: Downloads to local directory (development)
- **R2 mode**: Uploads to Cloudflare R2 (production)
- Mode determined by presence of R2 environment variables
- Added boto3 dependency for S3-compatible R2 API
4. **Updated raw SQLMesh model**
- Changed pattern from `**/*.zip` to `*.zip` to match flat structure
## Benefits
- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity
## Testing
- ✅ Local extraction works and respects ETags
- ✅ Falls back correctly when current month unavailable
- ✅ Linting passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN
- Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME
- Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI
- Remove CREATE SCHEMA and USE statements
- DuckDB has bug with Iceberg REST: missing Content-Type header
- Schema creation via SQL currently not supported
- Models will use fully-qualified table names instead
Successfully tested with real R2 credentials:
- Iceberg catalog attachment works ✓
- Plan dry-run executes ✓
- Only fails on missing source data (expected) ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add catalog ATTACH statement in before_all with SECRET parameter
- References r2_secret created by connection configuration
- Uses proper DuckDB ATTACH syntax per Cloudflare docs
- Single-line format to avoid Jinja parsing issues
- Remove manual CREATE SECRET from before_all hooks
- Secret automatically created by SQLMesh from connection config
- Cleaner separation: connection defines credentials, hooks use them
Successfully tested - config validates without warnings.
Only fails on missing env vars (expected locally).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Move Iceberg secret from before_all hook to connection.secrets
- Fixes SQLMesh warning about unsupported @env_var syntax
- Uses Jinja templating {{ env_var() }} instead of @env_var()
- Remove database: ':memory:' (incompatible with catalogs)
- DuckDB doesn't allow both database and catalogs config
- Connection defaults to in-memory when no database specified
- Simplify before_all hooks to only handle ATTACH and schema setup
- Secret is now created automatically by SQLMesh
- Cleaner separation: connection config vs runtime setup
Based on:
- https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/
- https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove dev gateway (local DuckDB file no longer needed)
- Single prod gateway connects to R2 Iceberg catalog
- Use virtual environments for dev isolation (e.g., dev_<username>)
- Update CLAUDE.md with new workflow and environment strategy
- Create comprehensive transform/sqlmesh_materia/README.md
Benefits:
- Simpler configuration (one gateway instead of two)
- All environments use same R2 Iceberg catalog
- SQLMesh handles environment isolation automatically
- No need to maintain local 13GB materia_dev.db file
- before_all hooks only run for prod gateway (no conditional logic needed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>