Commit Graph

16 Commits

Author SHA1 Message Date
Deeman
09ae88be19 cleanup and prefect service setup 2026-02-05 20:01:50 +01:00
Hendrik Dreesmann
b702e6565a Update SQLMesh for R2 data access & Convert psd data to gzip 2025-11-02 00:26:01 +01:00
Deeman
d30ec9b66b Add R2 upload support with landing bucket path
## Changes

1. **Support ESC environment variable names**
   - Fallback to R2_ADMIN_ACCESS_KEY_ID if R2_ACCESS_KEY not set
   - Fallback to R2_ADMIN_SECRET_ACCESS_KEY if R2_SECRET_KEY not set
   - Allows script to work with Pulumi ESC (beanflows/prod) variables

2. **Use landing bucket path**
   - Changed R2 path from `psd/{etag}.zip` to `landing/psd/{etag}.zip`
   - All extracted data goes to landing bucket for consistent organization

3. **Updated Pulumi ESC environment**
   - Added R2_BUCKET=beanflows-data-prod
   - Fixed R2_ENDPOINT to remove bucket path (now just account URL)

## Testing

-  R2 upload works: Uploaded to landing/psd/316039e2612edc1_0.zip
-  R2 deduplication works: Skips upload if file exists
-  Local mode still works without credentials

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 22:45:30 +02:00
Deeman
38897617e7 Refactor PSD extraction: simplify to latest-only + add R2 support
## Key Changes

1. **Simplified extraction logic**
   - Changed from downloading 220+ historical archives to checking only latest available month
   - Tries current month and falls back up to 3 months (handles USDA publication lag)
   - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary

2. **Flat storage structure**
   - Old: `data/{year}/{month}/{etag}.zip`
   - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
   - Migrated 226 existing files to flat structure

3. **Dual storage modes**
   - **Local mode**: Downloads to local directory (development)
   - **R2 mode**: Uploads to Cloudflare R2 (production)
   - Mode determined by presence of R2 environment variables
   - Added boto3 dependency for S3-compatible R2 API

4. **Updated raw SQLMesh model**
   - Changed pattern from `**/*.zip` to `*.zip` to match flat structure

## Benefits

- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity

## Testing

-  Local extraction works and respects ETags
-  Falls back correctly when current month unavailable
-  Linting passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 22:02:15 +02:00
Deeman
6c93021f2d remove stupid rules 2025-10-12 21:44:56 +02:00
Deeman
f5f2dbc7a5 refactor 2025-08-25 20:50:25 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
0bbbd25b68 update projects to packages 2025-07-26 22:32:37 +02:00
Deeman
00fffb2089 more simplification 2025-07-26 22:19:33 +02:00
Deeman
1c3455a906 more simplification 2025-07-26 22:18:47 +02:00
Deeman
4fd1b96114 simplify using etags 2025-07-26 22:08:35 +02:00
Deeman
bd65ddcac8 adding incremental load abilities 2025-07-26 21:10:02 +02:00
Deeman
b8ad73202c finish historical extraction 2025-07-13 23:20:50 +02:00
Deeman
70bd8a52db async is requesting stuff too fast 2025-07-13 18:08:25 +02:00
Deeman
8143c6ed8e async is requesting stuff too fast 2025-07-13 18:08:19 +02:00
Deeman
c3c281fcd8 update structure 2025-07-08 22:41:59 +02:00