## Key Changes
1. **Simplified extraction logic**
- Changed from downloading 220+ historical archives to checking only latest available month
- Tries current month and falls back up to 3 months (handles USDA publication lag)
- Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary
2. **Flat storage structure**
- Old: `data/{year}/{month}/{etag}.zip`
- New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
- Migrated 226 existing files to flat structure
3. **Dual storage modes**
- **Local mode**: Downloads to local directory (development)
- **R2 mode**: Uploads to Cloudflare R2 (production)
- Mode determined by presence of R2 environment variables
- Added boto3 dependency for S3-compatible R2 API
4. **Updated raw SQLMesh model**
- Changed pattern from `**/*.zip` to `*.zip` to match flat structure
## Benefits
- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity
## Testing
- ✅ Local extraction works and respects ETags
- ✅ Falls back correctly when current month unavailable
- ✅ Linting passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
25 lines
1000 B
SQL
25 lines
1000 B
SQL
MODEL (
|
|
name raw.psd_alldata,
|
|
kind FULL,
|
|
grain ( commodity_code, country_code, market_year, calendar_year, month, attribute_id,unit_id ),
|
|
start '2006-08-01',
|
|
cron '@daily',
|
|
columns (
|
|
commodity_code varchar,
|
|
commodity_description varchar,
|
|
country_code varchar,
|
|
country_name varchar,
|
|
market_year varchar,
|
|
calendar_year varchar,
|
|
month varchar,
|
|
attribute_id varchar,
|
|
attribute_description varchar,
|
|
unit_id varchar,
|
|
unit_description varchar,
|
|
value varchar,
|
|
filename varchar
|
|
)
|
|
);
|
|
SELECT *
|
|
FROM read_csv('zip://extract/psdonline/src/psdonline/data/*.zip/*.csv', header=true, union_by_name=true, filename=true, names = ['commodity_code', 'commodity_description', 'country_code', 'country_name', 'market_year', 'calendar_year', 'month', 'attribute_id', 'attribute_description', 'unit_id', 'unit_description', 'value'], all_varchar=true)
|