Commit Graph

25 Commits

Author SHA1 Message Date
Deeman
32132974b2 Clean up web changes and add favicon
- Update uv.lock dependencies
- Remove web/CLAUDE.md (moved to root)
- Update base.html template
- Add favicon.svg

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
2026-02-19 22:46:33 +01:00
Deeman
d6d2aa8efe Merge remote-tracking branch 'origin/master'
# Conflicts:
#	infra/readme.md
2026-02-18 21:09:24 +01:00
Deeman
c1d00dcdc4 Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.

- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:50:19 +01:00
Deeman
2748c606e9 Add BeanFlows MVP: coffee analytics dashboard, API, and web app
- Fix pipeline granularity: add market_year to cleaned/serving SQL models
- Add DuckDB data access layer with async query functions (analytics.py)
- Build Chart.js dashboard: supply/demand, STU ratio, top producers, YoY table
- Add country comparison page with multi-select picker
- Replace items CRUD with read-only commodity API (list, metrics, countries, CSV)
- Configure BeanFlows plan tiers (Free/Starter/Pro) with feature gating
- Rewrite public pages for coffee market intelligence positioning
- Remove boilerplate items schema, update health check for DuckDB
- Add test suite: 139 tests passing (dashboard, API, billing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 16:11:50 +01:00
Deeman
09ae88be19 cleanup and prefect service setup 2026-02-05 20:01:50 +01:00
Deeman
6d4377ccf9 cleanup and prefect service setup 2026-02-04 22:24:55 +01:00
Hendrik Dreesmann
b702e6565a Update SQLMesh for R2 data access & Convert psd data to gzip 2025-11-02 00:26:01 +01:00
Deeman
38897617e7 Refactor PSD extraction: simplify to latest-only + add R2 support
## Key Changes

1. **Simplified extraction logic**
   - Changed from downloading 220+ historical archives to checking only latest available month
   - Tries current month and falls back up to 3 months (handles USDA publication lag)
   - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary

2. **Flat storage structure**
   - Old: `data/{year}/{month}/{etag}.zip`
   - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
   - Migrated 226 existing files to flat structure

3. **Dual storage modes**
   - **Local mode**: Downloads to local directory (development)
   - **R2 mode**: Uploads to Cloudflare R2 (production)
   - Mode determined by presence of R2 environment variables
   - Added boto3 dependency for S3-compatible R2 API

4. **Updated raw SQLMesh model**
   - Changed pattern from `**/*.zip` to `*.zip` to match flat structure

## Benefits

- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity

## Testing

-  Local extraction works and respects ETags
-  Falls back correctly when current month unavailable
-  Linting passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 22:02:15 +02:00
Deeman
5ce112f44d Add comprehensive E2E tests for materia CLI
- Add pytest and pytest-cov for testing
- Add niquests for modern HTTP/2 support (keep requests for hcloud compatibility)
- Create 13 E2E tests covering CLI, workers, pipelines, and secrets (71% coverage)
- Fix Pulumi ESC environment path (beanflows/prod) and secret key names
- Update GitLab CI to run CLI tests with coverage reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 21:32:51 +02:00
Deeman
55bb84f0fa implement cli/infra update cicd 2025-10-12 21:00:41 +02:00
Deeman
77dd277ebf updates 2025-10-12 14:26:37 +02:00
Deeman
f5f2dbc7a5 refactor 2025-08-25 20:50:25 +02:00
Deeman
8b5d05b3c2 raw ingest model 2025-07-27 15:40:41 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
4fd1b96114 simplify using etags 2025-07-26 22:08:35 +02:00
Deeman
bd65ddcac8 adding incremental load abilities 2025-07-26 21:10:02 +02:00
Deeman
b8ad73202c finish historical extraction 2025-07-13 23:20:50 +02:00
Deeman
8143c6ed8e async is requesting stuff too fast 2025-07-13 18:08:19 +02:00
Deeman
c3c281fcd8 update structure 2025-07-08 22:41:59 +02:00
Deeman
5368c1e521 changes
'
2025-06-11 11:49:20 +02:00
Deeman
265250864c add dlt script to extract data from fas.usda.gov 2025-04-30 22:35:31 +02:00
Simon Dmsn
bff78f1e72 Psd 2025-04-30 18:56:39 +02:00
2839757cf8 add cicd and precommit 2025-03-01 18:23:56 +01:00
2a4e7fe668 update 2025-03-01 18:15:34 +01:00
f5b39d5ee0 add yfinance and more readme 2025-03-01 18:13:45 +01:00