Update SQLMesh for R2 data access & Convert psd data to gzip
This commit is contained in:
@@ -1,566 +0,0 @@
|
||||
# SaaS Frontend Architecture Plan: beanflows.coffee
|
||||
|
||||
**Date**: 2025-10-21
|
||||
**Status**: Planning
|
||||
**Product**: beanflows.coffee - Coffee market analytics platform
|
||||
|
||||
## Project Vision
|
||||
|
||||
**beanflows.coffee** - A specialized coffee market analytics platform built on USDA PSD data, providing traders, roasters, and market analysts with actionable insights into global coffee production, trade flows, and supply chain dynamics.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Robyn Web App (beanflows.coffee) │
|
||||
│ │
|
||||
│ Landing Page (Jinja2 + htmx) ─┬─> Auth (JWT + SQLite) │
|
||||
│ └─> /dashboards/* routes │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Serve Evidence /build/ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────┐
|
||||
│ Evidence.dev Dashboards │
|
||||
│ (coffee market focus) │
|
||||
│ │
|
||||
│ Queries: Local DuckDB ←──┼─── Export from Iceberg
|
||||
│ Builds: On data updates │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
### Data Flow
|
||||
- **Source:** Iceberg catalog (R2)
|
||||
- **Export:** Local DuckDB file for Evidence dashboards
|
||||
- **Trigger:** Rebuild Evidence after SQLMesh updates data
|
||||
- **Serving:** Robyn serves Evidence static build output
|
||||
|
||||
### Auth System
|
||||
- **User data:** SQLite database
|
||||
- **Auth method:** JWT tokens (Robyn built-in support)
|
||||
- **Consideration:** Evaluate hosted auth services (Clerk, Auth0)
|
||||
- **POC approach:** Simple email/password with JWT
|
||||
|
||||
### Payments
|
||||
- **Provider:** Stripe
|
||||
- **Integration:** Webhook-based (Stripe.js on client, webhooks to Robyn)
|
||||
- **Rationale:** Simplest integration, no need for complex server-side API calls
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
materia/
|
||||
├── web/ # NEW: Robyn web application
|
||||
│ ├── app.py # Robyn entry point
|
||||
│ ├── routes/
|
||||
│ │ ├── landing.py # Marketing page
|
||||
│ │ ├── auth.py # Login/signup (JWT)
|
||||
│ │ └── dashboards.py # Serve Evidence /build/
|
||||
│ ├── templates/ # Jinja2 + htmx
|
||||
│ │ ├── base.html
|
||||
│ │ ├── landing.html
|
||||
│ │ └── login.html
|
||||
│ ├── middleware/
|
||||
│ │ └── auth.py # JWT verification
|
||||
│ ├── models.py # SQLite schema (users table)
|
||||
│ └── static/ # CSS, htmx.js
|
||||
├── dashboards/ # NEW: Evidence.dev project
|
||||
│ ├── pages/ # Dashboard markdown files
|
||||
│ │ ├── index.md # Global coffee overview
|
||||
│ │ ├── production.md # Production trends
|
||||
│ │ ├── trade.md # Trade flows
|
||||
│ │ └── supply.md # Supply/demand balance
|
||||
│ ├── sources/ # Data source configs
|
||||
│ ├── data/ # Local DuckDB exports
|
||||
│ │ └── coffee_data.duckdb
|
||||
│ └── package.json
|
||||
```
|
||||
|
||||
## How It Works: Robyn + Evidence Integration
|
||||
|
||||
### 1. Evidence Build Process
|
||||
```bash
|
||||
cd dashboards
|
||||
npm run build
|
||||
# Outputs static HTML/JS/CSS to dashboards/build/
|
||||
```
|
||||
|
||||
### 2. Robyn Serves Evidence Output
|
||||
```python
|
||||
# web/routes/dashboards.py
|
||||
@app.get("/dashboards/*")
|
||||
@requires_jwt # Custom middleware
|
||||
def serve_dashboard(request):
|
||||
# Check authentication first
|
||||
if not verify_jwt(request):
|
||||
return redirect("/login")
|
||||
|
||||
# Strip /dashboards/ prefix
|
||||
path = request.path.removeprefix("/dashboards/") or "index.html"
|
||||
|
||||
# Serve from Evidence build directory
|
||||
file_path = Path("dashboards/build") / path
|
||||
|
||||
if not file_path.exists():
|
||||
file_path = Path("dashboards/build/index.html")
|
||||
|
||||
return FileResponse(file_path)
|
||||
```
|
||||
|
||||
### 3. User Flow
|
||||
1. User visits `beanflows.coffee` (landing page)
|
||||
2. User signs up / logs in (Robyn auth system)
|
||||
3. Stripe checkout for subscription (using Stripe.js)
|
||||
4. User navigates to `beanflows.coffee/dashboards/`
|
||||
5. Robyn checks JWT authentication
|
||||
6. If authenticated: serves Evidence static files
|
||||
7. If not: redirects to login
|
||||
|
||||
## Phase 1: Evidence.dev POC
|
||||
|
||||
**Goal:** Get Evidence working with coffee data
|
||||
|
||||
### Tasks
|
||||
1. Create Evidence project in `dashboards/`
|
||||
```bash
|
||||
mkdir dashboards && cd dashboards
|
||||
npm init evidence@latest .
|
||||
```
|
||||
|
||||
2. Create SQLMesh export model for coffee data
|
||||
```sql
|
||||
-- models/exports/export_coffee_analytics.sql
|
||||
COPY (
|
||||
SELECT * FROM serving.obt_commodity_metrics
|
||||
WHERE commodity_name ILIKE '%coffee%'
|
||||
) TO 'dashboards/data/coffee_data.duckdb';
|
||||
```
|
||||
|
||||
3. Build simple coffee production dashboard
|
||||
- Single dashboard showing coffee production trends
|
||||
- Test Evidence build process
|
||||
- Validate DuckDB query performance
|
||||
|
||||
4. Test local Evidence dev server
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
**Deliverable:** Working Evidence dashboard querying local DuckDB
|
||||
|
||||
## Phase 2: Robyn Web App
|
||||
|
||||
### Tasks
|
||||
|
||||
1. Set up Robyn project in `web/`
|
||||
```bash
|
||||
mkdir web && cd web
|
||||
uv add robyn jinja2
|
||||
```
|
||||
|
||||
2. Implement SQLite user database
|
||||
```python
|
||||
# web/models.py
|
||||
import sqlite3
|
||||
|
||||
def init_db():
|
||||
conn = sqlite3.connect('users.db')
|
||||
conn.execute('''
|
||||
CREATE TABLE IF NOT EXISTS users (
|
||||
id INTEGER PRIMARY KEY,
|
||||
email TEXT UNIQUE NOT NULL,
|
||||
password_hash TEXT NOT NULL,
|
||||
stripe_customer_id TEXT,
|
||||
subscription_status TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
''')
|
||||
conn.close()
|
||||
```
|
||||
|
||||
3. Add JWT authentication
|
||||
```python
|
||||
# web/middleware/auth.py
|
||||
from robyn import Request
|
||||
import jwt
|
||||
|
||||
def requires_jwt(func):
|
||||
def wrapper(request: Request):
|
||||
token = request.headers.get("Authorization")
|
||||
if not token:
|
||||
return redirect("/login")
|
||||
|
||||
try:
|
||||
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
|
||||
request.user = payload
|
||||
return func(request)
|
||||
except jwt.InvalidTokenError:
|
||||
return redirect("/login")
|
||||
|
||||
return wrapper
|
||||
```
|
||||
|
||||
4. Create landing page (Jinja2 + htmx)
|
||||
- Marketing copy
|
||||
- Feature highlights
|
||||
- Pricing section
|
||||
- Sign up CTA
|
||||
|
||||
5. Add dashboard serving route
|
||||
- Protected by JWT middleware
|
||||
- Serves Evidence `build/` directory
|
||||
|
||||
**Deliverable:** Authenticated web app serving Evidence dashboards
|
||||
|
||||
## Phase 3: Coffee Market Dashboards
|
||||
|
||||
### Dashboard Ideas
|
||||
|
||||
1. **Global Coffee Production Overview**
|
||||
- Top producing countries (Brazil, Vietnam, Colombia, Ethiopia, Honduras)
|
||||
- Arabica vs Robusta production split
|
||||
- Year-over-year production changes
|
||||
- Production volatility trends
|
||||
|
||||
2. **Supply & Demand Balance**
|
||||
- Stock-to-use ratios by country
|
||||
- Export/import flows (trade network visualization)
|
||||
- Consumption trends by region
|
||||
- Inventory levels (ending stocks)
|
||||
|
||||
3. **Market Volatility**
|
||||
- Production volatility (weather impacts, climate change signals)
|
||||
- Trade flow disruptions (sudden changes in export patterns)
|
||||
- Stock drawdown alerts (countries depleting reserves)
|
||||
|
||||
4. **Historical Trends**
|
||||
- 10-year production trends by country
|
||||
- Market share shifts (which countries gaining/losing)
|
||||
- Climate impact signals (correlation with weather events)
|
||||
- Long-term supply/demand balance
|
||||
|
||||
5. **Trade Flow Analysis**
|
||||
- Top exporters → top importers (Sankey diagram if possible)
|
||||
- Net trade position by country
|
||||
- Import dependency ratios
|
||||
- Trade balance trends
|
||||
|
||||
### Data Requirements
|
||||
|
||||
- Filter PSD data for coffee commodity codes
|
||||
- May need new serving layer models:
|
||||
- `fct_coffee_trade_flows` - Origin/destination trade flows
|
||||
- `dim_coffee_varieties` - Arabica vs Robusta (if data available)
|
||||
- `agg_coffee_regional_summary` - Regional aggregates
|
||||
|
||||
**Deliverable:** Production-ready coffee analytics dashboards
|
||||
|
||||
## Phase 4: Deployment & Automation
|
||||
|
||||
### Evidence Build Trigger
|
||||
|
||||
Rebuild Evidence dashboards after SQLMesh updates data:
|
||||
|
||||
```python
|
||||
# In SQLMesh post-hook or separate script
|
||||
import subprocess
|
||||
import httpx
|
||||
|
||||
def rebuild_dashboards():
|
||||
# Export fresh data from Iceberg to local DuckDB
|
||||
subprocess.run([
|
||||
"duckdb", "-c",
|
||||
"ATTACH 'iceberg_catalog' AS iceberg; "
|
||||
"COPY (SELECT * FROM iceberg.serving.obt_commodity_metrics "
|
||||
"WHERE commodity_name ILIKE '%coffee%') "
|
||||
"TO 'dashboards/data/coffee_data.duckdb';"
|
||||
])
|
||||
|
||||
# Rebuild Evidence
|
||||
subprocess.run(["npm", "run", "build"], cwd="dashboards")
|
||||
|
||||
# Optional: Restart Robyn to pick up new files
|
||||
# (or use file watching in development)
|
||||
```
|
||||
|
||||
**Trigger:** Run after SQLMesh `plan prod` completes successfully
|
||||
|
||||
### Deployment Strategy
|
||||
|
||||
- **Robyn app:** Deploy to supervisor instance or dedicated worker
|
||||
- **Evidence builds:** Built on deploy (run `npm run build` in CI/CD)
|
||||
- **DuckDB file:** Exported from Iceberg during deployment
|
||||
|
||||
**Deployment flow:**
|
||||
```
|
||||
GitLab master push
|
||||
↓
|
||||
CI/CD: Export coffee data from Iceberg → DuckDB
|
||||
↓
|
||||
CI/CD: Build Evidence dashboards (npm run build)
|
||||
↓
|
||||
Deploy Robyn app + Evidence build/ to supervisor/worker
|
||||
↓
|
||||
Robyn serves landing page + authenticated dashboards
|
||||
```
|
||||
|
||||
**Deliverable:** Automated pipeline: SQLMesh → Export → Evidence Rebuild → Deployment
|
||||
|
||||
## Alternative Architecture: nginx + FastCGI C
|
||||
|
||||
### Evaluation
|
||||
|
||||
**Current plan:** Robyn (Python web framework)
|
||||
**Alternative:** nginx + FastCGI C + kcgi library
|
||||
|
||||
### How It Would Work
|
||||
|
||||
```
|
||||
nginx (static files + Evidence dashboards)
|
||||
↓
|
||||
FastCGI C programs (auth, user management, Stripe webhooks)
|
||||
↓
|
||||
SQLite (user database)
|
||||
```
|
||||
|
||||
### Authentication Options
|
||||
|
||||
**Option 1: nginx JWT Module**
|
||||
- Use open-source JWT module (`kjdev/nginx-auth-jwt`)
|
||||
- nginx validates JWT before passing to FastCGI
|
||||
- FastCGI receives `REMOTE_USER` variable
|
||||
- **Complexity:** Medium (compile nginx with module)
|
||||
|
||||
**Option 2: FastCGI C Auth Service**
|
||||
- Separate FastCGI program validates JWT
|
||||
- nginx uses `auth_request` directive
|
||||
- Auth service returns 200 (valid) or 401 (invalid)
|
||||
- **Complexity:** Medium (need `libjwt` library)
|
||||
|
||||
**Option 3: FastCGI Handles Everything**
|
||||
- Main FastCGI program validates JWT inline
|
||||
- Uses `libjwt` for token parsing
|
||||
- **Complexity:** Medium (simplest architecture)
|
||||
|
||||
### Required C Libraries
|
||||
|
||||
- **FastCGI:** `kcgi` (modern, secure CGI/FastCGI library)
|
||||
- **JWT:** `libjwt` (JWT creation/validation)
|
||||
- **HTTP client:** `libcurl` (for Stripe API calls)
|
||||
- **JSON:** `json-c` or `cjson` (parsing Stripe webhook payloads)
|
||||
- **Database:** `libsqlite3` (user storage)
|
||||
- **Templating:** Manual string building (no C equivalent to Jinja2)
|
||||
|
||||
### Payment Integration
|
||||
|
||||
**Challenge:** No official Stripe C library
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Webhook-based approach (RECOMMENDED)**
|
||||
- Frontend uses Stripe.js (client-side checkout)
|
||||
- Stripe sends webhook to FastCGI endpoint
|
||||
- C program verifies webhook signature (HMAC-SHA256)
|
||||
- Updates user database (subscription status)
|
||||
- **Complexity:** Medium (simpler than full API integration)
|
||||
|
||||
2. **Direct API calls with libcurl**
|
||||
- Make HTTP POST to Stripe API
|
||||
- Build JSON payloads manually
|
||||
- Parse JSON responses with `json-c`
|
||||
- **Complexity:** High (manual HTTP/JSON handling)
|
||||
|
||||
### Development Time Estimate
|
||||
|
||||
| Task | Robyn (Python) | FastCGI (C) |
|
||||
|------|----------------|-------------|
|
||||
| Basic auth | 2-3 days | 5-7 days |
|
||||
| Payment integration | 3-5 days | 7-10 days |
|
||||
| Template rendering | 1-2 days | 5-7 days |
|
||||
| Debugging/testing | 1-2 days | 3-5 days |
|
||||
| **Total POC** | **1-2 weeks** | **3-4 weeks** |
|
||||
|
||||
### Performance Comparison
|
||||
|
||||
**Robyn (Python):** ~1,000-5,000 req/sec
|
||||
**nginx + FastCGI C:** ~10,000-50,000 req/sec
|
||||
|
||||
**Reality check:** For beanflows.coffee with <1000 users, even 100 req/sec is plenty.
|
||||
|
||||
### Pros & Cons
|
||||
|
||||
**Pros of C approach:**
|
||||
- 10-50x faster than Python
|
||||
- Lower memory footprint (~5-10MB vs 50-100MB)
|
||||
- Simpler deployment (compiled binary + nginx config)
|
||||
- More direct, no framework magic
|
||||
- Data-oriented, performance-first design
|
||||
|
||||
**Cons of C approach:**
|
||||
- 2-3x longer development time
|
||||
- More complex debugging (no interactive REPL)
|
||||
- Manual memory management (potential for leaks/bugs)
|
||||
- No templating library (build HTML with sprintf/snprintf)
|
||||
- Stripe integration requires manual HTTP/JSON handling
|
||||
- Steeper learning curve for team members
|
||||
|
||||
### Recommendation
|
||||
|
||||
**Start with Robyn, plan migration path to C:**
|
||||
|
||||
**Phase 1 (Now):** Build with Robyn
|
||||
- Fast development (1-2 weeks to POC)
|
||||
- Prove product-market fit
|
||||
- Get paying customers
|
||||
- Measure actual performance needs
|
||||
|
||||
**Phase 2 (After launch):** Evaluate performance
|
||||
- Monitor Robyn performance under real load
|
||||
- If Robyn handles <1000 users easily → stay with it
|
||||
- If hitting bottlenecks → profile to find hot paths
|
||||
|
||||
**Phase 3 (Optional, if needed):** Incremental C migration
|
||||
- Rewrite hot paths only (e.g., auth service)
|
||||
- Keep Evidence dashboards static (nginx serves directly)
|
||||
- Hybrid architecture: nginx → C (auth) → Robyn (business logic)
|
||||
|
||||
### Hybrid Architecture (Best of Both Worlds)
|
||||
|
||||
```
|
||||
nginx
|
||||
↓
|
||||
├─> Static files (Evidence dashboards) [nginx serves directly]
|
||||
├─> Auth endpoints (/login, /signup) [FastCGI C - future optimization]
|
||||
└─> Business logic (/api/*, /webhooks) [Robyn - for flexibility]
|
||||
```
|
||||
|
||||
**When to migrate:**
|
||||
- When Robyn becomes measurable bottleneck (>80% CPU under normal load)
|
||||
- When response times exceed targets (>100ms p95)
|
||||
- When memory usage becomes concern (>500MB for simple app)
|
||||
|
||||
**Philosophy:** Measure first, optimize second. Data-oriented approach means we don't guess about performance, we measure and optimize only when needed.
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Week 1:** Evidence POC + local DuckDB export
|
||||
- Create Evidence project
|
||||
- Export coffee data from Iceberg
|
||||
- Build simple production dashboard
|
||||
- Validate local dev workflow
|
||||
|
||||
2. **Week 2:** Robyn app + basic auth + Evidence embedding
|
||||
- Set up Robyn project
|
||||
- SQLite user database
|
||||
- JWT authentication
|
||||
- Landing page (Jinja2 + htmx)
|
||||
- Serve Evidence dashboards at `/dashboards/*`
|
||||
|
||||
3. **Week 3:** Coffee-specific dashboards + Stripe
|
||||
- Build 3-4 core coffee dashboards
|
||||
- Integrate Stripe checkout
|
||||
- Webhook handling for subscriptions
|
||||
- Basic user account page
|
||||
|
||||
4. **Week 4:** Automated rebuild pipeline + deployment
|
||||
- Automate Evidence rebuild after SQLMesh runs
|
||||
- CI/CD pipeline for deployment
|
||||
- Deploy to supervisor or dedicated worker
|
||||
- Monitoring and analytics
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Hosted auth:** Evaluate Clerk vs Auth0 vs roll-our-own
|
||||
- Clerk: $25/mo for 1000 MAU, nice DX
|
||||
- Auth0: Free tier 7500 MAU, more enterprise
|
||||
- Roll our own: $0, full control, more code
|
||||
- **Decision:** Start with roll-our-own JWT (simplest), migrate to hosted if auth becomes complex
|
||||
|
||||
2. **DuckDB sync:** How often to export from Iceberg?
|
||||
- Option A: Daily (after SQLMesh runs)
|
||||
- Option B: After every SQLMesh plan
|
||||
- **Decision:** Daily for now, automate after SQLMesh completion in production
|
||||
|
||||
3. **Evidence build time:** If builds are slow, need caching strategy
|
||||
- Monitor build times in Phase 1
|
||||
- If >60s, investigate Evidence cache options
|
||||
- May need incremental builds
|
||||
|
||||
4. **Multi-commodity future:** How to expand beyond coffee?
|
||||
- Code structure should be generic (parameterize commodity filter)
|
||||
- Could launch cocoa.flows, wheat.supply, etc.
|
||||
- Evidence supports parameterized pages (easy to expand)
|
||||
|
||||
5. **C migration decision point:** What metrics trigger rewrite?
|
||||
- CPU >80% sustained under normal load
|
||||
- Response times >100ms p95
|
||||
- Memory >500MB for simple app
|
||||
- User complaints about slowness
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**Phase 1 (POC):**
|
||||
- Evidence site builds successfully
|
||||
- Coffee data loads from DuckDB (<2s)
|
||||
- One dashboard renders with real data
|
||||
- Local dev server runs without errors
|
||||
|
||||
**Phase 2 (MVP):**
|
||||
- Robyn app runs and serves Evidence dashboards
|
||||
- JWT auth works (login/signup flow)
|
||||
- Landing page loads <2s
|
||||
- Dashboard access restricted to authenticated users
|
||||
|
||||
**Phase 3 (Launch):**
|
||||
- Stripe integration works (test payment succeeds)
|
||||
- 3-4 coffee dashboards functional
|
||||
- Automated deployment pipeline working
|
||||
- Monitoring in place (uptime, errors, performance)
|
||||
|
||||
**Phase 4 (Growth):**
|
||||
- User signups (track conversion rate)
|
||||
- Active subscribers (MRR growth)
|
||||
- Dashboard usage (which insights most valuable)
|
||||
- Performance metrics (response times, error rates)
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
**Current costs (data pipeline):**
|
||||
- Supervisor: €4.49/mo (Hetzner CPX11)
|
||||
- Workers: €0.01-0.05/day (ephemeral)
|
||||
- R2 Storage: ~€0.10/mo (Iceberg catalog)
|
||||
- **Total: ~€5/mo**
|
||||
|
||||
**Additional costs (SaaS frontend):**
|
||||
- Domain: €10/year (beanflows.coffee)
|
||||
- Robyn hosting: €0 (runs on supervisor or dedicated worker €4.49/mo)
|
||||
- Stripe fees: 2.9% + €0.30 per transaction
|
||||
- **Total: ~€5-10/mo base cost**
|
||||
|
||||
**Scaling costs:**
|
||||
- If need dedicated worker for Robyn: +€4.49/mo
|
||||
- If migrate to C: No additional cost (same infrastructure)
|
||||
- Stripe fees scale with revenue (good problem to have)
|
||||
|
||||
## Next Steps (When Ready)
|
||||
|
||||
1. Create `dashboards/` directory and initialize Evidence.dev
|
||||
2. Create SQLMesh export model for coffee data
|
||||
3. Build simple coffee production dashboard
|
||||
4. Set up Robyn project structure
|
||||
5. Implement basic JWT auth
|
||||
6. Integrate Evidence dashboards into Robyn
|
||||
|
||||
**Decision point:** After Phase 1 POC, re-evaluate C migration based on Evidence.dev capabilities and development experience.
|
||||
|
||||
## References
|
||||
|
||||
- Evidence.dev: https://docs.evidence.dev/
|
||||
- Robyn: https://github.com/sparckles/robyn
|
||||
- kcgi (C CGI library): https://kristaps.bsd.lv/kcgi/
|
||||
- libjwt: https://github.com/benmcollins/libjwt
|
||||
- nginx auth_request: https://nginx.org/en/docs/http/ngx_http_auth_request_module.html
|
||||
- Stripe webhooks: https://stripe.com/docs/webhooks
|
||||
Reference in New Issue
Block a user