beanflows/.claude/plans/saas-frontend-architecture.md

# SaaS Frontend Architecture Plan: beanflows.coffee

**Date**: 2025-10-21
**Status**: Planning
**Product**: beanflows.coffee - Coffee market analytics platform

## Project Vision

**beanflows.coffee** - A specialized coffee market analytics platform built on USDA PSD data, providing traders, roasters, and market analysts with actionable insights into global coffee production, trade flows, and supply chain dynamics.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│ Robyn Web App (beanflows.coffee)                           │
│                                                             │
│  Landing Page (Jinja2 + htmx) ─┬─> Auth (JWT + SQLite)    │
│                                 └─> /dashboards/* routes   │
│                                            │                │
│                                            ▼                │
│                                  Serve Evidence /build/    │
└─────────────────────────────────────────────────────────────┘
                                            │
                                            ▼
                              ┌──────────────────────────┐
                              │ Evidence.dev Dashboards  │
                              │ (coffee market focus)    │
                              │                          │
                              │ Queries: Local DuckDB ←──┼─── Export from Iceberg
                              │ Builds: On data updates  │
                              └──────────────────────────┘
```

## Technical Decisions

### Data Flow
- **Source:** Iceberg catalog (R2)
- **Export:** Local DuckDB file for Evidence dashboards
- **Trigger:** Rebuild Evidence after SQLMesh updates data
- **Serving:** Robyn serves Evidence static build output

### Auth System
- **User data:** SQLite database
- **Auth method:** JWT tokens (Robyn built-in support)
- **Consideration:** Evaluate hosted auth services (Clerk, Auth0)
- **POC approach:** Simple email/password with JWT

### Payments
- **Provider:** Stripe
- **Integration:** Webhook-based (Stripe.js on client, webhooks to Robyn)
- **Rationale:** Simplest integration, no need for complex server-side API calls

### Project Structure
```
materia/
├── web/                   # NEW: Robyn web application
│   ├── app.py            # Robyn entry point
│   ├── routes/
│   │   ├── landing.py    # Marketing page
│   │   ├── auth.py       # Login/signup (JWT)
│   │   └── dashboards.py # Serve Evidence /build/
│   ├── templates/        # Jinja2 + htmx
│   │   ├── base.html
│   │   ├── landing.html
│   │   └── login.html
│   ├── middleware/
│   │   └── auth.py       # JWT verification
│   ├── models.py         # SQLite schema (users table)
│   └── static/           # CSS, htmx.js
├── dashboards/           # NEW: Evidence.dev project
│   ├── pages/            # Dashboard markdown files
│   │   ├── index.md      # Global coffee overview
│   │   ├── production.md # Production trends
│   │   ├── trade.md      # Trade flows
│   │   └── supply.md     # Supply/demand balance
│   ├── sources/          # Data source configs
│   ├── data/             # Local DuckDB exports
│   │   └── coffee_data.duckdb
│   └── package.json
```

## How It Works: Robyn + Evidence Integration

### 1. Evidence Build Process
```bash
cd dashboards
npm run build
# Outputs static HTML/JS/CSS to dashboards/build/
```

### 2. Robyn Serves Evidence Output
```python
# web/routes/dashboards.py
@app.get("/dashboards/*")
@requires_jwt  # Custom middleware
def serve_dashboard(request):
    # Check authentication first
    if not verify_jwt(request):
        return redirect("/login")

    # Strip /dashboards/ prefix
    path = request.path.removeprefix("/dashboards/") or "index.html"

    # Serve from Evidence build directory
    file_path = Path("dashboards/build") / path

    if not file_path.exists():
        file_path = Path("dashboards/build/index.html")

    return FileResponse(file_path)
```

### 3. User Flow
1. User visits `beanflows.coffee` (landing page)
2. User signs up / logs in (Robyn auth system)
3. Stripe checkout for subscription (using Stripe.js)
4. User navigates to `beanflows.coffee/dashboards/`
5. Robyn checks JWT authentication
6. If authenticated: serves Evidence static files
7. If not: redirects to login

## Phase 1: Evidence.dev POC

**Goal:** Get Evidence working with coffee data

### Tasks
1. Create Evidence project in `dashboards/`
   ```bash
   mkdir dashboards && cd dashboards
   npm init evidence@latest .
   ```

2. Create SQLMesh export model for coffee data
   ```sql
   -- models/exports/export_coffee_analytics.sql
   COPY (
     SELECT * FROM serving.obt_commodity_metrics
     WHERE commodity_name ILIKE '%coffee%'
   ) TO 'dashboards/data/coffee_data.duckdb';
   ```

3. Build simple coffee production dashboard
   - Single dashboard showing coffee production trends
   - Test Evidence build process
   - Validate DuckDB query performance

4. Test local Evidence dev server
   ```bash
   npm run dev
   ```

**Deliverable:** Working Evidence dashboard querying local DuckDB

## Phase 2: Robyn Web App

### Tasks

1. Set up Robyn project in `web/`
   ```bash
   mkdir web && cd web
   uv add robyn jinja2
   ```

2. Implement SQLite user database
   ```python
   # web/models.py
   import sqlite3

   def init_db():
       conn = sqlite3.connect('users.db')
       conn.execute('''
           CREATE TABLE IF NOT EXISTS users (
               id INTEGER PRIMARY KEY,
               email TEXT UNIQUE NOT NULL,
               password_hash TEXT NOT NULL,
               stripe_customer_id TEXT,
               subscription_status TEXT,
               created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
           )
       ''')
       conn.close()
   ```

3. Add JWT authentication
   ```python
   # web/middleware/auth.py
   from robyn import Request
   import jwt

   def requires_jwt(func):
       def wrapper(request: Request):
           token = request.headers.get("Authorization")
           if not token:
               return redirect("/login")

           try:
               payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
               request.user = payload
               return func(request)
           except jwt.InvalidTokenError:
               return redirect("/login")

       return wrapper
   ```

4. Create landing page (Jinja2 + htmx)
   - Marketing copy
   - Feature highlights
   - Pricing section
   - Sign up CTA

5. Add dashboard serving route
   - Protected by JWT middleware
   - Serves Evidence `build/` directory

**Deliverable:** Authenticated web app serving Evidence dashboards

## Phase 3: Coffee Market Dashboards

### Dashboard Ideas

1. **Global Coffee Production Overview**
   - Top producing countries (Brazil, Vietnam, Colombia, Ethiopia, Honduras)
   - Arabica vs Robusta production split
   - Year-over-year production changes
   - Production volatility trends

2. **Supply & Demand Balance**
   - Stock-to-use ratios by country
   - Export/import flows (trade network visualization)
   - Consumption trends by region
   - Inventory levels (ending stocks)

3. **Market Volatility**
   - Production volatility (weather impacts, climate change signals)
   - Trade flow disruptions (sudden changes in export patterns)
   - Stock drawdown alerts (countries depleting reserves)

4. **Historical Trends**
   - 10-year production trends by country
   - Market share shifts (which countries gaining/losing)
   - Climate impact signals (correlation with weather events)
   - Long-term supply/demand balance

5. **Trade Flow Analysis**
   - Top exporters → top importers (Sankey diagram if possible)
   - Net trade position by country
   - Import dependency ratios
   - Trade balance trends

### Data Requirements

- Filter PSD data for coffee commodity codes
- May need new serving layer models:
  - `fct_coffee_trade_flows` - Origin/destination trade flows
  - `dim_coffee_varieties` - Arabica vs Robusta (if data available)
  - `agg_coffee_regional_summary` - Regional aggregates

**Deliverable:** Production-ready coffee analytics dashboards

## Phase 4: Deployment & Automation

### Evidence Build Trigger

Rebuild Evidence dashboards after SQLMesh updates data:

```python
# In SQLMesh post-hook or separate script
import subprocess
import httpx

def rebuild_dashboards():
    # Export fresh data from Iceberg to local DuckDB
    subprocess.run([
        "duckdb", "-c",
        "ATTACH 'iceberg_catalog' AS iceberg; "
        "COPY (SELECT * FROM iceberg.serving.obt_commodity_metrics "
        "WHERE commodity_name ILIKE '%coffee%') "
        "TO 'dashboards/data/coffee_data.duckdb';"
    ])

    # Rebuild Evidence
    subprocess.run(["npm", "run", "build"], cwd="dashboards")

    # Optional: Restart Robyn to pick up new files
    # (or use file watching in development)
```

**Trigger:** Run after SQLMesh `plan prod` completes successfully

### Deployment Strategy

- **Robyn app:** Deploy to supervisor instance or dedicated worker
- **Evidence builds:** Built on deploy (run `npm run build` in CI/CD)
- **DuckDB file:** Exported from Iceberg during deployment

**Deployment flow:**
```
GitLab master push
  ↓
CI/CD: Export coffee data from Iceberg → DuckDB
  ↓
CI/CD: Build Evidence dashboards (npm run build)
  ↓
Deploy Robyn app + Evidence build/ to supervisor/worker
  ↓
Robyn serves landing page + authenticated dashboards
```

**Deliverable:** Automated pipeline: SQLMesh → Export → Evidence Rebuild → Deployment

## Alternative Architecture: nginx + FastCGI C

### Evaluation

**Current plan:** Robyn (Python web framework)
**Alternative:** nginx + FastCGI C + kcgi library

### How It Would Work

```
nginx (static files + Evidence dashboards)
  ↓
FastCGI C programs (auth, user management, Stripe webhooks)
  ↓
SQLite (user database)
```

### Authentication Options

**Option 1: nginx JWT Module**
- Use open-source JWT module (`kjdev/nginx-auth-jwt`)
- nginx validates JWT before passing to FastCGI
- FastCGI receives `REMOTE_USER` variable
- **Complexity:** Medium (compile nginx with module)

**Option 2: FastCGI C Auth Service**
- Separate FastCGI program validates JWT
- nginx uses `auth_request` directive
- Auth service returns 200 (valid) or 401 (invalid)
- **Complexity:** Medium (need `libjwt` library)

**Option 3: FastCGI Handles Everything**
- Main FastCGI program validates JWT inline
- Uses `libjwt` for token parsing
- **Complexity:** Medium (simplest architecture)

### Required C Libraries

- **FastCGI:** `kcgi` (modern, secure CGI/FastCGI library)
- **JWT:** `libjwt` (JWT creation/validation)
- **HTTP client:** `libcurl` (for Stripe API calls)
- **JSON:** `json-c` or `cjson` (parsing Stripe webhook payloads)
- **Database:** `libsqlite3` (user storage)
- **Templating:** Manual string building (no C equivalent to Jinja2)

### Payment Integration

**Challenge:** No official Stripe C library

**Solutions:**

1. **Webhook-based approach (RECOMMENDED)**
   - Frontend uses Stripe.js (client-side checkout)
   - Stripe sends webhook to FastCGI endpoint
   - C program verifies webhook signature (HMAC-SHA256)
   - Updates user database (subscription status)
   - **Complexity:** Medium (simpler than full API integration)

2. **Direct API calls with libcurl**
   - Make HTTP POST to Stripe API
   - Build JSON payloads manually
   - Parse JSON responses with `json-c`
   - **Complexity:** High (manual HTTP/JSON handling)

### Development Time Estimate

| Task | Robyn (Python) | FastCGI (C) |
|------|----------------|-------------|
| Basic auth | 2-3 days | 5-7 days |
| Payment integration | 3-5 days | 7-10 days |
| Template rendering | 1-2 days | 5-7 days |
| Debugging/testing | 1-2 days | 3-5 days |
| **Total POC** | **1-2 weeks** | **3-4 weeks** |

### Performance Comparison

**Robyn (Python):** ~1,000-5,000 req/sec
**nginx + FastCGI C:** ~10,000-50,000 req/sec

**Reality check:** For beanflows.coffee with <1000 users, even 100 req/sec is plenty.

### Pros & Cons

**Pros of C approach:**
- 10-50x faster than Python
- Lower memory footprint (~5-10MB vs 50-100MB)
- Simpler deployment (compiled binary + nginx config)
- More direct, no framework magic
- Data-oriented, performance-first design

**Cons of C approach:**
- 2-3x longer development time
- More complex debugging (no interactive REPL)
- Manual memory management (potential for leaks/bugs)
- No templating library (build HTML with sprintf/snprintf)
- Stripe integration requires manual HTTP/JSON handling
- Steeper learning curve for team members

### Recommendation

**Start with Robyn, plan migration path to C:**

**Phase 1 (Now):** Build with Robyn
- Fast development (1-2 weeks to POC)
- Prove product-market fit
- Get paying customers
- Measure actual performance needs

**Phase 2 (After launch):** Evaluate performance
- Monitor Robyn performance under real load
- If Robyn handles <1000 users easily → stay with it
- If hitting bottlenecks → profile to find hot paths

**Phase 3 (Optional, if needed):** Incremental C migration
- Rewrite hot paths only (e.g., auth service)
- Keep Evidence dashboards static (nginx serves directly)
- Hybrid architecture: nginx → C (auth) → Robyn (business logic)

### Hybrid Architecture (Best of Both Worlds)

```
nginx
  ↓
  ├─> Static files (Evidence dashboards) [nginx serves directly]
  ├─> Auth endpoints (/login, /signup) [FastCGI C - future optimization]
  └─> Business logic (/api/*, /webhooks) [Robyn - for flexibility]
```

**When to migrate:**
- When Robyn becomes measurable bottleneck (>80% CPU under normal load)
- When response times exceed targets (>100ms p95)
- When memory usage becomes concern (>500MB for simple app)

**Philosophy:** Measure first, optimize second. Data-oriented approach means we don't guess about performance, we measure and optimize only when needed.

## Implementation Order

1. **Week 1:** Evidence POC + local DuckDB export
   - Create Evidence project
   - Export coffee data from Iceberg
   - Build simple production dashboard
   - Validate local dev workflow

2. **Week 2:** Robyn app + basic auth + Evidence embedding
   - Set up Robyn project
   - SQLite user database
   - JWT authentication
   - Landing page (Jinja2 + htmx)
   - Serve Evidence dashboards at `/dashboards/*`

3. **Week 3:** Coffee-specific dashboards + Stripe
   - Build 3-4 core coffee dashboards
   - Integrate Stripe checkout
   - Webhook handling for subscriptions
   - Basic user account page

4. **Week 4:** Automated rebuild pipeline + deployment
   - Automate Evidence rebuild after SQLMesh runs
   - CI/CD pipeline for deployment
   - Deploy to supervisor or dedicated worker
   - Monitoring and analytics

## Open Questions

1. **Hosted auth:** Evaluate Clerk vs Auth0 vs roll-our-own
   - Clerk: $25/mo for 1000 MAU, nice DX
   - Auth0: Free tier 7500 MAU, more enterprise
   - Roll our own: $0, full control, more code
   - **Decision:** Start with roll-our-own JWT (simplest), migrate to hosted if auth becomes complex

2. **DuckDB sync:** How often to export from Iceberg?
   - Option A: Daily (after SQLMesh runs)
   - Option B: After every SQLMesh plan
   - **Decision:** Daily for now, automate after SQLMesh completion in production

3. **Evidence build time:** If builds are slow, need caching strategy
   - Monitor build times in Phase 1
   - If >60s, investigate Evidence cache options
   - May need incremental builds

4. **Multi-commodity future:** How to expand beyond coffee?
   - Code structure should be generic (parameterize commodity filter)
   - Could launch cocoa.flows, wheat.supply, etc.
   - Evidence supports parameterized pages (easy to expand)

5. **C migration decision point:** What metrics trigger rewrite?
   - CPU >80% sustained under normal load
   - Response times >100ms p95
   - Memory >500MB for simple app
   - User complaints about slowness

## Success Metrics

**Phase 1 (POC):**
- Evidence site builds successfully
- Coffee data loads from DuckDB (<2s)
- One dashboard renders with real data
- Local dev server runs without errors

**Phase 2 (MVP):**
- Robyn app runs and serves Evidence dashboards
- JWT auth works (login/signup flow)
- Landing page loads <2s
- Dashboard access restricted to authenticated users

**Phase 3 (Launch):**
- Stripe integration works (test payment succeeds)
- 3-4 coffee dashboards functional
- Automated deployment pipeline working
- Monitoring in place (uptime, errors, performance)

**Phase 4 (Growth):**
- User signups (track conversion rate)
- Active subscribers (MRR growth)
- Dashboard usage (which insights most valuable)
- Performance metrics (response times, error rates)

## Cost Analysis

**Current costs (data pipeline):**
- Supervisor: €4.49/mo (Hetzner CPX11)
- Workers: €0.01-0.05/day (ephemeral)
- R2 Storage: ~€0.10/mo (Iceberg catalog)
- **Total: ~€5/mo**

**Additional costs (SaaS frontend):**
- Domain: €10/year (beanflows.coffee)
- Robyn hosting: €0 (runs on supervisor or dedicated worker €4.49/mo)
- Stripe fees: 2.9% + €0.30 per transaction
- **Total: ~€5-10/mo base cost**

**Scaling costs:**
- If need dedicated worker for Robyn: +€4.49/mo
- If migrate to C: No additional cost (same infrastructure)
- Stripe fees scale with revenue (good problem to have)

## Next Steps (When Ready)

1. Create `dashboards/` directory and initialize Evidence.dev
2. Create SQLMesh export model for coffee data
3. Build simple coffee production dashboard
4. Set up Robyn project structure
5. Implement basic JWT auth
6. Integrate Evidence dashboards into Robyn

**Decision point:** After Phase 1 POC, re-evaluate C migration based on Evidence.dev capabilities and development experience.

## References

- Evidence.dev: https://docs.evidence.dev/
- Robyn: https://github.com/sparckles/robyn
- kcgi (C CGI library): https://kristaps.bsd.lv/kcgi/
- libjwt: https://github.com/benmcollins/libjwt
- nginx auth_request: https://nginx.org/en/docs/http/ngx_http_auth_request_module.html
- Stripe webhooks: https://stripe.com/docs/webhooks