add untracked

This commit is contained in:
Deeman
2026-02-26 02:44:48 +01:00
parent 3629783bbf
commit 302ba07851
6 changed files with 2089 additions and 104 deletions

4
.gitignore vendored
View File

@@ -183,3 +183,7 @@ cython_debug/
data/ data/
.claude/worktrees/ .claude/worktrees/
.bedrock-state
.bedrockapikey
toggle-bedrock.sh

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,639 @@
# BeanFlows — Strategic Analysis
> Coffee commodity intelligence platform: USDA fundamentals + CFTC positioning + AIS physical flows → single clean API for trading desks.
---
## 1. Jobs-to-Be-Done Analysis
### Primary Job Statement
```
When I need to form a view on the coffee market before committing capital,
I want to quickly see the full fundamental picture — supply, demand,
positioning, and physical flows — in one place I can trust,
so I can make high-conviction trading decisions faster than
the other side of my trade.
```
**Altitude check:** This is the right level. Not too abstract ("be a profitable trader") and not a task ("download the WASDE PDF"). This job exists independently of any product.
### The Three Job Layers
**Functional Job:**
> "Get clean, normalized, query-ready coffee fundamental data into my models within minutes of release — not hours of manual wrangling."
**Emotional Job:**
> "Feel confident that my market view is built on complete, accurate data — that I'm not missing a signal my competitor caught."
**Social Job:**
> "Be the analyst on the desk who always has the numbers ready first. Be seen as rigorous and well-sourced by portfolio managers and senior traders."
**Key insight for BeanFlows:** The emotional and social jobs here are enormous. Trading is a status game. The analyst who pulls up a clean, instant view of USDA revisions while a competitor is still reformatting spreadsheets *looks competent to their PM*. That feeling of preparedness and speed is worth paying for even when the underlying data is technically public. You're not selling data — you're selling the feeling of being the best-informed person in the room.
### Struggling Moments
**Struggling Moment 1 — The WASDE Drop**
```
A junior coffee analyst at a trading house was trying to update their
supply/demand model when the USDA released the monthly WASDE report,
causing 30-45 minutes of frantic copy-pasting and reformatting into Excel,
making them realize their manual pipeline was too slow to inform
the desk's immediate trading response.
```
**Struggling Moment 2 — The Position Puzzle**
```
A portfolio manager at a commodity hedge fund was trying to understand
whether speculative positioning in coffee had become crowded when the
weekly CFTC COT report came out in a different format than expected,
causing their Python parsing script to break and miss the signal,
making them realize stitching together CFTC + USDA + their own models
was a fragile, high-risk process.
```
**Struggling Moment 3 — The Invisible Cargo**
```
A physical coffee trader was trying to assess whether Brazilian exports
were running ahead or behind seasonal norms when conflicting port
reports and shipping data made the picture unclear, causing uncertainty
about whether to hedge their forward book, making them realize they
had no reliable, real-time view of actual physical flows.
```
**Struggling Moment 4 — The New Hire**
```
A newly hired analyst at a commodity fund was trying to get up to speed
on coffee market fundamentals when they discovered the desk's "data
infrastructure" was a folder of brittle scripts written by someone
who left 18 months ago, causing two weeks of reverse-engineering
instead of analysis, making them realize there was no institutional
data layer for coffee.
```
**Signal strength:** Struggling Moments 1 and 2 validate V1 (USDA + CFTC cleanup). Struggling Moment 3 validates the AIS roadmap. Struggling Moment 4 validates the "whole product" play — becoming the institutional data layer that survives employee turnover.
### Four Forces of Switching
```
DRIVING SWITCH RESISTING SWITCH
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ PUSH (current pain) │ │ ANXIETY │
│ │ │ │
│ • WASDE drops break my │ │ • "What if the data has an │
│ workflow every month │ │ error and I trade on it?"│
│ • CFTC data requires hours │ │ • "What if this startup │
│ of reformatting │ │ disappears in 6 months?" │
│ • Internal scripts are │ │ • "Can I trust a one-person │
│ fragile, undocumented │ │ shop with my models?" │
│ • No visibility on physical │ │ • "What if pricing changes │
│ flows without paying │ │ after we're locked in?" │
│ $100K+ for Kpler/Bloomberg│ │ │
│ │ ├─────────────────────────────┤
├─────────────────────────────┤ │ HABIT │
│ PULL (BeanFlows promise) │ │ │
│ │ │ • "I've already built my │
│ • One API call = complete │ │ own scripts for this" │
│ fundamental picture │ │ • "My Excel models reference│
│ • Data ready in minutes, │ │ specific file formats" │
│ not hours after release │ │ • "Bloomberg is expensive │
│ • AIS shipping data at a │ │ but it's the standard" │
│ fraction of Kpler's price │ │ • "Switching cost of re- │
│ • Coffee-specific models │ │ piping my entire data │
│ and normalization │ │ stack feels high" │
└─────────────────────────────┘ └─────────────────────────────┘
```
**Analysis: Push is strong, Pull is strong, but Anxiety is VERY high.**
This is the defining challenge of DaaS in trading. One bad data point in a model that drives a $5M position = catastrophic. Your go-to-market must center on anxiety reduction, not feature selling.
### Anxiety Reduction Playbook (Critical for BeanFlows)
| Anxiety | Mitigation | Priority |
|---------|-----------|----------|
| "Data might have errors" | Publish methodology docs. Show data lineage for every field. Offer a "compare to source" view so they can audit. Run automated quality checks and publish accuracy scores. | **P0 — must have at launch** |
| "Startup might disappear" | Offer annual billing with data export guarantees. Open-source the schema. Publish your roadmap. Be transparent about financials if possible. | P1 |
| "Can't trust a small shop" | Pilot program with refund guarantee. Named customer testimonials (even 1-2 early). Published SLAs for uptime and data freshness. | P1 |
| "Switching cost is high" | Offer multiple delivery formats (JSON, CSV, Parquet, direct DB connection). Build Excel add-in. Match Bloomberg field naming conventions where possible. | P2 |
**The single most important page on your website isn't pricing — it's your data methodology page.** Traders will read it. If it's thorough and transparent, they'll trust you. If it's missing, they won't.
### Habit Reduction Playbook
| Habit | Bridge Strategy |
|-------|----------------|
| "I have my own scripts" | Offer a migration guide: "Currently pulling WASDE manually? Here's how to replace your pipeline with one API call." Show the before/after. |
| "My models expect specific formats" | Support CSV, JSON, Parquet. Offer a "Bloomberg-compatible" field mapping. Let them request custom column naming. |
| "Bloomberg is the default" | Don't fight Bloomberg head-on. Position as complementary: "Bloomberg for broad markets, BeanFlows for coffee depth." Many desks already supplement Bloomberg. |
### JTBD Competitive Map
```
SERVES FUNCTIONAL JOB WELL
OVERSERVED | WELL-SERVED
Bloomberg, | Kpler (oil/gas focus,
Refinitiv | coffee = afterthought)
(everything but |
coffee-specific) |
←───────────────────────┼───────────────────────→
DOESN'T SERVE | SERVES
EMOTIONAL/SOCIAL | EMOTIONAL/SOCIAL
|
UNDERSERVED | ★ BEANFLOWS TARGET ★
(no affordable | "Functional enough for V1,
coffee-specific | nails the emotional job
data solution) | of speed + confidence"
DOESN'T SERVE FUNCTIONAL JOB
```
**BeanFlows starts in the bottom-right quadrant** — you won't match Bloomberg's breadth, but you'll serve the emotional job (speed, confidence, looking sharp) better for coffee-specific work. As you add AIS data, you move up toward "well-served" on functional while keeping the emotional advantage.
### Job Canvas — Summary
```
┌──────────────────────────────────────────────────────────────────────┐
│ JOB CANVAS — BeanFlows │
├──────────────────────────────────────────────────────────────────────┤
│ TARGET CUSTOMER: Commodity analysts and traders at hedge funds, │
│ trading houses, and physical coffee companies who need to form │
│ market views quickly when government data drops. │
│ │
│ CORE JOB: When I need to form a view on the coffee market before │
│ committing capital, I want to see the full fundamental picture in │
│ one place I trust, so I can make high-conviction decisions faster │
│ than competitors. │
│ │
│ FUNCTIONAL: Get clean, normalized, query-ready coffee data into │
│ my models within minutes of release. │
│ EMOTIONAL: Feel confident I'm not missing signals. Feel prepared. │
│ SOCIAL: Be the analyst who always has the numbers first. │
│ │
│ STRUGGLING MOMENT: WASDE/COT report drops and the analyst's │
│ manual pipeline breaks or takes 30-60 min to update. │
│ │
│ CURRENT SOLUTIONS: │
│ • Bloomberg Terminal — hired for breadth, fired for coffee depth │
│ and $24K/yr/seat cost │
│ • Internal scripts — hired for customization, fired because fragile, │
│ undocumented, breaks on format changes │
│ • Manual Excel work — hired because "free," fired because slow and │
│ error-prone, makes analyst look behind │
│ • Kpler — hired for cargo intelligence, fired because coffee is a │
│ secondary commodity for them, pricing starts at enterprise level │
│ • Doing nothing — because "we've always done it this way" │
│ │
│ FORCES: │
│ Push [HIGH — fragile pipelines, time waste, missed signals] │
│ Pull [HIGH — one API, instant access, coffee-specific] │
│ Anxiety [VERY HIGH — data accuracy, startup risk, switching cost] │
│ Habit [MEDIUM — existing scripts, Bloomberg inertia] │
│ │
│ KEY INSIGHT: The job is never "I need data." The job is "I need to │
│ make a $10M decision with confidence in 30 minutes." Anxiety about │
│ data accuracy is the #1 blocker to adoption — more than price, │
│ more than features. Trust is the product. │
│ │
│ → PRODUCT: Start with USDA + CFTC via clean API. Add AIS for │
│ physical flow intelligence. Publish data lineage for every field. │
│ → MARKETING: Target the struggling moment. "WASDE drops in 10 │
│ minutes. Is your pipeline ready?" Show before/after. │
│ → PRICING: Anchor to Bloomberg ($24K/yr) and time saved (8-10 │
│ hrs/mo × $100/hr = $12K/yr). Price at $6-24K/yr feels like a │
│ bargain relative to both. │
└──────────────────────────────────────────────────────────────────────┘
```
---
## 2. Lean Canvas
```
┌─────────────────────┬──────────────────────┬─────────────────────┐
│ 2. PROBLEM │ 4. SOLUTION │ 1. CUSTOMER │
│ │ │ SEGMENTS │
│ P1: Coffee │ S1: Single API for │ │
│ fundamental data │ all USDA coffee │ EARLY ADOPTERS: │
│ (USDA, CFTC) is │ supply/demand + │ Junior-to-mid │
│ fragmented across │ CFTC positioning │ coffee/softs │
│ formats, painful │ data, cleaned and │ analysts at: │
│ to normalize │ normalized │ • Commodity hedge │
│ │ │ funds (50-200 │
│ P2: Internal data │ S2: AIS-based │ employees) │
│ pipelines are │ physical coffee │ • Physical trading │
│ fragile, break on │ flow tracking │ houses │
│ format changes, │ (Brazil, Vietnam, │ • Coffee hedging │
│ owned by one person │ Colombia → import │ desks at roasters │
│ who might leave │ ports) │ │
│ P3: No affordable │ │ Specifically: │
│ way to track │ S3: Data quality │ the analyst who │
│ physical coffee │ layer — lineage, │ currently maintains │
│ flows in real-time │ methodology docs, │ the desk's brittle │
│ │ accuracy scoring, │ data scripts and │
│ EXISTING │ source transparency │ hates it │
│ ALTERNATIVES: ├──────────────────────┤ │
│ • Bloomberg ($24K+) │ 3. UNIQUE VALUE PROP │ │
│ • Internal scripts │ │ │
│ • Manual Excel │ "The complete coffee │ │
│ • Kpler ($$$, │ fundamental data │ │
│ coffee is a │ stack — USDA, │ │
│ secondary focus) │ CFTC, and physical │ │
│ │ flows — in one clean │ │
│ │ API. Set up in │ │
│ │ minutes, not months."│ │
├─────────────────────┼──────────────────────┼─────────────────────┤
│ 8. KEY METRICS │ 5. CHANNELS │ 6. REVENUE STREAMS │
│ │ │ │
│ THE ONE METRIC: │ • Direct outreach │ Analyst: $499/mo │
│ # of desks with │ (LinkedIn, email │ (1 seat, USDA + │
│ BeanFlows piped │ to named analysts) │ CFTC, API access) │
│ into production │ • Coffee trading │ │
│ models (not trials │ conferences (ICO, │ Desk: $1,499/mo │
│ — production use) │ NCA, SCA events) │ (5 seats, + AIS │
│ │ • Weekly "BeanFlows │ flows, historical) │
│ Supporting: │ Coffee Data Brief" │ │
│ • API calls/day │ newsletter (free │ Enterprise: $3-5K/mo│
│ (engagement) │ content marketing) │ (unlimited seats, │
│ • Data freshness │ • Referrals from │ custom feeds, │
│ (latency to │ existing customers │ bulk export, │
│ source release) │ (tight community) │ priority support) │
│ • Error rate │ • Commodity data │ │
│ (trust metric) │ Twitter/X accounts │ MODEL: Annual │
│ │ and communities │ contracts preferred,│
│ │ │ monthly available │
├─────────────────────┼──────────────────────┼─────────────────────┤
│ 7. COST STRUCTURE │ 9. UNFAIR ADVANTAGE │
│ │ │
│ FIXED: │ TODAY: │
│ • Hetzner server: ~$50/mo │ • Capital efficiency│
│ • AIS data licensing: $500-2K/mo │ (Hetzner + DuckDB │
│ (once added) │ = near-zero │
│ • Domain, Paddle fees, tooling: ~$100/mo │ marginal cost) │
│ • Your time (biggest real cost) │ • Coffee-specific │
│ │ domain focus │
│ VARIABLE: │ │
│ • Support time per customer │ BUILDING TOWARD: │
│ • Data quality monitoring │ • Historical depth │
│ │ (time-series │
│ TOTAL: Can run for 12 months at <$3K/mo │ competitors can't │
│ with zero revenue. Very capital efficient. │ replicate) │
│ │ • AIS + fundamentals│
│ │ in one place │
│ │ (unique combo) │
│ │ • Workflow │
│ │ integration │
│ │ (switching costs) │
└────────────────────────────────────────────┴─────────────────────┘
```
### Lean Canvas — Key Assumptions to Test
| # | Assumption | Risk | Test |
|---|-----------|------|------|
| 1 | Coffee analysts spend 8-10+ hrs/mo on data wrangling | HIGH — if this is only 2 hrs, the pain isn't enough | Ask in first 5 demos: "Walk me through what happens when WASDE drops" |
| 2 | Trading desks will pay $500-1,500/mo for cleaned public data | HIGH — this is the core revenue assumption | Offer paid pilot at $299/mo with 3-month commitment. Credit card or PO = validated |
| 3 | You can reach 20+ decision-makers within 60 days | HIGH — if distribution is broken, nothing else matters | Track: outreach sent, responses received, demos booked. Need 10%+ response rate |
| 4 | AIS data can be acquired and licensed at viable margins | MEDIUM — licensing costs could eat margins | Get 3 AIS provider quotes before committing to the roadmap |
| 5 | Data accuracy will be high enough to maintain trust | CRITICAL — one error = lost customer forever | Build automated reconciliation against source. Publish accuracy scores |
---
## 3. Blue Ocean Strategy Canvas
### Competing Factors in Coffee Market Data
| Factor | Bloomberg | Internal Scripts | Manual Excel | Kpler | BeanFlows |
|--------|:---------:|:----------------:|:------------:|:-----:|:---------:|
| Breadth of data (commodities covered) | 5 | 1 | 1 | 4 | 1 |
| Coffee-specific depth | 2 | 3 | 2 | 2 | **5** |
| Data freshness / speed | 4 | 3 | 1 | 4 | **5** |
| API / programmatic access | 4 | 4 | 1 | 4 | **5** |
| Physical flow tracking | 2 | 0 | 0 | 5 | **4** (roadmap) |
| Setup time / ease of use | 2 | 1 | 4 | 2 | **5** |
| Price (inverted: 5=cheapest) | 1 | 5 | 5 | 1 | **4** |
| Data transparency / methodology | 2 | 1 | 1 | 3 | **5** |
| Maintenance burden on user | 2 | 1 | 1 | 3 | **5** |
| Historical time-series depth | 5 | 2 | 1 | 4 | 3 (growing) |
| Multi-asset analytics | 5 | 1 | 1 | 4 | 1 |
| Enterprise support / SLAs | 5 | 1 | 1 | 4 | 2 |
### Four Actions Framework
**ELIMINATE:**
- Multi-commodity breadth — don't try to cover 40 commodities. Coffee only.
- Enterprise sales theater — no 6-month RFP processes, no custom SOWs for V1
- Complex UI/dashboard features — lead with API, not a Bloomberg-clone interface
**REDUCE:**
- Enterprise support overhead — async support, documentation-first
- Feature count — fewer things, done perfectly. API + basic dashboard + data docs
- Historical depth initially — start with 5 years, build toward 20+
**RAISE:**
- Coffee-specific depth — every USDA table, every CFTC category, origin-level granularity
- Data freshness — minutes after source release, not hours
- Data transparency — full methodology docs, source lineage, accuracy scores
- Setup time — from first API call to data in their model in under 30 minutes
- Maintenance burden reduction — they never worry about format changes again
**CREATE:**
- Combined fundamentals + positioning + physical flows for coffee (nobody does this)
- "Data quality score" — transparent accuracy metrics per field, per source
- WASDE alert system — instant notification + pre-formatted data on release
- Migration guides from Bloomberg/manual workflows
- Coffee-specific data models (origin-level S&D, arabica vs. robusta splits)
### The BeanFlows Value Curve
```
High 5 │ ★ ★ ★ ★ ★
│ · · │ ★ │ │ │ │
4 │ │ │ │ │ · │ · │ │ │
│ │ │ │ │ │ │ │ │ │ │
3 │ │ │ │ │ │ · │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
2 │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
1 │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
0 └────┴───┴───┴───┴───┴───┴───┴───┴─────┴────┴────┴──
Brdth Coff Frsh API Phys Ease Prce Trns Mnt Hist MltA Ent
data depth acc flow (inv) depth asst supp
★ = BeanFlows · = Bloomberg (Kpler and internal scripts omitted for clarity)
```
**Positioning statement:**
> "Unlike Bloomberg which covers everything broadly, or internal scripts which break constantly, BeanFlows is the complete coffee data stack — fundamentals, positioning, and physical flows in one trusted API. Set up in minutes, always current, never breaks."
---
## 4. Wardley Map
### Value Chain — Coffee Trading Intelligence
```
Genesis Custom Product Commodity
(novel) (bespoke) (off-shelf) (utility)
│ │ │ │
VISIBLE User Need: │ │ │ │
(to user) "Make │ │ │ │
profitable │ │ │ │
coffee │ │ │ │
trades" ────┤ │ │ │
│ │ │ │
Trading ────┤ │ │ │
Decision │ │ │ │
Support │ │ │ │
│ │ │ │
Coffee- │ │ │ │
Specific ────┼──────────────┤ │ │
Intelligence │ ★ BUILD │ │ │
Layer │ HERE │ │ │
│ │ │ │
AIS Coffee ──┤ │ │ │
Flow ───┤ ★ BUILD │ │ │
Tracking │ HERE │ │ │
│ │ │ │
USDA/CFTC │ │ │ │
Data ────────┼──────────────┼──────────────┤ │
Aggregation │ │ ★ BUILD │ │
& Cleaning │ │ (fast, │ │
│ │ before │ │
│ │ commodit.) │ │
│ │ │ │
INVISIBLE API Layer ───┼──────────────┼──────────────┤ │
(REST/ │ │ │ │
GraphQL) │ │ │ │
│ │ │ │
DuckDB / │ │ │ │
SQLMesh ────┼──────────────┼──────────────┤ │
(transforms) │ │ │ │
│ │ │ │
Auth / │ │ │ │
Billing ────┼──────────────┼──────────────┼──────────────┤
(Paddle) │ │ │ USE (utility)│
│ │ │ │
Cloud │ │ │ │
Hosting ────┼──────────────┼──────────────┼──────────────┤
(Hetzner) │ │ │ USE (utility)│
│ │ │ │
Internet ────┼──────────────┼──────────────┼──────────────┤
│ │ │ USE (utility)│
```
### Strategic Reads from the Map
**1. USDA/CFTC aggregation is moving toward commodity.**
This is your V1, but it's not defensible long-term. Someone else can clean USDA data. The value here is speed-to-market and execution quality, not novelty. You must move up the value chain before this component commoditizes.
**Timeline pressure:** You have 12-18 months before a motivated competitor or an intern at a trading house replicates the basic USDA/CFTC cleanup. Use this window to add AIS and build historical depth.
**2. AIS coffee flow tracking is still genesis/custom.**
Nobody is doing coffee-specific physical flow intelligence well. Kpler does it for oil/gas/LNG. This is where your moat lives. Building this before anyone else gives you a time advantage that compounds (historical flow data can't be recreated retroactively).
**3. The intelligence layer is where long-term value lives.**
Raw data (even clean raw data) trends toward commodity. The strategic play is to climb from "data aggregation" to "coffee-specific intelligence":
```
DATA AGGREGATION (V1)
DATA + PHYSICAL FLOWS (V2) ← You are planning this
INTELLIGENCE LAYER (V3) ← This is where $100M ARR lives
• Anomaly detection (unusual flow patterns)
• Supply disruption early warnings
• Seasonal pattern analysis
• Cross-reference signals (positioning vs. physical flows)
• Predictive models (not price prediction — flow/supply prediction)
```
**4. Build vs. Buy decisions from the map:**
| Component | Decision | Reasoning |
|-----------|----------|-----------|
| Cloud hosting | BUY (Hetzner) | Commodity. Never build your own. |
| Auth/billing | BUY (Paddle) | Commodity. Don't waste time here. |
| Data transforms | BUILD (DuckDB + SQLMesh) | Product-stage but your core competency. Own this. |
| USDA/CFTC ingestion | BUILD (but fast) | Moving toward commodity. Build it quickly, move on. |
| AIS data | BUY raw + BUILD processing | Buy the raw AIS feed, build the coffee-specific intelligence on top. |
| Dashboard/UI | BUILD (minimal) | Keep lightweight (HTMX). The API is the product. |
| Coffee-specific ML/analytics | BUILD (future) | This is genesis. This is where your long-term moat lives. |
---
## 5. Demand-Side Sales — How Coffee Analysts Buy
### The Buying Timeline for BeanFlows
```
PASSIVE LOOKING ACTIVE LOOKING DECIDING CONSUMING
(3-12 months) (2-6 weeks) (1-4 weeks) (ongoing)
"Ugh, my WASDE "What's out there "OK, BeanFlows vs. "Is this actually
script broke again. for coffee data? Bloomberg data vs. better than what
There has to be a Let me look around." our internal stuff. I had before?"
better way..." Is it accurate?"
│ │ │ │
▼ ▼ ▼ ▼
YOUR MOVE: YOUR MOVE: YOUR MOVE: YOUR MOVE:
Content that names Be findable. SEO for Methodology docs. Fast onboarding.
their pain. "The "coffee market data Pilot program. "Try Quick wins in
Hidden Cost of Manual API", "USDA coffee it free for 2 weeks Week 1. "Your
Coffee Data Pipelines" data feed". Direct with your actual model is now auto-
blog post. Weekly outreach with a data stack." Named updating" moment.
data brief newsletter. specific struggling reference customers. Celebrate their
Conference talks. moment hook. Refund guarantee. time saved.
```
**Critical insight:** The buying cycle in commodity trading is **relationship-driven and trust-heavy**. A cold landing page won't close a $500+/mo deal with a trading desk. The sales motion is:
1. **Content → Credibility** (newsletter, conference presence, Twitter/X)
2. **Warm intro or direct outreach → Demo**
3. **Demo → Pilot (free or reduced rate)**
4. **Pilot → Production use → Annual contract**
This is a 2-4 month cycle for your first 5 customers, shortening to 2-4 weeks via referrals after that.
### Demand-Side Pricing Anchors
| Anchor | Value | BeanFlows Price Position |
|--------|-------|--------------------------|
| Bloomberg Terminal | $24,000/yr/seat | BeanFlows at $6-18K/yr is a fraction — and deeper on coffee |
| Analyst time wasted | 8-10 hrs/mo × $100-150/hr = $12-18K/yr | BeanFlows pays for itself in time saved alone |
| Kpler subscription | $50-100K+/yr for enterprise | BeanFlows AIS for coffee at $18-36K/yr is a fraction |
| Cost of one bad trade from stale data | $50K-$500K+ | Insurance framing: "What's one missed signal worth?" |
| Cost of building internally | 1 engineer × 3 months = $50-75K + ongoing maintenance | BeanFlows at $18K/yr is 75% cheaper with zero maintenance |
**Pricing confidence:** At $499-1,499/mo, BeanFlows is a rounding error for any desk that manages $10M+ in coffee positions. The price objection won't be "too expensive" — it'll be "can I trust it?"
---
## 6. Crossing the Chasm — Beachhead Strategy
### The Beachhead Segment
**Don't target:** "Commodity traders" (too broad)
**Don't target:** "Coffee market participants" (still too broad)
**Target:** Quantitative commodity analysts at mid-size hedge funds ($200M-$2B AUM) that trade soft commodities, have 2-5 people on the softs desk, and currently maintain internal data scripts for USDA/CFTC data.
**Why this beachhead:**
- They have the pain (maintaining data scripts isn't their job, but they're stuck doing it)
- They have the budget ($500-1,500/mo is trivial relative to AUM)
- They're technically sophisticated enough to value an API (vs. a dashboard-first buyer)
- They talk to each other (commodity analyst community is small and tight)
- They can make purchasing decisions without a 6-month procurement process
- Winning 10-15 of these funds = credible reference base for expanding to larger shops and physical traders
### Bowling Pin Sequence
```
Pin 1: Quant analysts at mid-size commodity hedge funds (softs focus)
↓ (referrals within the community)
Pin 2: Fundamental analysts at larger multi-strat hedge funds with softs exposure
↓ (credibility established)
Pin 3: Risk/hedging desks at physical coffee trading houses (Volcafe, Sucafina, etc.)
↓ (AIS data becomes the hook)
Pin 4: Hedging desks at large coffee roasters (Nestlé, JDE Peet's, Lavazza)
↓ (enterprise contracts, higher ACV)
Pin 5: Expand to cocoa, sugar, other soft commodities
```
### Whole Product for the Beachhead
For Pin 1 (quant analysts at mid-size hedge funds), the whole product is:
| Component | Status | Notes |
|-----------|--------|-------|
| Clean USDA coffee data via API | BUILD (V1) | Core product |
| Clean CFTC positioning via API | BUILD (V1) | Core product |
| Python client library | BUILD (V1) | `pip install beanflows` — critical for this segment |
| Data methodology documentation | BUILD (V1) | Trust = the product. Non-negotiable. |
| Example Jupyter notebooks | BUILD (V1) | Show how to pipe data into common model frameworks |
| Slack/email support (responsive) | YOU (V1) | Personal touch matters early. Be fast. |
| AIS physical flow data | BUILD (V2) | Differentiator that locks in the segment |
| Historical backfill (5+ years) | BUILD (ongoing) | Compounds over time. Start building day 1. |
| Excel add-in | BUILD (V3) | For the non-Python users on the desk |
| Community (Slack/Discord) | CONSIDER (V2) | Small enough community that this could be powerful |
**The "whole product" for V1 is: API + Python library + methodology docs + example notebooks + responsive support.** That's enough to win the beachhead segment. Everything else comes after you have 5-10 paying customers.
---
## 7. Synthesis — Strategic Roadmap
### Phase 1: Prove It (Month 1-3) — Target: 5 Paying Customers
**Goal:** Validate that coffee trading desks will pay for cleaned fundamental data.
- Ship V1: USDA + CFTC data via clean REST API
- Ship Python client (`pip install beanflows`)
- Publish data methodology docs (your trust moat)
- Direct outreach to 30+ named analysts at mid-size commodity funds
- Offer 2-week free pilot → $499/mo Analyst tier
- Success metric: 5 desks with BeanFlows in production models
**Key risk to test:** Can you reach and close these buyers without a warm network?
### Phase 2: Differentiate (Month 4-8) — Target: $15K MRR
**Goal:** Add AIS data to create a moat that cleaned USDA data alone can't provide.
- Secure AIS data licensing
- Build coffee-specific vessel tracking (origin ports → destination ports)
- Launch Desk tier ($1,499/mo) with AIS + historical data
- Upgrade existing customers, acquire new ones on the strength of AIS
- Publish weekly "BeanFlows Coffee Data Brief" (content marketing + credibility)
- Attend 1-2 commodity trading conferences for face-to-face relationship building
- Success metric: 10-15 customers, $15K+ MRR, 2+ customers on Desk tier
**Key risk to test:** Does AIS data for coffee justify 3x pricing? Will customers upgrade?
### Phase 3: Dominate Coffee (Month 9-18) — Target: $50K MRR
**Goal:** Become the default coffee data infrastructure for the beachhead segment.
- Build intelligence layer (anomaly detection, seasonal analysis, signal cross-referencing)
- Add Excel add-in for non-API users
- Expand to physical trading houses (Pin 2-3 in bowling pin sequence)
- Build historical depth (every month of data you accumulate = moat deepening)
- Consider Enterprise tier ($3-5K/mo) for larger shops
- Success metric: 25-35 customers, $50K+ MRR, <5% monthly churn, 120%+ NRR
### Phase 4: Expand (Month 18+) — Target: Path to $100K+ MRR
**Goal:** Replicate the model for adjacent soft commodities.
- Add cocoa, then sugar, then other softs
- Cross-sell existing customers (most trade multiple softs)
- Consider acquiring niche data sources
- Build toward the Kpler playbook: commodity intelligence platform for soft commodities
- At this point: evaluate whether to take capital for faster M&A consolidation
### Critical Assumptions Log
| # | Assumption | Status | How to Test | Kill Criteria |
|---|-----------|--------|-------------|---------------|
| 1 | Analysts spend 8+ hrs/mo on coffee data wrangling | UNTESTED | Ask in first 5 demos | If <3 hrs, pain is insufficient |
| 2 | Mid-size commodity funds will pay $499+/mo | UNTESTED | Paid pilot offers | If 0 of first 10 prospects convert to paid |
| 3 | You can reach 20+ decision-makers in 60 days | UNTESTED | Track outreach metrics | If <5% response rate on 50+ outreaches |
| 4 | AIS data licensing is viable at your margins | UNTESTED | Get 3 provider quotes | If licensing alone exceeds $3K/mo |
| 5 | Data accuracy is high enough for trading decisions | UNTESTED | Automated reconciliation vs. source | If error rate exceeds 0.1% |
| 6 | AIS addition justifies 3x pricing increase | UNTESTED | Customer reaction in demos | If <30% of existing customers upgrade |
---
## Key Strategic Insights
1. **Trust is the product, data is the delivery mechanism.** Your methodology docs, accuracy scores, and data lineage transparency aren't "nice to have" — they ARE the product for a trading audience. Budget 20% of your development time on trust infrastructure.
2. **The V1 moat is thin, and that's OK.** Cleaned USDA/CFTC data is replicable. Your moat in V1 is execution speed and being first with a coffee-specific offering. The real moat builds in V2 (AIS) and compounds in V3+ (historical depth + intelligence layer). You're racing to add layers before anyone copies V1.
3. **Distribution is your #1 existential risk.** The product can be perfect and it won't matter if you can't get 5 demos in the first month. Solve distribution before you polish features. If you don't have warm relationships in commodity trading, finding a way in (advisor, conference, content) is job #1.
4. **The Kpler playbook is your North Star, but be patient.** Kpler bootstrapped for 8 years. They started with one commodity flow type. They were cashflow positive in the first quarter. Copy their discipline: prove it on coffee, prove the economics, then expand deliberately.
5. **Sell the unfair advantage, not the data.** Nobody buys "clean data." They buy "I saw the Brazilian export surge 3 days before the market priced it in." Every piece of marketing, every demo, every conversation should be anchored to the trading decision the data enables, not the data itself.

View File

@@ -1,103 +0,0 @@
# Data Engineering Pipeline Layers & Naming Conventions
This document outlines the standard layered architecture and model naming conventions for our data platform. Adhering to these standards is crucial for maintaining a clean, scalable, and understandable project.
---
## Data Pipeline Layers
Each layer has a distinct purpose, transforming data from its raw state into a curated, analysis-ready format.
### 1. Raw Layer
The initial landing zone for all data ingested from source systems.
* **Purpose:** To create a permanent, immutable archive of source data.
* **Key Activities:**
* Data is ingested and stored in its original, unaltered format.
* Serves as the definitive source of truth, enabling reprocessing of the entire pipeline if needed.
* No transformations or schema enforcement occur at this stage.
### 2. Staging Layer
A workspace for initial data preparation and technical validation.
* **Purpose:** To convert raw data into a structured, technically sound format.
* **Key Activities:**
* **Schema Application:** A schema is applied to the raw data.
* **Data Typing:** Columns are cast to their correct data types (e.g., string to timestamp, integer to decimal).
* **Basic Cleansing:** Handles technical errors like malformed records and standardizes null values.
### 3. Cleaned Layer
The integrated core of the data platform, designed to create a "single version of the facts."
* **Purpose:** To integrate data from various sources into a unified, consistent, and historically accurate model.
* **Key Activities:**
* **Business Logic:** Complex business rules are applied to conform and validate the data.
* **Integration:** Data from different sources is combined using business keys.
* **Core Modeling:** Data is structured into a robust, integrated model (e.g., a Data Vault) that represents core business processes.
### 4. Serving Layer
The final, presentation-ready layer optimized for analytics, reporting, and business intelligence.
* **Purpose:** To provide high-performance, easy-to-query data for end-users.
* **Key Activities:**
* **Analytics Modeling:** Data from the Cleaned Layer is transformed into user-friendly models, such as **Fact and Dimension tables** (star schemas).
* **Aggregation:** Key business metrics and KPIs are pre-calculated to accelerate queries.
* **Consumption:** This layer feeds dashboards, reports, and analytical tools. It is often loaded into a dedicated Data Warehouse for optimal performance.
---
## Model Naming Conventions
A consistent naming convention helps us understand a model's purpose at a glance.
### Guiding Principles
1. **Be Explicit:** Names should clearly state the layer, source, and entity.
2. **Be Consistent:** Use the same patterns and abbreviations everywhere.
3. **Use Prefixes:** Start filenames and model names with the layer to group them logically.
### Layer-by-Layer Naming Scheme
#### 1. Raw / Sources Layer
This layer is for defining sources, not models. The convention is to name the source after the system it comes from.
* **Source Name:** `[source_system]` (e.g., `salesforce`, `google_ads`)
* **Table Name:** `[original_table_name]` (e.g., `account`, `ads_performance`)
#### 2. Staging Layer
Staging models have a 1:1 relationship with a source table.
* **Pattern:** `stg_[source_system]__[entity_name]`
* **Examples:**
* `stg_stripe__charges.sql`
* `stg_google_ads__campaigns.sql`
#### 3. Cleaned Layer
This is the integration layer for building unified business entities or a Data Vault.
* **Pattern (Integrated Entity):** `cln_[entity_name]`
* **Pattern (Data Vault):** `cln_[vault_component]_[entity_name]`
* **Examples:**
* `cln_customers.sql`
* `cln_hub_customers.sql`
* `cln_sat_customer_details.sql`
#### 4. Serving Layer
This layer contains business-friendly models for consumption.
* **Pattern (Dimension):** `dim_[entity_name]`
* **Pattern (Fact):** `fct_[business_process]`
* **Pattern (Aggregate):** `agg_[aggregation_description]`
* **Examples:**
* `dim_customers.sql`
* `fct_orders.sql`
* `agg_monthly_revenue_by_region.sql`
### Summary Table
| Layer | Purpose | Filename / Model Name Example | Notes |
| :------ | :---------------------- | :---------------------------------------- | :---------------------------------------------- |
| Raw | Source Declaration | `sources.yml` (for `stripe`, `charges`) | No models, just declarations. |
| Staging | Basic Cleansing & Typing | `stg_stripe__charges.sql` | 1:1 with source tables. |
| Cleaned | Integration & Core Models | `cln_customers.sql` or `cln_hub_customers.sql` | Integrates sources. Your Data Vault lives here. |
| Serving | Analytics & BI | `dim_customers.sql` or `fct_orders.sql` | Business-facing, optimized for queries. |

View File

@@ -64,7 +64,7 @@ serving/ ← pre-aggregated for web app
**seeds/** — Static lookup tables (commodity codes, attribute codes, unit of measure) loaded from `seeds/*.csv`. Referenced by staging. **seeds/** — Static lookup tables (commodity codes, attribute codes, unit of measure) loaded from `seeds/*.csv`. Referenced by staging.
**foundation/** — All other sources (prices, COT, ICE): reads landing CSVs directly via glob macros, casts types, deduplicates. Uses INCREMENTAL_BY_TIME_RANGE. Also holds `dim_commodity` (the cross-source identity mapping). **foundation/** — All other sources (prices, COT, ICE): reads landing data (e.g. CSVs) directly via glob macros, casts types, deduplicates. Uses INCREMENTAL_BY_TIME_RANGE. Also holds `dim_commodity` (the cross-source identity mapping).
**serving/** — Analytics-ready aggregates consumed by the web app via `analytics.duckdb`. Pre-computes moving averages, COT indices, MoM changes. These are the only tables the web app reads. **serving/** — Analytics-ready aggregates consumed by the web app via `analytics.duckdb`. Pre-computes moving averages, COT indices, MoM changes. These are the only tables the web app reads.