add untracked

This commit is contained in:
Deeman
2026-02-26 02:44:48 +01:00
parent 3629783bbf
commit 302ba07851
6 changed files with 2089 additions and 104 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,639 @@
# BeanFlows — Strategic Analysis
> Coffee commodity intelligence platform: USDA fundamentals + CFTC positioning + AIS physical flows → single clean API for trading desks.
---
## 1. Jobs-to-Be-Done Analysis
### Primary Job Statement
```
When I need to form a view on the coffee market before committing capital,
I want to quickly see the full fundamental picture — supply, demand,
positioning, and physical flows — in one place I can trust,
so I can make high-conviction trading decisions faster than
the other side of my trade.
```
**Altitude check:** This is the right level. Not too abstract ("be a profitable trader") and not a task ("download the WASDE PDF"). This job exists independently of any product.
### The Three Job Layers
**Functional Job:**
> "Get clean, normalized, query-ready coffee fundamental data into my models within minutes of release — not hours of manual wrangling."
**Emotional Job:**
> "Feel confident that my market view is built on complete, accurate data — that I'm not missing a signal my competitor caught."
**Social Job:**
> "Be the analyst on the desk who always has the numbers ready first. Be seen as rigorous and well-sourced by portfolio managers and senior traders."
**Key insight for BeanFlows:** The emotional and social jobs here are enormous. Trading is a status game. The analyst who pulls up a clean, instant view of USDA revisions while a competitor is still reformatting spreadsheets *looks competent to their PM*. That feeling of preparedness and speed is worth paying for even when the underlying data is technically public. You're not selling data — you're selling the feeling of being the best-informed person in the room.
### Struggling Moments
**Struggling Moment 1 — The WASDE Drop**
```
A junior coffee analyst at a trading house was trying to update their
supply/demand model when the USDA released the monthly WASDE report,
causing 30-45 minutes of frantic copy-pasting and reformatting into Excel,
making them realize their manual pipeline was too slow to inform
the desk's immediate trading response.
```
**Struggling Moment 2 — The Position Puzzle**
```
A portfolio manager at a commodity hedge fund was trying to understand
whether speculative positioning in coffee had become crowded when the
weekly CFTC COT report came out in a different format than expected,
causing their Python parsing script to break and miss the signal,
making them realize stitching together CFTC + USDA + their own models
was a fragile, high-risk process.
```
**Struggling Moment 3 — The Invisible Cargo**
```
A physical coffee trader was trying to assess whether Brazilian exports
were running ahead or behind seasonal norms when conflicting port
reports and shipping data made the picture unclear, causing uncertainty
about whether to hedge their forward book, making them realize they
had no reliable, real-time view of actual physical flows.
```
**Struggling Moment 4 — The New Hire**
```
A newly hired analyst at a commodity fund was trying to get up to speed
on coffee market fundamentals when they discovered the desk's "data
infrastructure" was a folder of brittle scripts written by someone
who left 18 months ago, causing two weeks of reverse-engineering
instead of analysis, making them realize there was no institutional
data layer for coffee.
```
**Signal strength:** Struggling Moments 1 and 2 validate V1 (USDA + CFTC cleanup). Struggling Moment 3 validates the AIS roadmap. Struggling Moment 4 validates the "whole product" play — becoming the institutional data layer that survives employee turnover.
### Four Forces of Switching
```
DRIVING SWITCH RESISTING SWITCH
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ PUSH (current pain) │ │ ANXIETY │
│ │ │ │
│ • WASDE drops break my │ │ • "What if the data has an │
│ workflow every month │ │ error and I trade on it?"│
│ • CFTC data requires hours │ │ • "What if this startup │
│ of reformatting │ │ disappears in 6 months?" │
│ • Internal scripts are │ │ • "Can I trust a one-person │
│ fragile, undocumented │ │ shop with my models?" │
│ • No visibility on physical │ │ • "What if pricing changes │
│ flows without paying │ │ after we're locked in?" │
│ $100K+ for Kpler/Bloomberg│ │ │
│ │ ├─────────────────────────────┤
├─────────────────────────────┤ │ HABIT │
│ PULL (BeanFlows promise) │ │ │
│ │ │ • "I've already built my │
│ • One API call = complete │ │ own scripts for this" │
│ fundamental picture │ │ • "My Excel models reference│
│ • Data ready in minutes, │ │ specific file formats" │
│ not hours after release │ │ • "Bloomberg is expensive │
│ • AIS shipping data at a │ │ but it's the standard" │
│ fraction of Kpler's price │ │ • "Switching cost of re- │
│ • Coffee-specific models │ │ piping my entire data │
│ and normalization │ │ stack feels high" │
└─────────────────────────────┘ └─────────────────────────────┘
```
**Analysis: Push is strong, Pull is strong, but Anxiety is VERY high.**
This is the defining challenge of DaaS in trading. One bad data point in a model that drives a $5M position = catastrophic. Your go-to-market must center on anxiety reduction, not feature selling.
### Anxiety Reduction Playbook (Critical for BeanFlows)
| Anxiety | Mitigation | Priority |
|---------|-----------|----------|
| "Data might have errors" | Publish methodology docs. Show data lineage for every field. Offer a "compare to source" view so they can audit. Run automated quality checks and publish accuracy scores. | **P0 — must have at launch** |
| "Startup might disappear" | Offer annual billing with data export guarantees. Open-source the schema. Publish your roadmap. Be transparent about financials if possible. | P1 |
| "Can't trust a small shop" | Pilot program with refund guarantee. Named customer testimonials (even 1-2 early). Published SLAs for uptime and data freshness. | P1 |
| "Switching cost is high" | Offer multiple delivery formats (JSON, CSV, Parquet, direct DB connection). Build Excel add-in. Match Bloomberg field naming conventions where possible. | P2 |
**The single most important page on your website isn't pricing — it's your data methodology page.** Traders will read it. If it's thorough and transparent, they'll trust you. If it's missing, they won't.
### Habit Reduction Playbook
| Habit | Bridge Strategy |
|-------|----------------|
| "I have my own scripts" | Offer a migration guide: "Currently pulling WASDE manually? Here's how to replace your pipeline with one API call." Show the before/after. |
| "My models expect specific formats" | Support CSV, JSON, Parquet. Offer a "Bloomberg-compatible" field mapping. Let them request custom column naming. |
| "Bloomberg is the default" | Don't fight Bloomberg head-on. Position as complementary: "Bloomberg for broad markets, BeanFlows for coffee depth." Many desks already supplement Bloomberg. |
### JTBD Competitive Map
```
SERVES FUNCTIONAL JOB WELL
OVERSERVED | WELL-SERVED
Bloomberg, | Kpler (oil/gas focus,
Refinitiv | coffee = afterthought)
(everything but |
coffee-specific) |
←───────────────────────┼───────────────────────→
DOESN'T SERVE | SERVES
EMOTIONAL/SOCIAL | EMOTIONAL/SOCIAL
|
UNDERSERVED | ★ BEANFLOWS TARGET ★
(no affordable | "Functional enough for V1,
coffee-specific | nails the emotional job
data solution) | of speed + confidence"
DOESN'T SERVE FUNCTIONAL JOB
```
**BeanFlows starts in the bottom-right quadrant** — you won't match Bloomberg's breadth, but you'll serve the emotional job (speed, confidence, looking sharp) better for coffee-specific work. As you add AIS data, you move up toward "well-served" on functional while keeping the emotional advantage.
### Job Canvas — Summary
```
┌──────────────────────────────────────────────────────────────────────┐
│ JOB CANVAS — BeanFlows │
├──────────────────────────────────────────────────────────────────────┤
│ TARGET CUSTOMER: Commodity analysts and traders at hedge funds, │
│ trading houses, and physical coffee companies who need to form │
│ market views quickly when government data drops. │
│ │
│ CORE JOB: When I need to form a view on the coffee market before │
│ committing capital, I want to see the full fundamental picture in │
│ one place I trust, so I can make high-conviction decisions faster │
│ than competitors. │
│ │
│ FUNCTIONAL: Get clean, normalized, query-ready coffee data into │
│ my models within minutes of release. │
│ EMOTIONAL: Feel confident I'm not missing signals. Feel prepared. │
│ SOCIAL: Be the analyst who always has the numbers first. │
│ │
│ STRUGGLING MOMENT: WASDE/COT report drops and the analyst's │
│ manual pipeline breaks or takes 30-60 min to update. │
│ │
│ CURRENT SOLUTIONS: │
│ • Bloomberg Terminal — hired for breadth, fired for coffee depth │
│ and $24K/yr/seat cost │
│ • Internal scripts — hired for customization, fired because fragile, │
│ undocumented, breaks on format changes │
│ • Manual Excel work — hired because "free," fired because slow and │
│ error-prone, makes analyst look behind │
│ • Kpler — hired for cargo intelligence, fired because coffee is a │
│ secondary commodity for them, pricing starts at enterprise level │
│ • Doing nothing — because "we've always done it this way" │
│ │
│ FORCES: │
│ Push [HIGH — fragile pipelines, time waste, missed signals] │
│ Pull [HIGH — one API, instant access, coffee-specific] │
│ Anxiety [VERY HIGH — data accuracy, startup risk, switching cost] │
│ Habit [MEDIUM — existing scripts, Bloomberg inertia] │
│ │
│ KEY INSIGHT: The job is never "I need data." The job is "I need to │
│ make a $10M decision with confidence in 30 minutes." Anxiety about │
│ data accuracy is the #1 blocker to adoption — more than price, │
│ more than features. Trust is the product. │
│ │
│ → PRODUCT: Start with USDA + CFTC via clean API. Add AIS for │
│ physical flow intelligence. Publish data lineage for every field. │
│ → MARKETING: Target the struggling moment. "WASDE drops in 10 │
│ minutes. Is your pipeline ready?" Show before/after. │
│ → PRICING: Anchor to Bloomberg ($24K/yr) and time saved (8-10 │
│ hrs/mo × $100/hr = $12K/yr). Price at $6-24K/yr feels like a │
│ bargain relative to both. │
└──────────────────────────────────────────────────────────────────────┘
```
---
## 2. Lean Canvas
```
┌─────────────────────┬──────────────────────┬─────────────────────┐
│ 2. PROBLEM │ 4. SOLUTION │ 1. CUSTOMER │
│ │ │ SEGMENTS │
│ P1: Coffee │ S1: Single API for │ │
│ fundamental data │ all USDA coffee │ EARLY ADOPTERS: │
│ (USDA, CFTC) is │ supply/demand + │ Junior-to-mid │
│ fragmented across │ CFTC positioning │ coffee/softs │
│ formats, painful │ data, cleaned and │ analysts at: │
│ to normalize │ normalized │ • Commodity hedge │
│ │ │ funds (50-200 │
│ P2: Internal data │ S2: AIS-based │ employees) │
│ pipelines are │ physical coffee │ • Physical trading │
│ fragile, break on │ flow tracking │ houses │
│ format changes, │ (Brazil, Vietnam, │ • Coffee hedging │
│ owned by one person │ Colombia → import │ desks at roasters │
│ who might leave │ ports) │ │
│ P3: No affordable │ │ Specifically: │
│ way to track │ S3: Data quality │ the analyst who │
│ physical coffee │ layer — lineage, │ currently maintains │
│ flows in real-time │ methodology docs, │ the desk's brittle │
│ │ accuracy scoring, │ data scripts and │
│ EXISTING │ source transparency │ hates it │
│ ALTERNATIVES: ├──────────────────────┤ │
│ • Bloomberg ($24K+) │ 3. UNIQUE VALUE PROP │ │
│ • Internal scripts │ │ │
│ • Manual Excel │ "The complete coffee │ │
│ • Kpler ($$$, │ fundamental data │ │
│ coffee is a │ stack — USDA, │ │
│ secondary focus) │ CFTC, and physical │ │
│ │ flows — in one clean │ │
│ │ API. Set up in │ │
│ │ minutes, not months."│ │
├─────────────────────┼──────────────────────┼─────────────────────┤
│ 8. KEY METRICS │ 5. CHANNELS │ 6. REVENUE STREAMS │
│ │ │ │
│ THE ONE METRIC: │ • Direct outreach │ Analyst: $499/mo │
│ # of desks with │ (LinkedIn, email │ (1 seat, USDA + │
│ BeanFlows piped │ to named analysts) │ CFTC, API access) │
│ into production │ • Coffee trading │ │
│ models (not trials │ conferences (ICO, │ Desk: $1,499/mo │
│ — production use) │ NCA, SCA events) │ (5 seats, + AIS │
│ │ • Weekly "BeanFlows │ flows, historical) │
│ Supporting: │ Coffee Data Brief" │ │
│ • API calls/day │ newsletter (free │ Enterprise: $3-5K/mo│
│ (engagement) │ content marketing) │ (unlimited seats, │
│ • Data freshness │ • Referrals from │ custom feeds, │
│ (latency to │ existing customers │ bulk export, │
│ source release) │ (tight community) │ priority support) │
│ • Error rate │ • Commodity data │ │
│ (trust metric) │ Twitter/X accounts │ MODEL: Annual │
│ │ and communities │ contracts preferred,│
│ │ │ monthly available │
├─────────────────────┼──────────────────────┼─────────────────────┤
│ 7. COST STRUCTURE │ 9. UNFAIR ADVANTAGE │
│ │ │
│ FIXED: │ TODAY: │
│ • Hetzner server: ~$50/mo │ • Capital efficiency│
│ • AIS data licensing: $500-2K/mo │ (Hetzner + DuckDB │
│ (once added) │ = near-zero │
│ • Domain, Paddle fees, tooling: ~$100/mo │ marginal cost) │
│ • Your time (biggest real cost) │ • Coffee-specific │
│ │ domain focus │
│ VARIABLE: │ │
│ • Support time per customer │ BUILDING TOWARD: │
│ • Data quality monitoring │ • Historical depth │
│ │ (time-series │
│ TOTAL: Can run for 12 months at <$3K/mo │ competitors can't │
│ with zero revenue. Very capital efficient. │ replicate) │
│ │ • AIS + fundamentals│
│ │ in one place │
│ │ (unique combo) │
│ │ • Workflow │
│ │ integration │
│ │ (switching costs) │
└────────────────────────────────────────────┴─────────────────────┘
```
### Lean Canvas — Key Assumptions to Test
| # | Assumption | Risk | Test |
|---|-----------|------|------|
| 1 | Coffee analysts spend 8-10+ hrs/mo on data wrangling | HIGH — if this is only 2 hrs, the pain isn't enough | Ask in first 5 demos: "Walk me through what happens when WASDE drops" |
| 2 | Trading desks will pay $500-1,500/mo for cleaned public data | HIGH — this is the core revenue assumption | Offer paid pilot at $299/mo with 3-month commitment. Credit card or PO = validated |
| 3 | You can reach 20+ decision-makers within 60 days | HIGH — if distribution is broken, nothing else matters | Track: outreach sent, responses received, demos booked. Need 10%+ response rate |
| 4 | AIS data can be acquired and licensed at viable margins | MEDIUM — licensing costs could eat margins | Get 3 AIS provider quotes before committing to the roadmap |
| 5 | Data accuracy will be high enough to maintain trust | CRITICAL — one error = lost customer forever | Build automated reconciliation against source. Publish accuracy scores |
---
## 3. Blue Ocean Strategy Canvas
### Competing Factors in Coffee Market Data
| Factor | Bloomberg | Internal Scripts | Manual Excel | Kpler | BeanFlows |
|--------|:---------:|:----------------:|:------------:|:-----:|:---------:|
| Breadth of data (commodities covered) | 5 | 1 | 1 | 4 | 1 |
| Coffee-specific depth | 2 | 3 | 2 | 2 | **5** |
| Data freshness / speed | 4 | 3 | 1 | 4 | **5** |
| API / programmatic access | 4 | 4 | 1 | 4 | **5** |
| Physical flow tracking | 2 | 0 | 0 | 5 | **4** (roadmap) |
| Setup time / ease of use | 2 | 1 | 4 | 2 | **5** |
| Price (inverted: 5=cheapest) | 1 | 5 | 5 | 1 | **4** |
| Data transparency / methodology | 2 | 1 | 1 | 3 | **5** |
| Maintenance burden on user | 2 | 1 | 1 | 3 | **5** |
| Historical time-series depth | 5 | 2 | 1 | 4 | 3 (growing) |
| Multi-asset analytics | 5 | 1 | 1 | 4 | 1 |
| Enterprise support / SLAs | 5 | 1 | 1 | 4 | 2 |
### Four Actions Framework
**ELIMINATE:**
- Multi-commodity breadth — don't try to cover 40 commodities. Coffee only.
- Enterprise sales theater — no 6-month RFP processes, no custom SOWs for V1
- Complex UI/dashboard features — lead with API, not a Bloomberg-clone interface
**REDUCE:**
- Enterprise support overhead — async support, documentation-first
- Feature count — fewer things, done perfectly. API + basic dashboard + data docs
- Historical depth initially — start with 5 years, build toward 20+
**RAISE:**
- Coffee-specific depth — every USDA table, every CFTC category, origin-level granularity
- Data freshness — minutes after source release, not hours
- Data transparency — full methodology docs, source lineage, accuracy scores
- Setup time — from first API call to data in their model in under 30 minutes
- Maintenance burden reduction — they never worry about format changes again
**CREATE:**
- Combined fundamentals + positioning + physical flows for coffee (nobody does this)
- "Data quality score" — transparent accuracy metrics per field, per source
- WASDE alert system — instant notification + pre-formatted data on release
- Migration guides from Bloomberg/manual workflows
- Coffee-specific data models (origin-level S&D, arabica vs. robusta splits)
### The BeanFlows Value Curve
```
High 5 │ ★ ★ ★ ★ ★
│ · · │ ★ │ │ │ │
4 │ │ │ │ │ · │ · │ │ │
│ │ │ │ │ │ │ │ │ │ │
3 │ │ │ │ │ │ · │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
2 │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
1 │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
0 └────┴───┴───┴───┴───┴───┴───┴───┴─────┴────┴────┴──
Brdth Coff Frsh API Phys Ease Prce Trns Mnt Hist MltA Ent
data depth acc flow (inv) depth asst supp
★ = BeanFlows · = Bloomberg (Kpler and internal scripts omitted for clarity)
```
**Positioning statement:**
> "Unlike Bloomberg which covers everything broadly, or internal scripts which break constantly, BeanFlows is the complete coffee data stack — fundamentals, positioning, and physical flows in one trusted API. Set up in minutes, always current, never breaks."
---
## 4. Wardley Map
### Value Chain — Coffee Trading Intelligence
```
Genesis Custom Product Commodity
(novel) (bespoke) (off-shelf) (utility)
│ │ │ │
VISIBLE User Need: │ │ │ │
(to user) "Make │ │ │ │
profitable │ │ │ │
coffee │ │ │ │
trades" ────┤ │ │ │
│ │ │ │
Trading ────┤ │ │ │
Decision │ │ │ │
Support │ │ │ │
│ │ │ │
Coffee- │ │ │ │
Specific ────┼──────────────┤ │ │
Intelligence │ ★ BUILD │ │ │
Layer │ HERE │ │ │
│ │ │ │
AIS Coffee ──┤ │ │ │
Flow ───┤ ★ BUILD │ │ │
Tracking │ HERE │ │ │
│ │ │ │
USDA/CFTC │ │ │ │
Data ────────┼──────────────┼──────────────┤ │
Aggregation │ │ ★ BUILD │ │
& Cleaning │ │ (fast, │ │
│ │ before │ │
│ │ commodit.) │ │
│ │ │ │
INVISIBLE API Layer ───┼──────────────┼──────────────┤ │
(REST/ │ │ │ │
GraphQL) │ │ │ │
│ │ │ │
DuckDB / │ │ │ │
SQLMesh ────┼──────────────┼──────────────┤ │
(transforms) │ │ │ │
│ │ │ │
Auth / │ │ │ │
Billing ────┼──────────────┼──────────────┼──────────────┤
(Paddle) │ │ │ USE (utility)│
│ │ │ │
Cloud │ │ │ │
Hosting ────┼──────────────┼──────────────┼──────────────┤
(Hetzner) │ │ │ USE (utility)│
│ │ │ │
Internet ────┼──────────────┼──────────────┼──────────────┤
│ │ │ USE (utility)│
```
### Strategic Reads from the Map
**1. USDA/CFTC aggregation is moving toward commodity.**
This is your V1, but it's not defensible long-term. Someone else can clean USDA data. The value here is speed-to-market and execution quality, not novelty. You must move up the value chain before this component commoditizes.
**Timeline pressure:** You have 12-18 months before a motivated competitor or an intern at a trading house replicates the basic USDA/CFTC cleanup. Use this window to add AIS and build historical depth.
**2. AIS coffee flow tracking is still genesis/custom.**
Nobody is doing coffee-specific physical flow intelligence well. Kpler does it for oil/gas/LNG. This is where your moat lives. Building this before anyone else gives you a time advantage that compounds (historical flow data can't be recreated retroactively).
**3. The intelligence layer is where long-term value lives.**
Raw data (even clean raw data) trends toward commodity. The strategic play is to climb from "data aggregation" to "coffee-specific intelligence":
```
DATA AGGREGATION (V1)
DATA + PHYSICAL FLOWS (V2) ← You are planning this
INTELLIGENCE LAYER (V3) ← This is where $100M ARR lives
• Anomaly detection (unusual flow patterns)
• Supply disruption early warnings
• Seasonal pattern analysis
• Cross-reference signals (positioning vs. physical flows)
• Predictive models (not price prediction — flow/supply prediction)
```
**4. Build vs. Buy decisions from the map:**
| Component | Decision | Reasoning |
|-----------|----------|-----------|
| Cloud hosting | BUY (Hetzner) | Commodity. Never build your own. |
| Auth/billing | BUY (Paddle) | Commodity. Don't waste time here. |
| Data transforms | BUILD (DuckDB + SQLMesh) | Product-stage but your core competency. Own this. |
| USDA/CFTC ingestion | BUILD (but fast) | Moving toward commodity. Build it quickly, move on. |
| AIS data | BUY raw + BUILD processing | Buy the raw AIS feed, build the coffee-specific intelligence on top. |
| Dashboard/UI | BUILD (minimal) | Keep lightweight (HTMX). The API is the product. |
| Coffee-specific ML/analytics | BUILD (future) | This is genesis. This is where your long-term moat lives. |
---
## 5. Demand-Side Sales — How Coffee Analysts Buy
### The Buying Timeline for BeanFlows
```
PASSIVE LOOKING ACTIVE LOOKING DECIDING CONSUMING
(3-12 months) (2-6 weeks) (1-4 weeks) (ongoing)
"Ugh, my WASDE "What's out there "OK, BeanFlows vs. "Is this actually
script broke again. for coffee data? Bloomberg data vs. better than what
There has to be a Let me look around." our internal stuff. I had before?"
better way..." Is it accurate?"
│ │ │ │
▼ ▼ ▼ ▼
YOUR MOVE: YOUR MOVE: YOUR MOVE: YOUR MOVE:
Content that names Be findable. SEO for Methodology docs. Fast onboarding.
their pain. "The "coffee market data Pilot program. "Try Quick wins in
Hidden Cost of Manual API", "USDA coffee it free for 2 weeks Week 1. "Your
Coffee Data Pipelines" data feed". Direct with your actual model is now auto-
blog post. Weekly outreach with a data stack." Named updating" moment.
data brief newsletter. specific struggling reference customers. Celebrate their
Conference talks. moment hook. Refund guarantee. time saved.
```
**Critical insight:** The buying cycle in commodity trading is **relationship-driven and trust-heavy**. A cold landing page won't close a $500+/mo deal with a trading desk. The sales motion is:
1. **Content → Credibility** (newsletter, conference presence, Twitter/X)
2. **Warm intro or direct outreach → Demo**
3. **Demo → Pilot (free or reduced rate)**
4. **Pilot → Production use → Annual contract**
This is a 2-4 month cycle for your first 5 customers, shortening to 2-4 weeks via referrals after that.
### Demand-Side Pricing Anchors
| Anchor | Value | BeanFlows Price Position |
|--------|-------|--------------------------|
| Bloomberg Terminal | $24,000/yr/seat | BeanFlows at $6-18K/yr is a fraction — and deeper on coffee |
| Analyst time wasted | 8-10 hrs/mo × $100-150/hr = $12-18K/yr | BeanFlows pays for itself in time saved alone |
| Kpler subscription | $50-100K+/yr for enterprise | BeanFlows AIS for coffee at $18-36K/yr is a fraction |
| Cost of one bad trade from stale data | $50K-$500K+ | Insurance framing: "What's one missed signal worth?" |
| Cost of building internally | 1 engineer × 3 months = $50-75K + ongoing maintenance | BeanFlows at $18K/yr is 75% cheaper with zero maintenance |
**Pricing confidence:** At $499-1,499/mo, BeanFlows is a rounding error for any desk that manages $10M+ in coffee positions. The price objection won't be "too expensive" — it'll be "can I trust it?"
---
## 6. Crossing the Chasm — Beachhead Strategy
### The Beachhead Segment
**Don't target:** "Commodity traders" (too broad)
**Don't target:** "Coffee market participants" (still too broad)
**Target:** Quantitative commodity analysts at mid-size hedge funds ($200M-$2B AUM) that trade soft commodities, have 2-5 people on the softs desk, and currently maintain internal data scripts for USDA/CFTC data.
**Why this beachhead:**
- They have the pain (maintaining data scripts isn't their job, but they're stuck doing it)
- They have the budget ($500-1,500/mo is trivial relative to AUM)
- They're technically sophisticated enough to value an API (vs. a dashboard-first buyer)
- They talk to each other (commodity analyst community is small and tight)
- They can make purchasing decisions without a 6-month procurement process
- Winning 10-15 of these funds = credible reference base for expanding to larger shops and physical traders
### Bowling Pin Sequence
```
Pin 1: Quant analysts at mid-size commodity hedge funds (softs focus)
↓ (referrals within the community)
Pin 2: Fundamental analysts at larger multi-strat hedge funds with softs exposure
↓ (credibility established)
Pin 3: Risk/hedging desks at physical coffee trading houses (Volcafe, Sucafina, etc.)
↓ (AIS data becomes the hook)
Pin 4: Hedging desks at large coffee roasters (Nestlé, JDE Peet's, Lavazza)
↓ (enterprise contracts, higher ACV)
Pin 5: Expand to cocoa, sugar, other soft commodities
```
### Whole Product for the Beachhead
For Pin 1 (quant analysts at mid-size hedge funds), the whole product is:
| Component | Status | Notes |
|-----------|--------|-------|
| Clean USDA coffee data via API | BUILD (V1) | Core product |
| Clean CFTC positioning via API | BUILD (V1) | Core product |
| Python client library | BUILD (V1) | `pip install beanflows` — critical for this segment |
| Data methodology documentation | BUILD (V1) | Trust = the product. Non-negotiable. |
| Example Jupyter notebooks | BUILD (V1) | Show how to pipe data into common model frameworks |
| Slack/email support (responsive) | YOU (V1) | Personal touch matters early. Be fast. |
| AIS physical flow data | BUILD (V2) | Differentiator that locks in the segment |
| Historical backfill (5+ years) | BUILD (ongoing) | Compounds over time. Start building day 1. |
| Excel add-in | BUILD (V3) | For the non-Python users on the desk |
| Community (Slack/Discord) | CONSIDER (V2) | Small enough community that this could be powerful |
**The "whole product" for V1 is: API + Python library + methodology docs + example notebooks + responsive support.** That's enough to win the beachhead segment. Everything else comes after you have 5-10 paying customers.
---
## 7. Synthesis — Strategic Roadmap
### Phase 1: Prove It (Month 1-3) — Target: 5 Paying Customers
**Goal:** Validate that coffee trading desks will pay for cleaned fundamental data.
- Ship V1: USDA + CFTC data via clean REST API
- Ship Python client (`pip install beanflows`)
- Publish data methodology docs (your trust moat)
- Direct outreach to 30+ named analysts at mid-size commodity funds
- Offer 2-week free pilot → $499/mo Analyst tier
- Success metric: 5 desks with BeanFlows in production models
**Key risk to test:** Can you reach and close these buyers without a warm network?
### Phase 2: Differentiate (Month 4-8) — Target: $15K MRR
**Goal:** Add AIS data to create a moat that cleaned USDA data alone can't provide.
- Secure AIS data licensing
- Build coffee-specific vessel tracking (origin ports → destination ports)
- Launch Desk tier ($1,499/mo) with AIS + historical data
- Upgrade existing customers, acquire new ones on the strength of AIS
- Publish weekly "BeanFlows Coffee Data Brief" (content marketing + credibility)
- Attend 1-2 commodity trading conferences for face-to-face relationship building
- Success metric: 10-15 customers, $15K+ MRR, 2+ customers on Desk tier
**Key risk to test:** Does AIS data for coffee justify 3x pricing? Will customers upgrade?
### Phase 3: Dominate Coffee (Month 9-18) — Target: $50K MRR
**Goal:** Become the default coffee data infrastructure for the beachhead segment.
- Build intelligence layer (anomaly detection, seasonal analysis, signal cross-referencing)
- Add Excel add-in for non-API users
- Expand to physical trading houses (Pin 2-3 in bowling pin sequence)
- Build historical depth (every month of data you accumulate = moat deepening)
- Consider Enterprise tier ($3-5K/mo) for larger shops
- Success metric: 25-35 customers, $50K+ MRR, <5% monthly churn, 120%+ NRR
### Phase 4: Expand (Month 18+) — Target: Path to $100K+ MRR
**Goal:** Replicate the model for adjacent soft commodities.
- Add cocoa, then sugar, then other softs
- Cross-sell existing customers (most trade multiple softs)
- Consider acquiring niche data sources
- Build toward the Kpler playbook: commodity intelligence platform for soft commodities
- At this point: evaluate whether to take capital for faster M&A consolidation
### Critical Assumptions Log
| # | Assumption | Status | How to Test | Kill Criteria |
|---|-----------|--------|-------------|---------------|
| 1 | Analysts spend 8+ hrs/mo on coffee data wrangling | UNTESTED | Ask in first 5 demos | If <3 hrs, pain is insufficient |
| 2 | Mid-size commodity funds will pay $499+/mo | UNTESTED | Paid pilot offers | If 0 of first 10 prospects convert to paid |
| 3 | You can reach 20+ decision-makers in 60 days | UNTESTED | Track outreach metrics | If <5% response rate on 50+ outreaches |
| 4 | AIS data licensing is viable at your margins | UNTESTED | Get 3 provider quotes | If licensing alone exceeds $3K/mo |
| 5 | Data accuracy is high enough for trading decisions | UNTESTED | Automated reconciliation vs. source | If error rate exceeds 0.1% |
| 6 | AIS addition justifies 3x pricing increase | UNTESTED | Customer reaction in demos | If <30% of existing customers upgrade |
---
## Key Strategic Insights
1. **Trust is the product, data is the delivery mechanism.** Your methodology docs, accuracy scores, and data lineage transparency aren't "nice to have" — they ARE the product for a trading audience. Budget 20% of your development time on trust infrastructure.
2. **The V1 moat is thin, and that's OK.** Cleaned USDA/CFTC data is replicable. Your moat in V1 is execution speed and being first with a coffee-specific offering. The real moat builds in V2 (AIS) and compounds in V3+ (historical depth + intelligence layer). You're racing to add layers before anyone copies V1.
3. **Distribution is your #1 existential risk.** The product can be perfect and it won't matter if you can't get 5 demos in the first month. Solve distribution before you polish features. If you don't have warm relationships in commodity trading, finding a way in (advisor, conference, content) is job #1.
4. **The Kpler playbook is your North Star, but be patient.** Kpler bootstrapped for 8 years. They started with one commodity flow type. They were cashflow positive in the first quarter. Copy their discipline: prove it on coffee, prove the economics, then expand deliberately.
5. **Sell the unfair advantage, not the data.** Nobody buys "clean data." They buy "I saw the Brazilian export surge 3 days before the market priced it in." Every piece of marketing, every demo, every conversation should be anchored to the trading decision the data enables, not the data itself.

248
research/market_overview.md Normal file
View File

@@ -0,0 +1,248 @@
# Comprehensive Alternative Data Sources for Coffee Futures Trading Analytics
The coffee futures trading landscape extends far beyond basic price data, encompassing a rich ecosystem of alternative data spanning regulatory reports, maritime intelligence, satellite monitoring, weather analytics, production statistics, trade flows, and emerging data types. This comprehensive analysis identifies 150+ data sources across seven critical categories, providing traders with actionable intelligence from farm to futures contract.
**Core finding**: Free government and international organization sources provide robust baseline data (CFTC COT reports, Sentinel-2 satellite imagery, NOAA weather, UN Comtrade trade data, ICO statistics), while premium commercial platforms offer real-time intelligence and predictive analytics that justify their cost through speed and integration advantages. The optimal strategy combines free foundational data with selective premium services targeting specific informational edges.
## Commitment of Traders (COT) reports reveal positioning dynamics
The CFTC publishes free weekly COT reports every Friday at 3:30 PM Eastern (reflecting Tuesday positions) covering Coffee C futures (CFTC Code 083731). The official CFTC website and Public Reporting Environment provide both legacy and disaggregated reports dating back to January 1986, accessible via web interface, downloadable CSV files, and REST API with no authentication required. **This represents the authoritative free source for trader positioning data**.
Third-party platforms significantly enhance usability. Barchart offers free interactive COT charts with historical visualization and multiple report types including proprietary COT Index calculations. Tradingster provides clean web interfaces for both legacy and disaggregated formats. For serious analysis, **COTbase stands out at $16.50/month**, delivering corrected historical data, API access, options-only data, and NinjaTrader 8 integration—features unavailable from free sources.
TradingView integrates COT data through multiple community scripts overlaying trader positioning directly on price charts, with basic access free and premium features from $12.95-$59.95/month. For programmatic access, the Python cot_reports library (open source, free) fetches data directly from CFTC, while Quandl/Nasdaq Data Link offers RESTful API access with a free tier (50 calls/day) and premium plans starting at $49.50/month for unlimited calls.
**Institutional recommendation**: Use free CFTC data via API for baseline positioning analysis, supplement with COTbase premium subscription for corrected historical analysis and advanced features. TradingView provides excellent integration for discretionary traders overlaying positioning on technical charts.
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| CFTC Official | Legacy, Disaggregated, Supplemental COT | Weekly (Friday 3:30 PM ET) | Coffee C futures (083731) | Web, CSV, API | Free |
| CFTC Public Reporting Environment | All COT types, REST API | Weekly | Coffee C, customizable queries | API, CSV/JSON/XML | Free |
| Barchart | COT charts, COT Index | Weekly (Friday 3:00 PM CT) | Coffee (KC symbol) | Web interface, interactive charts | Free basic, Premium subscription |
| COTbase | Corrected data, options-only, API | Weekly | Coffee with historical depth | Web, API, NinjaTrader integration | $16.50/month |
| TradingView | COT indicators/scripts | Weekly | Coffee (KC) via multiple scripts | Platform indicators | Free basic, $12.95-$59.95/month premium |
| Quandl/Nasdaq Data Link | Historical COT (Legacy) | Weekly | Coffee futures (CFTC/KC) | REST API, Python/R packages | Free (50 calls/day), $49.50+/month |
| Python cot_reports library | All CFTC COT types | On-demand | Coffee included | Python library (pip install) | Free (open source) |
## Maritime intelligence tracks physical coffee movements globally
AIS data and shipping intelligence provide leading indicators of supply movements before official trade statistics. **Kpler emerges as the premium institutional choice**, offering near real-time AIS tracking (\<1 minute latency via satellite), cargo flow analysis, and port call data specifically for agricultural commodities including coffee. Their platform integrates 13,000+ AIS receivers tracking 300,000 vessels daily, accessible via API, Python SDK, Excel plugin, and Snowflake integration (enterprise pricing, contact for quotes).
For free baseline vessel tracking, MarineTraffic and VesselFinder provide global coverage with real-time positions and historical AIS data back to 2009. Both offer free basic access with paid subscriptions for advanced features and API access. AISHub delivers completely free real-time AIS data via community-contributed receivers with JSON/XML/CSV API access requiring no authentication.
Bill of lading data proves critical for detailed cargo intelligence. **ImportGenius offers the most accessible entry point**, providing U.S. customs records updated daily with 18 years of historical data, covering 23+ countries including major coffee producers (Colombia, Vietnam, Mexico, India). Plans start at approximately $149/month for basic access, with annual subscriptions offering 36% savings. The platform includes AI-powered company profiling and unlimited search capabilities.
Panjiva (S&P Global) covers 30+ data sources with U.S. data updated weekly and international data monthly (2-month delay). PIERS provides 100% U.S. waterborne import/export coverage with 6x daily updates and 17 million BOLs annually, though pricing requires S&P Global contact. Descartes Datamyne covers 230 markets (75% of world trade) with daily U.S. updates and 500M+ annual shipment records, offering ISO 9001 certified data quality.
**Coffee-specific platforms**: TradeInt specializes in coffee supply chain data with global trade filtering by port, exporter, product type, and timeframe. Eximpedia focuses exclusively on coffee (Robusta and Arabica) with HS code tracking, offering subscription access with free samples.
Freight rate indices provide cost context: Freightos Baltic Index (FBX) offers free daily container rate updates across 12 global lanes, while Xeneta's XSI-C provides daily 40-foot container benchmarks. Both are EU-compliant and publicly accessible.
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| Kpler | Real-time AIS, cargo flows, port calls, ag commodities | Real-time (\<1 min), daily | Global, 14,500 dry bulkers, 10,300 tankers | API, Python SDK, Excel, Snowflake | Enterprise (contact) |
| MarineTraffic | Real-time AIS, historical (2009+), port calls | Real-time, historical | 550,000+ vessels globally | Web, API, mobile apps | Free basic, Paid advanced |
| VesselFinder | Real-time AIS, historical (2009+), voyage analysis | Real-time, historical | Global terrestrial/satellite | Web, API (JSON/XML/CSV) | Free basic, Paid reports |
| ImportGenius | U.S./23+ country customs records, BOL data | Daily (U.S.), varies internationally | U.S. + 23 countries (Colombia, Vietnam) | Web platform, API, Excel export | ~$149/month, annual plans |
| Panjiva (S&P Global) | Shipment/customs records, 30+ sources | Weekly (U.S.), monthly (international) | 30+ countries, 10M+ companies | Web platform, API, alerts | Enterprise (contact S&P) |
| PIERS (S&P Global) | Bill of lading, waterborne trade | 6x daily (U.S. imports), monthly (non-U.S.) | 100% U.S. waterborne, 15+ countries | Web platform, API | Enterprise (contact S&P) |
| Descartes Datamyne | BOL database, 230 markets | Daily (U.S. maritime), regular updates | 230 markets, 180+ countries | Web platform, API, downloads | Annual subscription (contact) |
| TradeInt | Coffee-specific supply chain data | Historical and recent | Global coffee trade | Web platform, filtering tools | Subscription (contact) |
| Eximpedia | Coffee import/export trade data | Regular updates | Global (Robusta/Arabica) | Web platform, search/filtering | Subscription, free samples |
| Freightos Baltic Index (FBX) | Container freight rates | Daily | 12 regional lanes | Public index, API | Free index, platform subscriptions |
| AISHub | Real-time AIS, community data | Real-time | Global (community-based) | Free API (JSON/XML/CSV) | Free |
## Satellite imagery enables crop health monitoring and yield prediction
**Sentinel-2 satellites provide the optimal free baseline** for coffee plantation monitoring, delivering 10-meter multispectral imagery (13 bands) with 5-day revisit frequency covering all global coffee regions. The European Space Agency's Copernicus program offers unlimited free access via multiple platforms: Copernicus Data Space Ecosystem, Sentinel Hub, Google Earth Engine, and AWS Open Data Registry. Sentinel-2 data enables NDVI monitoring, crop health assessment, plantation mapping, and has demonstrated 90.5% accuracy in coffee classification when combined with DEM data.
For cloud-prone tropical regions, **Sentinel-1 SAR provides all-weather monitoring** with 6-12 day revisit at 5-25 meter resolution depending on mode. C-band synthetic aperture radar penetrates clouds, enabling continuous monitoring of soil moisture, crop structure, and flooding in coffee areas. Both Sentinel-1 and Sentinel-2 data integrate seamlessly through the same free platforms.
NASA's Landsat 8/9 constellation complements Sentinel with 30-meter resolution (100m thermal), 8-day combined revisit, and critically, 50+ years of historical archive enabling long-term change detection. Studies show Landsat NDVI achieves R² = 0.85 for coffee leaf water potential estimation. MODIS provides regional-scale monitoring with 250-meter NDVI/EVI products updated every 8-16 days, ideal for biennial yield pattern analysis across large coffee regions.
**Google Earth Engine stands out as the premier integration platform**, providing free cloud computing access to the complete Sentinel, Landsat, and MODIS archives plus the new Forest Data Partnership coffee probability model (2020-2023). The Python and JavaScript APIs enable large-scale time-series analysis and machine learning classification. This is free for research, education, and nonprofit use, with commercial licensing available.
Commercial satellite imagery offers superior resolution when needed. **Planet Labs delivers daily global coverage** at 3-5 meter resolution (PlanetScope) with sub-meter SkySat imagery (50cm), plus upcoming 40cm Pelican constellation. Studies using Planet/RapidEye data combined with nutrient data achieved R² = 0.88 for coffee yield prediction. Access requires subscription (contact for pricing) via Planet Insights Platform with API and Google Earth Engine integration.
Maxar (WorldView, GeoEye) provides 30-50cm imagery via on-demand tasking through the Maxar Discovery Platform. Airbus Pléiades Neo delivers 30cm daily revisit capability. BlackSky specializes in high-revisit imaging with near real-time delivery. All require commercial licensing with contact-for-pricing models.
**Recommended workflow**: Use free Sentinel-2 (10m, 5-day) plus Landsat (30m, 8-day) via Google Earth Engine for baseline monitoring. Supplement with Sentinel-1 SAR during cloudy seasons. Deploy Planet daily imagery for intensive monitoring of priority plantation areas. Historical Landsat archive provides long-term expansion tracking and biennial pattern analysis.
| Source | Type | Resolution | Frequency | Coverage | Access | Cost |
|--------|------|-----------|-----------|----------|--------|------|
| Sentinel-2 | Optical multispectral (13 bands), NDVI, EVI | 10m (visible/NIR), 20m (red edge) | 5-day revisit | Global, all coffee regions | Copernicus Hub, Sentinel Hub, GEE, AWS | Free |
| Sentinel-1 | C-band SAR, all-weather | 5-25m (mode-dependent) | 6-12 day revisit | Global land/coastal | Copernicus Hub, GEE, AWS | Free |
| Landsat 8/9 | Optical multispectral (11 bands), thermal | 30m (optical), 100m (thermal) | 8-day combined revisit | Global, 1972+ archive | USGS EarthExplorer, GEE, AWS | Free |
| MODIS | NDVI, EVI, LST, GPP | 250m (NDVI), 500m-1km | Daily obs, 8-16 day composites | Global | NASA Earthdata, GEE, LANCE | Free |
| Google Earth Engine | Multi-petabyte catalog, cloud computing | Varies (250m to \<1m) | Continuous updates | Global, coffee models included | Python/JavaScript API, Code Editor | Free (research/education) |
| Sentinel Hub | Sentinel, Landsat, MODIS, commercial data | 10m-1km (source-dependent) | Daily to 16-day | Global | RESTful APIs, QGIS plugin, Python | Free tier, paid advanced |
| Planet Labs | PlanetScope optical, SkySat high-res | 3-5m (PlanetScope), 50cm (SkySat) | Daily global coverage | Global coffee regions | Platform, API, GEE integration | Subscription (contact) |
| Maxar | WorldView, GeoEye very high-res | 30-50cm | On-demand tasking | Global | Maxar Discovery, SecureWatch, ArcGIS | Commercial (contact) |
| Airbus | Pléiades Neo, SPOT | 30cm (Pléiades), 1.5-6m (SPOT) | Daily revisit capability | Global | OneAtlas platform, API | Commercial (contact) |
## Weather data services provide critical production forecasting inputs
**Visual Crossing Weather API delivers the best all-around package**, offering current conditions, 15-day forecasts, sub-hourly resolution, 50+ years of historical data, and agriculture-specific elements (soil temperature, soil moisture, evapotranspiration) through a single-endpoint REST API. The free tier provides 1,000 records/day with metered pricing at $0.0001/record beyond that, making it extremely cost-effective. Global coverage includes all major coffee regions with 100+ weather elements in JSON/CSV format.
For agricultural specialization, **aWhere stands out with purpose-built agronomic models**. Their platform provides daily observations, 8-day forecasts, agronomic indices (PET, GDD, P/PET ratios), and 3-5 years of historical data at 9km grid resolution globally. Free access is available for South Asia and parts of Africa via the weADAPT platform, with commercial licenses for other regions. The REST API includes OAuth2 authentication and an aWherePy Python package, delivering field-level data specifically designed for crop monitoring.
**ECMWF provides the world's leading weather forecasts** through the IFS HRES model at 9km resolution with 15-day forecasts updated 6-hourly, plus the new AIFS AI weather model. As of October 2025, ECMWF open-data is free under CC-BY 4.0 license, accessible via Open-Meteo's free REST API (no key required), the ecmwf-opendata Python package, or MARS API. This represents exceptional value for global forecast data.
IBM Environmental Intelligence Suite (The Weather Company) delivers hyper-local 4km resolution from 250,000+ stations globally, with agriculture-specific APIs for frost potential, evapotranspiration, and soil moisture/temperature. The platform offers 15-day forecasts with 15-minute precipitation updates and seasonal/sub-seasonal forecasts. Free trial available (30 days) with Standard tier requiring minimum 200,000 calls/month. This premium service justifies cost through accuracy and agriculture specialization.
For coffee-specific frost monitoring (critical for Brazilian arabica), AWIS Frost/Freeze Forecast Services provides 7-day frost forecasts with 24/7 email/text alerts customized to specific locations and crops, backed by 30+ years of agricultural weather experience. Affordable custom pricing makes this accessible for operational monitoring.
OpenWeatherMap remains a solid general choice with free tier (1,000 calls/day) and pay-as-you-call model ($0.0001 per call), covering current weather, forecasts, and 46+ years of historical data. DTN Weather API offers agriculture-specific hyper-local forecasts with 0.1° gridded weather (15-day), historical data (2013+), and proprietary meteorologist team, though pricing requires contact.
**Free government sources**: NOAA's National Centers for Environmental Information provides comprehensive climate data archives with Climate Data Online tool (free, 229+ TB monthly archived). NOAA's National Weather Service API offers real-time U.S. data via free REST API. Copernicus Climate Data Store provides ERA5 reanalysis and seasonal forecasts (free, registration required).
**Drought monitoring**: GRIDMET provides SPI, EDDI, SPEI, and PDSI drought indices at 4km resolution for CONUS via Google Earth Engine (free). NOAA publishes global SPI from CMORPH for international coverage (free).
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| Visual Crossing | Current, 15-day forecast, 50+ years historical, ag elements | Real-time, sub-hourly, daily | Global | REST API (JSON/CSV), Web Query Builder | Free (1K records/day), $0.0001/record |
| aWhere | Daily obs, 8-day forecast, agronomic models (PET, GDD) | Daily updates | Global (9km), South Asia/Africa free | REST API, aWherePy Python, web platform | Free (South Asia/Africa), commercial |
| ECMWF/Open-Meteo | IFS HRES (9km), AIFS AI forecasts, 15-day | 6-hourly model runs, 1-hour output | Global | Free REST API (no key), Python package | Free (CC-BY 4.0) |
| IBM Environmental Intelligence | 15-day forecast, ag APIs (frost, ET, soil), alerts | Real-time, 15-min precip, hourly/daily | Global (4km), 250K+ stations | REST API, Weather Company API | Free trial (30 days), Standard tier |
| OpenWeatherMap | Current, forecasts, 46+ years historical | Real-time, hourly, daily | Global coverage | REST API (JSON/XML), bulk downloads | Free (1K calls/day), $0.0001/call PAYG |
| DTN Weather | Current, 15-day forecast, historical (2013+), gridded | Near real-time, 3-hour updates | Global (0.1° gridded) | REST API, SDKs (Python/JS/Java), webhooks | Subscription (contact) |
| Meteomatics | 2000+ parameters, ag-specific, 90m resolution | Real-time, hourly, daily | Global, EURO1k, US1k | REST API (Meteocache) | Trial available (contact) |
| AWIS Frost Services | Frost/freeze forecasts, alerts | Real-time, 7-day forecasts | US and global customizable | Email/text alerts, web, API | Affordable custom (contact) |
| NOAA NCEI | Climate Data Online, Climate Normals, 176-year record | Monthly reports, daily archives | Global, extensive US | Web interface, FTP, downloads | Free |
| Copernicus CDS/ADS | ERA5 reanalysis, seasonal forecasts, climate data | Various (reanalysis, seasonal) | Global | CDS API (Python), web interface | Free (registration required) |
| GRIDMET Drought Indices | SPI, EDDI, SPEI, PDSI at 4km | Daily updates | CONUS | Google Earth Engine API | Free |
## Production and inventory statistics establish supply fundamentals
**The International Coffee Organization (ICO) maintains the definitive coffee statistics database**, covering trade volumes/values, production, consumption, inventories, and prices for 192 consuming countries and 54 producing countries since October 1963. The World Coffee Statistics Database launched January 2022 with monthly updates. The free Coffee Market Report releases monthly, while the comprehensive Quarterly Statistical Bulletin and full database access require paid subscriptions (minimum £250 per request for non-members, free for ICO members). The bi-annual Coffee Report and Outlook costs £500. This represents the gold standard for official coffee statistics.
**USDA Foreign Agricultural Service provides the best free government data**, publishing the comprehensive "Coffee: World Markets and Trade" report bi-annually (June and December) with global production volumes, consumption, trade statistics, stocks, and country-specific analysis. The PSD Online database offers interactive access to historical and forecast data. All USDA FAS data is free with no restrictions, making it essential for baseline supply/demand analysis.
For country-specific intelligence:
**Brazil (world's largest producer)**: CONAB (Companhia Nacional de Abastecimento) issues official production forecasts multiple times per harvest season (4+ reports annually, first in January) covering Arabica and Robusta/Conilon with state-by-state breakdowns, planted area, productivity estimates, and export data. Free access via conab.gov.br makes this the authoritative source for Brazilian supply.
**Colombia (3rd largest producer)**: The Colombian Coffee Growers Federation (FNC) publishes regular production data, domestic reference prices, export volumes, and quality standards at national, departmental, and municipal levels. The National Coffee Register tracks detailed farm data. Free public access via federaciondecafeteros.org.
**Vietnam (2nd largest producer)**: USDA FAS Vietnam reports provide more detailed analysis than local sources, though the General Statistics Office tracks 1,763,500 tons annual production. The Vietnam Coffee-Cocoa Association (VICOFA) offers industry perspective.
**Stock data**: ICE (Intercontinental Exchange) publishes daily certified Arabica and Robusta coffee stocks at approved warehouses globally, accessible free via the ICE Report Center with CSV downloads. **As of 2024, arabica stocks hit 509,300 bags (1.5-year low)**, making this critical for supply tightness analysis. The European Coffee Federation published bi-monthly stock reports for major European ports but suspended this in 2023. The Green Coffee Association discontinued U.S. port warehouse stock reports in May 2023, creating a significant data gap.
**Private research**: Volcafe (ED&F Man) publishes free market reviews with production forecasts covering 92% of origin countries. Rabobank releases quarterly coffee outlook reports with price forecasts (some public, full access requires subscription). Euromonitor International offers detailed market analysis for 78+ countries with retail sales, consumption trends, and market share data (premium pricing, annual updates).
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| ICO (International Coffee Organization) | Trade, production, consumption, inventories, prices | Monthly (CMR), Quarterly (QSB), Bi-annual | 192 consuming, 54 producing countries | World Coffee Statistics Database, reports | Free (CMR), Paid (WCSD £250+ min, QSB, Coffee Report £500) |
| USDA Foreign Agricultural Service | Production, consumption, trade, stocks, forecasts | Bi-annual (June/December) | Global + country-specific | Website, PSD Online database, PDF downloads | Free |
| CONAB (Brazil) | Production forecasts (Arabica/Robusta), harvest estimates | Multiple per season (4+ reports) | Brazil (national and state-level) | conab.gov.br, downloadable reports | Free |
| Colombian Coffee Growers Federation (FNC) | Production, domestic prices, exports, quality data | Regular updates | Colombia (national, departmental, municipal) | federaciondecafeteros.org, reports | Free |
| ICE Certified Stocks | Certified Arabica/Robusta stocks, warehouse inventory | Daily | ICE-approved warehouses globally | ICE Report Center, CSV downloads | Free basic, Paid premium |
| Volcafe (ED&F Man) | Production forecasts, market outlook | Weekly/periodic, seasonal forecasts | Global (92% of origin countries) | volcafe.com/pages/reports, downloads | Free reports online |
| Rabobank | Market forecasts, price forecasts, supply/demand | Quarterly coffee outlook, monthly commodity | Global and regional | research.rabobank.com | Some public, subscription for full |
| Euromonitor International | Market size, retail sales, consumption trends | Annual updates | Global (78+ countries) | Passport database, reports | Premium subscription/purchase |
| FAO (Food and Agriculture Organization) | Production volumes, area harvested, yield | Annual (published March) | Global (278 products) | FAOSTAT database, UNdata | Free |
| European Coffee Federation | European imports, stock levels (suspended 2023), trade | Annual report, stocks bi-monthly (suspended) | Europe (EU27 + UK, CH, NO, IS) | ecf-coffee.org, downloadable reports | Free (annual report) |
| Statista | Market data aggregation, production, trade | Regular updates as sources available | Global | statista.com, database platform | Limited free, Premium from $2,388/year |
## Export, import, and customs data track global trade flows
**UN Comtrade stands as the authoritative free source**, covering 220+ countries with monthly updates and data from 1962 onwards. Coffee trade data is accessible by HS code 0901 (coffee whether or not roasted or decaffeinated) at 2, 4, or 6-digit levels. The new ComtradePlus interface (comtradeplus.un.org) provides improved access with API, web interface, and bulk downloads at no cost. This represents the standard baseline for international trade statistics.
**ITC Trade Map complements Comtrade** with enhanced analytics, offering annual trade flows with mirror data, export performance indicators, international demand metrics, and critically, a company directory with 10M+ businesses. Coverage includes 220+ countries, 5,300 products, and historical data since 2001. Free access (supported by World Bank and EU) makes this essential for identifying trading partners and analyzing market share. The jointly developed ITC/WTO/UNCTAD platform excels at comparative analysis.
**World Bank WITS (World Integrated Trade Solution)** integrates UN Comtrade, UNCTAD TRAINS, and WTO data with added value through tariff analysis, non-tariff measures, and competitiveness indicators. Free access via wits.worldbank.org with API, bulk CSV downloads, and interactive visualization tools covering 200+ countries from 1962.
For detailed shipment-level intelligence, **bill of lading providers offer granular cargo tracking**:
**ImportGenius provides the most accessible entry** with U.S. customs records updated daily (258M+ import shipments, 5.6M+ export shipments), 18 years of U.S. historical data, and coverage of 23+ countries including major coffee producers (Colombia, Vietnam, India, Mexico). The AI-powered platform includes unlimited company profiling, Excel/CSV exports, and enterprise API. Plans start around $149/month with annual subscriptions offering 36% savings, making it cost-effective for SMEs.
**Panjiva (S&P Global Market Intelligence)** covers 30+ data sources with U.S. maritime data updated weekly (within 1 week of customs filing) and international data monthly (2-month delay). The platform provides 10M+ company profiles with supplier-buyer relationships searchable by HS code, company name, DUNS number, and location. Xpressfeed API enables CRM integration. Enterprise pricing requires S&P Global contact.
**PIERS (Port Import/Export Reporting Service)** delivers 100% U.S. waterborne trade coverage with 6x daily updates and 17 million BOLs annually. Historical data from 1950 provides long-term trend analysis. The platform integrates with Global Trade Atlas and includes commodity descriptions, tonnage, TEUs, and estimated values. Part of S&P Global Trade Analytics Suite (enterprise pricing).
**Descartes Datamyne** covers 230 markets (75% of world import-export trade) with daily U.S. maritime updates (~26,000 records/day, 500M+ annual shipments). ISO 9001 certified data includes master/house BOL information, container details, NVOCC/VOCC data, and company contacts across 180+ countries. The platform supports Excel exports (10,000 records), Massive Download (500,000 records), and API access (annual subscription, contact for quote).
**Coffee-specific platforms**: TradeInt specializes in coffee supply chain data with filtering by port, exporter, product type, and timeframe for past global trades. Eximpedia focuses exclusively on coffee (Robusta/Arabica) with HS code tracking and buyer/supplier information.
**Government sources**: U.S. Census Bureau's USA Trade Online provides U.S. import/export statistics by HS level with state-level and port breakdowns (paid subscription, monthly free reports). Eurostat offers EU coffee trade data (intra and extra-EU) with monthly/annual updates, bulk CSV downloads, and data from January 1988 (free). USDA FAS bi-annual Coffee World Markets reports include bean exports by country (free).
**Alternative platforms**: Volza covers 209 countries (90 complete data, 119 mirror data) with 3 billion+ shipment records including 82,467+ active coffee buyers and 556,489 trades in 2023. Pay-per-use pricing with 7-day trial. Tendata covers 91 countries with real-time customs data access tracking 42,084 coffee importers in 2023 worth $7.45B trade value.
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| UN Comtrade | Import/export volumes, values, bilateral flows, HS codes | Monthly updates | 220+ countries, 1962+ | Web (comtradeplus.un.org), API, downloads | Free basic, premium subscriptions |
| ITC Trade Map | Annual trade flows, mirror data, market indicators | Annual, monthly/quarterly | 220+ countries, 10M+ companies | Web (trademap.org), Excel export | Free (registration required) |
| World Bank WITS | Merchandise trade, tariffs, NTM, competitiveness | Annual updates | 200+ countries, 1962+ | Web (wits.worldbank.org), API, CSV | Free |
| ImportGenius | U.S./23+ country customs records, BOL | Daily (U.S.), varies internationally | U.S. + 23 countries (Colombia, Vietnam, India) | Web platform, Enterprise API, Excel/CSV | ~$149/month, annual savings |
| Panjiva (S&P Global) | Shipment/customs records, 30+ sources | Weekly (U.S.), monthly (international) | 30+ countries, 10M+ companies | Web, Xpressfeed API, alerts | Enterprise (contact S&P) |
| PIERS (S&P Global) | BOL, 17M annually, 100% U.S. waterborne | 6x daily (U.S. imports), monthly (non-U.S.) | 100% U.S. waterborne, 15+ countries | Web, Global Trade Atlas integration | Enterprise (contact S&P) |
| Descartes Datamyne | BOL database, 500M+ annually | Daily (U.S. maritime, 26K records/day) | 230 markets, 180+ countries | Web, API, Massive Download | Annual subscription (contact) |
| TradeInt | Coffee-specific supply chain data | Historical and recent | Global coffee trade | Web platform, filtering tools | Subscription (contact) |
| Eximpedia | Coffee import/export trade data | Regular updates | Global (Robusta/Arabica) | Web platform | Subscription, free samples |
| Volza | 3B+ shipment records, 82,467+ coffee buyers (2023) | Regular updates, real-time alerts | 209 countries (90 complete, 119 mirror) | Web platform, API, dashboards | Pay-per-use, 7-day trial |
| U.S. Census Bureau | U.S. import/export statistics, state/port level | Monthly releases | U.S. with 200+ partners | USA Trade Online, FT900/FT920 reports | USA Trade Online: Paid, Reports: Free |
| Eurostat | EU coffee trade (intra/extra-EU), HS 0901 | Monthly and annual | 27 EU members, global partners | Web (ec.europa.eu/eurostat), Comext, CSV | Free |
| ICO World Coffee Statistics Database | Coffee trade volumes/values, detailed statistics | Monthly (MTS), Quarterly (QSB) | 192 consuming, 54 producing countries | WCSD platform, email delivery | Limited free, subscriptions |
## Alternative data expands analytical possibilities
Beyond traditional categories, emerging alternative data sources provide predictive edges through sentiment analysis, supply chain transparency, auction pricing, consumer trends, and sustainability metrics.
**Sentiment and news analytics**: RavenPack delivers institutional-grade news sentiment with 80+ fields describing entities, 20+ sentiment indicators, and real-time updates from 40,000+ web and social media sources in 13 languages. The platform covers commodities including Robusta coffee with sentiment scores (0-100, 50=neutral) and event sentiment scores updated 24/7. Historical database extends 6+ years. Paid subscription (contact for pricing) targets institutional investors. StockPulse provides emotional data intelligence with real-time 24/7 monitoring and historical data since 2012 using proprietary LLMs.
Social media monitoring platforms (Sprout Social $249+/month, Brand24 varies, Hootsuite $99+/month) track Twitter/X, Instagram, Facebook, TikTok, and LinkedIn sentiment. Free alternatives include Python libraries (VADER, BeautifulSoup, Selenium) for custom sentiment analysis. Studies show coffee tweets are typically neutral or positive (45-47%).
**Supply chain transparency**: TraceX Technologies offers blockchain-based farm-to-cup traceability with real-time IoT sensor data, GPS mapping, and sustainability metrics. Implemented with 3,500+ farmers in India's Araku Valley, the platform tracks deforestation risks and certification compliance (enterprise pricing). Sourcemap maps approximately 25% of the world's coffee supply in Latin America, Africa, and Southeast Asia with end-to-end supply chain visualization, due diligence data, and compliance tracking (subscription-based).
INA-Trace provides open-source traceability (GitHub) with QR code consumer access, tracking pre-processing, post-harvest, storage, and payments in Rwanda and Honduras. Free open-source access enables customization. Trace by Fairfood combines NFC Farmer Cards with blockchain ledgers for real-time transaction recording in coffee, cocoa, spice, and fruit sectors.
**Coffee auction data reveals quality premiums**: Cup of Excellence auctions provide transparent pricing for top-quality lots with detailed quality scores, farm information, and buyer data. Colombia 2021 averaged $30.79/lb (top lot $135.10/lb), Ethiopia 2020 averaged $28/lb (top lot $445/lb from Angelino's), with 28 winning lots per auction scoring 87+ points. M-Cultivo private auctions set records—2025 Ethiopian auction reached $1,739/kg (Alo Coffee), Faysel Abdosh auction hit $1,604/kg (Sidama Keramo), generating 6,000+ bids. Ethiopian Coffee Exchange tracks weekly export price adjustments and daily trading for the world's 5th largest exporter ($1.7B earnings in 2023/24).
**Consumer demand indicators**: National Coffee Association's National Coffee Data Trends (NCDT) report provides authoritative U.S. consumption data annually, showing 66% of American adults drink coffee daily and 46% consumed specialty coffee in the past day (2025). The report purchase is required for full data. Specialty Coffee Association publishes the NCDT Breakout Report with detailed specialty coffee analysis (available to members). Tastewise AI platform analyzes trillions of data points across social media, eRetail, and menus for real-time trend tracking (14-day trial, then subscription).
Mintel tracks 30% increase in caffeine-free coffee launches (2022-2023) through its Global New Products Database with ongoing updates and periodic market reports (subscription required). Deloitte's 2024 Coffee Study surveyed 7,000 coffee drinkers across 13 countries examining consumption patterns, sustainability concerns, and specialty trends (free public report).
**Sustainability and certification**: Rainforest Alliance publishes annual certification reports covering 400,000+ certified coffee producers across ~1M hectares, showing 179% higher earnings for certified farms compared to non-certified. Interactive PowerBI reports provide global/regional/country breakdowns (free). Fairtrade International tracks 870,000+ coffee farmers with Fair Trade premiums and minimum price data (free public reports, certification costs $500-$3,000 annually). Specialty Coffee Association's Q Grader program provides quality certifications using the 100-point scale (80+ points = specialty grade, course ~$1,500-2,000).
**Weather and satellite data for forecasting**: Studies demonstrate combining Landsat/Sentinel NDVI data with weather variables achieves R² up to 0.88 for coffee yield prediction. Research shows multi-temporal NDVI from July-August provides highest correlation with yield, while weather explains up to 36% of yield variation in Vietnam's Dak Lak region with 3-6 months advance forecast capability. BR-DWGD (Brazilian Daily Weather Gridded Data), CLIMBra, and ERA5 reanalysis provide the necessary weather inputs (mostly free), while Sentinel-2 and Landsat provide optical data (free). RapidEye/PlanetScope commercial imagery improves resolution when combined with nutrient data.
**Market intelligence platforms**: ICE (Intercontinental Exchange) provides daily Coffee C and Robusta futures prices, volume, open interest, and inventory levels—the global benchmark for price discovery (market data fees required). Bloomberg Terminal (~$2,000+/month) and Refinitiv (enterprise pricing) integrate comprehensive coffee futures, news, analytics, and alternative data feeds. CoffeeBI specializes in coffee market research serving major industry players with out-of-home insights, machine market data, and industry trends (paid subscriptions).
| Source | Type | Frequency | Coverage | Access | Cost |
|--------|------|-----------|----------|--------|------|
| RavenPack | News sentiment, ESS, CSS, entity analysis | Real-time 24/7, 6+ years historical | 40K+ sources, 13 languages, Robusta coffee | API, web dashboards, MATLAB integration | Paid (contact) |
| StockPulse | Social media sentiment, emotional intelligence | Real-time 24/7, historical since 2012 | Global markets, commodities including coffee | Web software, API endpoints | Paid (contact) |
| Sprout Social | Social media sentiment, engagement metrics | Real-time | Twitter/X, Instagram, Facebook, LinkedIn | Platform dashboard, API | $249+/month |
| TraceX Technologies | Blockchain traceability, IoT sensors, GPS | Real-time | Global, 3,500+ farmers (India Araku Valley) | Platform, QR codes, API | Enterprise (contact) |
| Sourcemap | End-to-end supply chain mapping | Real-time | 25% of world's coffee (Latin America, Africa, SE Asia) | Platform interface, API | Subscription |
| INA-Trace | Pre-processing, post-harvest, storage, payments | Real-time | Rwanda, Honduras | GitHub open source, mobile app | Free (open source) |
| Cup of Excellence | Auction prices, quality scores, farm info | Seasonal auctions | Multiple countries (Colombia, Ethiopia) | Online auction platform, public results | Free to view, fees to participate |
| M-Cultivo | Private auction prices, bidding data | Seasonal/ad-hoc | Ethiopia (Echoes of Peak, Faysel Abdosh) | Online auction platform | Free to view, registration for participation |
| Ethiopian Coffee Exchange | Export prices, volume, minimum price | Weekly price adjustments, daily trading | Ethiopia (5th largest exporter) | Official reports, government data | Free public data |
| NCA NCDT Report | Consumption patterns, consumer preferences | Annual (Spring) | United States | Purchasable reports, press releases | Report purchase required |
| SCA Specialty Coffee Breakout | Specialty consumption, preparation methods | Annual (partnership with NCA) | United States | SCA membership/purchase | Members/purchase |
| Tastewise | Consumer trends, flavor pairings, social/eRetail | Real-time trend tracking | Global food \u0026 beverage | AI platform with GenAI | Subscription (14-day trial) |
| Mintel | Product launches, consumer trends, market sizing | Ongoing database updates, periodic reports | Global, country-specific | Subscription platform, reports | Subscription (tiered) |
| Deloitte Coffee Study | Consumption patterns, sustainability, preferences | Periodic (2024 edition) | Global - 13 countries, 7K drinkers | Published reports online | Free (public report) |
| Rainforest Alliance | Certified farm data, sustainability metrics | Annual reports | 400K+ producers, ~1M hectares | Interactive PowerBI, PDFs | Free public reports |
| Fairtrade International | Certified producer data, Fair Trade premiums | Annual reports | 870K+ coffee farmers globally | fairtrade.net, reports | Free public reports |
| SCA Q Grader Program | Quality certifications, cupping scores | Ongoing certifications | Global specialty coffee | Certification programs | Course ~$1,500-2,000 |
| ICE Coffee Futures | Futures prices, volume, open interest, inventory | Real-time during trading hours, daily | Global benchmark (Coffee C, Robusta) | ICE platform, market data vendors | Market data fees required |
| Bloomberg Terminal | Real-time prices, news, alt data feeds | Real-time | Global commodities including coffee | Bloomberg Terminal | ~$2,000+/month |
| Refinitiv | Futures, spot prices, news, analytics | Real-time | Global coffee markets | Refinitiv platform | Enterprise (contact) |
| databento | All kinds of market data | Historic and realtime | - | - | Onetime/subscription - dev friendly
| CoffeeBI | Coffee market research, OOH insights | Ongoing news, periodic reports | Global coffee and machine industry | Subscription platform, reports | Paid subscriptions |
## Strategic recommendations for data source selection
**For comprehensive baseline coverage at zero cost**: Combine CFTC COT reports (weekly trader positioning), Sentinel-2/Landsat satellite imagery via Google Earth Engine (crop monitoring and yield prediction), Visual Crossing or ECMWF/Open-Meteo weather data (production forecasting), USDA FAS reports (supply/demand fundamentals), UN Comtrade and ITC Trade Map (trade flows), and ICE certified stocks (inventory tightness). This free foundation covers all essential data categories with sufficient quality for fundamental analysis.
**For institutional-grade analytics**: Add Kpler maritime intelligence ($enterprise) for leading indicators of physical movements, Planet Labs daily satellite imagery ($subscription) for intensive plantation monitoring, IBM Weather or aWhere ($subscription) for agriculture-specific weather models, ICO World Coffee Statistics Database (£250+ minimum) for the most comprehensive official statistics, ImportGenius or Panjiva ($149+/month to $enterprise) for granular shipment tracking, and RavenPack ($paid) for sentiment analysis. Bloomberg or Refinitiv terminals (~$2K+/month) provide integrated access to multiple premium feeds.
**For specific analytical edges**: Deploy coffee-specific platforms like TradeInt for supply chain intelligence, Cup of Excellence and M-Cultivo auction data for quality premium trends, TraceX or Sourcemap for traceability and sustainability verification, NCA NCDT and Tastewise for consumer demand shifts, and specialized frost monitoring (AWIS) for Brazilian arabica risk assessment. These targeted sources address specific informational gaps competitors may overlook.
**Frequency optimization**: Real-time sources (AIS tracking, weather APIs, sentiment analysis, futures prices) provide short-term tactical advantages. Daily sources (ICE stocks, satellite imagery, customs data) enable responsive positioning. Weekly/monthly sources (COT reports, trade statistics, production forecasts) inform medium-term strategy. Annual reports (consumer trends, sustainability metrics, long-term production forecasts) guide strategic allocation.
**Coverage completeness**: Ensure data spans all major coffee origins (Brazil 40% of global arabica, Vietnam 40% of robusta, Colombia, Indonesia, Ethiopia, Honduras) and consumption markets (U.S., Europe, Japan, emerging markets). Cross-reference free and paid sources to validate critical data points. Monitor data gaps like the discontinued GCA warehouse stocks and adapt by using alternative indicators.
The optimal strategy layers free foundational data with selective premium services targeting specific informational advantages, adjusted to trading timeframe, risk tolerance, and capital allocation. Systematic integration of alternative data beyond basic prices creates sustainable analytical edges in increasingly competitive coffee futures markets.