docs(inventory): pipeline tracker, scores, impl notes, FX section

- Replace Priority Summary Table with Pipeline Status Tracker: status
  (/🔲/⏸/—), score (1-5), credential requirements, and extractor refs
  for all 30+ sources
- Add implementation notes to §1.1 (Overpass), §1.2 (Playtomic tenants +
  availability), §5.1 (Eurostat urb_cpop1 + ilc_di03), §5.2 (Census), §5.3 (ONS)
- Update §8 DuckDB integration table with extractor names and status
- Add §10 FX / Currency Rates: ECB SDMX endpoint and Frankfurter.app wrapper,
  proposed landing format and stg_fx_rates staging model design

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-24 01:33:32 +01:00
parent 86539e72b9
commit 8bb00ea9b0

View File

@@ -1,43 +1,51 @@
# Padel Market Intelligence — Data Sources Inventory # Padel Market Intelligence — Data Sources Inventory
Compiled: 2026-02-21 Compiled: 2026-02-21 · Updated: 2026-02-24
Purpose: Identify data sources to feed a DuckDB analytics pipeline for padel business intelligence. Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytics pipeline.
--- ---
## Priority Summary Table ## Pipeline Status Tracker
Sorted by Priority (High first), then by category. **Status:** ✅ Ingested — extractor + staging model live in `master` | 🔲 Planned — worth building | ⏸ On hold — blocked on cost/access | — Not targeted
| Source | Category | Access Method | Priority | Notes | **Score (15):** Overall ingestion priority. Weighs data value to Padelnomics (market scores, financial planner, pSEO content) against implementation effort and access barriers. 5 = core infrastructure already ingested, 1 = marginal or inaccessible.
|--------|----------|---------------|----------|-------|
| OpenStreetMap / Overpass API | Court Locations | Public API | High | Free, global, `sport=padel` tag, no auth | | Source | Category | Status | Score | Credentials | Pipeline refs |
| Playtomic API (read-only) | Court Locations / Pricing | Public API | High | Some endpoints unauthenticated; official API needs club credentials | |--------|----------|--------|-------|-------------|---------------|
| Eurostat Statistics API | Demographics | Public API | High | Free, no auth, NUTS city-level data | | OpenStreetMap / Overpass | Court locations | ✅ Ingested | 5 | None | `extract-overpass``stg_padel_courts` |
| US Census Bureau API | Demographics | Public API | High | Free with API key, comprehensive | | Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants``stg_playtomic_venues/resources/opening_hours` |
| ONS Beta API | Demographics | Public API | High | Free, no auth, 120 req/10 s limit | | Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability``stg_playtomic_availability` |
| FIP World Padel Report | Market Reports | Open Download | High | Free PDF; 2024 and 2025 editions available | | Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat``stg_population` |
| Playtomic Global Padel Report | Market Reports | Open Download | High | Free PDF; co-produced with PwC/Strategy& | | Eurostat `ilc_di03` | Demographics — EU income | ✅ Ingested | 5 | None | `extract-eurostat``stg_income` |
| Sport England Active Lives | Demographics | Open Download | High | Free download; UK sports participation data | | Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels``stg_city_labels` |
| USPA Court Directory | Court Locations | Scrape | High | Website scrape; 100+ member clubs listed | | ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk``stg_population_uk` |
| DPV Standorte (Germany) | Court Locations | Scrape | High | German federation venue page, small dataset | | US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa``stg_population_usa` |
| LTA Padel Venue Finder | Court Locations | Scrape | High | UK venue registry; The Padel Directory also available | | GeoNames cities15000 | Demographics — global fallback | ✅ Ingested† | 3 | `GEONAMES_USERNAME` (free) | `extract-geonames``stg_population_geonames` |
| PadelAPI.org | Tournament Data | Public API | High | Free tier: 50k req, last 6 months of data | | ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx``stg_fx_rates` (proposed) |
| padelapi.org MCP server | Tournament Data | Public API | High | AI-accessible padel tournament & player stats | | FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table |
| Google Maps Places API | Court Locations | Public API | Medium | $200 free/mo credit; text search for padel courts | | PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo |
| Playtomic third-party API | Pricing / Bookings | Public API | Medium | Club credential required; read-only; 1 req/min | | Sport England Active Lives | Demographics — UK participation | 🔲 Planned | 3 | None (CSV) | Annual download |
| ImmoScout24 API | Real Estate | Public API | Medium | Developer portal; commercial use; auth required | | DPV Standorte | Court locations | 🔲 Planned | 2 | None (scrape) | DE federation registry |
| Immowelt API | Real Estate | Public API | Medium | API documented; aggregator EstateSync also available | | LTA Padel Venue Finder | Court locations | 🔲 Planned | 2 | None (scrape) | UK venue registry |
| planning.data.gov.uk | Regulatory | Public API | Medium | UK planning data portal; some endpoints open | | USPA Club Directory | Court locations | 🔲 Planned | 2 | None (scrape) | US member clubs |
| FEP (Spanish federation) | Market Reports | Manual | Medium | Annual statistics published as press releases | | UK Planning Data Portal | Regulatory | 🔲 Planned | 2 | None | Planning permissions, sports use |
| Statista (padel topic page) | Market Reports | Subscription | Low | Some charts free; full data requires subscription | | Google Maps Places API | Court locations | ⏸ On hold | 2 | Paid ($200/mo credit) | Gap-fill for US/DE; data storage license required |
| Playskan.com | Pricing / Bookings | Scrape | Low | No public API; consumer site; ToS unclear | | ImmoScout24 API | Real estate — DE | ⏸ On hold | 2 | Partner account | Commercial rent benchmarks |
| CoStar / LoopNet | Real Estate | Subscription | Low | No public API; subscription only; scraping violates ToS | | Immowelt API | Real estate — DE | ⏸ On hold | 2 | Partner account | Commercial rent |
| Rightmove Commercial API | Real Estate | Subscription | Low | ADF partner program only; not open to arbitrary developers | | Rightmove Commercial | Real estate — UK | — | 1 | ADF partner only | Not accessible without partner agreement |
| JLL / CBRE Reports | Real Estate | Manual | Low | Published reports only; no API | | LoopNet / CoStar | Real estate — US/UK | — | 1 | Subscription | ToS prohibits scraping |
| Court Metrics | Pricing / Utilisation | Subscription | Low | Aggregated padel club competitive intelligence platform | | JLL / CBRE reports | Real estate | — | 1 | Manual (PDF) | Annual benchmark seed table only |
| Shovels.ai | Regulatory | Subscription | Low | US building permit intelligence; paid | | Statista | Market reports | — | 1 | Subscription | Primary data available from FIP/Playtomic for free |
| Matchi | Court Locations | Scrape | Low | No documented public API; consumer app | | Playskan | Pricing | — | 1 | No public API | Aggregates Playtomic/Matchi; go direct |
| Court Metrics | Pricing | — | 1 | Subscription | Derived from Playtomic signals |
| World Padel Rating | Tournament data | — | 1 | Scrape | Tournament venues only; limited utility |
| Matchi | Court locations | — | 1 | No public API | ToS prohibits scraping |
| GovData Germany | Regulatory | — | 1 | CKAN | Only aggregate permit counts available |
| Shovels.ai | Regulatory | — | 1 | Subscription | US only |
| Padel Biz Magazine | Market reports | — | 1 | Manual | No structured data |
† Extractor and staging model are live; placeholder file written when credentials absent. Set `CENSUS_API_KEY` / `GEONAMES_USERNAME` env vars to activate real data.
--- ---
@@ -70,6 +78,14 @@ Limitations: coverage is community-driven and incomplete in newer markets (Germa
OSM wiki: https://wiki.openstreetmap.org/wiki/Tag:sport=padel OSM wiki: https://wiki.openstreetmap.org/wiki/Tag:sport=padel
**Pipeline implementation:** ✅ Ingested
- Extractor: `extract-overpass` — single global query (all nodes/ways/relations with `sport=padel`), writes raw OSM JSON
- Landing: `data/landing/overpass/{year}/{month}/courts.json.gz`
- Staging: `staging.stg_padel_courts`, grain `osm_id`
- Columns: `osm_type, osm_id, lat, lon, name, country_code, city_tag, postcode, operator_name, opening_hours, fee`
- Cadence: monthly (OSM community changes are incremental; full re-query is cheap at ~1.5 MB response)
- No auth; query timeout set to 60 s in extractor
--- ---
### 1.2 Playtomic API ### 1.2 Playtomic API
@@ -99,6 +115,22 @@ External API docs (Notion): https://playtomicio.notion.site/Playtomic-External-A
Playtomic covers 16,000+ courts globally. The platform is dominant in Spain, UK, France, Germany, and expanding in the US. Playtomic covers 16,000+ courts globally. The platform is dominant in Spain, UK, France, Germany, and expanding in the US.
**Pipeline implementation (tenants):** ✅ Ingested
- Extractor: `extract-playtomic-tenants` — paginated global scrape of `GET /v1/tenants?sport_ids=PADEL`, page size 100, up to 500 pages
- Landing: `data/landing/playtomic/{year}/{month}/tenants.json.gz` (~14K venues as of Feb 2026)
- Throttle: 2 s between pages; deduplicates on `tenant_id`
- Staging models (all grain `tenant_id` or `(tenant_id, resource_id)`):
- `stg_playtomic_venues` — venue metadata: name, address, city, country, coordinates, booking type, status
- `stg_playtomic_resources` — court resources per venue: resource type, sport, surface, indoor/outdoor
- `stg_playtomic_opening_hours` — operating hours per venue per day of week
**Pipeline implementation (availability):** ✅ Ingested
- Extractor: `extract-playtomic-availability` — reads tenant IDs from latest tenants file, queries `GET /v1/availability` for next-day slots per venue
- Landing: `data/landing/playtomic/{year}/{month}/{date}/availability_morning.json.gz` + `availability_recheck.json.gz`
- Recheck mode: re-queries slots starting within 90 min (controlled by `RECHECK_WINDOW_MINUTES`); captures near-real-time fill rates
- Parallelism: `EXTRACT_WORKERS` env var; `PROXY_URLS` for distributed rate limiting; throttle 1 s per venue per worker
- Staging: `stg_playtomic_availability`, grain `(snapshot_date, tenant_id, resource_id, slot_start_time, snapshot_type, captured_at_utc)`
--- ---
### 1.3 DPV — Deutscher Padel Verband Standorte ### 1.3 DPV — Deutscher Padel Verband Standorte
@@ -430,6 +462,16 @@ Key datasets:
The R `eurostat` package and Python `eurostat` library provide typed wrappers. Data is queryable at NUTS2/NUTS3 and city level using `geoLevel=city`. The R `eurostat` package and Python `eurostat` library provide typed wrappers. Data is queryable at NUTS2/NUTS3 and city level using `geoLevel=city`.
**Pipeline implementation:** ✅ Ingested
- Extractor: `extract-eurostat` — ETag deduplication (304 Not Modified skips the write; most runs are fast no-ops)
- Landing: `data/landing/eurostat/{year}/{month}/{dataset}.json.gz`
- Base URL: `https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/{datasetCode}`
- Datasets fetched:
- `urb_cpop1` (city population): filter `indic_ur=DE1001V` (Population on 1 January, total), `geoLevel=city` → staging `stg_population`, grain `(city_code, ref_year)`. City codes are Eurostat format (`DE001C`).
- `ilc_di03` (median income): filter `indic_il=MED_E`, `unit=PPS`, `sex=T`, `age=TOTAL` → staging `stg_income`, grain `(country_code, ref_year)`. Income in Purchasing Power Standards for cross-country comparability.
- City-code bridge: `extract-eurostat-city-labels` / `stg_city_labels` maps `DE001C → Berlin`. See §9.1 for the live implementation details (compact JSON response, not SDMX 2.1 spec).
- Used in: `foundation.dim_cities` (Eurostat population + income joined via city labels → market score)
--- ---
### 5.2 US Census Bureau API ### 5.2 US Census Bureau API
@@ -447,6 +489,8 @@ The American Community Survey (ACS) API provides city and tract-level demographi
Relevant for US market expansion analysis. Relevant for US market expansion analysis.
**Pipeline implementation:** ✅ Ingested† — see §9.4 for full implementation details (endpoint, response format, place name parsing). Staging: `stg_population_usa`, grain `(place_fips, ref_year)`. Requires `CENSUS_API_KEY` env var; writes empty placeholder when absent.
--- ---
### 5.3 ONS Beta API (UK) ### 5.3 ONS Beta API (UK)
@@ -462,6 +506,8 @@ Relevant for US market expansion analysis.
The ONS Beta API at `https://api.beta.ons.gov.uk/v1` is open and unauthenticated. Rate limit: 120 requests/10 s, 200/min. Datasets include population estimates, deprivation indices, and 2021 census variables at MSOA/LAD level. Sports participation specifically comes from Sport England (see 5.4), not ONS directly. The ONS Beta API at `https://api.beta.ons.gov.uk/v1` is open and unauthenticated. Rate limit: 120 requests/10 s, 200/min. Datasets include population estimates, deprivation indices, and 2021 census variables at MSOA/LAD level. Sports participation specifically comes from Sport England (see 5.4), not ONS directly.
**Pipeline implementation:** ✅ Ingested — see §9.3 for full details (CSV download path, LAD code filtering, observations endpoint 404 bug). Staging: `stg_population_uk`, grain `(lad_code, ref_year)`. No credentials required.
--- ---
### 5.4 Sport England — Active Lives Survey ### 5.4 Sport England — Active Lives Survey
@@ -557,17 +603,23 @@ Token-based REST API. Free tier includes 50k requests/month and last 6 months of
### Recommended ingestion patterns ### Recommended ingestion patterns
| Source | Ingestion Pattern | | Source | Ingestion Pattern | Extractor |
|--------|------------------| |--------|------------------|-----------|
| Eurostat API | `httpfs` + JSON → staging table; run weekly | | Overpass / OSM | Single global query → JSON.gz; run monthly | `extract-overpass` |
| Overpass API / OSM | Bulk `.osm.pbf` download via Geofabrik → DuckDB spatial extension; run monthly | | Playtomic tenants | Paginated global scrape → JSON.gz; run monthly | `extract-playtomic-tenants` |
| Playtomic unauthenticated API | Paginated scraper per city bounding box → Parquet; run nightly | | Playtomic availability | Per-venue slot query → JSON.gz; run daily | `extract-playtomic-availability` |
| FIP / Playtomic PDFs | Manual parse → CSV seed files; run annually | | Eurostat `urb_cpop1` + `ilc_di03` | SDMX REST + ETag dedup → JSON.gz; run monthly | `extract-eurostat` |
| US Census ACS | `httpfs` REST → staging; run annually | | Eurostat SDMX city labels | Codelist fetch + ETag dedup → JSON.gz; run monthly | `extract-eurostat-city-labels` |
| ONS Beta API | `httpfs` REST → staging; run annually | | ONS UK mid-year estimates | CSV download (~68 MB) → JSON.gz; run annually | `extract-ons-uk` |
| Sport England CSV | Manual download → seed file; run annually | | US Census ACS 5-year | REST → JSON.gz; run annually | `extract-census-usa` ✅† |
| ImmoScout24 / Immowelt | API → staging (requires partner account); run monthly | | GeoNames cities15000 | Bulk zip download → JSON.gz; run monthly | `extract-geonames` ✅† |
| planning.data.gov.uk | REST API → staging; run weekly for new permissions | | ECB / Frankfurter.app FX | REST → JSON.gz; run daily or monthly | `extract-fx` 🔲 planned |
| FIP / Playtomic PDFs | Manual parse → CSV seed files; run annually | — |
| Sport England CSV | Manual download → seed file; run annually | — |
| ImmoScout24 / Immowelt | API → staging (requires partner account); run monthly | — |
| planning.data.gov.uk | REST API → staging; run weekly for new permissions | — |
† Placeholder file written when credentials absent; set `CENSUS_API_KEY` / `GEONAMES_USERNAME` to activate.
### Key technical constraints ### Key technical constraints
@@ -721,6 +773,81 @@ Population cascade in `dim_cities`: Eurostat → US Census → ONS → GeoNames
--- ---
## 10. FX / Currency Rates
Needed for two purposes:
1. **Cross-market normalisation** — Playtomic venue prices are in local currency (GBP for UK, USD for US, EUR for eurozone). Benchmarking court rates across countries requires a common base.
2. **Financial planner display** — the planner currently shows symbols (€/£/$) per country but applies no conversion. FX rates would let users toggle a "view in EUR" mode, or auto-convert EUR benchmark figures to the investor's local currency.
---
### 10.1 European Central Bank (ECB) Data Portal
| Field | Value |
|-------|-------|
| URL | https://data-api.ecb.europa.eu/service/data/EXR |
| Data Type | Daily exchange rates, EUR as base currency |
| Access Method | Public SDMX REST API |
| Credentials | None |
| Update Frequency | Daily (business days) |
| License | Public domain |
| Score | **4** |
| Status | 🔲 Planned |
ECB publishes official daily reference rates for ~30 currencies against EUR via SDMX. Free, unauthenticated, stable.
```
GET https://data-api.ecb.europa.eu/service/data/EXR/D.USD+GBP+CHF+SEK+AED.EUR.SP00.A
?format=jsondata&lastNObservations=1
```
Returns the most recent observation per currency pair. The SDMX JSON response is nested; rates live at `dataSets[0].series["{key}"].observations["0"][0]` where `{key}` encodes the dimension index positions (0:0:0:0:0, 1:0:0:0:0, …).
Key series for Padelnomics:
- `D.USD.EUR.SP00.A` — EUR/USD
- `D.GBP.EUR.SP00.A` — EUR/GBP
- `D.CHF.EUR.SP00.A` — EUR/CHF (Switzerland)
- `D.SEK.EUR.SP00.A` — EUR/SEK (Sweden)
- `D.AED.EUR.SP00.A` — EUR/AED (UAE)
**Note:** ECB only provides EUR-base rates. Cross rates (e.g. USD/GBP) require computation: `rate = eur_gbp / eur_usd`.
---
### 10.2 Frankfurter.app
| Field | Value |
|-------|-------|
| URL | https://api.frankfurter.app |
| Data Type | Daily exchange rates (ECB data re-served) |
| Access Method | Public REST API |
| Credentials | None |
| Update Frequency | Daily |
| License | MIT (open source) |
| Score | **4** |
| Status | 🔲 Planned |
Frankfurter is an open-source wrapper around ECB data with a simpler interface than the raw SDMX endpoint. No auth, no documented rate limit. Preferred for implementation simplicity; self-host the open-source version if uptime SLA becomes a concern.
```
GET https://api.frankfurter.app/latest?from=EUR&to=USD,GBP,CHF,SEK,AED
```
Response:
```json
{"amount": 1.0, "base": "EUR", "date": "2026-02-24",
"rates": {"USD": 1.0531, "GBP": 0.8412, "CHF": 0.9374, "SEK": 10.932, "AED": 3.8669}}
```
**Proposed pipeline:**
- Landing: `data/landing/fx/{year}/{month}/{date}/rates.json.gz`
- Format: `{"date": "2026-02-24", "base": "EUR", "rates": {"USD": 1.05, "GBP": 0.84, ...}}`
- Cadence: daily (or monthly — rates change daily but the pipeline only needs monthly snapshots for historical benchmarking)
- Staging: `staging.stg_fx_rates`, grain `(date, quote_currency)` — columns: `date, base_currency ('EUR'), quote_currency, rate`
- Downstream: join to `stg_playtomic_availability` price column to normalize to EUR; expose latest rate to planner for display conversion
---
## Sources ## Sources
- [Reverse Engineering Playtomic](https://mattrighetti.com/2025/03/03/reverse-engineering-playtomic) - [Reverse Engineering Playtomic](https://mattrighetti.com/2025/03/03/reverse-engineering-playtomic)