Files
padelnomics/docs/data-sources-inventory.md
Deeman b33dd51d76 feat: standardise recheck availability to JSONL output
- extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz
  (one venue per line with date/captured_at_utc/recheck_hour injected);
  uses compress_jsonl_atomic; removes write_gzip_atomic import
- stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited
  read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL;
  old recheck_blob CTE kept for transition
- init_landing_seeds.py: add JSONL recheck seed alongside blob seed
- Docs: README landing structure + data sources table updated; CHANGELOG
  availability bullets updated; data-sources-inventory paths corrected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:52:47 +01:00

44 KiB
Raw Permalink Blame History

Padel Market Intelligence — Data Sources Inventory

Compiled: 2026-02-21 · Updated: 2026-02-24 Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytics pipeline.


Pipeline Status Tracker

Status: Ingested — extractor + staging model live in master | 🔲 Planned — worth building | ⏸ On hold — blocked on cost/access | — Not targeted

Score (15): Overall ingestion priority. Weighs data value to Padelnomics (market scores, financial planner, pSEO content) against implementation effort and access barriers. 5 = core infrastructure already ingested, 1 = marginal or inaccessible.

Source Category Status Score Credentials Pipeline refs
OpenStreetMap / Overpass (padel) Court locations Ingested 5 None extract-overpassstg_padel_courts
OpenStreetMap / Overpass (tennis) Court locations Ingested 4 None extract-overpass-tennisstg_tennis_courts
Playtomic — tenants Court locations Ingested 5 None extract-playtomic-tenantsstg_playtomic_venues/resources/opening_hours
Playtomic — availability Pricing / utilisation Ingested 5 None extract-playtomic-availabilitystg_playtomic_availability
Eurostat urb_cpop1 Demographics — EU city population Ingested 5 None extract-eurostatstg_population
Eurostat ilc_di03 Demographics — EU income Ingested 5 None extract-eurostatstg_income
Eurostat SDMX city labels Demographics — EU city lookup Ingested 4 None extract-eurostat-city-labelsstg_city_labels
ONS UK mid-year estimates Demographics — UK population Ingested 4 None extract-ons-ukstg_population_uk
US Census ACS 5-year Demographics — US population Ingested† 3 CENSUS_API_KEY (free) extract-census-usastg_population_usa
GeoNames cities1000 Demographics — global locations ≥1K pop Ingested† 4 GEONAMES_USERNAME=padelnomics (free) extract-geonamesstg_population_geonamesdim_locations
ECB / Frankfurter.app FX rates 🔲 Planned 4 None extract-fxstg_fx_rates (proposed)
FIP World Padel Report Market reports 🔲 Planned 4 None (PDF) Annual seed table
PadelAPI.org Tournament data 🔲 Planned 3 Free-tier token 50k req/mo
Sport England Active Lives Demographics — UK participation 🔲 Planned 3 None (CSV) Annual download
DPV Standorte Court locations 🔲 Planned 2 None (scrape) DE federation registry
LTA Padel Venue Finder Court locations 🔲 Planned 2 None (scrape) UK venue registry
USPA Club Directory Court locations 🔲 Planned 2 None (scrape) US member clubs
UK Planning Data Portal Regulatory 🔲 Planned 2 None Planning permissions, sports use
Google Maps Places API Court locations ⏸ On hold 2 Paid ($200/mo credit) Gap-fill for US/DE; data storage license required
ImmoScout24 API Real estate — DE ⏸ On hold 2 Partner account Commercial rent benchmarks
Immowelt API Real estate — DE ⏸ On hold 2 Partner account Commercial rent
Rightmove Commercial Real estate — UK 1 ADF partner only Not accessible without partner agreement
LoopNet / CoStar Real estate — US/UK 1 Subscription ToS prohibits scraping
JLL / CBRE reports Real estate 1 Manual (PDF) Annual benchmark seed table only
Statista Market reports 1 Subscription Primary data available from FIP/Playtomic for free
Playskan Pricing 1 No public API Aggregates Playtomic/Matchi; go direct
Court Metrics Pricing 1 Subscription Derived from Playtomic signals
World Padel Rating Tournament data 1 Scrape Tournament venues only; limited utility
Matchi Court locations 1 No public API ToS prohibits scraping
GovData Germany Regulatory 1 CKAN Only aggregate permit counts available
Shovels.ai Regulatory 1 Subscription US only
Padel Biz Magazine Market reports 1 Manual No structured data

† Extractor and staging model are live; placeholder file written when credentials absent. Set CENSUS_API_KEY / GEONAMES_USERNAME env vars to activate real data.


1. Court Location & Registry Data

1.1 OpenStreetMap — Overpass API

Field Value
URL https://overpass-turbo.eu / https://overpass-api.de/api/
Data Type Geographic — padel court locations, geometry, names, addresses
Access Method Public API
Update Frequency Continuous (community-edited)
License / TOS ODbL — open use with attribution
Priority High

OSM uses the tag sport=padel on leisure=sports_centre or leisure=pitch nodes. The Overpass API is free, unauthenticated, and globally scoped. Query example:

[out:json];
(
  node["sport"="padel"];
  way["sport"="padel"];
  relation["sport"="padel"];
);
out body;

Limitations: coverage is community-driven and incomplete in newer markets (Germany, US). Spain and UK coverage is reasonable. Data can be downloaded in bulk as .osm.pbf files from Geofabrik for a full DuckDB load.

OSM wiki: https://wiki.openstreetmap.org/wiki/Tag:sport=padel

Pipeline implementation: Ingested

  • Extractor: extract-overpass — single global query (all nodes/ways/relations with sport=padel), writes raw OSM JSON
  • Landing: data/landing/overpass/{year}/{month}/courts.json.gz
  • Staging: staging.stg_padel_courts, grain osm_id
  • Columns: osm_type, osm_id, lat, lon, name, country_code, city_tag, postcode, operator_name, opening_hours, fee
  • Cadence: monthly (OSM community changes are incremental; full re-query is cheap at ~1.5 MB response)
  • No auth; query timeout set to 60 s in extractor

1.2 Playtomic API

Field Value
URL https://third-party.playtomic.io / https://api.playtomic.io/v1
Data Type Venues (tenants), court availability, pricing slots
Access Method Public API (some endpoints); club credentials (official third-party API)
Update Frequency Real-time
License / TOS Playtomic ToS — data may not be redistributed; read-only API
Priority High (unauthenticated availability), Medium (official API)

Two access tiers exist:

Unauthenticated endpoints (confirmed via reverse engineering, March 2025):

  • GET /v1/availability?sport_id=PADEL&start_min=...&start_max=...&tenant_id=... — max 25 h window per request
  • GET /v1/tenants?sport_ids=PADEL&... — tenant (venue) search by geo-bounds; no auth required

Official Third-Party API (credential-based):

  • Credentials generated in Playtomic Manager → Settings → Developer Tools
  • Requires Champion or Master subscription plan per tenant
  • Read-only; rate limit ~1 call/minute
  • Auth: POST https://api.playtomic.io/oauth/token with client_id, client_secret, grant_type

External API docs (Notion): https://playtomicio.notion.site/Playtomic-External-API-Documentation-v1-5-57430603e8324c7c9f69bb2c9327eb98

Playtomic covers 16,000+ courts globally. The platform is dominant in Spain, UK, France, Germany, and expanding in the US.

Pipeline implementation (tenants): Ingested

  • Extractor: extract-playtomic-tenants — paginated global scrape of GET /v1/tenants?sport_ids=PADEL, page size 100, up to 500 pages
  • Landing: data/landing/playtomic/{year}/{month}/tenants.jsonl.gz (~14K venues as of Feb 2026)
  • Throttle: 2 s between pages; deduplicates on tenant_id
  • Staging models (all grain tenant_id or (tenant_id, resource_id)):
    • stg_playtomic_venues — venue metadata: name, address, city, country, coordinates, booking type, status
    • stg_playtomic_resources — court resources per venue: resource type, sport, surface, indoor/outdoor
    • stg_playtomic_opening_hours — operating hours per venue per day of week

Pipeline implementation (availability): Ingested

  • Extractor: extract-playtomic-availability — reads tenant IDs from latest tenants file, queries GET /v1/availability for next-day slots per venue
  • Landing: data/landing/playtomic/{year}/{month}/availability_{date}.jsonl.gz (morning) + availability_{date}_recheck_{HH}.jsonl.gz (recheck)
  • Old blob format (.json.gz) retained in landing zone alongside JSONL; staging reads both
  • Recheck mode: re-queries slots starting within RECHECK_WINDOW_MINUTES (default 30); captures near-real-time fill rates
  • Parallelism: worker count derived from PROXY_URLS length; throttle 1 s per venue per worker
  • Staging: stg_playtomic_availability, grain (snapshot_date, tenant_id, resource_id, slot_start_time, snapshot_type, captured_at_utc)

1.3 DPV — Deutscher Padel Verband Standorte

Field Value
URL https://www.dpv-padel.de/standorte-2/
Data Type German federation-registered padel venues
Access Method Scrape
Update Frequency Periodic (federation-managed)
License / TOS No explicit open data license; scrape for internal use only
Priority High

The DPV "Standorte" page lists DPV-registered venues in Germany. No API exists. The dataset is small (Germany has ~875 courts as of end 2025) and can be scraped as a one-time or periodic snapshot.


1.4 LTA Padel Venue Finder (UK)

Field Value
URL https://www.ltapadel.org.uk/play/find-a-padel-court/
Data Type UK padel venue registry; court count, location, facilities
Access Method Scrape
Update Frequency Ongoing (LTA maintains registration program)
License / TOS No public API or data license; scrape for internal use
Priority High

The LTA runs a venue registration program. As of July 2025, the UK has 1,000+ courts across 325 venues. The Padel Directory (https://www.thepadeldirectory.co.uk/) is an alternative aggregator with filtering. Neither offers a public API.


1.5 USPA US Padel Club Directory

Field Value
URL https://padelusa.org/us-padel-clubs/
Data Type US padel club registry; name, city, state
Access Method Scrape
Update Frequency Periodic
License / TOS No explicit license; internal use scrape
Priority High

The USPA lists 100+ member clubs with city/state. The dataset is small enough for a one-time scrape plus quarterly refresh. Only USPA member clubs are listed — not comprehensive for all US courts.


1.6 Google Maps Places API

Field Value
URL https://developers.google.com/maps/documentation/places/web-service/overview
Data Type Business name, address, coordinates, ratings, opening hours
Access Method Public API (paid)
Update Frequency Real-time
License / TOS Google Maps Platform ToS — data cannot be stored beyond caching limits without a license
Priority Medium

Text Search ("padel court" + city) returns POI data including address and rating. Pricing from March 2025: Essentials tier free up to 10,000 events/month; Text Search (Basic) ~$0.04/request beyond that. Storing results in a database requires a Maps Data Export license agreement.

Use case: gap-fill where OSM or federation data is absent, particularly for US venues.


1.7 World Padel Rating

Field Value
URL https://app.worldpadelrating.com/tournaments
Data Type Player rankings, tournament venues
Access Method Scrape
Update Frequency After each tournament
License / TOS No public API documented
Priority Low

Tournament venue data, not a comprehensive court registry. Limited utility for location intelligence.


1.8 Matchi (Racket Sports Booking)

Field Value
URL https://www.matchi.se
Data Type Venue listings, court availability (Scandinavia, some EU)
Access Method Scrape
Update Frequency Real-time
License / TOS No public API; ToS prohibits scraping
Priority Low

Matchi is a Playtomic competitor popular in Sweden and northern Europe. Used by Playskan as a data source. No documented public API found.


2. Pricing & Revenue Data

2.1 Playtomic — Public Availability & Pricing

(See 1.2 above for API details.)

The unauthenticated /v1/availability endpoint returns time slots with prices visible to consumers. This enables per-venue, per-city price benchmarking without club credentials. Max 25 h window per request; rate limit must be respected.


2.2 Playskan.com

Field Value
URL https://www.playskan.com
Data Type Aggregated court availability + price ranges across Playtomic, Matchi, Padel Mates (UK-focused)
Access Method Scrape
Update Frequency Real-time (consumer UI)
License / TOS No public API; built on FastAPI + DynamoDB internally; ToS unclear
Priority Low

Playskan is the world's first padel booking aggregator (UK + some EU), described as "Skyscanner for padel courts." The platform itself aggregates Playtomic, Matchi, and Padelmates data. It offers a calendar view with availability counts and price ranges. No public API; the backend is internal. Going directly to source APIs (Playtomic, Matchi) is preferred.


2.3 Court Metrics

Field Value
URL https://courtmetrics.io
Data Type Estimated booking revenue, pricing, utilisation signals, Google Maps reputation per club
Access Method Subscription
Update Frequency Ongoing
License / TOS Commercial SaaS; data derived from public booking platform signals
Priority Low

Court Metrics is a padel-specific competitive intelligence SaaS that aggregates publicly visible pricing and availability data plus Google Maps signals. It provides estimated revenue per competitor club. Useful as a benchmark check but adds cost as a dependency. Data source is ultimately the same public Playtomic/booking signals.


3. Market Growth & Industry Reports

3.1 FIP World Padel Report

Field Value
URL https://www.padelfip.com/world-padel-report-2025/
Data Type Global court counts, player numbers, federation membership by country, tournament stats
Access Method Open Download (PDF)
Update Frequency Annual (December)
License / TOS FIP copyright; citations permitted
Priority High

Published annually by the International Padel Federation. The 2025 edition (released December 2025) reports 77,300 courts globally (+15.2%), 35M+ players, 100 member federations. The 2024 edition PDF is available directly from the Danish Padel Federation and other mirrors.

Direct PDF (2024): https://padelfip.com/pdf/WORLD_PADEL_REPORT_2024_FIP.pdf


3.2 Playtomic Global Padel Report

Field Value
URL https://playtomic.com/global-padel-report
Data Type Global court counts, club growth rates, booking patterns, country-level breakdowns
Access Method Open Download (PDF)
Update Frequency Annual
License / TOS Playtomic/PwC copyright; citations permitted
Priority High

Co-produced with PwC's Strategy& arm. The 2025 report covers 50,000+ courts globally, 3,282 new clubs in 2024 (avg 9/day), 92% player return rate. Free download at playtomic.com/global-padel-report.

PDF mirror: https://www.padeladdict.com/wp-content/uploads/2025/07/PLAYTOMIC_GLOBAL-_PADEL_REPORT_2025.pdf


3.3 Playtomic Global Padel Report 2023 (Deloitte)

Field Value
URL https://www.scribd.com/document/714549135/202306-Global-Padel-Report-2023
Data Type Market valuation (€1.775B global padel club market as of 2023), growth projections
Access Method Open Download (PDF)
Update Frequency One-time (2023 edition)
License / TOS Playtomic/Deloitte copyright
Priority Medium

Earlier edition co-produced with Deloitte. Contains baseline market valuation data useful for DuckDB time series.


3.4 Statista — Padel Topic Page

Field Value
URL https://www.statista.com/topics/12528/padel/
Data Type Market size, equipment revenue, country stats, consumer survey results
Access Method Subscription (some charts free)
Update Frequency Irregular
License / TOS Statista commercial license required for data export
Priority Low

The padel topic page aggregates third-party data (FIP, Playtomic, national federations). The underlying data is available from primary sources for free; Statista adds presentation but no original data collection.


3.5 Padel Biz Magazine Newsletter

Field Value
URL https://newsletter.padelbusinessmagazine.com
Data Type Market statistics digests, court count updates, industry news
Access Method Open (newsletter / web)
Update Frequency Weekly
License / TOS Editorial content; no API
Priority Medium

Regularly aggregates and re-publishes FIP/Playtomic data with added context. Useful for tracking new report releases. No structured data.


3.6 Misitrano Consulting — State of Padel in the US 2025

Field Value
URL https://www.misitranoconsulting.com/us-padel-report
Data Type US market size, court counts by state, investment trends
Access Method Manual (form download)
Update Frequency Annual
License / TOS Commercial consulting report
Priority Medium

US-specific market sizing. Gated but free download. Useful supplement to FIP for the US segment.


4. Commercial Real Estate Data

4.1 ImmoScout24 API (Germany)

Field Value
URL https://api.immobilienscout24.de
Data Type Commercial property listings; rent/price per sqm by location
Access Method Public API (requires registration + commercial use agreement)
Update Frequency Real-time (listings)
License / TOS Commercial use only; API ToS prohibits redistribution
Priority Medium

ImmoScout24 has a documented developer portal with Import/Export API, search API, and a Market Data endpoint. Sandbox access is available via registration. The Market Data API provides price/rent indices. Authentication via OAuth 1.0/2.0. Primarily targets real estate agents and portals, but market data endpoints are accessible to analytics users with a commercial account.


4.2 Immowelt API (Germany)

Field Value
URL https://www.immowelt.de/anbieten/gewerbe/apitechdoku
Data Type Commercial rental listings; warehouse and industrial space
Access Method Public API (requires partner registration)
Update Frequency Real-time
License / TOS Partner/commercial agreement required
Priority Medium

Immowelt documents an API for commercial listings. EstateSync (https://estatesync.com/en/) provides a wrapper REST API for both ImmoScout24 and Immowelt. Useful for tracking warehouse/industrial rental rates as a proxy for padel hall fit-out costs.


4.3 Rightmove Commercial Listings API (UK)

Field Value
URL https://api-docs.rightmove.co.uk/docs/property-feed-api-product/1/overview
Data Type Commercial property listings; rents, sizes, locations
Access Method Subscription (ADF partner program)
Update Frequency Real-time
License / TOS Rightmove ADF partner agreement; not open to arbitrary developers
Priority Low

Rightmove's API (ADF format) is restricted to partner estate agents and portals. Contact adfsupport@rightmove.co.uk for access. Not practically accessible for a startup analytics pipeline without a commercial relationship.


4.4 LoopNet / CoStar (US + UK)

Field Value
URL https://www.loopnet.com / https://www.costar.com
Data Type Commercial real estate listings, market analytics
Access Method Subscription
Update Frequency Real-time
License / TOS Proprietary; no public API; scraping violates ToS
Priority Low

CoStar acquired LoopNet and operates both as subscription services. No public API exists. A previous government-mandated data sharing arrangement (post-LoopNet acquisition) with Xceligent collapsed after copyright violations. For US commercial rent benchmarks, manual extraction or a CoStar institutional subscription is the only legitimate path.


4.5 JLL / CBRE Market Reports

Field Value
URL https://www.jll.com/en-de/insights / https://www.cbre.com/insights
Data Type Commercial real estate market indices; industrial/warehouse rents; European market outlook
Access Method Manual (PDF reports)
Update Frequency Quarterly
License / TOS Copyright; no API; PDF/HTML reports only
Priority Low

JLL and CBRE publish free quarterly market reports for Germany, UK, and other EU markets covering industrial/warehouse rents. These are manually downloaded PDFs — no structured data export. Useful for one-time benchmarks to seed DuckDB reference tables.

JLL Germany Q4 2025 Investment Market: https://www.jll.com/en-de/insights/market-dynamics/germany-investment


5. Demographics & Socioeconomics

5.1 Eurostat Statistics API

Field Value
URL https://ec.europa.eu/eurostat/web/user-guides/data-browser/api-data-access/api-introduction
Data Type Population, income, sports participation (EHIS), NUTS city-level
Access Method Public API
Update Frequency Annual to multi-year (survey-dependent)
License / TOS CC BY 4.0 — free use with attribution
Priority High

Eurostat's Statistics API (SDMX 3.0 + REST) is free and unauthenticated. Base URL: https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/{datasetCode}.

Key datasets:

  • sprt_pcs — sport participation by country (from EHIS; wave 4 expected 2025/26)
  • urb_cpop1 — city population statistics (NUTS LAU)
  • ilc_di03 — median equivalised net income by NUTS2

The R eurostat package and Python eurostat library provide typed wrappers. Data is queryable at NUTS2/NUTS3 and city level using geoLevel=city.

Pipeline implementation: Ingested

  • Extractor: extract-eurostat — ETag deduplication (304 Not Modified skips the write; most runs are fast no-ops)
  • Landing: data/landing/eurostat/{year}/{month}/{dataset}.json.gz
  • Base URL: https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/{datasetCode}
  • Datasets fetched:
    • urb_cpop1 (city population): filter indic_ur=DE1001V (Population on 1 January, total), geoLevel=city → staging stg_population, grain (city_code, ref_year). City codes are Eurostat format (DE001C).
    • ilc_di03 (median income): filter indic_il=MED_E, unit=PPS, sex=T, age=TOTAL → staging stg_income, grain (country_code, ref_year). Income in Purchasing Power Standards for cross-country comparability.
  • City-code bridge: extract-eurostat-city-labels / stg_city_labels maps DE001C → Berlin. See §9.1 for the live implementation details (compact JSON response, not SDMX 2.1 spec).
  • Used in: foundation.dim_cities (Eurostat population + income joined via city labels → market score)

5.2 US Census Bureau API

Field Value
URL https://www.census.gov/data/developers.html
Data Type Population, income, age distribution (ACS 5-year), geographic boundaries
Access Method Public API (free API key)
Update Frequency Annual (ACS)
License / TOS Public domain (US federal government data)
Priority High

The American Community Survey (ACS) API provides city and tract-level demographics. Free API key required (no cost). Endpoint pattern: https://api.census.gov/data/2023/acs/acs5?get=B01003_001E,NAME&for=place:*&in=state:12.

Relevant for US market expansion analysis.

Pipeline implementation: Ingested† — see §9.4 for full implementation details (endpoint, response format, place name parsing). Staging: stg_population_usa, grain (place_fips, ref_year). Requires CENSUS_API_KEY env var; writes empty placeholder when absent.


5.3 ONS Beta API (UK)

Field Value
URL https://developer.ons.gov.uk
Data Type UK city/MSOA demographics, income, population
Access Method Public API
Update Frequency Annual
License / TOS Open Government Licence v3 — free use
Priority High

The ONS Beta API at https://api.beta.ons.gov.uk/v1 is open and unauthenticated. Rate limit: 120 requests/10 s, 200/min. Datasets include population estimates, deprivation indices, and 2021 census variables at MSOA/LAD level. Sports participation specifically comes from Sport England (see 5.4), not ONS directly.

Pipeline implementation: Ingested — see §9.3 for full details (CSV download path, LAD code filtering, observations endpoint 404 bug). Staging: stg_population_uk, grain (lad_code, ref_year). No credentials required.


5.4 Sport England — Active Lives Survey

Field Value
URL https://www.sportengland.org/research-and-data/data/active-lives
Data Type UK sports participation rates by sport, age, geography (local authority level)
Access Method Open Download (CSV/Excel)
Update Frequency Annual (April publication)
License / TOS Open Government Licence v3
Priority High

Active Lives is the UK's primary sports participation survey (~200,000 respondents/year). The November 202324 report was published April 2025. Data tables are downloadable from Sport England's website. The UK Data Service also holds microdata for detailed analysis. Sports classification does not yet include padel as a standalone category, but racket sports and physical activity levels at local authority level are relevant for site selection.

Interactive explorer: https://activelives.sportengland.org/


5.5 Statista (Sports Market Data)

(See 3.4 above — same platform, subscription required for export.)


6. Regulatory & Zoning

6.1 UK Planning Data Portal

Field Value
URL https://www.planning.data.gov.uk
Data Type Planning applications, permissions, land use data (England)
Access Method Public API
Update Frequency Ongoing
License / TOS Open Government Licence v3
Priority Medium

MHCLG's Planning Data service provides an API for planning applications across England. The API documentation is at https://www.planning.data.gov.uk/docs. Third-party services like Landhawk (https://www.landhawk.uk/api/planning-application-data/) and Searchland provide enhanced APIs with historical data back to 1990. The London Planning Datahub (https://www.london.gov.uk/programmes-strategies/planning/digital-planning/planning-london-datahub) provides London-specific real-time planning data.

Use case: identify commercial/industrial sites with planning permission for sports use, or track padel-related applications.


6.2 GovData — Germany Open Data Portal

Field Value
URL https://www.govdata.de
Data Type German federal/state open data (CKAN catalog); building permits at aggregate level
Access Method Public API (CKAN REST)
Update Frequency Varies by dataset
License / TOS CKAN ODbL / individual dataset licenses
Priority Medium

GovData hosts 1,200+ high-value datasets from German federal, state, and local governments. Building permit data is available at aggregate statistical level (monthly counts by type, from Destatis). Individual permit records are not centralised — they remain with local Bauämter (building offices). The CEIC database publishes aggregated Germany Building Permits indicators: https://www.ceicdata.com/en/indicator/germany/building-permits.


6.3 Shovels.ai (US Building Permits)

Field Value
URL https://www.shovels.ai
Data Type US building permit records; commercial construction activity
Access Method Subscription
Update Frequency Ongoing
License / TOS Commercial SaaS
Priority Low

Shovels aggregates US local building permit databases into a searchable API. Relevant for tracking new sports facility construction in the US. Paid subscription; pricing not publicly listed.


7. Tournament & Professional Circuit Data

7.1 PadelAPI.org

Field Value
URL https://padelapi.org / https://docs.padelapi.org
Data Type Professional tournament draws, results, player stats, rankings (Premier Padel + WPT archive)
Access Method Public API
Update Frequency Real-time during tournaments
License / TOS Free tier available (50k requests, last 6 months); paid tiers for full history
Priority High

Token-based REST API. Free tier includes 50k requests/month and last 6 months of match data. Covers Premier Padel and 2023 WPT events. Includes an MCP server for AI assistant integration. Useful for correlating major tournament venues with local market demand signals.


8. DuckDB Integration Notes

Source Ingestion Pattern Extractor
Overpass / OSM Single global query → JSON.gz; run monthly extract-overpass
Playtomic tenants Paginated global scrape → JSON.gz; run monthly extract-playtomic-tenants
Playtomic availability Per-venue slot query → JSON.gz; run daily extract-playtomic-availability
Eurostat urb_cpop1 + ilc_di03 SDMX REST + ETag dedup → JSON.gz; run monthly extract-eurostat
Eurostat SDMX city labels Codelist fetch + ETag dedup → JSON.gz; run monthly extract-eurostat-city-labels
ONS UK mid-year estimates CSV download (~68 MB) → JSON.gz; run annually extract-ons-uk
US Census ACS 5-year REST → JSON.gz; run annually extract-census-usa
GeoNames cities15000 Bulk zip download → JSON.gz; run monthly extract-geonames
ECB / Frankfurter.app FX REST → JSON.gz; run daily or monthly extract-fx 🔲 planned
FIP / Playtomic PDFs Manual parse → CSV seed files; run annually
Sport England CSV Manual download → seed file; run annually
ImmoScout24 / Immowelt API → staging (requires partner account); run monthly
planning.data.gov.uk REST API → staging; run weekly for new permissions

† Placeholder file written when credentials absent; set CENSUS_API_KEY / GEONAMES_USERNAME to activate.

Key technical constraints

  • Playtomic: availability endpoint limited to 25 h windows per call; ~1 req/min recommended on the official API. The unauthenticated tenant search endpoint has no documented rate limit but should be throttled (1 req/2 s).
  • Eurostat: no rate limit documented for the Statistics API; the SDMX API supports bulk dataset downloads.
  • ONS Beta API: 120 req/10 s hard limit; back off on 429.
  • Google Maps Places: storage of query results beyond the caching window requires a Maps Data Export license.
  • CoStar/LoopNet/Rightmove: no legitimate automated access path without partner agreements. Avoid scraping — ToS explicitly prohibit it and CoStar has a history of pursuing copyright enforcement.

9. Live Implementation Findings (Feb 2026)

Findings from implementing the population pipeline extractors and staging models. Pipeline stack: Python extractors → landing zone JSON.gz → SQLMesh / DuckDB.


9.1 Eurostat SDMX City Labels Codelist

Field Value
Endpoint https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/codelist/ESTAT/CITIES?format=JSON
Credentials None required
Response size ~190 KB
Dedup ETag header — only re-downloads when changed

The SDMX 2.1 codelist endpoint for ESTAT/CITIES returns a compact dimension JSON, not the SDMX 2.1 full XML/JSON structure. The useful content is:

{"category": {"label": {"DE001C": "Berlin", "DE002C": "Hamburg", ...}}}

This is a flat {city_code: city_name} dict with ~1,800 entries covering EU cities.

Critical finding: The response does NOT match the SDMX 2.1 spec's data["structure"]["codelists"] path. The correct path is data["category"]["label"].

Country-level entries (e.g. "DE", "FR") appear in the dict without digits — filter them by requiring any(c.isdigit() for c in city_code) to keep only proper city codes like DE001C.

Result: 1,771 city codes extracted. Provides the missing city_code → city_name mapping needed to join Eurostat population data (urb_cpop1, which uses city codes) against dim_cities (which uses city names).


9.2 Eurostat Population — urb_cpop1

(Existing extractor eurostat.py; no changes required.)

The urb_cpop1 dataset provides city-level population estimates at the geoLevel=city dimension. Data is keyed by city code (e.g. DE001C), not city name — the SDMX city labels codelist (9.1) provides the bridge.

Population pipeline join: stg_city_labels + stg_populationdim_cities populates ~75% of EU/UK cities matched by name. Cities without a matching city code in Eurostat (smaller/newer cities) fall through to the GeoNames fallback.


9.3 ONS UK — Mid-Year Population Estimates

Field Value
Dataset mid-year-pop-est
Edition mid-2022-england-wales
Download URL https://api.beta.ons.gov.uk/v1/datasets/mid-year-pop-est/editions/mid-2022-england-wales/versions/{N}/downloads/csv/href
Credentials None required
File size ~68 MB uncompressed CSV
Reference year 2022 (mid-year estimate)

Critical finding: The ONS observations API endpoint (/observations?geography=*&age=0) returns 404 for the datasets documented in the developer portal. The correct approach for bulk data is the CSV download path, reached by:

  1. GET /v1/datasets/mid-year-pop-est/editions/mid-2022-england-wales/versions — list versions, pick max(version)
  2. GET versions[latest]["downloads"]["csv"]["href"] — download the ~68 MB CSV

CSV format: one row per (year × LAD × sex × age-group). Filter:

  • sex = 'all' (aggregate row — do not sum individual sex rows as that double-counts)
  • calendar-years = '2022' (target year from edition name)
  • LAD codes starting with E0, W0, S1, N0 (English/Welsh/Scottish/NI districts; excludes region/country aggregate codes)

Sum the v4_0 column per administrative-geography (LAD code) to get total population.

Result: 316 UK Local Authority Districts with population ≥ 50,000, ref_year = 2022.


9.4 US Census Bureau — ACS 5-Year Place Population

Field Value
Endpoint https://api.census.gov/data/2023/acs/acs5?get=B01003_001E,NAME&for=place:*&in=state:*
Credentials CENSUS_API_KEY (free — register at https://api.census.gov/data/key_signup.html)
Variable B01003_001E = total population (ACS concept: Total Population)
Vintage 2023 (released late 2024)
Coverage ~30,000 Census places across all 50 states + DC

Response is a JSON array: first row = headers ["B01003_001E", "NAME", "state", "place"], subsequent rows = data.

Place names follow the pattern "Los Angeles city, California" — strip the suffix ( city, town, CDP, borough, village, municipality) and take the part before the first comma.

Filtered to population ≥ 50,000: ~1,500 US cities.

Status: Extractor implemented. Requires CENSUS_API_KEY env var. Without it, a {"rows": [], "count": 0} placeholder is written so the SQLMesh staging model does not fail. US population data will be empty until the key is added.


9.5 GeoNames — cities15000 Global Bulk

Field Value
Download URL https://download.geonames.org/export/dump/cities15000.zip
Credentials GEONAMES_USERNAME (free — register at https://www.geonames.org/login)
File size ~1.5 MB compressed, ~26,000 entries (all cities ≥ 15,000 pop)
Update frequency Monthly (GeoNames updates continuously)
License CC BY 4.0

The username is passed as ?username=... in the URL query string (signals ToS acceptance to GeoNames; no auth gate).

Tab-separated format (19 columns). Relevant columns:

  • col 0: geoname_id — stable numeric ID
  • col 1: name — Unicode name
  • col 2: asciiname — ASCII transliteration (preferred for matching)
  • col 7: feature_code — filter to {PPLC, PPLA, PPLA2, PPL} (excludes airports, parks)
  • col 8: country_code — ISO 2-letter
  • col 14: population

Filtered to pop ≥ 50,000 and valid feature codes: ~7,0009,000 cities globally.

Status: Extractor implemented. Requires GEONAMES_USERNAME env var. Without it, a {"rows": [], "count": 0} placeholder is written so staging models do not fail. Acts as the final fallback in dim_cities for any city not matched by Eurostat, US Census, or ONS.


9.6 DuckDB read_json() glob limitation

Finding: DuckDB's glob() is a table function (returns rows), not a scalar function. It cannot be used as an argument to read_json() as read_json(glob('/path/*')) or via a subquery read_json((SELECT list(file) FROM glob(...))).

Workaround used: Extractors that skip due to missing credentials write a {"rows": [], "count": 0} placeholder file, ensuring at least one file always exists for each source's glob pattern. The SQLMesh staging models use string glob patterns like:

read_json(@LANDING_DIR || '/census_usa/*/*/acs5_places.json.gz', auto_detect = true)

This pattern requires the files to physically exist. The placeholder approach is cleaner than conditional SQL in the model.


9.7 Population Coverage Summary (Feb 2026)

Source Region Cities extracted Credentials needed
Eurostat urb_cpop1 + SDMX city labels EU-27 + EEA ~1,400 cities None
ONS mid-year estimates England & Wales 316 LADs None
US Census ACS 5-year United States ~1,500 places CENSUS_API_KEY (free)
GeoNames cities15000 Global fallback ~7,500 cities GEONAMES_USERNAME (free)

Population cascade in dim_cities: Eurostat → US Census → ONS → GeoNames → 0.


10. FX / Currency Rates

Needed for two purposes:

  1. Cross-market normalisation — Playtomic venue prices are in local currency (GBP for UK, USD for US, EUR for eurozone). Benchmarking court rates across countries requires a common base.
  2. Financial planner display — the planner currently shows symbols (€/£/$) per country but applies no conversion. FX rates would let users toggle a "view in EUR" mode, or auto-convert EUR benchmark figures to the investor's local currency.

10.1 European Central Bank (ECB) Data Portal

Field Value
URL https://data-api.ecb.europa.eu/service/data/EXR
Data Type Daily exchange rates, EUR as base currency
Access Method Public SDMX REST API
Credentials None
Update Frequency Daily (business days)
License Public domain
Score 4
Status 🔲 Planned

ECB publishes official daily reference rates for ~30 currencies against EUR via SDMX. Free, unauthenticated, stable.

GET https://data-api.ecb.europa.eu/service/data/EXR/D.USD+GBP+CHF+SEK+AED.EUR.SP00.A
    ?format=jsondata&lastNObservations=1

Returns the most recent observation per currency pair. The SDMX JSON response is nested; rates live at dataSets[0].series["{key}"].observations["0"][0] where {key} encodes the dimension index positions (0:0:0:0:0, 1:0:0:0:0, …).

Key series for Padelnomics:

  • D.USD.EUR.SP00.A — EUR/USD
  • D.GBP.EUR.SP00.A — EUR/GBP
  • D.CHF.EUR.SP00.A — EUR/CHF (Switzerland)
  • D.SEK.EUR.SP00.A — EUR/SEK (Sweden)
  • D.AED.EUR.SP00.A — EUR/AED (UAE)

Note: ECB only provides EUR-base rates. Cross rates (e.g. USD/GBP) require computation: rate = eur_gbp / eur_usd.


10.2 Frankfurter.app

Field Value
URL https://api.frankfurter.app
Data Type Daily exchange rates (ECB data re-served)
Access Method Public REST API
Credentials None
Update Frequency Daily
License MIT (open source)
Score 4
Status 🔲 Planned

Frankfurter is an open-source wrapper around ECB data with a simpler interface than the raw SDMX endpoint. No auth, no documented rate limit. Preferred for implementation simplicity; self-host the open-source version if uptime SLA becomes a concern.

GET https://api.frankfurter.app/latest?from=EUR&to=USD,GBP,CHF,SEK,AED

Response:

{"amount": 1.0, "base": "EUR", "date": "2026-02-24",
 "rates": {"USD": 1.0531, "GBP": 0.8412, "CHF": 0.9374, "SEK": 10.932, "AED": 3.8669}}

Proposed pipeline:

  • Landing: data/landing/fx/{year}/{month}/{date}/rates.json.gz
  • Format: {"date": "2026-02-24", "base": "EUR", "rates": {"USD": 1.05, "GBP": 0.84, ...}}
  • Cadence: daily (or monthly — rates change daily but the pipeline only needs monthly snapshots for historical benchmarking)
  • Staging: staging.stg_fx_rates, grain (date, quote_currency) — columns: date, base_currency ('EUR'), quote_currency, rate
  • Downstream: join to stg_playtomic_availability price column to normalize to EUR; expose latest rate to planner for display conversion

Sources