docs: update docs and PROJECT.md for dual score pipeline
Task 8: documentation updates for the dual market score feature. - CHANGELOG.md: comprehensive [Unreleased] entries for all additions (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion, DuckDB spatial, SOPS secrets, methodology page updates) - docs/data-sources-inventory.md: add tennis courts Overpass row, update GeoNames entry (cities1000, username=padelnomics, higher score) - transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed dimensions table, update source integration map with new pipeline branch, document ST_Distance_Sphere bounding-box pattern - PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50 ranking page to Next Up, add data backlog items (sports_centre, NUTS-3, opportunity map), add Decisions Log entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
26
CHANGELOG.md
26
CHANGELOG.md
@@ -7,6 +7,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
|||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
- **Dual market score system** — split the single market score into two branded scores:
|
||||||
|
- **padelnomics Marktreife-Score™** (market maturity): existing score, refined — only for cities
|
||||||
|
with ≥1 padel venue. Adds ×0.85 saturation discount when `venues_per_100k > 8`.
|
||||||
|
- **padelnomics Marktpotenzial-Score™** (investment opportunity): new score covering ALL
|
||||||
|
GeoNames locations globally (pop ≥1K), including zero-court locations. Rewards supply gaps,
|
||||||
|
underserved catchment areas, and racket sport culture via inverted venue density signal.
|
||||||
|
- **Tennis court Overpass extractor** — `extract-overpass-tennis` downloads all OSM
|
||||||
|
`sport=tennis` nodes/ways/relations globally (~150K+ features). Lands at
|
||||||
|
`overpass_tennis/{year}/{month}/courts.json.gz`. Staged in `stg_tennis_courts`.
|
||||||
|
- **`foundation.dim_locations`** — new conformed dimension seeded from GeoNames (all locations
|
||||||
|
≥1K pop), not from padel venues. Grain `(country_code, geoname_id)`. Enriched with:
|
||||||
|
- `nearest_padel_court_km` via `ST_Distance_Sphere` (DuckDB spatial extension)
|
||||||
|
- `padel_venue_count` / `padel_venues_per_100k` (venues within 5km)
|
||||||
|
- `tennis_courts_within_25km` (courts within 25km)
|
||||||
|
- **GeoNames expanded** — extractor switched from `cities15000` (50K+ filter, ~24K rows) to
|
||||||
|
`cities1000` (~140K locations, pop ≥1K). Added `lat`, `lon`, `admin1_code`, `admin2_code`
|
||||||
|
to output. Expanded feature codes to include `PPLA3/4/5` (Gemeinden/cantons).
|
||||||
|
- **DuckDB spatial extension** — `extensions: [spatial]` added to `config.yaml`. Enables
|
||||||
|
`ST_Distance_Sphere` for great-circle distance and future map features (bounding box
|
||||||
|
queries, geometry columns).
|
||||||
|
- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both
|
||||||
|
`.env.dev.sops` and `.env.prod.sops`.
|
||||||
|
- **Methodology page updated** — `/en/market-score` now documents both scores with:
|
||||||
|
Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial),
|
||||||
|
score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics
|
||||||
|
wordmark span (Bricolage Grotesque). Bilingual EN + DE (native-quality German, no calques).
|
||||||
- **Market Score methodology page** — standalone page at `/{lang}/market-score`
|
- **Market Score methodology page** — standalone page at `/{lang}/market-score`
|
||||||
explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four
|
explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four
|
||||||
input categories (demographics, economic strength, demand evidence, data
|
input categories (demographics, economic strength, demand evidence, data
|
||||||
|
|||||||
13
PROJECT.md
13
PROJECT.md
@@ -135,7 +135,7 @@
|
|||||||
|
|
||||||
## In Progress 🔄
|
## In Progress 🔄
|
||||||
|
|
||||||
_Move here when you start working on it._
|
- [ ] **Dual market score system** — Marktreife-Score + Marktpotenzial-Score + expanded data pipeline (merging to master)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -155,6 +155,13 @@ _Move here when you start working on it._
|
|||||||
| Submit sitemap to Google Search Console | Set up Google Search Console + Bing Webmaster Tools (SEO hub ready — just add env vars) |
|
| Submit sitemap to Google Search Console | Set up Google Search Console + Bing Webmaster Tools (SEO hub ready — just add env vars) |
|
||||||
| Verify Litestream R2 backup running on prod | |
|
| Verify Litestream R2 backup running on prod | |
|
||||||
|
|
||||||
|
### Gemeinde-level pSEO (follow-up from dual score work)
|
||||||
|
|
||||||
|
| 🛠 Tech |
|
||||||
|
|--------|
|
||||||
|
| Gemeinde-level pSEO article template — consumes `location_opportunity_profile` data, targets "Padel in [Ort]" + "Padel bauen in [Ort]" queries (zero SERP competition confirmed) |
|
||||||
|
| "Top 50 underserved locations" ranking page — high-value SEO content, fully programmatic from `location_opportunity_profile` ORDER BY opportunity_score DESC |
|
||||||
|
|
||||||
### Week 1–2 — First Revenue
|
### Week 1–2 — First Revenue
|
||||||
|
|
||||||
| 🛠 Tech | 📣 Business |
|
| 🛠 Tech | 📣 Business |
|
||||||
@@ -196,6 +203,9 @@ _Move here when you start working on it._
|
|||||||
- [ ] Padel Hall Accelerator (€999 — report + call + supplier intros)
|
- [ ] Padel Hall Accelerator (€999 — report + call + supplier intros)
|
||||||
|
|
||||||
### Data & Intelligence
|
### Data & Intelligence
|
||||||
|
- [ ] Sports centre Overpass extract (`leisure=sports_centre`) — additional market signal for `dim_locations`
|
||||||
|
- [ ] City-level income enrichment (Eurostat NUTS-3 regional income — replaces country-level PPS proxy, higher granularity)
|
||||||
|
- [ ] Interactive opportunity map / explorer in web app (map UI over `location_opportunity_profile` — bounding box queries via ST_Distance_Sphere)
|
||||||
- [ ] Multi-source data aggregation (add booking platforms beyond Playtomic)
|
- [ ] Multi-source data aggregation (add booking platforms beyond Playtomic)
|
||||||
- [ ] Google Maps signals (reviews, ratings)
|
- [ ] Google Maps signals (reviews, ratings)
|
||||||
- [ ] Weather + demographic overlays
|
- [ ] Weather + demographic overlays
|
||||||
@@ -246,3 +256,4 @@ _Move here when you start working on it._
|
|||||||
| 2026-02-22 | Credit system over pay-per-lead blast | Suppliers self-select → higher quality perception; scales without manual intervention |
|
| 2026-02-22 | Credit system over pay-per-lead blast | Suppliers self-select → higher quality perception; scales without manual intervention |
|
||||||
| 2026-02-22 | No soft email gate on planner | Planner already captures emails at natural points (scenario save → login, quote wizard step 9). Gate would add friction without meaningful list value. Revisit if data shows a gap. |
|
| 2026-02-22 | No soft email gate on planner | Planner already captures emails at natural points (scenario save → login, quote wizard step 9). Gate would add friction without meaningful list value. Revisit if data shows a gap. |
|
||||||
| 2026-02-22 | Wipe test suppliers before launch | 5 `example.com` entries from seed_dev_data.py — empty directory with "Be the first" CTA is better than obviously fake data |
|
| 2026-02-22 | Wipe test suppliers before launch | 5 `example.com` entries from seed_dev_data.py — empty directory with "Be the first" CTA is better than obviously fake data |
|
||||||
|
| 2026-02-24 | Split market score into two branded scores | Marktreife-Score (existing market maturity, cities with ≥1 venue) vs Marktpotenzial-Score (greenfield opportunity, all GeoNames locations globally). SERP analysis confirmed zero competition for hyperlocal Gemeinde-level market intelligence pages. |
|
||||||
|
|||||||
@@ -13,7 +13,8 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
|
|||||||
|
|
||||||
| Source | Category | Status | Score | Credentials | Pipeline refs |
|
| Source | Category | Status | Score | Credentials | Pipeline refs |
|
||||||
|--------|----------|--------|-------|-------------|---------------|
|
|--------|----------|--------|-------|-------------|---------------|
|
||||||
| OpenStreetMap / Overpass | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` |
|
| OpenStreetMap / Overpass (padel) | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` |
|
||||||
|
| OpenStreetMap / Overpass (tennis) | Court locations | ✅ Ingested | 4 | None | `extract-overpass-tennis` → `stg_tennis_courts` |
|
||||||
| Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants` → `stg_playtomic_venues/resources/opening_hours` |
|
| Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants` → `stg_playtomic_venues/resources/opening_hours` |
|
||||||
| Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability` → `stg_playtomic_availability` |
|
| Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability` → `stg_playtomic_availability` |
|
||||||
| Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat` → `stg_population` |
|
| Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat` → `stg_population` |
|
||||||
@@ -21,7 +22,7 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
|
|||||||
| Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels` → `stg_city_labels` |
|
| Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels` → `stg_city_labels` |
|
||||||
| ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk` → `stg_population_uk` |
|
| ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk` → `stg_population_uk` |
|
||||||
| US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa` → `stg_population_usa` |
|
| US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa` → `stg_population_usa` |
|
||||||
| GeoNames cities15000 | Demographics — global fallback | ✅ Ingested† | 3 | `GEONAMES_USERNAME` (free) | `extract-geonames` → `stg_population_geonames` |
|
| GeoNames cities1000 | Demographics — global locations ≥1K pop | ✅ Ingested† | 4 | `GEONAMES_USERNAME=padelnomics` (free) | `extract-geonames` → `stg_population_geonames` → `dim_locations` |
|
||||||
| ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx` → `stg_fx_rates` (proposed) |
|
| ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx` → `stg_fx_rates` (proposed) |
|
||||||
| FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table |
|
| FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table |
|
||||||
| PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo |
|
| PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo |
|
||||||
|
|||||||
@@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
|
|||||||
| Dimension | Grain | Used by |
|
| Dimension | Grain | Used by |
|
||||||
|-----------|-------|---------|
|
|-----------|-------|---------|
|
||||||
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
|
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
|
||||||
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
|
| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models |
|
||||||
|
| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations |
|
||||||
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
|
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
|
||||||
|
|
||||||
## Source integration map
|
## Source integration map
|
||||||
|
|
||||||
```
|
```
|
||||||
stg_playtomic_venues ─┐
|
stg_playtomic_venues ─┐
|
||||||
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
|
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile
|
||||||
stg_padel_courts ─┘ └→ dim_venue_capacity
|
stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score)
|
||||||
↓
|
↓
|
||||||
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
|
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
|
||||||
↓
|
↓
|
||||||
@@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a
|
|||||||
↓
|
↓
|
||||||
stg_population ──→ dim_cities ─────────────────────────────┘
|
stg_population ──→ dim_cities ─────────────────────────────┘
|
||||||
stg_income ──→ dim_cities
|
stg_income ──→ dim_cities
|
||||||
|
|
||||||
|
stg_population_geonames ─┐
|
||||||
|
stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile
|
||||||
|
stg_tennis_courts ─┤ (Marktpotenzial-Score)
|
||||||
|
stg_income ─┘
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Distance calculation pattern (ST_Distance_Sphere)
|
||||||
|
|
||||||
|
Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Nearest padel court (km) per location
|
||||||
|
SELECT l.geoname_id,
|
||||||
|
MIN(ST_Distance_Sphere(
|
||||||
|
ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat)
|
||||||
|
) / 1000.0) AS nearest_km
|
||||||
|
FROM locations l
|
||||||
|
JOIN padel_courts p
|
||||||
|
ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter
|
||||||
|
AND ABS(l.lon - p.lon) < 0.5
|
||||||
|
GROUP BY l.geoname_id
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must
|
||||||
|
`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available.
|
||||||
|
|
||||||
## Common pitfalls
|
## Common pitfalls
|
||||||
|
|
||||||
- **Don't add business logic to staging.** Even a CASE statement renaming values = business
|
- **Don't add business logic to staging.** Even a CASE statement renaming values = business
|
||||||
|
|||||||
Reference in New Issue
Block a user