docs: update docs and PROJECT.md for dual score pipeline

Task 8: documentation updates for the dual market score feature.

- CHANGELOG.md: comprehensive [Unreleased] entries for all additions
  (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion,
  DuckDB spatial, SOPS secrets, methodology page updates)
- docs/data-sources-inventory.md: add tennis courts Overpass row, update
  GeoNames entry (cities1000, username=padelnomics, higher score)
- transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed
  dimensions table, update source integration map with new pipeline branch,
  document ST_Distance_Sphere bounding-box pattern
- PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50
  ranking page to Next Up, add data backlog items (sports_centre, NUTS-3,
  opportunity map), add Decisions Log entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-24 17:12:22 +01:00
parent caec0c4410
commit 405efcfd19
4 changed files with 70 additions and 6 deletions

View File

@@ -7,6 +7,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- **Dual market score system** — split the single market score into two branded scores:
- **padelnomics Marktreife-Score™** (market maturity): existing score, refined — only for cities
with ≥1 padel venue. Adds ×0.85 saturation discount when `venues_per_100k > 8`.
- **padelnomics Marktpotenzial-Score™** (investment opportunity): new score covering ALL
GeoNames locations globally (pop ≥1K), including zero-court locations. Rewards supply gaps,
underserved catchment areas, and racket sport culture via inverted venue density signal.
- **Tennis court Overpass extractor** — `extract-overpass-tennis` downloads all OSM
`sport=tennis` nodes/ways/relations globally (~150K+ features). Lands at
`overpass_tennis/{year}/{month}/courts.json.gz`. Staged in `stg_tennis_courts`.
- **`foundation.dim_locations`** — new conformed dimension seeded from GeoNames (all locations
≥1K pop), not from padel venues. Grain `(country_code, geoname_id)`. Enriched with:
- `nearest_padel_court_km` via `ST_Distance_Sphere` (DuckDB spatial extension)
- `padel_venue_count` / `padel_venues_per_100k` (venues within 5km)
- `tennis_courts_within_25km` (courts within 25km)
- **GeoNames expanded** — extractor switched from `cities15000` (50K+ filter, ~24K rows) to
`cities1000` (~140K locations, pop ≥1K). Added `lat`, `lon`, `admin1_code`, `admin2_code`
to output. Expanded feature codes to include `PPLA3/4/5` (Gemeinden/cantons).
- **DuckDB spatial extension** — `extensions: [spatial]` added to `config.yaml`. Enables
`ST_Distance_Sphere` for great-circle distance and future map features (bounding box
queries, geometry columns).
- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both
`.env.dev.sops` and `.env.prod.sops`.
- **Methodology page updated** — `/en/market-score` now documents both scores with:
Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial),
score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics
wordmark span (Bricolage Grotesque). Bilingual EN + DE (native-quality German, no calques).
- **Market Score methodology page** — standalone page at `/{lang}/market-score`
explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four
input categories (demographics, economic strength, demand evidence, data

View File

@@ -135,7 +135,7 @@
## In Progress 🔄
_Move here when you start working on it._
- [ ] **Dual market score system** — Marktreife-Score + Marktpotenzial-Score + expanded data pipeline (merging to master)
---
@@ -155,6 +155,13 @@ _Move here when you start working on it._
| Submit sitemap to Google Search Console | Set up Google Search Console + Bing Webmaster Tools (SEO hub ready — just add env vars) |
| Verify Litestream R2 backup running on prod | |
### Gemeinde-level pSEO (follow-up from dual score work)
| 🛠 Tech |
|--------|
| Gemeinde-level pSEO article template — consumes `location_opportunity_profile` data, targets "Padel in [Ort]" + "Padel bauen in [Ort]" queries (zero SERP competition confirmed) |
| "Top 50 underserved locations" ranking page — high-value SEO content, fully programmatic from `location_opportunity_profile` ORDER BY opportunity_score DESC |
### Week 12 — First Revenue
| 🛠 Tech | 📣 Business |
@@ -196,6 +203,9 @@ _Move here when you start working on it._
- [ ] Padel Hall Accelerator (€999 — report + call + supplier intros)
### Data & Intelligence
- [ ] Sports centre Overpass extract (`leisure=sports_centre`) — additional market signal for `dim_locations`
- [ ] City-level income enrichment (Eurostat NUTS-3 regional income — replaces country-level PPS proxy, higher granularity)
- [ ] Interactive opportunity map / explorer in web app (map UI over `location_opportunity_profile` — bounding box queries via ST_Distance_Sphere)
- [ ] Multi-source data aggregation (add booking platforms beyond Playtomic)
- [ ] Google Maps signals (reviews, ratings)
- [ ] Weather + demographic overlays
@@ -246,3 +256,4 @@ _Move here when you start working on it._
| 2026-02-22 | Credit system over pay-per-lead blast | Suppliers self-select → higher quality perception; scales without manual intervention |
| 2026-02-22 | No soft email gate on planner | Planner already captures emails at natural points (scenario save → login, quote wizard step 9). Gate would add friction without meaningful list value. Revisit if data shows a gap. |
| 2026-02-22 | Wipe test suppliers before launch | 5 `example.com` entries from seed_dev_data.py — empty directory with "Be the first" CTA is better than obviously fake data |
| 2026-02-24 | Split market score into two branded scores | Marktreife-Score (existing market maturity, cities with ≥1 venue) vs Marktpotenzial-Score (greenfield opportunity, all GeoNames locations globally). SERP analysis confirmed zero competition for hyperlocal Gemeinde-level market intelligence pages. |

View File

@@ -13,7 +13,8 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
| Source | Category | Status | Score | Credentials | Pipeline refs |
|--------|----------|--------|-------|-------------|---------------|
| OpenStreetMap / Overpass | Court locations | ✅ Ingested | 5 | None | `extract-overpass``stg_padel_courts` |
| OpenStreetMap / Overpass (padel) | Court locations | ✅ Ingested | 5 | None | `extract-overpass``stg_padel_courts` |
| OpenStreetMap / Overpass (tennis) | Court locations | ✅ Ingested | 4 | None | `extract-overpass-tennis``stg_tennis_courts` |
| Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants``stg_playtomic_venues/resources/opening_hours` |
| Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability``stg_playtomic_availability` |
| Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat``stg_population` |
@@ -21,7 +22,7 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
| Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels``stg_city_labels` |
| ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk``stg_population_uk` |
| US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa``stg_population_usa` |
| GeoNames cities15000 | Demographics — global fallback | ✅ Ingested† | 3 | `GEONAMES_USERNAME` (free) | `extract-geonames``stg_population_geonames` |
| GeoNames cities1000 | Demographics — global locations ≥1K pop | ✅ Ingested† | 4 | `GEONAMES_USERNAME=padelnomics` (free) | `extract-geonames``stg_population_geonames``dim_locations` |
| ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx``stg_fx_rates` (proposed) |
| FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table |
| PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo |

View File

@@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
| Dimension | Grain | Used by |
|-----------|-------|---------|
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models |
| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations |
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
## Source integration map
```
stg_playtomic_venues ─┐
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
stg_padel_courts ─┘ └→ dim_venue_capacity
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile
stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score)
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
@@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a
stg_population ──→ dim_cities ─────────────────────────────┘
stg_income ──→ dim_cities
stg_population_geonames ─┐
stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile
stg_tennis_courts ─┤ (Marktpotenzial-Score)
stg_income ─┘
```
## Distance calculation pattern (ST_Distance_Sphere)
Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins:
```sql
-- Nearest padel court (km) per location
SELECT l.geoname_id,
MIN(ST_Distance_Sphere(
ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat)
) / 1000.0) AS nearest_km
FROM locations l
JOIN padel_courts p
ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter
AND ABS(l.lon - p.lon) < 0.5
GROUP BY l.geoname_id
```
Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must
`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available.
## Common pitfalls
- **Don't add business logic to staging.** Even a CASE statement renaming values = business