docs: update docs and PROJECT.md for dual score pipeline
Task 8: documentation updates for the dual market score feature. - CHANGELOG.md: comprehensive [Unreleased] entries for all additions (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion, DuckDB spatial, SOPS secrets, methodology page updates) - docs/data-sources-inventory.md: add tennis courts Overpass row, update GeoNames entry (cities1000, username=padelnomics, higher score) - transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed dimensions table, update source integration map with new pipeline branch, document ST_Distance_Sphere bounding-box pattern - PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50 ranking page to Next Up, add data backlog items (sports_centre, NUTS-3, opportunity map), add Decisions Log entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
26
CHANGELOG.md
26
CHANGELOG.md
@@ -7,6 +7,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
- **Dual market score system** — split the single market score into two branded scores:
|
||||
- **padelnomics Marktreife-Score™** (market maturity): existing score, refined — only for cities
|
||||
with ≥1 padel venue. Adds ×0.85 saturation discount when `venues_per_100k > 8`.
|
||||
- **padelnomics Marktpotenzial-Score™** (investment opportunity): new score covering ALL
|
||||
GeoNames locations globally (pop ≥1K), including zero-court locations. Rewards supply gaps,
|
||||
underserved catchment areas, and racket sport culture via inverted venue density signal.
|
||||
- **Tennis court Overpass extractor** — `extract-overpass-tennis` downloads all OSM
|
||||
`sport=tennis` nodes/ways/relations globally (~150K+ features). Lands at
|
||||
`overpass_tennis/{year}/{month}/courts.json.gz`. Staged in `stg_tennis_courts`.
|
||||
- **`foundation.dim_locations`** — new conformed dimension seeded from GeoNames (all locations
|
||||
≥1K pop), not from padel venues. Grain `(country_code, geoname_id)`. Enriched with:
|
||||
- `nearest_padel_court_km` via `ST_Distance_Sphere` (DuckDB spatial extension)
|
||||
- `padel_venue_count` / `padel_venues_per_100k` (venues within 5km)
|
||||
- `tennis_courts_within_25km` (courts within 25km)
|
||||
- **GeoNames expanded** — extractor switched from `cities15000` (50K+ filter, ~24K rows) to
|
||||
`cities1000` (~140K locations, pop ≥1K). Added `lat`, `lon`, `admin1_code`, `admin2_code`
|
||||
to output. Expanded feature codes to include `PPLA3/4/5` (Gemeinden/cantons).
|
||||
- **DuckDB spatial extension** — `extensions: [spatial]` added to `config.yaml`. Enables
|
||||
`ST_Distance_Sphere` for great-circle distance and future map features (bounding box
|
||||
queries, geometry columns).
|
||||
- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both
|
||||
`.env.dev.sops` and `.env.prod.sops`.
|
||||
- **Methodology page updated** — `/en/market-score` now documents both scores with:
|
||||
Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial),
|
||||
score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics
|
||||
wordmark span (Bricolage Grotesque). Bilingual EN + DE (native-quality German, no calques).
|
||||
- **Market Score methodology page** — standalone page at `/{lang}/market-score`
|
||||
explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four
|
||||
input categories (demographics, economic strength, demand evidence, data
|
||||
|
||||
13
PROJECT.md
13
PROJECT.md
@@ -135,7 +135,7 @@
|
||||
|
||||
## In Progress 🔄
|
||||
|
||||
_Move here when you start working on it._
|
||||
- [ ] **Dual market score system** — Marktreife-Score + Marktpotenzial-Score + expanded data pipeline (merging to master)
|
||||
|
||||
---
|
||||
|
||||
@@ -155,6 +155,13 @@ _Move here when you start working on it._
|
||||
| Submit sitemap to Google Search Console | Set up Google Search Console + Bing Webmaster Tools (SEO hub ready — just add env vars) |
|
||||
| Verify Litestream R2 backup running on prod | |
|
||||
|
||||
### Gemeinde-level pSEO (follow-up from dual score work)
|
||||
|
||||
| 🛠 Tech |
|
||||
|--------|
|
||||
| Gemeinde-level pSEO article template — consumes `location_opportunity_profile` data, targets "Padel in [Ort]" + "Padel bauen in [Ort]" queries (zero SERP competition confirmed) |
|
||||
| "Top 50 underserved locations" ranking page — high-value SEO content, fully programmatic from `location_opportunity_profile` ORDER BY opportunity_score DESC |
|
||||
|
||||
### Week 1–2 — First Revenue
|
||||
|
||||
| 🛠 Tech | 📣 Business |
|
||||
@@ -196,6 +203,9 @@ _Move here when you start working on it._
|
||||
- [ ] Padel Hall Accelerator (€999 — report + call + supplier intros)
|
||||
|
||||
### Data & Intelligence
|
||||
- [ ] Sports centre Overpass extract (`leisure=sports_centre`) — additional market signal for `dim_locations`
|
||||
- [ ] City-level income enrichment (Eurostat NUTS-3 regional income — replaces country-level PPS proxy, higher granularity)
|
||||
- [ ] Interactive opportunity map / explorer in web app (map UI over `location_opportunity_profile` — bounding box queries via ST_Distance_Sphere)
|
||||
- [ ] Multi-source data aggregation (add booking platforms beyond Playtomic)
|
||||
- [ ] Google Maps signals (reviews, ratings)
|
||||
- [ ] Weather + demographic overlays
|
||||
@@ -246,3 +256,4 @@ _Move here when you start working on it._
|
||||
| 2026-02-22 | Credit system over pay-per-lead blast | Suppliers self-select → higher quality perception; scales without manual intervention |
|
||||
| 2026-02-22 | No soft email gate on planner | Planner already captures emails at natural points (scenario save → login, quote wizard step 9). Gate would add friction without meaningful list value. Revisit if data shows a gap. |
|
||||
| 2026-02-22 | Wipe test suppliers before launch | 5 `example.com` entries from seed_dev_data.py — empty directory with "Be the first" CTA is better than obviously fake data |
|
||||
| 2026-02-24 | Split market score into two branded scores | Marktreife-Score (existing market maturity, cities with ≥1 venue) vs Marktpotenzial-Score (greenfield opportunity, all GeoNames locations globally). SERP analysis confirmed zero competition for hyperlocal Gemeinde-level market intelligence pages. |
|
||||
|
||||
@@ -13,7 +13,8 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
|
||||
|
||||
| Source | Category | Status | Score | Credentials | Pipeline refs |
|
||||
|--------|----------|--------|-------|-------------|---------------|
|
||||
| OpenStreetMap / Overpass | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` |
|
||||
| OpenStreetMap / Overpass (padel) | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` |
|
||||
| OpenStreetMap / Overpass (tennis) | Court locations | ✅ Ingested | 4 | None | `extract-overpass-tennis` → `stg_tennis_courts` |
|
||||
| Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants` → `stg_playtomic_venues/resources/opening_hours` |
|
||||
| Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability` → `stg_playtomic_availability` |
|
||||
| Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat` → `stg_population` |
|
||||
@@ -21,7 +22,7 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic
|
||||
| Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels` → `stg_city_labels` |
|
||||
| ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk` → `stg_population_uk` |
|
||||
| US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa` → `stg_population_usa` |
|
||||
| GeoNames cities15000 | Demographics — global fallback | ✅ Ingested† | 3 | `GEONAMES_USERNAME` (free) | `extract-geonames` → `stg_population_geonames` |
|
||||
| GeoNames cities1000 | Demographics — global locations ≥1K pop | ✅ Ingested† | 4 | `GEONAMES_USERNAME=padelnomics` (free) | `extract-geonames` → `stg_population_geonames` → `dim_locations` |
|
||||
| ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx` → `stg_fx_rates` (proposed) |
|
||||
| FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table |
|
||||
| PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo |
|
||||
|
||||
@@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
|
||||
| Dimension | Grain | Used by |
|
||||
|-----------|-------|---------|
|
||||
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
|
||||
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
|
||||
| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models |
|
||||
| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations |
|
||||
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
|
||||
|
||||
## Source integration map
|
||||
|
||||
```
|
||||
stg_playtomic_venues ─┐
|
||||
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
|
||||
stg_padel_courts ─┘ └→ dim_venue_capacity
|
||||
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile
|
||||
stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score)
|
||||
↓
|
||||
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
|
||||
↓
|
||||
@@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a
|
||||
↓
|
||||
stg_population ──→ dim_cities ─────────────────────────────┘
|
||||
stg_income ──→ dim_cities
|
||||
|
||||
stg_population_geonames ─┐
|
||||
stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile
|
||||
stg_tennis_courts ─┤ (Marktpotenzial-Score)
|
||||
stg_income ─┘
|
||||
```
|
||||
|
||||
## Distance calculation pattern (ST_Distance_Sphere)
|
||||
|
||||
Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins:
|
||||
|
||||
```sql
|
||||
-- Nearest padel court (km) per location
|
||||
SELECT l.geoname_id,
|
||||
MIN(ST_Distance_Sphere(
|
||||
ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat)
|
||||
) / 1000.0) AS nearest_km
|
||||
FROM locations l
|
||||
JOIN padel_courts p
|
||||
ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter
|
||||
AND ABS(l.lon - p.lon) < 0.5
|
||||
GROUP BY l.geoname_id
|
||||
```
|
||||
|
||||
Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must
|
||||
`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available.
|
||||
|
||||
## Common pitfalls
|
||||
|
||||
- **Don't add business logic to staging.** Even a CASE statement renaming values = business
|
||||
|
||||
Reference in New Issue
Block a user