docs: update docs and PROJECT.md for dual score pipeline
Task 8: documentation updates for the dual market score feature. - CHANGELOG.md: comprehensive [Unreleased] entries for all additions (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion, DuckDB spatial, SOPS secrets, methodology page updates) - docs/data-sources-inventory.md: add tennis courts Overpass row, update GeoNames entry (cities1000, username=padelnomics, higher score) - transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed dimensions table, update source integration map with new pipeline branch, document ST_Distance_Sphere bounding-box pattern - PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50 ranking page to Next Up, add data backlog items (sports_centre, NUTS-3, opportunity map), add Decisions Log entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
|
||||
| Dimension | Grain | Used by |
|
||||
|-----------|-------|---------|
|
||||
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
|
||||
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
|
||||
| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models |
|
||||
| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations |
|
||||
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
|
||||
|
||||
## Source integration map
|
||||
|
||||
```
|
||||
stg_playtomic_venues ─┐
|
||||
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
|
||||
stg_padel_courts ─┘ └→ dim_venue_capacity
|
||||
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile
|
||||
stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score)
|
||||
↓
|
||||
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
|
||||
↓
|
||||
@@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a
|
||||
↓
|
||||
stg_population ──→ dim_cities ─────────────────────────────┘
|
||||
stg_income ──→ dim_cities
|
||||
|
||||
stg_population_geonames ─┐
|
||||
stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile
|
||||
stg_tennis_courts ─┤ (Marktpotenzial-Score)
|
||||
stg_income ─┘
|
||||
```
|
||||
|
||||
## Distance calculation pattern (ST_Distance_Sphere)
|
||||
|
||||
Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins:
|
||||
|
||||
```sql
|
||||
-- Nearest padel court (km) per location
|
||||
SELECT l.geoname_id,
|
||||
MIN(ST_Distance_Sphere(
|
||||
ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat)
|
||||
) / 1000.0) AS nearest_km
|
||||
FROM locations l
|
||||
JOIN padel_courts p
|
||||
ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter
|
||||
AND ABS(l.lon - p.lon) < 0.5
|
||||
GROUP BY l.geoname_id
|
||||
```
|
||||
|
||||
Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must
|
||||
`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available.
|
||||
|
||||
## Common pitfalls
|
||||
|
||||
- **Don't add business logic to staging.** Even a CASE statement renaming values = business
|
||||
|
||||
Reference in New Issue
Block a user