# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] ### Changed - **Market Score v3 (Marktreife-Score recalibration)** — fixes ranking inversion where early-stage markets (Germany 1/100k) outscored mature markets (Spain 36/100k): - **Formula rewrite** (`city_market_profile.sql`): supply development now 40 pts (log-scaled density LN(d+1)/LN(21) × count gate min(1,count/5)); demand evidence 25 pts (occupancy or 40% density proxy); population reduced to 15 pts (context); income to 10 pts (context); data quality to 10 pts; saturation discount removed - **Count gate** eliminates small-town inflation: a single venue in a 5k-resident town can no longer outscore Berlin (was 92.7 → now 43.9 for Bernau bei Berlin) - **LN ceiling at 20/100k** (was linear 4/100k) gives meaningful differentiation from 0 to 20: Málaga 70.1, Barcelona 67.4, Madrid 66.9, Amsterdam 58.4, Berlin 42.2, London 44.1 - **Template thresholds updated** across all 3 pSEO templates (city-cost-de, country-overview, city-pricing): color coding green ≥55 (was ≥65) / amber ≥35 (was ≥40); intro/FAQ tiers strong ≥55 (was ≥70) / mid ≥35 (was ≥45); white-space signal interplay market_score < 40 (was < 50) - **Opportunity Score supply gap ceiling raised 4→8/100k** (`location_opportunity_profile.sql`) — gentler gradient for partially-served markets; accounts for ~87% data undercount vs FIP real-world totals. Documents discovered formula behaviour: DuckDB `LEAST(1.0, NULL)=1.0` means NULL catchment already yields full 15 pts; income PPS saturates for all EU countries; tennis courts data currently empty (formula correct, data pending) ### Added - **Opportunity Score integration** — second scoring dimension (`Marktpotenzial`) now visible in city and country articles: - **SQL chain**: `dim_cities` now carries `geoname_id` (from the existing GeoNames LEFT JOIN); threaded through `city_market_profile` → `pseo_city_costs_de` which LEFT JOINs `location_opportunity_profile` on `(country_code, geoname_id)`; `pseo_country_overview` gains `avg_opportunity_score`, `top_opportunity_score`, `top_opportunity_slugs`, `top_opportunity_names` - **71.4% match rate** — 3,350 of 4,693 cities matched to a GeoNames `geoname_id`; unmatched cities gracefully show no Opportunity Score - **City articles** (`city-cost-de.md.jinja`) — `{% if opportunity_score %}` guard adds: 5th stats-strip item with green/amber/red color coding (≥65/≥40/<40), contextual intro sentence explaining the score interplay, table row in Market Overview, score explainer FAQ (DE + EN) - **Country overview articles** (`country-overview.md.jinja`) — adds: `avg_opportunity_score` as 5th stats-strip item, opportunity interplay paragraph in market landscape section, "Top Locations by Investment Potential" table (distinct from top Market Score cities), score explainer FAQ (DE + EN) - **CSS**: stats-strip changed from `repeat(4, 1fr)` to `repeat(auto-fit, minmax(140px, 1fr))` — supports 4-item country and 5-item city strips without layout breakage - **Pipeline Console admin section** — full operational visibility into the data engineering pipeline at `/admin/pipeline/`: - **Overview tab** — extraction status grid (one card per workflow with status dot, schedule, last-run timestamp, error preview), serving table row counts from `_serving_meta.json`, landing zone file stats (per-source file count + total size) - **Extractions tab** — filterable, paginated run history table from `.state.sqlite` (extractor + status dropdowns, HTMX live filter); stale "running" row detection (amber highlight) with "Mark Failed" button; "Run All Extractors" button enqueues `run_extraction` task - **Catalog tab** — accordion list of serving tables with row count badges; click-to-expand lazy-loads column schema + 10-row sample data per table - **Query editor tab** — dark-themed SQL textarea (`Commit Mono`, navy background, electric blue focus glow); schema sidebar (collapsible table/column list with types); Tab-key indent and Cmd/Ctrl+Enter submit; results table with sticky headers + row count + elapsed time; query security (read-only DuckDB, blocklist regex, 10k char limit, 1000 row cap, 10s timeout) - **`analytics.execute_user_query()`** — new function returning `(columns, rows, error, elapsed_ms)` for admin query editor - **`worker.run_extraction` task** — background handler shells out to `uv run extract` from repo root (2h timeout) - 29 new tests covering all routes, data access helpers, security checks, and `execute_user_query()` - **Outreach follow-up scheduling + activity timeline** — extends the outreach pipeline (migration 0024): - **Migration 0025** — adds `follow_up_at TEXT DEFAULT NULL` to `suppliers` and `noindex INTEGER NOT NULL DEFAULT 0` to `articles` - **Follow-up date picker** (`POST /admin/outreach//follow-up`) — HTMX date input on each outreach row; sets/clears `follow_up_at`; returns updated row via outerHTML swap - **Follow-up due banner** on `/admin/outreach` — amber alert banner shows count of overdue follow-ups with "Show them" link (`?follow_up=due` filter) - **`?follow_up=due` / `?follow_up=set` filters** in `get_outreach_suppliers()` — querystring params passed through dashboard and results partial - **`get_follow_up_due_count()`** query function counts suppliers with `follow_up_at <= date('now')` - **Activity timeline** on `/admin/suppliers/` — merges sent outreach emails (`email_log WHERE email_type='outreach'`) and received emails (`inbound_emails`) matched by `contact_email`; sorted by date descending; max 50 entries; empty state shown when no history - 29 new tests (follow-up CRUD, due count, due filter, timeline with sent+received, timeline empty state) - **pSEO article noindex** — prevents thin-data articles from diluting crawl budget and index quality: - **`NOINDEX_THRESHOLDS` dict** in `content/__init__.py` — per-template lambda: `city-pricing` (venue_count < 3), `city-cost-de` (data_confidence < 1.0), `country-overview` (total_venues < 5) - **`generate_articles()` upsert** now evaluates the threshold and stores `noindex = 1` for articles that fail it; existing articles are updated on re-generation - **``** injected in `article_detail.html` head block when `article.noindex` is truthy - **Sitemap exclusion** — `sitemap.py` articles query adds `AND noindex = 0`; thin-data articles excluded from `sitemap.xml` - **pSEO dashboard noindex card** — 4th summary card shows count of noindex articles (amber highlight when > 0) - **Article row noindex badge** — amber pill badge on `partials/article_row.html` when `a.noindex` - 20 new tests (threshold unit tests per template, sitemap exclusion, article detail robots meta tag) - **Outreach pipeline** — cold B2B supplier outreach isolated from transactional emails: - **Separate sending domain** (`hello.padelnomics.io`) — added `"outreach"` key to `EMAIL_ADDRESSES`; reputation isolated from `notifications.padelnomics.io` magic-link/lead-forward traffic (manual DNS step: add domain in Resend dashboard) - **Migration 0024** — 4 new columns on `suppliers`: `outreach_status`, `outreach_notes`, `last_contacted_at`, `outreach_sequence_step`; `NULL` status = not in pipeline (no backfill needed for existing suppliers) - **Admin outreach pipeline tab** (`/admin/outreach`) — 6 pipeline cards (prospect → contacted → replied → signed_up → declined → not_interested) with click-to-filter; HTMX-powered supplier table with inline status dropdown + note editing; sidebar link added - **HTMX endpoints** — `POST /admin/outreach//status` returns updated row; `POST /admin/outreach//note` returns truncated note text - **Bulk add-to-pipeline** — checkbox column on `/admin/suppliers`, "Add to Outreach Pipeline" form action → `POST /admin/outreach/add-prospects`; skips suppliers already in pipeline - **CSV import** (`GET/POST /admin/outreach/import`) — uploads CSV (`name`, `contact_email` required; `country_code`, `category`, `website` optional); creates new supplier rows as `prospect`; auto-generates slug; deduplicates by `contact_email`; capped at 500 rows - **Compose integration** — `GET /admin/emails/compose` now accepts `?from_key=outreach&email_type=outreach&supplier_id=` query params; pre-selects outreach from-address and unchecks HTML wrap (plain text best practice for cold email); on successful send with `email_type=outreach` + `supplier_id`, auto-updates supplier: `prospect→contacted`, `last_contacted_at=now`, `outreach_sequence_step+1` - **Supplier detail outreach card** — shown when supplier is in the outreach pipeline; displays status, step, last contact date, notes, and "Send Outreach Email" compose link - 44 new tests in `web/tests/test_outreach.py` - **Email template system** — all 11 transactional emails migrated from inline f-string HTML in `worker.py` to Jinja2 templates: - **Standalone renderer** (`email_templates.py`) — `render_email_template()` uses a module-level `jinja2.Environment` with `autoescape=True`, works outside Quart request context (worker process); `tformat` filter mirrors the one in `app.py` - **`_base.html`** — branded shell (dark header, 3px blue accent, white card body, footer with tagline + copyright); replaces the old `_email_wrap()` helper - **`_macros.html`** — reusable Jinja2 macros: `email_button`, `heat_badge`, `heat_badge_sm`, `section_heading`, `info_box` - **11 email templates**: `magic_link`, `quote_verification`, `welcome`, `waitlist_supplier`, `waitlist_general`, `lead_matched`, `lead_forward`, `lead_match_notify`, `weekly_digest`, `business_plan`, `admin_compose` - **`EMAIL_TEMPLATE_REGISTRY`** — dict mapping slug → `{template, label, description, email_type, sample_data}` with realistic sample data callables for each template - **Admin email gallery** (`/admin/emails/gallery`) — card grid of all email types; preview page with EN/DE language toggle renders each template in a sandboxed iframe (`srcdoc`); "View in sent log →" cross-link; gallery link added to admin sidebar - **Compose live preview** — two-column compose layout: form on the left, HTMX-powered preview iframe on the right; `hx-trigger="input delay:500ms"` on the textarea; `POST /admin/emails/compose/preview` endpoint supports plain body or branded wrapper via `wrap` checkbox - 50 new tests covering all template renders (EN + DE), registry structure, gallery routes (access control, list, preview, lang fallback), and compose preview endpoint - **JSONL streaming landing format** — extractors now write one JSON object per line (`.jsonl.gz`) instead of a single large blob, eliminating in-memory accumulation and `maximum_object_size` workarounds: - `playtomic_tenants.py` → `tenants.jsonl.gz` (one tenant per line; dedup still happens in memory before write) - `playtomic_availability.py` → `availability_{date}.jsonl.gz` (morning) + `availability_{date}_recheck_{HH}.jsonl.gz` (recheck); one venue per line with `date`/`captured_at_utc`/`recheck_hour` injected - `geonames.py` → `cities_global.jsonl.gz` (one city per line; eliminates 30 MB blob and its `maximum_object_size` workaround) - `compress_jsonl_atomic(jsonl_path, dest_path)` utility added to `utils.py` — streams compression in 1 MB chunks, atomic `.tmp` rename, deletes source - **Regional Overpass splitting for tennis courts** — replaces single global query (150K+ elements, timed out) with 10 regional bbox queries (~10-40K elements each, 150s server / 180s client): - Regions: europe\_west, europe\_central, europe\_east, north\_america, south\_america, asia\_east, asia\_west, oceania, africa, asia\_north - Per-region retry (2 attempts, 30s cooldown) + 5s inter-region polite delay - Crash recovery via `working.jsonl` accumulation — already-written element IDs skipped on restart; completed regions produce 0 new elements on re-query - Output: `courts.jsonl.gz` (one OSM element per line) - **`scripts/init_landing_seeds.py`** — creates minimal `.jsonl.gz` and `.json.gz` seed files in `1970/01/` so SQLMesh staging models can run before real extraction data arrives; idempotent ### Changed - All modified staging SQL models use **UNION ALL transition CTEs** — both JSONL (new) and blob (old) formats are readable simultaneously; old `.json.gz` files in the landing zone continue working until they rotate out naturally: - `stg_playtomic_venues`, `stg_playtomic_resources`, `stg_playtomic_opening_hours` — JSONL top-level columns (no `UNNEST(tenants)`) - `stg_playtomic_availability` — JSONL morning + recheck files; blob morning + recheck kept for transition - `stg_population_geonames` — JSONL city rows (no `UNNEST(rows)`, no `maximum_object_size`) - `stg_tennis_courts` — JSONL elements with `COALESCE(lat, center.lat)` for way/relation centre coords; blob UNNEST kept for old files ### Removed - `_email_wrap()` and `_email_button()` helper functions removed from `worker.py` — replaced by templates - **Marketplace admin dashboard** (`/admin/marketplace`) — single-screen health view for the two-sided market: - **Lead funnel** — total / verified-new (ready to unlock) / unlocked / won / conversion rate - **Credit economy** — total credits issued, consumed (lead unlocks), outstanding balance across all paid suppliers, 30-day burn rate - **Supplier engagement** — active paid supplier count, avg lead unlocks per supplier, forward response rate - **Feature flag toggles** — `lead_unlock` and `supplier_signup` flags togglable inline; sidebar nav entry added - **Live activity stream** (HTMX partial) — last 50 events across leads, unlocks, and credit ledger in a single feed - **Lead matching notifications** (`notify_matching_suppliers` worker task) — on quote verification, finds growth/pro suppliers whose `service_area` includes the lead's country and sends an instant alert email; bounded to 20 suppliers per lead - **Weekly lead digest** (`send_weekly_lead_digest` worker task) — every Monday at 08:00 UTC, sends paid suppliers a summary table of new matching leads from the past 7 days they haven't unlocked yet (max 5 rows per email) - **One-click CTA token** — lead-forward emails now include a "Mark as contacted" footer link backed by a unique `cta_token`; clicking it sets the forward status to `contacted` and redirects to the supplier dashboard; token stored on `lead_forwards` after send - **Supplier `lead_respond` endpoint** — HTMX status update for forwarded leads: `sent / viewed / contacted / quoted / won / lost / no_response` - **Supplier `lead_cta_contacted` endpoint** (`/suppliers/leads/cta/`) — one-click email handler; idempotent (only advances from `sent` → `contacted`) - **Migration 0022** — adds `status_updated_at`, `supplier_note`, `cta_token` to `lead_forwards`; unique partial index on `cta_token` - **Admin leads list improvements** — summary cards (total / new+unverified / hot pipeline credits / forward rate); text search across name, email, company; period filter pills (Today / 7d / 30d / All); `get_leads()` now returns `(rows, total_count)` and supports `search` + `days` params - **Admin lead detail — HTMX inline actions** — status change returns an updated status badge partial; forward-to-supplier form returns an updated forward history table; no full-page reload - **Quote form extended** — captures `build_context`, `glass_type`, `lighting_type`, `location_status`, `financing_status`, `services_needed`, `additional_info`; displayed in lead detail view - **pSEO Engine admin tab** (`/admin/pseo`) — operational visibility for the programmatic SEO system: - **Content gap detection** — queries DuckDB serving tables vs SQLite articles to find rows with no matching article per language; per-template HTMX-loaded gap list - **Data freshness signals** — compares `_serving_meta.json` export timestamp vs `MAX(updated_at)` in articles; per-template status: 🟢 Fresh / 🟡 Stale / 🟣 No articles / ⚫ No data - **Article health checks** (HTMX partial) — hreflang orphans (EN exists, DE missing), missing HTML build files, broken `[scenario:slug]` references in article markdown - **Generation job monitoring** — live progress bars polling every 2s while jobs run; stops polling on completion; error drilldown via `
`; dedicated `/admin/pseo/jobs` list page - **`_serving_meta.json`** — written by `export_serving.py` after atomic rename; records `exported_at_utc` and per-table row counts; drives freshness signals in pSEO Engine dashboard - **Progress tracking columns** on `tasks` table (migration 0021): `progress_current`, `progress_total`, `error_log`; `generate_articles()` writes progress every 50 articles and on completion - 45 new tests covering all health functions + pSEO routes (access control, rendering, gap detection, generate-gaps POST, job status HTMX polling) - **Dual market score system** — split the single market score into two branded scores: - **padelnomics Marktreife-Score™** (market maturity): existing score, refined — only for cities with ≥1 padel venue. Adds ×0.85 saturation discount when `venues_per_100k > 8`. - **padelnomics Marktpotenzial-Score™** (investment opportunity): new score covering ALL GeoNames locations globally (pop ≥1K), including zero-court locations. Rewards supply gaps, underserved catchment areas, and racket sport culture via inverted venue density signal. - **Tennis court Overpass extractor** — `extract-overpass-tennis` downloads all OSM `sport=tennis` nodes/ways/relations globally (~150K+ features). Lands at `overpass_tennis/{year}/{month}/courts.json.gz`. Staged in `stg_tennis_courts`. - **`foundation.dim_locations`** — new conformed dimension seeded from GeoNames (all locations ≥1K pop), not from padel venues. Grain `(country_code, geoname_id)`. Enriched with: - `nearest_padel_court_km` via `ST_Distance_Sphere` (DuckDB spatial extension) - `padel_venue_count` / `padel_venues_per_100k` (venues within 5km) - `tennis_courts_within_25km` (courts within 25km) - **GeoNames expanded** — extractor switched from `cities15000` (50K+ filter, ~24K rows) to `cities1000` (~140K locations, pop ≥1K). Added `lat`, `lon`, `admin1_code`, `admin2_code` to output. Expanded feature codes to include `PPLA3/4/5` (Gemeinden/cantons). - **DuckDB spatial extension** — `extensions: [spatial]` added to `config.yaml`. Enables `ST_Distance_Sphere` for great-circle distance and future map features (bounding box queries, geometry columns). - **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both `.env.dev.sops` and `.env.prod.sops`. - **Crash-safe partial JSONL** — `utils.load_partial_results()` and `flush_partial_batch()` provide a generic opt-in mechanism for incremental progress flushing during long extractions. Any extractor processing items one-by-one can flush every N records and resume from a `.partial.jsonl` sidecar file after a crash. - **Methodology page updated** — `/en/market-score` now documents both scores with: Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial), score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics wordmark span (Bricolage Grotesque). Bilingual EN + DE (native-quality German, no calques). - **Market Score methodology page** — standalone page at `/{lang}/market-score` explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four input categories (demographics, economic strength, demand evidence, data completeness) and score band interpretations without exposing weights or formulas. Full JSON-LD (WebPage + FAQPage + BreadcrumbList), OG tags, and bilingual content (EN professional, DE Du-form). Added to sitemap and footer. First "padelnomics Market Score" mention in each article template now links to the methodology page (hub-and-spoke internal linking). ### Changed - **`EXTRACT_WORKERS` env var removed** — worker count is now derived from `PROXY_URLS` length (one worker per proxy). No proxies → single-threaded. No manual tuning needed. - **Playtomic tenants extractor** — parallel batch page fetching when proxies are configured. Each page in a batch fires concurrently using its own session + proxy. Expected speedup: ~2.5 min → ~15 s with 10 Webshare datacenter proxies. - **Playtomic availability extractor** — three performance changes: 1. No per-request `time.sleep()` on success when a proxy is active (throttle only when running direct). Retry/backoff sleeps for 429 and 5xx responses are unchanged. 2. Worker count auto-detected from proxy count (drops `EXTRACT_WORKERS`). 3. True crash resumption via `.partial.jsonl` sidecar: progress flushed every 50 venues, resume skips already-fetched venues and merges prior results into the final file. - **Lead-Back Guarantee** — suppliers can claim credits back for non-responding leads with one click after 3 business days. Route `POST /suppliers/leads//guarantee-claim`, `refund_lead_guarantee()` in credits.py, "Lead didn't respond" button on unlocked lead cards (visible 3–30 days after unlock). Migration 0020 adds `guarantee_claimed_at` and `guarantee_contact_method` columns to `lead_forwards`. - **Supplier page CRO restructure** — `/suppliers` page reordered to lead with value before pricing (Why Padelnomics → Lead-Back Guarantee → lead preview → social proof → pricing). All CTAs changed from "See Plans & Pricing" to "Get Started Free". - **Static ROI line** — one-sentence ROI callout near pricing grounded in `research/padel-hall-economics.md` data (4-court project = €30K+ contractor profit). - **Credits-only callout** — below pricing grid: "Not ready for a subscription? Buy a credit pack and unlock leads one at a time." ### Fixed - **`datetime.utcnow()` deprecation warnings** — replaced all 94 occurrences across 22 files (source + tests) with `utcnow()` / `utcnow_iso()` helpers from `core.py`. `utcnow_iso()` produces `YYYY-MM-DD HH:MM:SS` (space separator) matching SQLite's `datetime('now')` format so lexicographic SQL comparisons stay correct. `datetime.utcfromtimestamp()` in `seo/_bing.py` also replaced with `datetime.fromtimestamp(ts, tz=UTC)`. Zero deprecation warnings remain. - **Credit ledger ordering** — `get_ledger()` now uses `ORDER BY created_at DESC, id DESC` to preserve insertion order when multiple credits are added within the same second. - **Double language prefix in article URLs** — articles were served at `/en/en/markets/italy` (double prefix) because `generate_articles()` stored `url_path` with the lang prefix baked in, but the blueprint is already mounted at `/`. Now `url_path` is stored without the prefix; canonical URLs, breadcrumbs, sitemap, and admin links all generate correct single-prefix URLs. - **`/markets` removed from RESERVED_PREFIXES** — pSEO articles live under `/markets/` and the explicit `/markets` route takes priority over the catch-all, so the reservation was blocking article generation. - **`country-overview` schema_type** — changed from `[Article]` to `[Article, FAQPage]` to enable FAQ rich results for existing FAQ content. ### Added - **Bilingual pSEO templates (DE + EN)** — all 3 article templates (`city-cost-de`, `city-pricing`, `country-overview`) now generate proper German prose via `{% if language == "de" %}` conditionals. German text uses informal "Du/Dein", natural business German (not calque translation), and localized labels/units (€/Std, Hauptzeit/Nebenzeit, etc.). - **Expanded English pSEO content** — all 3 templates expanded from ~400–900 words to ~1300–1500 words each. Added: Market Context/Landscape sections, analytical commentary after scenario markers, cross-template links (cost ↔ pricing ↔ country), planner links in FAQ answers, second CTA at bottom of each article, 2 additional FAQ questions per template. - **Scenario cross-reference** — `city-pricing` template now embeds `[scenario:city-cost-de-{{ city_key }}:operating]` to show operating cost data from the investment analysis template. - **CMS admin improvement** — articles list now has HTMX filter bar (search, status, template, language), pagination (50/page), and stats strip (total/live/scheduled/draft counts). Article actions (publish/unpublish, delete) are inline HTMX operations — no full page reload. "View" link opens live articles on the public site. Article generation and rebuild-all now enqueue to the background worker instead of blocking the HTTP request. Markdown source is written to disk during generation so the edit form shows content. Sitemap cache is invalidated when articles are published, deleted, or created. Fixed broken "Scheduled"/"Published" status display (was always showing "Scheduled") and stale `template_data_id` column reference. ### Changed - **Visual test overhaul** — consolidated 3 separate Playwright server processes (ports 5111/5112/5113) into 1 session-scoped fixture in `conftest.py`; 77 tests pass in ~59 seconds (was ~3× slower with 3 independent servers). Fixed `init_db` mock bypass (must patch `padelnomics.app.init_db`, not `core.init_db`, since `from .core import init_db` creates a local binding). Forces `RESEND_API_KEY=""` and `WAITLIST_MODE=false` in subprocess so visual tests never send real emails or render waitlist pages. Added sections J–N: pricing, checkout, supplier signup, supplier dashboard, business plan export. ### Added - **SOPS + age encrypted secrets** — `.env.dev.sops` and `.env.prod.sops` replace `.env.example` and GitLab CI/CD variables; age keypair for encryption/decryption; `deploy.sh` auto-decrypts on server; `infra/setup_server.sh` installs sops + age and generates server keypair; Makefile targets: `secrets-decrypt-dev`, `secrets-decrypt-prod`, `secrets-edit-dev`, `secrets-edit-prod` ### Removed - `.env.example` — replaced by `.env.dev.sops` (decrypt with `make secrets-decrypt-dev`) - GitLab CI heredoc that wrote `.env` via SSH — deploy.sh now handles decryption - Dead `ADMIN_PASSWORD` CI variable reference - Deprecated `WAITLIST_MODE` from env files (replaced by DB-backed feature flags) - **Python supervisor** (`src/padelnomics/supervisor.py`) — replaces `supervisor.sh`; reads `infra/supervisor/workflows.toml` (module, schedule, entry, depends_on, proxy_mode); runs due workflows in topological waves (parallel within each wave); croniter-based `is_due()` check; systemd service updated to use `uv run python` - **`workflows.toml` workflow registry** — 5 extractors registered: overpass, eurostat, playtomic_tenants, playtomic_availability, playtomic_prices; cron presets: hourly/daily/weekly/monthly; `playtomic_availability` depends on `playtomic_tenants` - **`proxy.py` proxy rotation** (`extract/padelnomics_extract/proxy.py`) — reads `PROXY_URLS` env var; `make_round_robin_cycler()` for thread-safe round-robin; `make_sticky_selector()` for consistent per-tenant proxy assignment (hash-based) - **DB-backed feature flags** — `feature_flags` table (migration 0019); `is_flag_enabled(name, default)` helper; `feature_gate(flag, template)` decorator replaces `WAITLIST_MODE`/`waitlist_gate`; 5 flags seeded: `markets` (on), `payments`, `planner_export`, `supplier_signup`, `lead_unlock` (all off) - **Admin feature flags UI** — `/admin/flags` lists all flags with toggle; `POST /admin/flags/toggle` flips enabled bit; requires admin role; flash message on unknown flag - **`lead_unlock` gate** — `unlock_lead` route returns HTTP 403 when `lead_unlock` flag is disabled - **SEO/GEO admin hub** — syncs search performance data from Google Search Console (service account auth), Bing Webmaster Tools (API key), and Umami (bearer token) into 3 new SQLite tables (`seo_search_metrics`, `seo_analytics_metrics`, `seo_sync_log`); daily background sync via worker scheduler at 6am UTC; admin dashboard at `/admin/seo` with three HTMX tab views: search performance (top queries, top pages, country/device breakdown), full funnel (impressions → clicks → pageviews → visitors → planner users → leads), and per-article scorecard with attention flags (low CTR, no clicks); manual "Sync Now" button; 12-month data retention with automatic cleanup; all data sources optional (skip silently if not configured) - **Landing zone backup to R2** — append-only landing files (`data/landing/*.json.gz`) synced to Cloudflare R2 every 30 minutes via systemd timer + rclone; extraction state DB (`.state.sqlite`) continuously replicated via Litestream (second DB entry in existing config); auto-restore on container startup for both `app.db` and `.state.sqlite`; `infra/restore_landing.sh` script for disaster recovery of landing files; `infra/landing-backup/` systemd service + timer units; rclone installation added to `infra/setup_server.sh`; reuses existing R2 bucket and credentials (no new env vars) - **Admin Email Hub** (`/admin/emails`) — full email management dashboard with: sent log (filterable by type/event/search, HTMX partial updates), email detail with Resend API enrichment for HTML preview, inbound inbox with unread badges and inline reply, compose form with branded template wrapping, and Resend audience management with contact list/remove - **Email delivery tracking** — `email_log` table records every outgoing email with resend_id; Resend webhook handler (`/webhooks/resend`) updates delivery events (delivered, bounced, opened, clicked, complained) in real-time; `inbound_emails` table stores received messages with full body - **send_email() returns resend_id** — changed return type from `bool` to `str | None` (backward-compatible: truthy string works like True); all 9 worker handlers now pass `email_type=` for per-type filtering in the log - **Playtomic full data extraction** — expanded venue bounding boxes from 4 regions (ES, UK, DE, FR) to 23 globally (Italy, Portugal, NL, BE, AT, CH, Nordics, Mexico, Argentina, Middle East, USA); PAGE_SIZE increased from 20 to 100; availability extractor throttle reduced from 2s to 1s for ~4.5h runtime at 16K venues - **Playtomic pricing & occupancy pipeline** — 4 new staging models: `stg_playtomic_resources` (per-court: indoor/outdoor, surface type, size), `stg_playtomic_opening_hours` (per-day: open/close times, hours_open), `stg_playtomic_availability` (per-slot: 60-min bookable windows with real prices); `stg_playtomic_venues` rewritten to extract all metadata (opening_hours, resources, VAT rate, currency, timezone, booking settings) - **Venue capacity & daily availability fact tables** — `fct_venue_capacity` derives total bookable court-hours from court_count × opening_hours; `fct_daily_availability` calculates occupancy rate (1 - available/capacity), booked hours, revenue estimate, and pricing stats (median/peak/offpeak) per venue per day - **Venue pricing benchmarks** — `venue_pricing_benchmarks.sql` aggregates last-30-day venue metrics to city/country level: median hourly rate, peak/offpeak rates, P25/P75, occupancy rate, estimated daily revenue, court count - **Real data planner defaults** — `planner_defaults.sql` rewritten with 3-tier cascade: city-level Playtomic data → country median → hardcoded fallback; replaces income-factor estimation with actual market pricing; includes `data_source` and `data_confidence` provenance columns - **Eurostat income integration** (`stg_income.sql`) — staging model reads `ilc_di03` (median equivalised net income in PPS) from landing zone; grain `(country_code, ref_year)` - **Income columns in dim_cities and city_market_profile** — `median_income_pps` and `income_year` passed through from staging to serving layer - **Transactional email i18n** — all 8 email types now translated via locale files; `_t()` helper in `worker.py` looks up `email_*` keys from `en.json` / `de.json`; `_email_wrap()` accepts `lang` parameter for `` tag and translated footer; ~70 new translation keys (EN + DE); all task payloads now carry `lang` from request context at enqueue time; payloads without `lang` gracefully default to English - **Email design & copy upgrade** — redesigned `_email_wrap()`: replaced monogram header with lowercase wordmark matching website, added 3px blue accent border, preheader text support (hidden preview in email clients), HR separators between heading and body; `_email_button()` now full-width block for mobile tap targets; rewrote copy for all 9 emails with improved subject lines, urgency cues, quick-start links in welcome email, styled project recap cards in quote verification, heat badges on lead forward emails, "what happens next" section in lead matched notifications, and secondary CTAs; ~30 new/updated translation keys in both EN and DE ### Changed - **Resend audiences restructured** — replaced dynamic `waitlist-{blueprint}` audience naming (up to 4 audiences) with 3 named audiences fitting free plan limit: `suppliers` (supplier signups), `leads` (planner/quote users), `newsletter` (auth/content/public catch-all); new `_audience_for_blueprint()` mapping function in `core.py` - **dim_venues enhanced** — now includes court_count, indoor/outdoor split, timezone, VAT rate, and default currency from Playtomic venue metadata - **city_market_profile enhanced** — includes median hourly rate, occupancy rate, daily revenue estimate, and price currency from venue pricing benchmarks - **Planner API route** — col_map updated to match new planner_defaults columns (`rate_peak`, `rate_off_peak`, `avg_utilisation_pct`, `courts_typical`); adds `_dataSource` and `_currency` metadata keys ### Changed - **pSEO CMS: SSG architecture** — templates now live in git as `.md.jinja` files with YAML frontmatter (slug, data_table, url_pattern, etc.) instead of SQLite `article_templates` table; data comes directly from DuckDB serving tables instead of intermediary `template_data` table; admin template views are read-only (edit in git, preview/generate in admin) - **pSEO CMS: SEO pipeline** — article generation bakes canonical URLs, hreflang links (EN + DE), JSON-LD structured data (Article, FAQPage, BreadcrumbList), and Open Graph tags into each article's `seo_head` column at generation time; articles stored with `template_slug`, `language`, and `date_modified` columns for regeneration and freshness tracking ### Removed - `article_templates` and `template_data` SQLite tables (migration 0018) — replaced by git template files + direct DuckDB reads; `template_data_id` FK removed from `articles` and `published_scenarios` tables - Admin template CRUD routes (create/edit/delete) and CSV upload — replaced by read-only views with generate/regenerate/preview actions - `template_form.html` and `template_data.html` admin templates ### Changed - **Extraction: one file per source** — replaced monolithic `execute.py` with per-source modules (`overpass.py`, `eurostat.py`, `playtomic_tenants.py`, `playtomic_availability.py`); each module has its own CLI entry point (`extract-overpass`, `extract-eurostat`, etc.); shared boilerplate extracted to `_shared.py` with `run_extractor()` wrapper that handles SQLite state tracking, logging, and session management - **Transform: 4-layer → 3-layer** — removed `raw/` layer; staging models now read landing zone JSON files directly via `read_json()` with `@LANDING_DIR` variable; model schemas renamed from `padelnomics.*` to per-layer namespaces (`staging.*`, `foundation.*`, `serving.*`) - **Two-DuckDB architecture** — web app now reads from `SERVING_DUCKDB_PATH` (analytics.duckdb) instead of `DUCKDB_PATH` (lakehouse.duckdb); `export_serving.py` atomically swaps serving tables after each transform run - Supervisor: added daily sleep interval between pipeline runs ### Added - **Sitemap: hreflang alternates + caching** — extracted sitemap generation to `sitemap.py`; each URL entry now includes `xhtml:link` hreflang alternates (en, de, x-default) for correct international SEO signaling; supplier detail pages now listed in both EN and DE (were EN-only); removed misleading "today" lastmod from static pages; added 1-hour in-memory TTL cache with `Cache-Control: public, max-age=3600` response header - **Playtomic availability extractor** (`playtomic_availability.py`) — daily next-day booking slot snapshots for occupancy rate estimation and pricing benchmarking; reads tenant IDs from latest `tenants.json.gz`, queries `/v1/availability` per venue with 2s throttle, resumable via cursor, bounded at 10K venues per run - Template sync: copier update v0.9.0 → v0.10.0 — `export_serving.py` module, `@padelnomics_glob()` macro, `setup_server.sh`, supervisor export_serving step ### Fixed - **Eurostat JSON-stat parsing** — API returns 4-7 dimension sparse dictionaries (583K values) that caused DuckDB OOM; extractor now pre-processes JSON-stat into flat records with configurable dimension filters per dataset - **Playtomic venue lat/lon** — staging model used wrong JSON path (`address.coordinate_lat` vs actual `address.coordinate.lat`) - **dim_cities CTE** — unused `eurostat_labels` CTE caused `city_slug_raw` column not found error ### Removed - `extract/.../execute.py` — replaced by per-source modules - `models/raw/` directory — raw layer eliminated; staging reads landing files directly ### Added - Template sync: copier update from `29ac25b` → `v0.9.0` (29 template commits) - `.claude/CLAUDE.md`: project-specific Claude Code instructions (skills, commands, architecture) - `.claude/coding_philosophy.md`: engineering principles guide - `extract/padelnomics_extract/README.md`: extraction patterns & state tracking docs - `extract/padelnomics_extract/src/padelnomics_extract/utils.py`: SQLite state tracking (`open_state_db`, `start_run`, `end_run`, `get_last_cursor`) + file I/O helpers (`landing_path`, `content_hash`, `write_gzip_atomic`) - `transform/sqlmesh_padelnomics/README.md`: 4-layer SQLMesh architecture guide - Per-layer model READMEs (raw, staging, foundation, serving) - `infra/supervisor/`: systemd service + supervisor script for pipeline orchestration - Copier answers file now includes `enable_daas`, `enable_cms`, `enable_directory`, `enable_i18n` toggles (prevents accidental deletion on future copier updates) - Expanded programmatic SEO city coverage from 18 to 40 cities (+22 cities across ES, FR, IT, NL, AT, CH, SE, PT, BE, AE, AU, IE) — generates 80 articles (40 cities × EN + DE) - `scripts/refresh_from_daas.py`: syncs template_data rows from DuckDB `planner_defaults` serving table; supports `--dry-run` and `--generate` flags; graceful no-op when DuckDB unavailable ### Added - `analytics.py`: DuckDB read-only reader (`open_analytics_db`, `close_analytics_db`, `fetch_analytics`) registered in app lifecycle (startup/shutdown) - `GET /planner/api/market-data?city_slug=`: returns per-city planner defaults from DuckDB `planner_defaults` serving table; falls back to `{}` when analytics DB unavailable ### Added - `transform/sqlmesh_padelnomics` workspace member: SQLMesh 4-layer model pipeline over DuckDB - Raw: `raw_overpass_courts`, `raw_playtomic_tenants`, `raw_eurostat_population` - Staging: `stg_padel_courts`, `stg_playtomic_venues`, `stg_population` - Foundation: `dim_venues` (OSM + Playtomic deduped), `dim_cities` (with Eurostat population) - Serving: `city_market_profile` (market score OBT), `planner_defaults` (per-city calculator pre-fill) - `extract/padelnomics_extract` workspace member: Overpass API (padel courts via OSM), Eurostat city demographics (`urb_cpop1`, `ilc_di03`), and Playtomic unauthenticated tenant search extractors - Landing zone structure at `data/landing/` with per-source subdirectories: `overpass/`, `eurostat/`, `playtomic/` - `.env.example` entries for `DUCKDB_PATH` and `LANDING_DIR` - content: `scripts/seed_content.py` — seeds two article templates (EN + DE) and 18 cities × 2 language rows into the database; run with `uv run python -m padelnomics.scripts.seed_content --generate` to produce 36 pre-built SEO articles covering Germany (8 cities), USA (6 cities), and UK (4 cities); each city has realistic per-market overrides for rates, rent, utilities, permits, and court configuration so the financial model produces genuinely unique output per article - content: EN template (`city-padel-cost-en`) at `/padel-cost/{{ city_slug }}` and DE template (`city-padel-cost-de`) at `/padel-kosten/{{ city_slug }}` with Jinja2 Markdown bodies embedding `[scenario:slug:section]` cards for summary, CAPEX, operating, cashflow, and returns ### Fixed - content: `bake_scenario_cards()` now accepts a `lang` parameter and passes it to scenario partial templates; previously `lang` was always `undefined`, causing all cards to render with English labels even for German articles - admin: `_generate_from_template()` extracts `language` from data row and passes it to `calc()` and `bake_scenario_cards()` so German scenario cards use translated CAPEX/OPEX item names - admin: `_generate_from_template()` now derives `article_slug` as `{template_slug}-{city_slug}` instead of bare `city_slug`; bare slugs caused UNIQUE constraint collisions when multiple templates generated articles for the same city - admin: `_rebuild_article()` passes `lang` from data row (or `"en"` for manual articles) to `bake_scenario_cards()` so rebuilt articles render correct language labels - content: removed unused `g` import from `content/routes.py` ### Changed - planner: full HTMX refactor — replaced 847-line SPA `planner.js` with server-rendered Jinja2 tab partials; planner now uses `hx-post /planner/calculate` + form state; all tab content (CAPEX, Operating, Cash Flow, Returns, Metrics) rendered server-side; Chart.js data embedded as `