Files
padelnomics/docs/CMS.md
Deeman 488e47b4b4 docs: rewrite CMS.md for SSG architecture
Replace the old CSV-upload CMS documentation with the new SSG system:
git templates, DuckDB data sources, generation pipeline, SEO pipeline
(hreflang, JSON-LD, canonical, OG), admin routes, and step-by-step
guide for adding new pSEO ideas.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 13:03:30 +01:00

431 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CMS & Programmatic SEO
How the content management system works: git-based template definitions, DuckDB data sources, bulk article generation, scenario embedding, SEO pipeline, and how content is served.
---
## Overview
The CMS is an **SSG-inspired programmatic content system**. The core idea:
1. **Templates live in git**`.md.jinja` files with YAML frontmatter define how articles look
2. **Data lives in DuckDB** — one serving table per pSEO idea provides what articles say
3. **Generation** renders templates with data, bakes in SEO metadata, writes HTML to disk
4. **SQLite** stores routing state only — `articles` and `published_scenarios` tables
5. Articles are **served** by a catch-all route reading pre-built HTML from disk
```
Git repo (content shape) DuckDB (content data) SQLite (routing/state)
──────────────────────── ───────────────────── ──────────────────────
content/templates/ serving.pseo_city_costs_de articles
city-cost-de.md.jinja → city_slug, population, → url_path, slug
market-compare.md.jinja venue_count, rates... → status, published_at
→ template_slug
serving.pseo_market_compare
"How articles look" "What articles say" published_scenarios
→ slug, state_json
→ calc_json
"Where things live"
```
---
## Template file format
Templates are `.md.jinja` files in `web/src/padelnomics/content/templates/`. Each file has YAML frontmatter delimited by `---`, followed by a Jinja2 Markdown body.
### Example
```markdown
---
name: "DE City Padel Costs"
slug: city-cost-de
content_type: calculator
data_table: serving.pseo_city_costs_de
natural_key: city_slug
languages: [de, en]
url_pattern: "/markets/{{ country_name_en | lower | slugify }}/{{ city_slug }}"
title_pattern: "Padel in {{ city_name }} — Market Analysis & Costs"
meta_description_pattern: "How much does it cost to build a padel center in {{ city_name }}? {{ padel_venue_count }} venues, pricing data & financial model."
schema_type: [Article, FAQPage]
priority_column: population
---
# Padel in {{ city_name }}
{{ city_name }} has {{ padel_venue_count }} padel venues...
[scenario:{{ scenario_slug }}:capex]
## FAQ
**How much does a padel court cost in {{ city_name }}?**
Based on our financial model, a {{ courts_typical }}-court center requires...
```
### Frontmatter fields
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Human-readable template name (shown in admin) |
| `slug` | Yes | URL-safe identifier, also the filename stem |
| `content_type` | Yes | `calculator` (has financial scenario) or `editorial` |
| `data_table` | Yes | DuckDB serving table, schema-qualified (e.g. `serving.pseo_city_costs_de`) |
| `natural_key` | Yes | Column used as unique row identifier (e.g. `city_slug`) |
| `languages` | Yes | List of language codes to generate (e.g. `[de, en]`) |
| `url_pattern` | Yes | Jinja2 pattern for article URL path |
| `title_pattern` | Yes | Jinja2 pattern for `<title>` and `og:title` |
| `meta_description_pattern` | Yes | Jinja2 pattern for meta description (aim for 120-155 chars) |
| `schema_type` | No | JSON-LD type(s): `Article` (default), `FAQPage`, or list. See [Structured data](#structured-data) |
| `priority_column` | No | Column to sort by for publish order (highest value first) |
| `related_template_slugs` | No | Other template slugs for cross-linking (future) |
### Body template
Everything after the closing `---` is the body. It's Jinja2 Markdown with access to:
- **All columns** from the DuckDB row (e.g. `{{ city_name }}`, `{{ population }}`)
- **`language`** — current language code (`en`, `de`)
- **`scenario_slug`** — auto-derived slug for `[scenario:]` markers (calculator type only)
The `slugify` filter is available: `{{ country_name_en | slugify }}`.
---
## Database schema
After migration 0018, only two content tables remain in SQLite:
### articles
| Column | Type | Notes |
|--------|------|-------|
| `url_path` | TEXT UNIQUE | e.g. `/en/markets/germany/berlin` |
| `slug` | TEXT UNIQUE | e.g. `city-cost-de-en-berlin` |
| `title` | TEXT | Rendered from `title_pattern` |
| `meta_description` | TEXT | Rendered from `meta_description_pattern` |
| `country` | TEXT | For markets hub grouping |
| `region` | TEXT | For markets hub grouping |
| `status` | TEXT | `'published'` or `'draft'` |
| `published_at` | DATETIME | Future dates suppress the article |
| `template_slug` | TEXT | Links back to git template for regeneration |
| `language` | TEXT | Language code (`en`, `de`) for hreflang |
| `date_modified` | TEXT | ISO timestamp, updated on regeneration |
| `seo_head` | TEXT | Pre-computed HTML: canonical, hreflang, JSON-LD, OG tags |
FTS5 virtual table `articles_fts` syncs `title`, `meta_description`, `country`, `region` via triggers.
### published_scenarios
| Column | Type | Notes |
|--------|------|-------|
| `slug` | TEXT UNIQUE | Referenced in `[scenario:slug]` markers |
| `title` | TEXT | City name |
| `location` | TEXT | City name |
| `country` | TEXT | |
| `venue_type` | TEXT | `indoor` / `outdoor` |
| `ownership` | TEXT | `rent` / `own` |
| `court_config` | TEXT | e.g. `4 double + 2 single` |
| `state_json` | JSON | Calculator input state |
| `calc_json` | JSON | Calculator output (all sections) |
---
## Generation pipeline
### Full flow
```
1. Create serving model in SQLMesh
models/serving/pseo_city_costs_de.sql → sqlmesh plan prod → analytics.duckdb
2. Create template file in git
content/templates/city-cost-de.md.jinja
3. Preview locally
Admin → Templates → pick template → pick city → preview rendered article
4. Commit + push + deploy
5. Admin → Templates → "DE City Padel Costs"
sees data_table, row count from DuckDB, generated count
6. Click "Generate"
start_date, articles_per_day → for each DuckDB row × language:
• render URL, title, meta patterns
• create/update published_scenario (calculator type)
• render body.md.jinja → Markdown → HTML
• bake [scenario:] markers into HTML
• inject SEO head (canonical, hreflang, JSON-LD, OG)
• write HTML to data/content/_build/{lang}/{slug}.html
• upsert article row in SQLite
• stagger published_at dates
7. Articles live at /en/markets/germany/berlin
with links to /planner?scenario=city-cost-de-berlin
8. Data refresh → "Regenerate" in admin
reads fresh DuckDB data, updates HTML + scenarios in-place
```
### What happens per row
For each DuckDB row × each language in `config.languages`:
1. Render `url_pattern`, `title_pattern`, `meta_description_pattern` with row data
2. If `content_type == "calculator"`: extract calc fields → `validate_state()``calc()` → upsert `published_scenarios`
3. Render body template with row data + `scenario_slug`
4. Convert Markdown → HTML via `mistune`
5. Replace `[scenario:slug:section]` markers with rendered HTML cards
6. Build SEO head: canonical, hreflang links, JSON-LD, OG tags
7. Write HTML to `data/content/_build/{lang}/{article_slug}.html`
8. Upsert `articles` row in SQLite
9. Stagger `published_at` based on `articles_per_day`
### Scenario markers
Use `[scenario:slug]` or `[scenario:slug:section]` in the body template:
| Section | Shows |
|---------|-------|
| _(none)_ | Default summary card with CTA link to planner |
| `capex` | Capital expenditure breakdown |
| `operating` | Operating cost breakdown |
| `cashflow` | Annual cash flow projection |
| `returns` | ROI / returns summary |
| `full` | All sections combined |
Always use `{{ scenario_slug }}` — never hardcode a slug.
### Scenario slug derivation
```
scenario_slug = template_slug + "-" + natural_key_value
```
Example: template `city-cost-de` + row `city_slug=berlin``city-cost-de-berlin`
### Article slug derivation
```
article_slug = template_slug + "-" + language + "-" + natural_key_value
```
Example: `city-cost-de-en-berlin`, `city-cost-de-de-berlin`
---
## SEO pipeline
All SEO metadata is baked into the `seo_head` column at generation time. No runtime computation.
### URL structure
```
/{lang}/markets/{country_name}/{city_slug}
```
- Language prefix for hreflang (`/de/...`, `/en/...`)
- Full country name in path (`/germany/`, not `/de/`) — avoids confusion with lang prefix
- `slugify` filter converts to lowercase hyphenated form
### Hreflang
For each article, links to all language variants + `x-default` (points to English):
```html
<link rel="alternate" hreflang="de" href="https://padelnomics.io/de/markets/germany/berlin" />
<link rel="alternate" hreflang="en" href="https://padelnomics.io/en/markets/germany/berlin" />
<link rel="alternate" hreflang="x-default" href="https://padelnomics.io/en/markets/germany/berlin" />
```
### Canonical URLs
Self-referencing canonical on every article:
```html
<link rel="canonical" href="https://padelnomics.io/en/markets/germany/berlin" />
```
### Structured data
JSON-LD `<script>` blocks injected in `seo_head`. The `schema_type` frontmatter controls which types are generated:
| Schema | When | Data source |
|--------|------|-------------|
| `BreadcrumbList` | Always | Auto-generated from URL segments |
| `Article` | Default (or `schema_type: Article`) | Template patterns + row data |
| `FAQPage` | `schema_type: FAQPage` | Parsed from `## FAQ` section — bold questions + answer paragraphs |
**`Article`** includes: `headline`, `datePublished`, `dateModified`, `author` (Padelnomics), `publisher`, `description`, `inLanguage`.
**`FAQPage`** wraps bold-question/answer pairs from the `## FAQ` heading into `Question`/`acceptedAnswer` pairs.
`dateModified` updates on every regeneration (freshness signal for Google).
### Open Graph
```html
<meta property="og:title" content="..." />
<meta property="og:description" content="..." />
<meta property="og:url" content="..." />
<meta property="og:type" content="article" />
```
### Drip publishing
- `articles_per_day` parameter (default 3) staggers `published_at` dates
- `priority_column` sorts rows so high-value cities publish first
- Articles with future `published_at` are invisible to the catch-all route
---
## Article serving
The `content` blueprint registers a catch-all route:
```python
@bp.route("/<lang>/<path:url_path>")
async def article_page(lang, url_path):
```
Reserved prefixes are short-circuited:
```python
RESERVED_PREFIXES = (
"/admin", "/auth", "/planner", "/billing", "/dashboard",
"/directory", "/leads", "/suppliers", "/health",
"/sitemap", "/static", "/markets", "/features", "/feedback",
)
```
Lookup: find the article by `url_path`, check `status = 'published'` and `published_at <= now`, then read the pre-built HTML from `data/content/_build/{lang}/{slug}.html`.
---
## Markets hub
`/{lang}/markets` is the discovery index for all published articles.
- Groups articles by `country` and `region`
- Full-text search via FTS5 across `title`, `meta_description`, `country`, `region`
- HTMX partial at `/{lang}/markets/results` handles live filtering
---
## Admin interface
Templates are **read-only** in the admin UI — edit them in git, preview and generate in admin.
### Routes
| Route | Method | Purpose |
|-------|--------|---------|
| `/admin/templates` | GET | List templates scanned from disk |
| `/admin/templates/<slug>` | GET | Template detail: config, DuckDB columns, sample data |
| `/admin/templates/<slug>/preview/<row_key>` | GET | Preview one article rendered in-memory |
| `/admin/templates/<slug>/generate` | GET/POST | Generate form + action |
| `/admin/templates/<slug>/regenerate` | POST | Re-generate all articles with fresh DuckDB data |
Scenario and article management routes are unchanged (create, edit, delete, publish toggle, rebuild).
---
## Directory structure
```
web/src/padelnomics/
├── content/
│ ├── __init__.py # Core engine: discover, load, generate, preview, SEO
│ ├── routes.py # Catch-all serving, markets hub, scenario baking
│ └── templates/ # Git-based .md.jinja template files
│ └── city-cost-de.md.jinja
├── admin/
│ ├── routes.py # Read-only template views + generate/regenerate
│ └── templates/admin/
│ ├── templates.html # Template list (scanned from disk)
│ ├── template_detail.html # Config view, columns, sample data
│ ├── template_preview.html # Article preview
│ ├── generate_form.html # Schedule form
│ ├── articles.html
│ └── article_form.html
└── migrations/versions/
└── 0018_pseo_cms_refactor.py # Drops old tables, recreates articles + scenarios
data/content/
├── _build/ # Generated HTML (served at runtime)
│ ├── en/
│ │ └── {slug}.html
│ └── de/
│ └── {slug}.html
└── articles/ # Markdown source backup (manual articles)
└── {slug}.md
```
---
## Adding a new pSEO idea
### Step 1 — Create the DuckDB serving model
Write a SQLMesh model at `transform/sqlmesh_padelnomics/models/serving/pseo_your_idea.sql` that produces one row per article to generate. Must include a `natural_key` column (e.g. `city_slug`).
```bash
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
```
### Step 2 — Create the template file
Create `web/src/padelnomics/content/templates/your-idea.md.jinja`:
```markdown
---
name: "Your Idea Name"
slug: your-idea
content_type: calculator
data_table: serving.pseo_your_idea
natural_key: city_slug
languages: [en, de]
url_pattern: "/markets/{{ country_name_en | lower | slugify }}/{{ city_slug }}"
title_pattern: "Your Title for {{ city_name }}"
meta_description_pattern: "Your description for {{ city_name }}. Max 155 chars."
schema_type: [Article, FAQPage]
priority_column: population
---
# Your Title for {{ city_name }}
Article body with {{ template_variables }}...
[scenario:{{ scenario_slug }}:capex]
## FAQ
**Your question about {{ city_name }}?**
Your answer with data from the row.
```
### Step 3 — Preview and iterate
Start the dev server, go to Admin → Templates → your template → pick a row → Preview. Iterate on the body until it looks right.
### Step 4 — Commit, deploy, generate
Push the template file. In production admin, click Generate with a start date and articles/day cadence.
### Calculator field overrides
Any DuckDB column whose name matches a key in `DEFAULTS` (e.g. `electricity`, `ratePeak`, `dblCourts`) is automatically used as a calculator override. Name your serving model columns accordingly.
---
## Gotchas
- **Reserved prefixes** — content URLs must not collide with blueprint routes. The `is_reserved_path()` check rejects them during generation.
- **Slug uniqueness** — article slugs include template slug + language + natural key. Collisions are caught as DB UNIQUE constraint violations.
- **Future publish dates** — articles with `published_at` in the future are invisible to the catch-all route and markets hub. They exist in the DB and can be previewed via admin.
- **FTS5 sync** — triggers keep FTS in sync. If you manually insert into `articles`, run `INSERT INTO articles_fts(articles_fts) VALUES('rebuild')`.
- **Template edits** — editing a `.md.jinja` file in git doesn't automatically update existing articles. Use "Regenerate" in admin after deploying template changes.
- **DuckDB read-only** — all DuckDB access uses `read_only=True`. No write risk.
- **Table name validation** — `data_table` is validated against `^[a-z_][a-z0-9_.]*$` to prevent SQL injection.