Replace the old CSV-upload CMS documentation with the new SSG system: git templates, DuckDB data sources, generation pipeline, SEO pipeline (hreflang, JSON-LD, canonical, OG), admin routes, and step-by-step guide for adding new pSEO ideas. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
431 lines
16 KiB
Markdown
431 lines
16 KiB
Markdown
# CMS & Programmatic SEO
|
||
|
||
How the content management system works: git-based template definitions, DuckDB data sources, bulk article generation, scenario embedding, SEO pipeline, and how content is served.
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
The CMS is an **SSG-inspired programmatic content system**. The core idea:
|
||
|
||
1. **Templates live in git** — `.md.jinja` files with YAML frontmatter define how articles look
|
||
2. **Data lives in DuckDB** — one serving table per pSEO idea provides what articles say
|
||
3. **Generation** renders templates with data, bakes in SEO metadata, writes HTML to disk
|
||
4. **SQLite** stores routing state only — `articles` and `published_scenarios` tables
|
||
5. Articles are **served** by a catch-all route reading pre-built HTML from disk
|
||
|
||
```
|
||
Git repo (content shape) DuckDB (content data) SQLite (routing/state)
|
||
──────────────────────── ───────────────────── ──────────────────────
|
||
content/templates/ serving.pseo_city_costs_de articles
|
||
city-cost-de.md.jinja → city_slug, population, → url_path, slug
|
||
market-compare.md.jinja venue_count, rates... → status, published_at
|
||
→ template_slug
|
||
serving.pseo_market_compare
|
||
"How articles look" "What articles say" published_scenarios
|
||
→ slug, state_json
|
||
→ calc_json
|
||
"Where things live"
|
||
```
|
||
|
||
---
|
||
|
||
## Template file format
|
||
|
||
Templates are `.md.jinja` files in `web/src/padelnomics/content/templates/`. Each file has YAML frontmatter delimited by `---`, followed by a Jinja2 Markdown body.
|
||
|
||
### Example
|
||
|
||
```markdown
|
||
---
|
||
name: "DE City Padel Costs"
|
||
slug: city-cost-de
|
||
content_type: calculator
|
||
data_table: serving.pseo_city_costs_de
|
||
natural_key: city_slug
|
||
languages: [de, en]
|
||
url_pattern: "/markets/{{ country_name_en | lower | slugify }}/{{ city_slug }}"
|
||
title_pattern: "Padel in {{ city_name }} — Market Analysis & Costs"
|
||
meta_description_pattern: "How much does it cost to build a padel center in {{ city_name }}? {{ padel_venue_count }} venues, pricing data & financial model."
|
||
schema_type: [Article, FAQPage]
|
||
priority_column: population
|
||
---
|
||
# Padel in {{ city_name }}
|
||
|
||
{{ city_name }} has {{ padel_venue_count }} padel venues...
|
||
|
||
[scenario:{{ scenario_slug }}:capex]
|
||
|
||
## FAQ
|
||
|
||
**How much does a padel court cost in {{ city_name }}?**
|
||
Based on our financial model, a {{ courts_typical }}-court center requires...
|
||
```
|
||
|
||
### Frontmatter fields
|
||
|
||
| Field | Required | Description |
|
||
|-------|----------|-------------|
|
||
| `name` | Yes | Human-readable template name (shown in admin) |
|
||
| `slug` | Yes | URL-safe identifier, also the filename stem |
|
||
| `content_type` | Yes | `calculator` (has financial scenario) or `editorial` |
|
||
| `data_table` | Yes | DuckDB serving table, schema-qualified (e.g. `serving.pseo_city_costs_de`) |
|
||
| `natural_key` | Yes | Column used as unique row identifier (e.g. `city_slug`) |
|
||
| `languages` | Yes | List of language codes to generate (e.g. `[de, en]`) |
|
||
| `url_pattern` | Yes | Jinja2 pattern for article URL path |
|
||
| `title_pattern` | Yes | Jinja2 pattern for `<title>` and `og:title` |
|
||
| `meta_description_pattern` | Yes | Jinja2 pattern for meta description (aim for 120-155 chars) |
|
||
| `schema_type` | No | JSON-LD type(s): `Article` (default), `FAQPage`, or list. See [Structured data](#structured-data) |
|
||
| `priority_column` | No | Column to sort by for publish order (highest value first) |
|
||
| `related_template_slugs` | No | Other template slugs for cross-linking (future) |
|
||
|
||
### Body template
|
||
|
||
Everything after the closing `---` is the body. It's Jinja2 Markdown with access to:
|
||
|
||
- **All columns** from the DuckDB row (e.g. `{{ city_name }}`, `{{ population }}`)
|
||
- **`language`** — current language code (`en`, `de`)
|
||
- **`scenario_slug`** — auto-derived slug for `[scenario:]` markers (calculator type only)
|
||
|
||
The `slugify` filter is available: `{{ country_name_en | slugify }}`.
|
||
|
||
---
|
||
|
||
## Database schema
|
||
|
||
After migration 0018, only two content tables remain in SQLite:
|
||
|
||
### articles
|
||
|
||
| Column | Type | Notes |
|
||
|--------|------|-------|
|
||
| `url_path` | TEXT UNIQUE | e.g. `/en/markets/germany/berlin` |
|
||
| `slug` | TEXT UNIQUE | e.g. `city-cost-de-en-berlin` |
|
||
| `title` | TEXT | Rendered from `title_pattern` |
|
||
| `meta_description` | TEXT | Rendered from `meta_description_pattern` |
|
||
| `country` | TEXT | For markets hub grouping |
|
||
| `region` | TEXT | For markets hub grouping |
|
||
| `status` | TEXT | `'published'` or `'draft'` |
|
||
| `published_at` | DATETIME | Future dates suppress the article |
|
||
| `template_slug` | TEXT | Links back to git template for regeneration |
|
||
| `language` | TEXT | Language code (`en`, `de`) for hreflang |
|
||
| `date_modified` | TEXT | ISO timestamp, updated on regeneration |
|
||
| `seo_head` | TEXT | Pre-computed HTML: canonical, hreflang, JSON-LD, OG tags |
|
||
|
||
FTS5 virtual table `articles_fts` syncs `title`, `meta_description`, `country`, `region` via triggers.
|
||
|
||
### published_scenarios
|
||
|
||
| Column | Type | Notes |
|
||
|--------|------|-------|
|
||
| `slug` | TEXT UNIQUE | Referenced in `[scenario:slug]` markers |
|
||
| `title` | TEXT | City name |
|
||
| `location` | TEXT | City name |
|
||
| `country` | TEXT | |
|
||
| `venue_type` | TEXT | `indoor` / `outdoor` |
|
||
| `ownership` | TEXT | `rent` / `own` |
|
||
| `court_config` | TEXT | e.g. `4 double + 2 single` |
|
||
| `state_json` | JSON | Calculator input state |
|
||
| `calc_json` | JSON | Calculator output (all sections) |
|
||
|
||
---
|
||
|
||
## Generation pipeline
|
||
|
||
### Full flow
|
||
|
||
```
|
||
1. Create serving model in SQLMesh
|
||
models/serving/pseo_city_costs_de.sql → sqlmesh plan prod → analytics.duckdb
|
||
|
||
2. Create template file in git
|
||
content/templates/city-cost-de.md.jinja
|
||
|
||
3. Preview locally
|
||
Admin → Templates → pick template → pick city → preview rendered article
|
||
|
||
4. Commit + push + deploy
|
||
|
||
5. Admin → Templates → "DE City Padel Costs"
|
||
sees data_table, row count from DuckDB, generated count
|
||
|
||
6. Click "Generate"
|
||
start_date, articles_per_day → for each DuckDB row × language:
|
||
• render URL, title, meta patterns
|
||
• create/update published_scenario (calculator type)
|
||
• render body.md.jinja → Markdown → HTML
|
||
• bake [scenario:] markers into HTML
|
||
• inject SEO head (canonical, hreflang, JSON-LD, OG)
|
||
• write HTML to data/content/_build/{lang}/{slug}.html
|
||
• upsert article row in SQLite
|
||
• stagger published_at dates
|
||
|
||
7. Articles live at /en/markets/germany/berlin
|
||
with links to /planner?scenario=city-cost-de-berlin
|
||
|
||
8. Data refresh → "Regenerate" in admin
|
||
reads fresh DuckDB data, updates HTML + scenarios in-place
|
||
```
|
||
|
||
### What happens per row
|
||
|
||
For each DuckDB row × each language in `config.languages`:
|
||
|
||
1. Render `url_pattern`, `title_pattern`, `meta_description_pattern` with row data
|
||
2. If `content_type == "calculator"`: extract calc fields → `validate_state()` → `calc()` → upsert `published_scenarios`
|
||
3. Render body template with row data + `scenario_slug`
|
||
4. Convert Markdown → HTML via `mistune`
|
||
5. Replace `[scenario:slug:section]` markers with rendered HTML cards
|
||
6. Build SEO head: canonical, hreflang links, JSON-LD, OG tags
|
||
7. Write HTML to `data/content/_build/{lang}/{article_slug}.html`
|
||
8. Upsert `articles` row in SQLite
|
||
9. Stagger `published_at` based on `articles_per_day`
|
||
|
||
### Scenario markers
|
||
|
||
Use `[scenario:slug]` or `[scenario:slug:section]` in the body template:
|
||
|
||
| Section | Shows |
|
||
|---------|-------|
|
||
| _(none)_ | Default summary card with CTA link to planner |
|
||
| `capex` | Capital expenditure breakdown |
|
||
| `operating` | Operating cost breakdown |
|
||
| `cashflow` | Annual cash flow projection |
|
||
| `returns` | ROI / returns summary |
|
||
| `full` | All sections combined |
|
||
|
||
Always use `{{ scenario_slug }}` — never hardcode a slug.
|
||
|
||
### Scenario slug derivation
|
||
|
||
```
|
||
scenario_slug = template_slug + "-" + natural_key_value
|
||
```
|
||
|
||
Example: template `city-cost-de` + row `city_slug=berlin` → `city-cost-de-berlin`
|
||
|
||
### Article slug derivation
|
||
|
||
```
|
||
article_slug = template_slug + "-" + language + "-" + natural_key_value
|
||
```
|
||
|
||
Example: `city-cost-de-en-berlin`, `city-cost-de-de-berlin`
|
||
|
||
---
|
||
|
||
## SEO pipeline
|
||
|
||
All SEO metadata is baked into the `seo_head` column at generation time. No runtime computation.
|
||
|
||
### URL structure
|
||
|
||
```
|
||
/{lang}/markets/{country_name}/{city_slug}
|
||
```
|
||
|
||
- Language prefix for hreflang (`/de/...`, `/en/...`)
|
||
- Full country name in path (`/germany/`, not `/de/`) — avoids confusion with lang prefix
|
||
- `slugify` filter converts to lowercase hyphenated form
|
||
|
||
### Hreflang
|
||
|
||
For each article, links to all language variants + `x-default` (points to English):
|
||
|
||
```html
|
||
<link rel="alternate" hreflang="de" href="https://padelnomics.io/de/markets/germany/berlin" />
|
||
<link rel="alternate" hreflang="en" href="https://padelnomics.io/en/markets/germany/berlin" />
|
||
<link rel="alternate" hreflang="x-default" href="https://padelnomics.io/en/markets/germany/berlin" />
|
||
```
|
||
|
||
### Canonical URLs
|
||
|
||
Self-referencing canonical on every article:
|
||
|
||
```html
|
||
<link rel="canonical" href="https://padelnomics.io/en/markets/germany/berlin" />
|
||
```
|
||
|
||
### Structured data
|
||
|
||
JSON-LD `<script>` blocks injected in `seo_head`. The `schema_type` frontmatter controls which types are generated:
|
||
|
||
| Schema | When | Data source |
|
||
|--------|------|-------------|
|
||
| `BreadcrumbList` | Always | Auto-generated from URL segments |
|
||
| `Article` | Default (or `schema_type: Article`) | Template patterns + row data |
|
||
| `FAQPage` | `schema_type: FAQPage` | Parsed from `## FAQ` section — bold questions + answer paragraphs |
|
||
|
||
**`Article`** includes: `headline`, `datePublished`, `dateModified`, `author` (Padelnomics), `publisher`, `description`, `inLanguage`.
|
||
|
||
**`FAQPage`** wraps bold-question/answer pairs from the `## FAQ` heading into `Question`/`acceptedAnswer` pairs.
|
||
|
||
`dateModified` updates on every regeneration (freshness signal for Google).
|
||
|
||
### Open Graph
|
||
|
||
```html
|
||
<meta property="og:title" content="..." />
|
||
<meta property="og:description" content="..." />
|
||
<meta property="og:url" content="..." />
|
||
<meta property="og:type" content="article" />
|
||
```
|
||
|
||
### Drip publishing
|
||
|
||
- `articles_per_day` parameter (default 3) staggers `published_at` dates
|
||
- `priority_column` sorts rows so high-value cities publish first
|
||
- Articles with future `published_at` are invisible to the catch-all route
|
||
|
||
---
|
||
|
||
## Article serving
|
||
|
||
The `content` blueprint registers a catch-all route:
|
||
|
||
```python
|
||
@bp.route("/<lang>/<path:url_path>")
|
||
async def article_page(lang, url_path):
|
||
```
|
||
|
||
Reserved prefixes are short-circuited:
|
||
|
||
```python
|
||
RESERVED_PREFIXES = (
|
||
"/admin", "/auth", "/planner", "/billing", "/dashboard",
|
||
"/directory", "/leads", "/suppliers", "/health",
|
||
"/sitemap", "/static", "/markets", "/features", "/feedback",
|
||
)
|
||
```
|
||
|
||
Lookup: find the article by `url_path`, check `status = 'published'` and `published_at <= now`, then read the pre-built HTML from `data/content/_build/{lang}/{slug}.html`.
|
||
|
||
---
|
||
|
||
## Markets hub
|
||
|
||
`/{lang}/markets` is the discovery index for all published articles.
|
||
|
||
- Groups articles by `country` and `region`
|
||
- Full-text search via FTS5 across `title`, `meta_description`, `country`, `region`
|
||
- HTMX partial at `/{lang}/markets/results` handles live filtering
|
||
|
||
---
|
||
|
||
## Admin interface
|
||
|
||
Templates are **read-only** in the admin UI — edit them in git, preview and generate in admin.
|
||
|
||
### Routes
|
||
|
||
| Route | Method | Purpose |
|
||
|-------|--------|---------|
|
||
| `/admin/templates` | GET | List templates scanned from disk |
|
||
| `/admin/templates/<slug>` | GET | Template detail: config, DuckDB columns, sample data |
|
||
| `/admin/templates/<slug>/preview/<row_key>` | GET | Preview one article rendered in-memory |
|
||
| `/admin/templates/<slug>/generate` | GET/POST | Generate form + action |
|
||
| `/admin/templates/<slug>/regenerate` | POST | Re-generate all articles with fresh DuckDB data |
|
||
|
||
Scenario and article management routes are unchanged (create, edit, delete, publish toggle, rebuild).
|
||
|
||
---
|
||
|
||
## Directory structure
|
||
|
||
```
|
||
web/src/padelnomics/
|
||
├── content/
|
||
│ ├── __init__.py # Core engine: discover, load, generate, preview, SEO
|
||
│ ├── routes.py # Catch-all serving, markets hub, scenario baking
|
||
│ └── templates/ # Git-based .md.jinja template files
|
||
│ └── city-cost-de.md.jinja
|
||
│
|
||
├── admin/
|
||
│ ├── routes.py # Read-only template views + generate/regenerate
|
||
│ └── templates/admin/
|
||
│ ├── templates.html # Template list (scanned from disk)
|
||
│ ├── template_detail.html # Config view, columns, sample data
|
||
│ ├── template_preview.html # Article preview
|
||
│ ├── generate_form.html # Schedule form
|
||
│ ├── articles.html
|
||
│ └── article_form.html
|
||
│
|
||
└── migrations/versions/
|
||
└── 0018_pseo_cms_refactor.py # Drops old tables, recreates articles + scenarios
|
||
|
||
data/content/
|
||
├── _build/ # Generated HTML (served at runtime)
|
||
│ ├── en/
|
||
│ │ └── {slug}.html
|
||
│ └── de/
|
||
│ └── {slug}.html
|
||
└── articles/ # Markdown source backup (manual articles)
|
||
└── {slug}.md
|
||
```
|
||
|
||
---
|
||
|
||
## Adding a new pSEO idea
|
||
|
||
### Step 1 — Create the DuckDB serving model
|
||
|
||
Write a SQLMesh model at `transform/sqlmesh_padelnomics/models/serving/pseo_your_idea.sql` that produces one row per article to generate. Must include a `natural_key` column (e.g. `city_slug`).
|
||
|
||
```bash
|
||
uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod
|
||
```
|
||
|
||
### Step 2 — Create the template file
|
||
|
||
Create `web/src/padelnomics/content/templates/your-idea.md.jinja`:
|
||
|
||
```markdown
|
||
---
|
||
name: "Your Idea Name"
|
||
slug: your-idea
|
||
content_type: calculator
|
||
data_table: serving.pseo_your_idea
|
||
natural_key: city_slug
|
||
languages: [en, de]
|
||
url_pattern: "/markets/{{ country_name_en | lower | slugify }}/{{ city_slug }}"
|
||
title_pattern: "Your Title for {{ city_name }}"
|
||
meta_description_pattern: "Your description for {{ city_name }}. Max 155 chars."
|
||
schema_type: [Article, FAQPage]
|
||
priority_column: population
|
||
---
|
||
# Your Title for {{ city_name }}
|
||
|
||
Article body with {{ template_variables }}...
|
||
|
||
[scenario:{{ scenario_slug }}:capex]
|
||
|
||
## FAQ
|
||
|
||
**Your question about {{ city_name }}?**
|
||
Your answer with data from the row.
|
||
```
|
||
|
||
### Step 3 — Preview and iterate
|
||
|
||
Start the dev server, go to Admin → Templates → your template → pick a row → Preview. Iterate on the body until it looks right.
|
||
|
||
### Step 4 — Commit, deploy, generate
|
||
|
||
Push the template file. In production admin, click Generate with a start date and articles/day cadence.
|
||
|
||
### Calculator field overrides
|
||
|
||
Any DuckDB column whose name matches a key in `DEFAULTS` (e.g. `electricity`, `ratePeak`, `dblCourts`) is automatically used as a calculator override. Name your serving model columns accordingly.
|
||
|
||
---
|
||
|
||
## Gotchas
|
||
|
||
- **Reserved prefixes** — content URLs must not collide with blueprint routes. The `is_reserved_path()` check rejects them during generation.
|
||
- **Slug uniqueness** — article slugs include template slug + language + natural key. Collisions are caught as DB UNIQUE constraint violations.
|
||
- **Future publish dates** — articles with `published_at` in the future are invisible to the catch-all route and markets hub. They exist in the DB and can be previewed via admin.
|
||
- **FTS5 sync** — triggers keep FTS in sync. If you manually insert into `articles`, run `INSERT INTO articles_fts(articles_fts) VALUES('rebuild')`.
|
||
- **Template edits** — editing a `.md.jinja` file in git doesn't automatically update existing articles. Use "Regenerate" in admin after deploying template changes.
|
||
- **DuckDB read-only** — all DuckDB access uses `read_only=True`. No write risk.
|
||
- **Table name validation** — `data_table` is validated against `^[a-z_][a-z0-9_.]*$` to prevent SQL injection.
|