feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention
Split the single lakehouse.duckdb into two files to eliminate the exclusive write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader): lakehouse.duckdb — SQLMesh exclusive (all pipeline layers) serving.duckdb — web app reads (serving tables only, atomically swapped) Changes: web/src/beanflows/analytics.py - Replace persistent global _conn with per-thread connections (threading.local) - Add _get_conn(): opens read_only=True on first call per thread, reopens automatically on inode change (~1μs os.stat) to pick up atomic file swaps - Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH - Add module docstring documenting architecture + DuckLake migration path web/src/beanflows/app.py - Startup check: use SERVING_DUCKDB_PATH - Health check: use _db_path instead of _conn src/materia/export_serving.py (new) - Reads all serving.* tables from lakehouse.duckdb (read_only) - Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic) - ~50 lines; runs after each SQLMesh transform src/materia/pipelines.py - Add export_serving pipeline entry (uv run python -c ...) infra/supervisor/supervisor.sh - Add SERVING_DUCKDB_PATH env var comment - Add export step: uv run materia pipeline run export_serving infra/supervisor/materia-supervisor.service - Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb infra/bootstrap_supervisor.sh - Add SERVING_DUCKDB_PATH to .env template web/.env.example + web/docker-compose.yml - Document both env vars; switch web service to SERVING_DUCKDB_PATH web/src/beanflows/dashboard/templates/settings.html - Minor settings page fix from prior session Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -93,6 +93,7 @@ PULUMI_ACCESS_TOKEN=${PULUMI_ACCESS_TOKEN}
|
||||
PATH=/root/.cargo/bin:/root/.pulumi/bin:/usr/local/bin:/usr/bin:/bin
|
||||
LANDING_DIR=/data/materia/landing
|
||||
DUCKDB_PATH=/data/materia/lakehouse.duckdb
|
||||
SERVING_DUCKDB_PATH=/data/materia/serving.duckdb
|
||||
EOF
|
||||
|
||||
echo "--- Setting up systemd service ---"
|
||||
|
||||
@@ -13,6 +13,7 @@ RestartSec=10
|
||||
EnvironmentFile=/opt/materia/.env
|
||||
Environment=LANDING_DIR=/data/materia/landing
|
||||
Environment=DUCKDB_PATH=/data/materia/lakehouse.duckdb
|
||||
Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
|
||||
@@ -4,9 +4,10 @@
|
||||
# https://github.com/tigerbeetle/tigerbeetle/blob/main/src/scripts/cfo_supervisor.sh
|
||||
#
|
||||
# Environment variables (set in systemd EnvironmentFile):
|
||||
# LANDING_DIR — local path for extracted landing data
|
||||
# DUCKDB_PATH — path to DuckDB lakehouse file
|
||||
# ALERT_WEBHOOK_URL — optional ntfy.sh / Slack / Telegram webhook for failure alerts
|
||||
# LANDING_DIR — local path for extracted landing data
|
||||
# DUCKDB_PATH — path to DuckDB lakehouse file (SQLMesh pipeline DB)
|
||||
# SERVING_DUCKDB_PATH — path to serving-only DuckDB (web app reads from here)
|
||||
# ALERT_WEBHOOK_URL — optional ntfy.sh / Slack / Telegram webhook for failure alerts
|
||||
|
||||
set -eu
|
||||
|
||||
@@ -51,6 +52,13 @@ do
|
||||
DUCKDB_PATH="${DUCKDB_PATH:-/data/materia/lakehouse.duckdb}" \
|
||||
uv run materia pipeline run transform
|
||||
|
||||
# Export serving tables to serving.duckdb (atomic swap).
|
||||
# The web app reads from SERVING_DUCKDB_PATH and picks up the new file
|
||||
# automatically via inode-based connection reopen — no restart needed.
|
||||
DUCKDB_PATH="${DUCKDB_PATH:-/data/materia/lakehouse.duckdb}" \
|
||||
SERVING_DUCKDB_PATH="${SERVING_DUCKDB_PATH:-/data/materia/serving.duckdb}" \
|
||||
uv run materia pipeline run export_serving
|
||||
|
||||
) || {
|
||||
# Notify on failure if webhook is configured, then sleep to avoid busy-loop
|
||||
if [ -n "${ALERT_WEBHOOK_URL:-}" ]; then
|
||||
|
||||
Reference in New Issue
Block a user