Files
Deeman 2962bf5e3b Fix COT pipeline: TRY_CAST nulls, dim_commodity leading zeros, correct CFTC codes
- config.yaml: remove ambiguousorinvalidcolumn linter rule (false positives on read_csv TVFs)
- fct_cot_positioning: use TRY_CAST throughout — CFTC uses '.' as null in many columns
- raw/cot_disaggregated: add columns() declaration for 33 varchar cols
- dim_commodity: switch from SEED to FULL model with SQL VALUES to preserve leading zeros
  Pandas auto-converts '083' → 83 even with varchar column declarations in SEED models
- seeds/dim_commodity.csv: correct cftc_commodity_code from '083731' (contract market code)
  to '083' (3-digit CFTC commodity code); add CSV quoting
- test_cot_foundation.yaml: fix output key name, vars for time range, partial: true,
  and correct cftc_commodity_code to '083'
- analytics.py: COFFEE_CFTC_CODE '083731' → '083' to match actual data

Result: serving.cot_positioning has 685 rows (2013-01-08 to 2026-02-17), 23/23 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 23:28:10 +01:00

42 lines
1.3 KiB
YAML

# --- Gateway Connection ---
# Single local DuckDB gateway
# Local dev uses virtual environments (e.g., dev_<username>)
# Production uses the 'prod' environment
gateways:
duckdb:
connection:
type: duckdb
catalogs:
local: '{{ env_var("DUCKDB_PATH", "local.duckdb") }}'
default_gateway: duckdb
# --- Variables ---
variables:
LANDING_DIR: '{{ env_var("LANDING_DIR", "data/landing") }}'
# --- Model Defaults ---
# https://sqlmesh.readthedocs.io/en/stable/reference/model_configuration/#model-defaults
model_defaults:
dialect: duckdb
start: 2025-07-07 # Start date for backfill history
cron: '@daily' # Run models daily at 12am UTC (can override per model)
# --- Linting Rules ---
# https://sqlmesh.readthedocs.io/en/stable/guides/linter/
linter:
enabled: true
rules:
# ambiguousorinvalidcolumn removed: sqlglot cannot introspect read_csv() TVF
# schemas at lint time, causing false positives on all raw models. Cross-model
# column validation is handled by SQLMesh at plan time via columns() declarations.
- invalidselectstarexpansion
# --- Default Target Environment ---
# Prevents accidentally applying plans to prod during local development.
# https://sqlmesh.readthedocs.io/en/stable/guides/configuration/#default-target-environment
default_target_environment: dev_{{ user() }}