refactor(transform): remove raw layer, read landing zone directly

- Delete 6 data raw models (coffee_prices, cot_disaggregated, ice_*,
  psd_data) — pure read_csv passthroughs with no added value
- Move 3 PSD seed models raw/ → seeds/, rename schema raw.* → seeds.*
- Update staging.psdalldata__commodity: read_csv(@psd_glob()) directly,
  join seeds.psd_* instead of raw.psd_*
- Update 5 foundation models: inline read_csv() with src CTE, removing
  raw.* dependency (fct_coffee_prices, fct_cot_positioning, fct_ice_*)
- Remove fixture-based SQLMesh test that depended on raw.cot_disaggregated
  (unit tests incompatible with inline read_csv; integration run covers this)
- Update readme.md: 3-layer architecture (staging/foundation → serving)

Landing files are immutable and content-addressed — the landing directory
is the audit trail. A raw SQL layer duplicated file bytes into DuckDB
with no added value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-22 17:30:18 +01:00
parent 1814a76e74
commit c3c8333407
18 changed files with 266 additions and 643 deletions

View File

@@ -1,6 +1,7 @@
-- Foundation fact: daily KC=F Coffee C futures prices.
--
-- Casts raw varchar columns to proper types and deduplicates via hash key.
-- Reads directly from the landing zone, casts varchar columns to proper types,
-- and deduplicates via hash key.
-- Covers all available history from the landing directory.
--
-- Grain: one row per trade_date.
@@ -17,7 +18,18 @@ MODEL (
cron '@daily'
);
WITH cast_and_clean AS (
WITH src AS (
SELECT * FROM read_csv(
@prices_glob(),
compression = 'gzip',
header = true,
union_by_name = true,
filename = true,
all_varchar = true
)
),
cast_and_clean AS (
SELECT
TRY_CAST(Date AS date) AS trade_date,
TRY_CAST(Open AS double) AS open,
@@ -32,7 +44,7 @@ WITH cast_and_clean AS (
-- Dedup key: trade date + close price
hash(Date, Close) AS hkey
FROM raw.coffee_prices
FROM src
WHERE TRY_CAST(Date AS date) IS NOT NULL
AND TRY_CAST(Close AS double) IS NOT NULL
),