- Delete 6 data raw models (coffee_prices, cot_disaggregated, ice_*, psd_data) — pure read_csv passthroughs with no added value - Move 3 PSD seed models raw/ → seeds/, rename schema raw.* → seeds.* - Update staging.psdalldata__commodity: read_csv(@psd_glob()) directly, join seeds.psd_* instead of raw.psd_* - Update 5 foundation models: inline read_csv() with src CTE, removing raw.* dependency (fct_coffee_prices, fct_cot_positioning, fct_ice_*) - Remove fixture-based SQLMesh test that depended on raw.cot_disaggregated (unit tests incompatible with inline read_csv; integration run covers this) - Update readme.md: 3-layer architecture (staging/foundation → serving) Landing files are immutable and content-addressed — the landing directory is the audit trail. A raw SQL layer duplicated file bytes into DuckDB with no added value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
70 lines
1.9 KiB
SQL
70 lines
1.9 KiB
SQL
-- Foundation fact: daily KC=F Coffee C futures prices.
|
|
--
|
|
-- Reads directly from the landing zone, casts varchar columns to proper types,
|
|
-- and deduplicates via hash key.
|
|
-- Covers all available history from the landing directory.
|
|
--
|
|
-- Grain: one row per trade_date.
|
|
-- Dedup: hash of (trade_date, close) — if Yahoo Finance corrects a price,
|
|
-- the new hash triggers a re-ingest on the next incremental run.
|
|
|
|
MODEL (
|
|
name foundation.fct_coffee_prices,
|
|
kind INCREMENTAL_BY_TIME_RANGE (
|
|
time_column trade_date
|
|
),
|
|
grain (trade_date),
|
|
start '1971-08-16',
|
|
cron '@daily'
|
|
);
|
|
|
|
WITH src AS (
|
|
SELECT * FROM read_csv(
|
|
@prices_glob(),
|
|
compression = 'gzip',
|
|
header = true,
|
|
union_by_name = true,
|
|
filename = true,
|
|
all_varchar = true
|
|
)
|
|
),
|
|
|
|
cast_and_clean AS (
|
|
SELECT
|
|
TRY_CAST(Date AS date) AS trade_date,
|
|
TRY_CAST(Open AS double) AS open,
|
|
TRY_CAST(High AS double) AS high,
|
|
TRY_CAST(Low AS double) AS low,
|
|
TRY_CAST(Close AS double) AS close,
|
|
TRY_CAST(Adj_Close AS double) AS adj_close,
|
|
TRY_CAST(Volume AS bigint) AS volume,
|
|
|
|
-- Filename encodes the content hash — use as ingest identifier
|
|
filename AS source_file,
|
|
|
|
-- Dedup key: trade date + close price
|
|
hash(Date, Close) AS hkey
|
|
FROM src
|
|
WHERE TRY_CAST(Date AS date) IS NOT NULL
|
|
AND TRY_CAST(Close AS double) IS NOT NULL
|
|
),
|
|
|
|
deduplicated AS (
|
|
SELECT
|
|
any_value(trade_date) AS trade_date,
|
|
any_value(open) AS open,
|
|
any_value(high) AS high,
|
|
any_value(low) AS low,
|
|
any_value(close) AS close,
|
|
any_value(adj_close) AS adj_close,
|
|
any_value(volume) AS volume,
|
|
any_value(source_file) AS source_file,
|
|
hkey
|
|
FROM cast_and_clean
|
|
GROUP BY hkey
|
|
)
|
|
|
|
SELECT *
|
|
FROM deduplicated
|
|
WHERE trade_date BETWEEN @start_ds AND @end_ds
|