User16502773013
Databricks Employee
Databricks Employee

Hello JD,

Here is a suggested approach:

Within the single pipeline that owns  dim_customer :
1. Declare the streaming table with  __START_AT  and  __END_AT .
2. Add a backfill flow:
CREATE FLOW backfill_dim_customer AS
INSERT INTO ONCE dim_customer BY NAME
SELECT
customer_id,
...,
MIN(order_date) AS __START_AT,
CAST(NULL AS TIMESTAMP) AS __END_AT
FROM raw_orders
GROUP BY customer_id, ...;
3. Add an AUTO CDC SCD2 flow using your CDC source and a proper  SEQUENCE BY  column that reflects change/effective time.
That will give you SCD2 history where the first version for each customer starts in 2019 (or whenever the first order occurred), and subsequent changes will be tracked automatically going forward.