SCD-2 backfilling with streaming tabels
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-16-2026 11:38 AM
Hi there,
Im new to Databricks and trying to build a SCD2 type table using AUTO CDC approach. while it quite simple to create a scd2 table Im unable to do a backfill.
Full context.
I have raw data(order, customer info) from 2019 and creating a dimension for customer(dim_customer), when I first ran the pipeline the dim_customer is populated correctly but the __start_date was equal to the current_timestamp and __end_date is null (as expected). In this scenario the __start_date (the effective period) should not be 2026 right ? Ideally it should be when we first received the order from customer (ie somewhere in 2019). As a general approach I was trying to use the min(order date) from raw data to populate the __start_date but failed.
I tried few approaches
- tried creating streaming table from notebook - failed with error cannot create from unity catalog
- Created a separate pipeline for backfills - failed again with error that dim_customer is managed in separate pipeline.
So folks, Im reaching out to understand what is the standard way to solve this problem. I would appreciate if you can share any links blogs/ your experience which can help me with this.
TIA,
JD.
#databricks
#dlt
#scd2
#auto_cdc