SCD2 table migration using LakeFlow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
A source SQL DB of an operational systems delivers daily snapshots to a legacy DWH with SCD2 logic enabled. Now for a migration to Databricks. Lets look at the table "customer" (SCD2; with customer_id, valid_from and valid_to columns). On migration day t -1 I copy the table to Databricks. The valid_from can be used as the sequence column.
After the migration, when the source SQL DB delivers daily snapshots to Databricks directly I want to use dp.auto_cdc_from_snapshot_flow() which requires dp.create_streaming_table(). If I fill up this streaming table with the state of the customer SCD2 table from day t -1 and run dp.auto_cdc_from_snapshot_flow() it says the streaming table is not empty.
How to prepare this scenario so that dp.auto_cdc_from_snapshot_flow() works with all history data in the target table.