Hello everyone,
I’m working with two CDC tables:
table_x: 23,467,761 rows (and growing)
table_y: 27,868,173,722 rows
My goal is to build an SCD Type 2 table (table_z) using the Auto CDC API.
The workflow I’d like to achieve is:
Initial Load: Populate table_z using the historical data from table_y (used only once).
Ongoing Updates: Once the initial load is complete, keep table_z updated using the incremental hourly data from table_x.
Constraints / Challenges:
I can usually handle append_flow + streaming tables, but I haven’t found a way to implement it cleanly with the Auto CDC API.
I don’t want to create a merged copy of table_x and table_y with append_flows just for the one-time bootstrap, since table_y is extremely large and would be wasteful to duplicate.
Has anyone set up a similar pattern? using a one-time full load from a large CDC table, followed by incremental Auto CDC updates from another table? If so, how did you approach it?
Thanks in advance!