Hello,
I'm building a Delta Live Tables (DLT) pipeline to load data from a cloud source into an on-premise warehouse. My source tables have Change Data Feed (CDF) enabled, and my pipeline code is complex, involving joins of multiple Slowly Changing Dimensions (SCDs).
The pipeline is intended to perform an incremental load, but I've noticed it's reading and processing significantly more rows than expected. This is leading to inefficient pipeline runs.
I also need to capture DLT-generated metadata, specifically the change type (_change type) and commit version (_commit version) from the final DLT output table, not the source tables.
Could you please provide guidance on how to configure the DLT pipeline for a truly incremental load while also ensuring I can capture this essential metadata from the Change Data Feed of the DLT table itself?