Databricks Community

Ovasheli · ‎08-08-2025

Hello,

I'm building a Delta Live Tables (DLT) pipeline to load data from a cloud source into an on-premise warehouse. My source tables have Change Data Feed (CDF) enabled, and my pipeline code is complex, involving joins of multiple Slowly Changing Dimensions (SCDs).

The pipeline is intended to perform an incremental load, but I've noticed it's reading and processing significantly more rows than expected. This is leading to inefficient pipeline runs.

I also need to capture DLT-generated metadata, specifically the change type (_change type) and commit version (_commit version) from the final DLT output table, not the source tables.

Could you please provide guidance on how to configure the DLT pipeline for a truly incremental load while also ensuring I can capture this essential metadata from the Change Data Feed of the DLT table itself?

szymon_dybczak · ‎08-08-2025

Hi @Ovasheli ,

The thing is with Declarative Pipelines (former DLT) you can't always force incremental load. For example, if you're using materialized views in your pipeline there is an optimizer called Enzyme that can selectively incrementally load materialized views when the optimizer determines that an incremental update is a more optimal strategy than a full update. Enzyme chooses an incremental strategy when a number of factors are true (for example what operator you use in pipeline etc) . If you have a complex pipeline then Enzyme can estimate that it's better to perform full refresh instead of incremental one.

You can read more about it here:

Incremental refresh for materialized views - Azure Databricks | Microsoft Learn