XP
Databricks Employee
Databricks Employee

I'll try to clear up some misunderstanding surrounding the incremental load feature of materialized views:

There isn't a feature to force materialized views to update incrementally. Instead, there is an optimizer called Enzyme that can selectively incrementally load materialized views when the optimizer determines that an incremental update is a more optimal strategy than a full update. Enzyme chooses an incremental strategy when a number of factors are true.  

@ismaelhenzel, in your case Enzyme determined full load was more optimal indicated by:

"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL"
and the reason is indicated by:
"cost_model_rejection_subtype": "CHANGESET_SIZE_THRESHOLD_EXCEEDED"

This rejection subtype indicates that the change size is greater than the threshold allowed for incremental loading. The current default threshold is pretty conservative but will be relaxed overtime. This threshold is potentially tunable, depending on your query pattern. 

@lucassvrielink You might be getting tripped up by seeing the cost outputs from the log and doing a simple inequality. The calculation that Enzyme uses to determine which strategy to execute is much more involved than that. It's important to remember that the goal here isn't to incrementally load a materialized view, it's to make the loading of materialized views faster. The optimizer assumes you wouldn't want to force an incremental load if it was slower to do so. 

The goal of DLT more broadly is to simplify ETL by abstracting and automating away some of the inherent complexity. While this might feel like a black box to some, the goal isn't to obscure things. If you want more control over the process, it is possible to implement a similar pattern using other tools in the Databricks platform.

View solution in original post