12-03-2024 05:43 AM
I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?
"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]
Thanks!
12-04-2024 05:50 AM
12-04-2024 05:50 AM
12-04-2024 06:07 AM
@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings.
pipelines.enzyme.numberOfJoinsThreshold 5
04-17-2025 08:52 AM
I have the same issue.
What do you mean exactly with "added to the pipeline settings"? How can I set it?
06-23-2025 05:54 AM
@PotnuruSiva I set pipelines.enzyme.numberOfJoinsThreshold 5 for a MV with 4 joins. But still I am getting
"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]
06-23-2025 06:21 AM
@maarko- I'd create a separate thread for your issue. But I can tell you though that I had the same problem- the Enzyme number of joins threshold was seemingly not being respected. I reported to Databricks support and they transferred the ticket to the Databricks Spark team to investigate. Currently they do not have any answer for me.
04-21-2025 12:28 AM
Hey @TheSmike
In the DLT Pipeline's top right cornor, you can click on settings and scroll down to Advanced and click on Add Configuration and give the key as `pipelines.enzyme.numberOfJoinsThreshold` and value as 5.
Hope this helps.
04-24-2025 12:50 AM
Thanks, it works.
05-14-2025 08:56 AM
@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?
05-27-2025 08:29 AM
To determine which refresh strategy is being used (incremental vs full), refer to the final section of the documentation: https://docs.databricks.com/aws/en/optimizations/incremental-refresh#determine-the-refresh-type-of-a....
According to the docs:
To determine the technique used, query the DLT event log where the "event_type" is "planning_information"...
Note:
There's a typo in the official documentation. To query the event log correctly, use FROM event_log_[NORMALIZED_DLT_ID]. Where [NORMALIZED_DLT_ID] is your pipeline ID with "_" instead of "-".
Look at the "details" column — if a **full refresh** is triggered, it often contains helpful insights into the reason.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now