Materialized View Refresh - NUM_JOINS_THRESHOLD_EX...

GregTyndall · ‎12-03-2024

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?

"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]

Thanks!

MuthuLakshmi · ‎12-04-2024

@GregTyndall We have a limit of 2 joins

View solution in original post

PotnuruSiva · ‎12-04-2024

@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings.

pipelines.enzyme.numberOfJoinsThreshold 5

TheSmike · ‎04-17-2025

I have the same issue.

What do you mean exactly with "added to the pipeline settings"? How can I set it?

maarko · ‎06-23-2025

@PotnuruSiva I set pipelines.enzyme.numberOfJoinsThreshold 5 for a MV with 4 joins. But still I am getting
"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]

_DatabricksUser · ‎06-23-2025

@maarko- I'd create a separate thread for your issue. But I can tell you though that I had the same problem- the Enzyme number of joins threshold was seemingly not being respected. I reported to Databricks support and they transferred the ticket to the Databricks Spark team to investigate. Currently they do not have any answer for me.

RiyazAliM · ‎04-21-2025

Hey @TheSmike

In the DLT Pipeline's top right cornor, you can click on settings and scroll down to Advanced and click on Add Configuration and give the key as `pipelines.enzyme.numberOfJoinsThreshold` and value as 5.

Hope this helps.

Riz

TheSmike · ‎04-24-2025

Thanks, it works.

_DatabricksUser · ‎05-14-2025

@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?

TheSmike · ‎05-27-2025

To determine which refresh strategy is being used (incremental vs full), refer to the final section of the documentation: https://docs.databricks.com/aws/en/optimizations/incremental-refresh#determine-the-refresh-type-of-a....

According to the docs:

To determine the technique used, query the DLT event log where the "event_type" is "planning_information"...

Note:
There's a typo in the official documentation. To query the event log correctly, use FROM event_log_[NORMALIZED_DLT_ID]. Where [NORMALIZED_DLT_ID] is your pipeline ID with "_" instead of "-".

Look at the "details" column — if a **full refresh** is triggered, it often contains helpful insights into the reason.

Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?