Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

GregTyndall
New Contributor II

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?

"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]

Thanks!

MuthuLakshmi
Databricks Employee
Databricks Employee

@GregTyndall We have a limit of 2 joins 

View solution in original post

PotnuruSiva
Databricks Employee
Databricks Employee

@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings.

pipelines.enzyme.numberOfJoinsThreshold 5

TheSmike
New Contributor III

I have the same issue.

What do you mean exactly with "added to the pipeline settings"? How can I set it? 

maarko
New Contributor II

@PotnuruSiva I set pipelines.enzyme.numberOfJoinsThreshold 5 for a MV with 4 joins. But still I am getting
"incrementalization_issues": [
{
"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
"prevent_incrementalization": true,
"cost_model_rejection_subtype": "NUM_JOINS_THRESHOLD_EXCEEDED"
}
]

@maarko- I'd create a separate thread for your issue. But I can tell you though that I had the same problem- the Enzyme number of joins threshold was seemingly not being respected. I reported to Databricks support and they transferred the ticket to the Databricks Spark team to investigate. Currently they do not have any answer for me.

RiyazAliM
Honored Contributor

Hey @TheSmike 

In the DLT Pipeline's top right cornor, you can click on settings and scroll down to Advanced and click on Add Configuration and give the key as `pipelines.enzyme.numberOfJoinsThreshold` and value as 5.

Hope this helps.

Riz

TheSmike
New Contributor III

Thanks, it works.

_DatabricksUser
New Contributor III

@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?

To determine which refresh strategy is being used (incremental vs full), refer to the final section of the documentation: https://docs.databricks.com/aws/en/optimizations/incremental-refresh#determine-the-refresh-type-of-a....

According to the docs:

To determine the technique used, query the DLT event log where the "event_type" is "planning_information"...

Note:
There's a typo in the official documentation. To query the event log correctly, use FROM event_log_[NORMALIZED_DLT_ID]. Where [NORMALIZED_DLT_ID] is your pipeline ID with "_" instead of "-".

Look at the "details" column — if a **full refresh** is triggered, it often contains helpful insights into the reason.