cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Partition Pruning override

pantelis_mare
Contributor III

Hello everybody,

Another strange issue I have and I would like to confirm me if this is a bug or expected behaviour:

I'm joining a large dataset with a dimension table and as expected DPP is activated.

I was trying to deactivate the feature as it changes the read partitions so I disabled through

spark.sql.optimizer.dynamicPartitionPruning.enabled and spark.databricks.optimizer.dynamicPartitionPruning but I STILL had the dynamic partition prunning.

Finally I discovered that by pushing spark.databricks.optimizer.deltaTableFilesThreshold to a big number I managed to see my sql query not to use DPP.

Is this behavior expected? I would say no given the DPP documentation here

Tested on both DBR 9.1 and 10

1 ACCEPTED SOLUTION

Accepted Solutions

pantelis_mare
Contributor III

Hello @Kaniz Fatma​ 

Thank you for taking the time to answer.

The issue in this case was that spark.databricks.optimizer.deltaTableFilesThreshold was activating DPP even if it was formally deactivated by setting all available "enabled" properties to false (see my initial post)

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

Hello there! Thanks for your question. I'd like to give this a bit longer to see what the community comes up with. Otherwise, I'll bump this to the SMEs.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Pantelis Maroudis, spark.databricks.optimizer.deltaTableFilesThreshold(default is 10 in Databricks Runtime 8.4 and above, 1000 in Databricks Runtime 8.3 and below): Represents the number of files of the Delta table on the probe side of the join required to trigger dynamic file pruning. When the probe side table contains fewer files than the threshold value, dynamic file pruning is not triggered. If a table has only a few files, it is probably not worthwhile to enable dynamic file pruning.

Source:-https://docs.databricks.com/delta/optimizations/dynamic-file-pruning.html?_ga=2.75262864.326194262.1...

pantelis_mare
Contributor III

Hello @Kaniz Fatma​ 

Thank you for taking the time to answer.

The issue in this case was that spark.databricks.optimizer.deltaTableFilesThreshold was activating DPP even if it was formally deactivated by setting all available "enabled" properties to false (see my initial post)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group