cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Bloom Filters for Inner Joins

tomvogel01
New Contributor II

I have a question regarding combining the use of Bloom filters with Liquid Clustering to further reduce the data read during a join/merge on top of dynamic file pruning. Testing both combined worked extremely well together for point queries. However having Bloom filters on a table removed dynamic file pruning entirely and lead to the entire table being read when doing a join/merge with and without Photon.

Do Bloom filters work along side dynamic file pruning? If so, any thoughts as to what might be going wrong?

Is there a plan to support such a functionality if not? If would be amazing to have it as it reduced the amount of data read by a factor of 20.

3 REPLIES 3

Could you point me to the specific online resources that speak of this? My research has yielded very little in terms of guidance which is why I am reaching out here.

NandiniN
Databricks Employee
Databricks Employee

We do not recommend Bloom filters Index on the Delta Tables as they have to be manually maintained. 

If you prefer photon - please try predictive I/O with Liquid Clustering.

merryray
New Contributor

Bloom filters and dynamic file pruning should ideally work together, but it seems that in your case, they are interfering with each other, likely due to the way file pruning is being handled in the presence of Bloom filters. The most likely causes could be conflicting filtering logic or the system prioritizing one optimization over the other. You can try adjusting the order of operations or checking for specific configuration settings. If this functionality isn't supported yet, it would be worth submitting a feature request for improved integration, as it could significantly reduce data reads, as you’ve already observed.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group