cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

When should I create a Bloom Filter Index on my Delta table?

User16826992666
Valued Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Honored Contributor III

A bloom filter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary text. The Bloom filter operates by either stating that data is definitively not in the file, or that it is probably in the file, with a defined false positive probability (FPP).

The biggest reason for using a bloom filter when you often query on a specific set of columns. An example use case is when you have a large table and try to query a small subset of the data, which helps in โ€œneedle in a haystackโ€ queries.

View solution in original post

1 REPLY 1

Ryan_Chynoweth
Honored Contributor III

A bloom filter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary text. The Bloom filter operates by either stating that data is definitively not in the file, or that it is probably in the file, with a defined false positive probability (FPP).

The biggest reason for using a bloom filter when you often query on a specific set of columns. An example use case is when you have a large table and try to query a small subset of the data, which helps in โ€œneedle in a haystackโ€ queries.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.