When should I create a Bloom Filter Index on my De...

User16826992666 · ‎06-16-2021

Ryan_Chynoweth · ‎06-17-2021

A bloom filter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary text. The Bloom filter operates by either stating that data is definitively not in the file, or that it is probably in the file, with a defined false positive probability (FPP).

The biggest reason for using a bloom filter when you often query on a specific set of columns. An example use case is when you have a large table and try to query a small subset of the data, which helps in “needle in a haystack” queries.

View solution in original post

When should I create a Bloom Filter Index on my Delta table?