When should I create a Bloom Filter Index on my Delta table?

User16826992666 — Thu, 17 Jun 2021 03:57:38 GMT

Re: When should I create a Bloom Filter Index on my Delta table?

Ryan_Chynoweth — Fri, 18 Jun 2021 00:00:40 GMT

A bloom filter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary text. The Bloom filter operates by either stating that data is definitively not in the file, or that it is probably in the file, with a defined false positive probability (FPP).

The biggest reason for using a bloom filter when you often query on a specific set of columns. An example use case is when you have a large table and try to query a small subset of the data, which helps in “needle in a haystack” queries.

topic Re: When should I create a Bloom Filter Index on my Delta table? in Data Engineering

When should I create a Bloom Filter Index on my Delta table?

Re: When should I create a Bloom Filter Index on my Delta table?