When should I use a bloom filter index vs. a z-order index? Any best practices around it?

User16826992724 — Thu, 16 Sep 2021 00:18:50 GMT

Re: When should I use a bloom filter index vs. a z-order index? Any best practices around it?

User16826992724 — Thu, 16 Sep 2021 00:31:16 GMT

Just like B-tree indices in the traditional EDW world, Z-order indexing can be used on high-cardinality columns like Primary Key columns and high-cardinality joins like facts and dimension tables joins. Z-order indexes can be created only on the first 32 columns of a table.

Bloom filter index - can be used where z-order index cannot be used ( meaning columns on wide tables with more than 32 columns- for e.g. columns at the end of the tables like MM,YYYY, datetime, quarter etc.
Bloom filters are also be used for less-used low or high cardinality fields that users typically use in WHERE conditions. ( You don't want to use too many columns in one z-ordex index as its efficiency may reduce).
Bloom filter index can be used as a mechanism to avoid over-partitioning. In Databricks, we don’t recommend over-partition - each partition should have around 1 GB data (compressed) at least, In such cases, when the table is not too big and you don't want to partition, putting a bloom filter index on date or yyyy-mm columns - can achieve the benefits of faster queries when a predicate is put on the date columns. Similarly, if partitioning is done on the Year column then a bloom filter on Month-Year can help.
Even when a column has a z-order Index, we can put an additional bloom filter index on it for better performance.

topic Re: When should I use a bloom filter index vs. a z-order index? Any best practices around it? in Data Engineering

When should I use a bloom filter index vs. a z-order index? Any best practices around it?

Re: When should I use a bloom filter index vs. a z-order index? Any best practices around it?