What's the difference between Z-Ordering and Partitioning?

User16790091296
Databricks Employee
Databricks Employee
 

sajith_appukutt
Databricks Employee
Databricks Employee

Partitioning is a way of distributing the data by keys so that you can restrict the amount of data scanned by each query and improve performance / avoid conflicts

General rules of thumb for choosing the right partition columns 

  •   Cardinality of a column should not be very high
  •   Amount of data in each partition should meet a minimum threshold

Now delta supports a feature called data skipping to speed up queries .

Z-odering is a multi-dimensional clustering approach to colocate related information in the same set of files so that databricks data-skipping algorithms can dramatically reduce the amount of data that needs to be read. This works somewhat like secondary indexes in terms of improving query read performance.