Databricks Community

kasiviss42 · ‎02-03-2025

I have join on two large tables.

If i apply Z ordering on 3 columns for both the tables ,

I am joining two tables on the basis on the same 3 columns used for Z ordering ,

Will i get any benefit of Z ordering on performance when i use Joins here .

So as per my current knowledge Z ordering is specifically useful for faster query data retrieval i.e. select *from table1 where zordercolumn1 =X

Does Z ordering also helps Joins ? if not how to improve join performance here

kasiviss42 · ‎02-03-2025

I have asked this query because w.r.t Predicate query push down to storage .

So here as part of joins . Data needs to be loaded into memory first and then join is performed . So How does Z order help here if it can't skip the data being fetched from storage .

saurabh18cs · ‎02-04-2025

hi @kasiviss42 indeed z-order helps in file level data skipping . Now answer is 2 fold.

1) if you're joining 2 dataframes then you can make use of data skipping a.k.a z-ordering by performing filters first on those dataframes which will be then used for joins. (predicate push down). your dataframes are already lighter and in-memory is optimized.

2) you can also specify merge predicate during your join if you're using delta native merge . This merge predicate consists of partition pruning predicate + data skipping predicate + join predicate. So the combination of predicate will be applied to filter down to relevant partitions , relevant files if they are z-ordered.

example: