<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Query related to Z ordering in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108550#M43080</link>
    <description>&lt;P&gt;I have join on two large tables.&lt;/P&gt;&lt;P&gt;If i apply Z ordering on 3 columns for both the tables ,&lt;/P&gt;&lt;P&gt;I am joining two tables on the basis on the same 3 columns used for Z ordering ,&lt;/P&gt;&lt;P&gt;Will i get any benefit of Z ordering on performance when i use &lt;STRONG&gt;Joins&lt;/STRONG&gt; here .&lt;/P&gt;&lt;P&gt;So as per my current knowledge Z ordering is specifically useful for faster query data retrieval i.e. select *from table1 where zordercolumn1 =X&lt;/P&gt;&lt;P&gt;Does Z ordering also helps Joins ? if not how to improve join performance here&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Feb 2025 10:52:10 GMT</pubDate>
    <dc:creator>kasiviss42</dc:creator>
    <dc:date>2025-02-03T10:52:10Z</dc:date>
    <item>
      <title>Query related to Z ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108550#M43080</link>
      <description>&lt;P&gt;I have join on two large tables.&lt;/P&gt;&lt;P&gt;If i apply Z ordering on 3 columns for both the tables ,&lt;/P&gt;&lt;P&gt;I am joining two tables on the basis on the same 3 columns used for Z ordering ,&lt;/P&gt;&lt;P&gt;Will i get any benefit of Z ordering on performance when i use &lt;STRONG&gt;Joins&lt;/STRONG&gt; here .&lt;/P&gt;&lt;P&gt;So as per my current knowledge Z ordering is specifically useful for faster query data retrieval i.e. select *from table1 where zordercolumn1 =X&lt;/P&gt;&lt;P&gt;Does Z ordering also helps Joins ? if not how to improve join performance here&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 10:52:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108550#M43080</guid>
      <dc:creator>kasiviss42</dc:creator>
      <dc:date>2025-02-03T10:52:10Z</dc:date>
    </item>
    <item>
      <title>Re: Query related to Z ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108551#M43081</link>
      <description>&lt;P&gt;I have asked this query because w.r.t Predicate query push down to storage .&lt;/P&gt;&lt;P&gt;So here as part of joins . Data needs to be loaded into memory first and then join is performed . So How does Z order help here if it can't skip the data being fetched from storage .&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 10:57:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108551#M43081</guid>
      <dc:creator>kasiviss42</dc:creator>
      <dc:date>2025-02-03T10:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: Query related to Z ordering</title>
      <link>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108722#M43130</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/139168"&gt;@kasiviss42&lt;/a&gt;&amp;nbsp;indeed z-order helps in file level data skipping . Now answer is 2 fold.&lt;/P&gt;&lt;P&gt;1) if you're joining 2 dataframes then you can make use of data skipping a.k.a z-ordering by performing filters first on those dataframes which will be then used for joins. (predicate push down). your dataframes are already lighter and in-memory is optimized.&lt;/P&gt;&lt;P&gt;2) you can also specify &lt;STRONG&gt;merge predicate&lt;/STRONG&gt; during your join if you're using delta native merge . This merge predicate consists of partition pruning predicate + data skipping predicate + join predicate. So the combination of predicate will be applied to filter down to relevant partitions , relevant files if they are z-ordered.&lt;/P&gt;&lt;P&gt;example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;STRONG&gt;join_predicate&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;" AND "&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt; &amp;lt;=&amp;gt; t.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt; &lt;SPAN&gt;for&lt;/SPAN&gt; &lt;SPAN&gt;i&lt;/SPAN&gt; &lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;record_keys&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;STRONG&gt;merge_predicate &lt;/STRONG&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"(&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;target_prune_predicate&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;) AND (&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;time_filter_skipping_predicate&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;) AND (&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;join_predicate&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;)"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;target_prune&lt;/SPAN&gt; &lt;SPAN&gt;!=&lt;/SPAN&gt; &lt;SPAN&gt;""&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;else&lt;/SPAN&gt; &lt;SPAN&gt;join_predicate&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; target&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"t"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;merge&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;source_df&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"s"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;merge_predicate&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenMatchedUpdate&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"t.`&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;`"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s.`&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;`"&lt;/SPAN&gt; &lt;SPAN&gt;for&lt;/SPAN&gt; &lt;SPAN&gt;i&lt;/SPAN&gt; &lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;target_columns&lt;/SPAN&gt;&lt;SPAN&gt;})&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;whenNotMatchedInsert&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;values&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"s.`&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;i&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;`"&lt;/SPAN&gt; &lt;SPAN&gt;for&lt;/SPAN&gt; &lt;SPAN&gt;i&lt;/SPAN&gt; &lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;target_columns&lt;/SPAN&gt;&lt;SPAN&gt;})&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;execute&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;also , good to know this:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;A href="https://www.databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-lake-with-dynamic-file-pruning.html" target="_blank"&gt;https://www.databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-lake-with-dynamic-file-pruning.html&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 04 Feb 2025 08:30:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-related-to-z-ordering/m-p/108722#M43130</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-02-04T08:30:11Z</dc:date>
    </item>
  </channel>
</rss>

