<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the difference between Z-Ordering and Partitioning? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-s-the-difference-between-z-ordering-and-partitioning/m-p/26657#M18680</link>
    <description>&lt;P&gt;Partitioning is a way of&amp;nbsp;distributing the data by keys so that you can restrict the amount of data scanned by each query and &lt;A href="https://kb.databricks.com/delta/delta-merge-into.html#how-to-improve-performance-of-delta-lake-merge-into-queries-using-partition-pruning" alt="https://kb.databricks.com/delta/delta-merge-into.html#how-to-improve-performance-of-delta-lake-merge-into-queries-using-partition-pruning" target="_blank"&gt;improve performance&lt;/A&gt; /&lt;A href="https://docs.databricks.com/delta/concurrency-control.html?_ga=2.46048486.352253067.1624558321-2086263156.1624558321#avoid-conflicts-using-partitioning-and-disjoint-command-conditions" alt="https://docs.databricks.com/delta/concurrency-control.html?_ga=2.46048486.352253067.1624558321-2086263156.1624558321#avoid-conflicts-using-partitioning-and-disjoint-command-conditions" target="_blank"&gt; avoid conflicts &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;General rules of thumb for choosing the right partition columns&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&amp;nbsp;Cardinality of a column should not be very high&lt;/LI&gt;&lt;LI&gt;&amp;nbsp;&amp;nbsp;Amount of data in each partition should meet a minimum threshold&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now delta supports a feature called &lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html#data-skipping" alt="https://docs.databricks.com/delta/optimizations/file-mgmt.html#data-skipping" target="_blank"&gt;data skipping to speed up queries&lt;/A&gt; . &lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html#z-ordering-multi-dimensional-clustering" alt="https://docs.databricks.com/delta/optimizations/file-mgmt.html#z-ordering-multi-dimensional-clustering" target="_blank"&gt;Z-odering &lt;/A&gt;is a multi-dimensional clustering approach to colocate related information in the same set of files so that databricks data-skipping algorithms can dramatically reduce the amount of data that needs to be read. This works somewhat like secondary indexes in terms of improving query read performance.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 24 Jun 2021 22:02:47 GMT</pubDate>
    <dc:creator>sajith_appukutt</dc:creator>
    <dc:date>2021-06-24T22:02:47Z</dc:date>
    <item>
      <title>What's the difference between Z-Ordering and Partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-difference-between-z-ordering-and-partitioning/m-p/26656#M18679</link>
      <description />
      <pubDate>Fri, 28 May 2021 19:22:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-difference-between-z-ordering-and-partitioning/m-p/26656#M18679</guid>
      <dc:creator>User16790091296</dc:creator>
      <dc:date>2021-05-28T19:22:29Z</dc:date>
    </item>
    <item>
      <title>Re: What's the difference between Z-Ordering and Partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-difference-between-z-ordering-and-partitioning/m-p/26657#M18680</link>
      <description>&lt;P&gt;Partitioning is a way of&amp;nbsp;distributing the data by keys so that you can restrict the amount of data scanned by each query and &lt;A href="https://kb.databricks.com/delta/delta-merge-into.html#how-to-improve-performance-of-delta-lake-merge-into-queries-using-partition-pruning" alt="https://kb.databricks.com/delta/delta-merge-into.html#how-to-improve-performance-of-delta-lake-merge-into-queries-using-partition-pruning" target="_blank"&gt;improve performance&lt;/A&gt; /&lt;A href="https://docs.databricks.com/delta/concurrency-control.html?_ga=2.46048486.352253067.1624558321-2086263156.1624558321#avoid-conflicts-using-partitioning-and-disjoint-command-conditions" alt="https://docs.databricks.com/delta/concurrency-control.html?_ga=2.46048486.352253067.1624558321-2086263156.1624558321#avoid-conflicts-using-partitioning-and-disjoint-command-conditions" target="_blank"&gt; avoid conflicts &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;General rules of thumb for choosing the right partition columns&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&amp;nbsp;Cardinality of a column should not be very high&lt;/LI&gt;&lt;LI&gt;&amp;nbsp;&amp;nbsp;Amount of data in each partition should meet a minimum threshold&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now delta supports a feature called &lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html#data-skipping" alt="https://docs.databricks.com/delta/optimizations/file-mgmt.html#data-skipping" target="_blank"&gt;data skipping to speed up queries&lt;/A&gt; . &lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html#z-ordering-multi-dimensional-clustering" alt="https://docs.databricks.com/delta/optimizations/file-mgmt.html#z-ordering-multi-dimensional-clustering" target="_blank"&gt;Z-odering &lt;/A&gt;is a multi-dimensional clustering approach to colocate related information in the same set of files so that databricks data-skipping algorithms can dramatically reduce the amount of data that needs to be read. This works somewhat like secondary indexes in terms of improving query read performance.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2021 22:02:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-difference-between-z-ordering-and-partitioning/m-p/26657#M18680</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-24T22:02:47Z</dc:date>
    </item>
  </channel>
</rss>

