<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-instance-type-to-run-optimize-bin-packing-and-z/m-p/26774#M18786</link>
    <description>&lt;P&gt;I've been doing some research on optimizing data storage while implementing delta, however, I'm not sure which instance type would be best for this.&lt;/P&gt;</description>
    <pubDate>Fri, 21 May 2021 18:40:49 GMT</pubDate>
    <dc:creator>User16790091296</dc:creator>
    <dc:date>2021-05-21T18:40:49Z</dc:date>
    <item>
      <title>What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-instance-type-to-run-optimize-bin-packing-and-z/m-p/26774#M18786</link>
      <description>&lt;P&gt;I've been doing some research on optimizing data storage while implementing delta, however, I'm not sure which instance type would be best for this.&lt;/P&gt;</description>
      <pubDate>Fri, 21 May 2021 18:40:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-instance-type-to-run-optimize-bin-packing-and-z/m-p/26774#M18786</guid>
      <dc:creator>User16790091296</dc:creator>
      <dc:date>2021-05-21T18:40:49Z</dc:date>
    </item>
    <item>
      <title>Re: What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-s-the-best-instance-type-to-run-optimize-bin-packing-and-z/m-p/26775#M18787</link>
      <description>&lt;P&gt;OPTIMIZE as you alluded has two operations , Bin-packing and &lt;B&gt;multi-dimensional clustering ( zorder)&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Bin-packing optimization is&amp;nbsp;&lt;I&gt;idempotent&lt;/I&gt;, meaning that if it is run twice on the same dataset, the second run has no effect&lt;/LI&gt;&lt;LI&gt;Z-Ordering is&amp;nbsp;&lt;I&gt;not idempotent&lt;/I&gt;&amp;nbsp;but aims to be an incremental operation. if no new data was added to a partition that was just Z-Ordered, another Z-Ordering of that partition will not have any effect.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With that in context, you'd want to check if your optimize job is CPU bound or memory bound. Anecdotal evidence shows that ESv3 series in Azure and r5a/r5d instances in AWS gives good cpu/mem ratio and is good for optimize - but really &lt;B&gt;YMMV&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 05:26:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-s-the-best-instance-type-to-run-optimize-bin-packing-and-z/m-p/26775#M18787</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-18T05:26:45Z</dc:date>
    </item>
  </channel>
</rss>

