<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: delta table autooptimize vs optimize command in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34130#M24911</link>
    <description>&lt;P&gt;the auto optimize is sufficient, unless you run into performance issues.&lt;/P&gt;&lt;P&gt;Then I would trigger an optimize.  This will generate files of 1GB (so larger than the standard size of auto optimize).  And of course the Z-Order if necessary.&lt;/P&gt;&lt;P&gt;The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.&lt;/P&gt;&lt;P&gt;Z-ordering is a very interesting optimization technique but one should check what the best ordering could be.  So depending on the case this can be interesting or not.&lt;/P&gt;&lt;P&gt;Auto-optimize does not apply z-ordering.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize" alt="https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize" target="_blank"&gt;https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 01 Dec 2021 10:13:41 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2021-12-01T10:13:41Z</dc:date>
    <item>
      <title>delta table autooptimize vs optimize command</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34128#M24909</link>
      <description>&lt;P&gt;HI,&lt;/P&gt;&lt;P&gt;i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3.  there are only write/read operation on delta tables and no update/delete.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As part of release pipeline, below commands are executed in a new notebook in workspace on a new cluster&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;')
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;all my application jobs are triggered on different notebook and different cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Question:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is above autoOptimize is sufficient to have optimize on all the delta tables OR i should periodically run Optimize {tableName} for each table.&lt;/LI&gt;&lt;LI&gt;Is there way to verify if autoOptimize is working or not, since when i execute query on my delta table, it gives suggestion to run Optimize&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Wed, 01 Dec 2021 06:03:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34128#M24909</guid>
      <dc:creator>guruv</dc:creator>
      <dc:date>2021-12-01T06:03:59Z</dc:date>
    </item>
    <item>
      <title>Re: delta table autooptimize vs optimize command</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34130#M24911</link>
      <description>&lt;P&gt;the auto optimize is sufficient, unless you run into performance issues.&lt;/P&gt;&lt;P&gt;Then I would trigger an optimize.  This will generate files of 1GB (so larger than the standard size of auto optimize).  And of course the Z-Order if necessary.&lt;/P&gt;&lt;P&gt;The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.&lt;/P&gt;&lt;P&gt;Z-ordering is a very interesting optimization technique but one should check what the best ordering could be.  So depending on the case this can be interesting or not.&lt;/P&gt;&lt;P&gt;Auto-optimize does not apply z-ordering.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize" alt="https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize" target="_blank"&gt;https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 10:13:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34130#M24911</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-12-01T10:13:41Z</dc:date>
    </item>
    <item>
      <title>Re: delta table autooptimize vs optimize command</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34131#M24912</link>
      <description>&lt;P&gt;Thanks for confirmation.&lt;/P&gt;&lt;P&gt;Is there way to verify autoOptimize is actually doing optimize? I&lt;/P&gt;&lt;P&gt;i was thinking Descripe History {tableName} will be showing some operation for autoOptimize running. But in my case all the delta tables are showing only 1 day of history (we have not set anything exlicitly) and in that there is only "Write" operation.  &lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 21:32:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34131#M24912</guid>
      <dc:creator>guruv</dc:creator>
      <dc:date>2021-12-01T21:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: delta table autooptimize vs optimize command</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34132#M24913</link>
      <description>&lt;P&gt;the optimize runs while writing so it is not shown in the describe .&lt;/P&gt;&lt;P&gt;This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)&lt;/P&gt;</description>
      <pubDate>Fri, 03 Dec 2021 09:24:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34132#M24913</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-12-03T09:24:10Z</dc:date>
    </item>
    <item>
      <title>Re: delta table autooptimize vs optimize command</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34133#M24914</link>
      <description>&lt;P&gt;hi @guruv​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@Werner Stinckens​&amp;nbsp; is correct. Auto optimize will try to create files of 128 MB within each partition. On the other hand,  explicit optimize will compress more and create files of 1 GB each (default value). You can customize the default value according to your use case.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Dec 2021 00:40:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-autooptimize-vs-optimize-command/m-p/34133#M24914</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-12-07T00:40:31Z</dc:date>
    </item>
  </channel>
</rss>

