<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic best practice for optimizedWrites and Optimize in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21210#M14427</link>
    <description>&lt;P&gt;What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 23 Jun 2021 21:16:18 GMT</pubDate>
    <dc:creator>User16783853501</dc:creator>
    <dc:date>2021-06-23T21:16:18Z</dc:date>
    <item>
      <title>best practice for optimizedWrites and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21210#M14427</link>
      <description>&lt;P&gt;What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 21:16:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21210#M14427</guid>
      <dc:creator>User16783853501</dc:creator>
      <dc:date>2021-06-23T21:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: best practice for optimizedWrites and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21211#M14428</link>
      <description>&lt;P&gt;A better way I can think of is-&lt;/P&gt;&lt;P&gt;Enable auto optimize (it will automatically create a file of 128 mb)&lt;/P&gt;&lt;P&gt;Enable Auto compact.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;delta.autoOptimize.optimizeWrite = true
 delta.autoOptimize.autoCompact = true &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;​complete guide-&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize" target="test_blank"&gt;https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize&lt;/A&gt;&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2021 01:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21211#M14428</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-24T01:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: best practice for optimizedWrites and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21212#M14429</link>
      <description>&lt;P&gt;As kunal mentioned, delta.autoOptimize.optimizeWrite aims to create 128 mb files. If you have very high write throughput, and need low latency inserts, perhaps disable autoCompact by setting  "delta.autoOptimize.autoCompact = false". &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This pattern is convenient if you have the table partitioned by day and an append heavy pipeline - you could run a manual optimize and specify filter condition to exclude current day to reduce write conflicts &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2021 03:35:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21212#M14429</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-24T03:35:30Z</dc:date>
    </item>
    <item>
      <title>Re: best practice for optimizedWrites and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21213#M14430</link>
      <description>&lt;P&gt;The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-compaction will also introduce latency in the write - specifically in the commit operation.  So running an optimize command on a daily basis is a general practice in use. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2021 05:21:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/21213#M14430</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-24T05:21:44Z</dc:date>
    </item>
    <item>
      <title>Re: best practice for optimizedWrites and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/119786#M45972</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;Can anyone who has solved this challenge confirm if the below increases write latency and avoids creating smaller file, based a POC I did, I dont see that behaviour replicable, so I am just wondering. Many thanks.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 15:36:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-for-optimizedwrites-and-optimize/m-p/119786#M45972</guid>
      <dc:creator>rajkve</dc:creator>
      <dc:date>2025-05-20T15:36:31Z</dc:date>
    </item>
  </channel>
</rss>

