<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can I run OPTIMIZE on a table if I am streaming to it 24/7? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-run-optimize-on-a-table-if-i-am-streaming-to-it-24-7/m-p/19594#M13151</link>
    <description>&lt;P&gt;If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.&lt;/P&gt;&lt;P&gt;However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases within the streaming, custom logic can be written to perform the optimize as part of the streaming job itself. Maybe every 100 batches perform the OPTIMIZE. &lt;/P&gt;&lt;P&gt;Check here for the list of operations:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/concurrency-control.html#write-conflicts" target="test_blank"&gt;https://docs.databricks.com/delta/concurrency-control.html#write-conflicts&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 25 Jun 2021 21:28:20 GMT</pubDate>
    <dc:creator>brickster_2018</dc:creator>
    <dc:date>2021-06-25T21:28:20Z</dc:date>
    <item>
      <title>How can I run OPTIMIZE on a table if I am streaming to it 24/7?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-run-optimize-on-a-table-if-i-am-streaming-to-it-24-7/m-p/19593#M13150</link>
      <description>&lt;P&gt;I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2021 16:15:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-run-optimize-on-a-table-if-i-am-streaming-to-it-24-7/m-p/19593#M13150</guid>
      <dc:creator>User16826992666</dc:creator>
      <dc:date>2021-06-25T16:15:20Z</dc:date>
    </item>
    <item>
      <title>Re: How can I run OPTIMIZE on a table if I am streaming to it 24/7?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-run-optimize-on-a-table-if-i-am-streaming-to-it-24-7/m-p/19594#M13151</link>
      <description>&lt;P&gt;If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.&lt;/P&gt;&lt;P&gt;However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases within the streaming, custom logic can be written to perform the optimize as part of the streaming job itself. Maybe every 100 batches perform the OPTIMIZE. &lt;/P&gt;&lt;P&gt;Check here for the list of operations:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/concurrency-control.html#write-conflicts" target="test_blank"&gt;https://docs.databricks.com/delta/concurrency-control.html#write-conflicts&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2021 21:28:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-run-optimize-on-a-table-if-i-am-streaming-to-it-24-7/m-p/19594#M13151</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-25T21:28:20Z</dc:date>
    </item>
  </channel>
</rss>

