<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta Streaming and Optimize in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-streaming-and-optimize/m-p/18908#M12606</link>
    <description>&lt;P&gt;I have a master delta table that is continuously getting written by a streaming job. I have optimize writes enabled and in addition, I run the OPTIMIZE command every 3 hours. &lt;/P&gt;&lt;P&gt;However, I think the downstream streaming jobs which are streaming the data from the master delta table do not get any benefit from the OPTIMIZE job. I tried stopping the downstream job and started it after completing the OPTIMIZE command, still, the downstream job is reading the un-optimized files &lt;/P&gt;</description>
    <pubDate>Fri, 25 Jun 2021 19:24:56 GMT</pubDate>
    <dc:creator>brickster_2018</dc:creator>
    <dc:date>2021-06-25T19:24:56Z</dc:date>
    <item>
      <title>Delta Streaming and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-streaming-and-optimize/m-p/18908#M12606</link>
      <description>&lt;P&gt;I have a master delta table that is continuously getting written by a streaming job. I have optimize writes enabled and in addition, I run the OPTIMIZE command every 3 hours. &lt;/P&gt;&lt;P&gt;However, I think the downstream streaming jobs which are streaming the data from the master delta table do not get any benefit from the OPTIMIZE job. I tried stopping the downstream job and started it after completing the OPTIMIZE command, still, the downstream job is reading the un-optimized files &lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2021 19:24:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-streaming-and-optimize/m-p/18908#M12606</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-25T19:24:56Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Streaming and Optimize</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-streaming-and-optimize/m-p/18909#M12607</link>
      <description>&lt;P&gt;This is working as expected. For Delta streaming, the data files created in the first place will be used for streaming.  The optimized files are not considered the downstream streaming job. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is the reason it's not recommended to run VACUUM with fewer retention hours&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2021 19:26:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-streaming-and-optimize/m-p/18909#M12607</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-25T19:26:24Z</dc:date>
    </item>
  </channel>
</rss>

