<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic OutputMode “complete” unable to replace the entire table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/outputmode-complete-unable-to-replace-the-entire-table/m-p/77061#M35381</link>
    <description>&lt;P&gt;&lt;SPAN&gt;According to the document&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode" target="_blank"&gt;&lt;SPAN&gt;https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;, the “complete” option seems to “replace the entire table with every batch”. However, it is not working in my case.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Here is how I reproduce the issue:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Firstly I prepared a single file in the ADLS named `employee_01.csv`. Then I use the python code to read data from it and generate a table&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;outputMode = 'complete'

default_spark_options = {
    "cloudFiles.format": "csv",
    "delimiter": "\x01",
    "inferSchema": "true"
}

    &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
        name = table_01,
    )
    def create_raw_table():
        path = source_path
        df = (spark.readStream
            .outputMode(outputMode)
            .format("cloudFiles")
            .options(**spark_options)
            .load(path))
        return df
&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I can load the data and create the table successfully&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Then I upload another file in the ADLS and trigger the DLT pipeline again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;However, when the DLT pipeline finished running. The table result seems contains the two running result together&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Do I understanding the `complete` outputMode incorrectly&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 08 Jul 2024 05:02:16 GMT</pubDate>
    <dc:creator>guangyi</dc:creator>
    <dc:date>2024-07-08T05:02:16Z</dc:date>
    <item>
      <title>OutputMode “complete” unable to replace the entire table</title>
      <link>https://community.databricks.com/t5/data-engineering/outputmode-complete-unable-to-replace-the-entire-table/m-p/77061#M35381</link>
      <description>&lt;P&gt;&lt;SPAN&gt;According to the document&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode" target="_blank"&gt;&lt;SPAN&gt;https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;, the “complete” option seems to “replace the entire table with every batch”. However, it is not working in my case.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Here is how I reproduce the issue:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Firstly I prepared a single file in the ADLS named `employee_01.csv`. Then I use the python code to read data from it and generate a table&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;outputMode = 'complete'

default_spark_options = {
    "cloudFiles.format": "csv",
    "delimiter": "\x01",
    "inferSchema": "true"
}

    &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
        name = table_01,
    )
    def create_raw_table():
        path = source_path
        df = (spark.readStream
            .outputMode(outputMode)
            .format("cloudFiles")
            .options(**spark_options)
            .load(path))
        return df
&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I can load the data and create the table successfully&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Then I upload another file in the ADLS and trigger the DLT pipeline again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;However, when the DLT pipeline finished running. The table result seems contains the two running result together&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Do I understanding the `complete` outputMode incorrectly&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2024 05:02:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/outputmode-complete-unable-to-replace-the-entire-table/m-p/77061#M35381</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-07-08T05:02:16Z</dc:date>
    </item>
    <item>
      <title>Re: OutputMode “complete” unable to replace the entire table</title>
      <link>https://community.databricks.com/t5/data-engineering/outputmode-complete-unable-to-replace-the-entire-table/m-p/77077#M35388</link>
      <description>&lt;P&gt;I figure out it already. I cannot find the delete button. Please ignore this post&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2024 06:54:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/outputmode-complete-unable-to-replace-the-entire-table/m-p/77077#M35388</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-07-08T06:54:49Z</dc:date>
    </item>
  </channel>
</rss>

