<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to use cloudFiles to completely overwrite the target in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12231#M7073</link>
    <description>&lt;P&gt;Hi @Brad Sheridan​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
    <pubDate>Wed, 17 Aug 2022 21:03:47 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-08-17T21:03:47Z</dc:date>
    <item>
      <title>How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12223#M7065</link>
      <description>&lt;P&gt;Hey there Community!!  I have a client that will produce a CSV file daily that needs to be moved from Bronze -&amp;gt; Silver. Unfortunately, this source file will always be a full set of data....not incremental.  I was thinking of using AutoLoader/cloudFiles to take advantage of the checkpointLocation and will just do Trigger Once. However, I need to ensure that all of the parquet files in the Silver S3 bucket are completely deleted/overwritten each run.  What is the .option to use in .writeStream to do this?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 13:13:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12223#M7065</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-27T13:13:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12224#M7066</link>
      <description>&lt;P&gt;Is &lt;A href="https://docs.databricks.com/delta/delta-streaming.html#complete-mode" alt="https://docs.databricks.com/delta/delta-streaming.html#complete-mode" target="_blank"&gt;this what you are looking for&lt;/A&gt;?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 13:32:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12224#M7066</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-07-27T13:32:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12225#M7067</link>
      <description>&lt;P&gt;Thanks @Werner Stinckens​...yes, that would work if I was using Delta, but I'm using .writeStream.format('parquet') and get the error "Data source parquet does not support Complete output mode". The reason I'm not using Delta is b/c once the parquet files are written to S3, I would then crawl them with AWS Glue. I guess the alternative is to use Delta as the output, do .outputMode("complete"), and then just create a manifest file for Athena queries and skip the Glue crawler?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 13:54:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12225#M7067</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-27T13:54:55Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12226#M7068</link>
      <description>&lt;P&gt;or use good old batch instead of streaming?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 13:55:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12226#M7068</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-07-27T13:55:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12227#M7069</link>
      <description>&lt;P&gt;yeah, I tried that initially but the issue is that csv1 will be processed into the Silver bucket...all good here.  Then the next day csv2 will land in the same Bronze S3 bucket as csv1, and it will have all the rows from csv1 and possibly some new data or updated data.  Next time the batch runs, it will read both of these files therefore duplicating data in Silver.  So that's why I tried AutoLoader...because it keeps track of which files it has already processed in the source. i'm getting ready to try the manifest file idea now....more soon&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 14:11:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12227#M7069</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-27T14:11:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12228#M7070</link>
      <description>&lt;P&gt;if you have any influence on the name of the incoming file (or the location), you could add a date to the filename or put them in a subdir yyyy/mm/dd.  That is how I organize my bronze.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 14:13:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12228#M7070</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-07-27T14:13:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12229#M7071</link>
      <description>&lt;P&gt;hmmm @Werner Stinckens​....I had even thought about the most obvious/easiest approach.  Love that!  Will keep this thread posted on my outcome&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2022 14:34:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12229#M7071</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-07-27T14:34:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12230#M7072</link>
      <description>&lt;P&gt;I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!).  However, turns out I'm going to end up getting incremental data afterall :).  So now the flow will go like this: Salesforce -&amp;gt; AWS AppFlow -&amp;gt; S3 Bronze -&amp;gt; Databricks DLT w/AutoLoader -&amp;gt; S3 Silver.  thanks again @werners !&lt;/P&gt;</description>
      <pubDate>Fri, 12 Aug 2022 17:44:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12230#M7072</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-08-12T17:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12231#M7073</link>
      <description>&lt;P&gt;Hi @Brad Sheridan​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2022 21:03:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12231#M7073</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-08-17T21:03:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to use cloudFiles to completely overwrite the target</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12232#M7074</link>
      <description>&lt;P&gt;Morning Jose.  I just marked the first answer as best just now.  thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 18 Aug 2022 12:42:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-use-cloudfiles-to-completely-overwrite-the-target/m-p/12232#M7074</guid>
      <dc:creator>BradSheridan</dc:creator>
      <dc:date>2022-08-18T12:42:17Z</dc:date>
    </item>
  </channel>
</rss>

