<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader move file to archive immediately after processing in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/m-p/120699#M46228</link>
    <description>&lt;P&gt;cloudFiles.cleanSource.retentionDuration&lt;/P&gt;&lt;P&gt;Type:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Interval String&lt;/P&gt;&lt;P&gt;Amount of time to wait before processed files become candidates for archival with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;cleanSource. Must be greater than 7 days for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;DELETE. No minimum restriction for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MOVE.&lt;/P&gt;&lt;P&gt;Available in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Databricks Runtime&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;16.4 and above. &lt;STRONG&gt;Default value: 30 days&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;STRONG&gt;Alternative Solutions :&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;1. Use Azure storage lifecycle policy.&lt;/DIV&gt;&lt;DIV&gt;2. Create a databricks jobs with autocleanup(&lt;SPAN&gt;A delta log tracker is required to make sure the files are processed before moving).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;3.&amp;nbsp;Use Azure event grid to trigger a movement operation on file as ingested&amp;nbsp;and an Azure function to listen to the files&amp;nbsp;in the source directory and moves them immediately after autoloader ingestion.&lt;/DIV&gt;&lt;DIV&gt;4. Instead of moving files, we can also use databricks external location where the source folder is mapped to a temp directory, and an Azure storage tiering rule automatically moves the files after autoloader ingestion.&lt;/DIV&gt;</description>
    <pubDate>Mon, 02 Jun 2025 10:00:04 GMT</pubDate>
    <dc:creator>vaibhavs120</dc:creator>
    <dc:date>2025-06-02T10:00:04Z</dc:date>
    <item>
      <title>Autoloader move file to archive immediately after processing</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/m-p/120692#M46224</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We are using autoloader with spark streaming (Databricks: file detection mode) and Want to move files to archive folder from source immediately after processing file. But I cannot reduce retention window beyond 7 days.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Code:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option("cloudFiles.cleanSource", "move")&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option("cloudFiles.cleanSource.moveDestination", archive_path_monthly)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option("cloudFiles.cleanSource.retentionDuration", "interval 1 minutes")&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Do suggest alternate way to achieve the same&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Note: I dont want to do this job via code manualy but I want to configure this with autoloader.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 02 Jun 2025 08:40:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/m-p/120692#M46224</guid>
      <dc:creator>kumar_soneta</dc:creator>
      <dc:date>2025-06-02T08:40:18Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader move file to archive immediately after processing</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/m-p/120699#M46228</link>
      <description>&lt;P&gt;cloudFiles.cleanSource.retentionDuration&lt;/P&gt;&lt;P&gt;Type:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Interval String&lt;/P&gt;&lt;P&gt;Amount of time to wait before processed files become candidates for archival with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;cleanSource. Must be greater than 7 days for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;DELETE. No minimum restriction for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MOVE.&lt;/P&gt;&lt;P&gt;Available in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Databricks Runtime&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;16.4 and above. &lt;STRONG&gt;Default value: 30 days&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;STRONG&gt;Alternative Solutions :&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;1. Use Azure storage lifecycle policy.&lt;/DIV&gt;&lt;DIV&gt;2. Create a databricks jobs with autocleanup(&lt;SPAN&gt;A delta log tracker is required to make sure the files are processed before moving).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;3.&amp;nbsp;Use Azure event grid to trigger a movement operation on file as ingested&amp;nbsp;and an Azure function to listen to the files&amp;nbsp;in the source directory and moves them immediately after autoloader ingestion.&lt;/DIV&gt;&lt;DIV&gt;4. Instead of moving files, we can also use databricks external location where the source folder is mapped to a temp directory, and an Azure storage tiering rule automatically moves the files after autoloader ingestion.&lt;/DIV&gt;</description>
      <pubDate>Mon, 02 Jun 2025 10:00:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/m-p/120699#M46228</guid>
      <dc:creator>vaibhavs120</dc:creator>
      <dc:date>2025-06-02T10:00:04Z</dc:date>
    </item>
  </channel>
</rss>

