<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks autoloader with manual file delete? in Data Governance</title>
    <link>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/148529#M2761</link>
    <description>&lt;P&gt;While we evaluate moving our many autoloader configurations to use `&lt;SPAN&gt;cloudFiles.cleanSource` , we're wondering if we can instead just implement a lifecycle policy&amp;nbsp;&lt;EM&gt;outside of Databricks&lt;/EM&gt; that deletes files older than 30 days.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there a problem with doing this? (For example, if the azure storage account lifecycle policy deletes files older than 30 days while autoloader is running, is that a problem?)&amp;nbsp; (Our autoloader configuration is in directory full listing mode and we have thousands of files a day coming in and only just realized how long autoloader is spending reading over files it has already processed, not to mention the storage costs we're paying).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We're trying to migrate to use the cleanSource option, but in the meantime, it is much faster for us to implement a lifecycle policy across all of our storage accounts. We're wondering if that is a viable solution while we work on migrating to the built-in cleanSource capability.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you~&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Nathan&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 16 Feb 2026 16:20:28 GMT</pubDate>
    <dc:creator>ctech932</dc:creator>
    <dc:date>2026-02-16T16:20:28Z</dc:date>
    <item>
      <title>Databricks autoloader with manual file delete?</title>
      <link>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/148529#M2761</link>
      <description>&lt;P&gt;While we evaluate moving our many autoloader configurations to use `&lt;SPAN&gt;cloudFiles.cleanSource` , we're wondering if we can instead just implement a lifecycle policy&amp;nbsp;&lt;EM&gt;outside of Databricks&lt;/EM&gt; that deletes files older than 30 days.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there a problem with doing this? (For example, if the azure storage account lifecycle policy deletes files older than 30 days while autoloader is running, is that a problem?)&amp;nbsp; (Our autoloader configuration is in directory full listing mode and we have thousands of files a day coming in and only just realized how long autoloader is spending reading over files it has already processed, not to mention the storage costs we're paying).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We're trying to migrate to use the cleanSource option, but in the meantime, it is much faster for us to implement a lifecycle policy across all of our storage accounts. We're wondering if that is a viable solution while we work on migrating to the built-in cleanSource capability.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you~&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Nathan&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Feb 2026 16:20:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/148529#M2761</guid>
      <dc:creator>ctech932</dc:creator>
      <dc:date>2026-02-16T16:20:28Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks autoloader with manual file delete?</title>
      <link>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/148562#M2762</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/177679"&gt;@ctech932&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Short answer - y&lt;SPAN&gt;es, you can use an Azure Storage lifecycle policy to delete files older than 30 days.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;In&lt;STRONG&gt; directory listing mode&lt;/STRONG&gt;, Auto Loader works like this:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Lists files in the directory&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Filters out already-processed files using the checkpoint state&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Processes new files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Updates checkpoint&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;So the real question is - c&lt;SPAN&gt;an you guarantee that all files are processed well within 30 days?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;If yes - lifecycle policy is safe.&lt;BR /&gt;If no - you risk silent data loss.&lt;/P&gt;&lt;P&gt;For instance if your processing job won't run for 30 days for some reason then your Azure Policy will archive the files that should be loaded. But if you actively monitor your production jobs and you're sure that this won't happen then sure, you can use azure policy instead of auto loader cleanup.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Feb 2026 06:49:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/148562#M2762</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-02-17T06:49:43Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks autoloader with manual file delete?</title>
      <link>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/149187#M2764</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;why are you not planning to move away from directory listing mode to&amp;nbsp;&lt;SPAN&gt;useManagedFileEvents?? execution will be faster and no more scanning of directories everytime.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;File events use a single Azure Databricks-managed file notification queue for all streams that process files from any given external location defined in Unity Catalog. It is a managed service offered by Databricks on top of external locations in Unity Catalog. It provides the following benefits&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Makes it easier to set up file notifications for Auto Loader. Specifically, it enables incremental file discovery with notification-like performance in Auto Loader.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;Improves the efficiency and capacity of file arrival triggers for jobs.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This solves&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Pain points around long startup times for DLT and Auto Loader&lt;/LI&gt;&lt;LI&gt;Enable Auto Loader to read from shared Unity Catalog locations&lt;/LI&gt;&lt;LI&gt;Scale file arrival triggers in jobs to storage of arbitrary size&lt;/LI&gt;&lt;LI&gt;Run many streams reading from the same storage&lt;/LI&gt;&lt;LI&gt;No longer need to create a queue for each Auto Loader stream, it's easier to avoid hitting the cloud provider notification limit&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;When first enabling the file even notification on an external location, Databricks will do a full scan on the external location to discover all files that exists and store it in the managed file events service. Notifications that are consumed and stored by the managed file events service or automatically removed from the cloud queue. The notifications will be stored in the managed service for 7 days. If autoloader does not consume the notifications within that period, it will fallback to directory listing to guarantee completeness.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 14:55:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/databricks-autoloader-with-manual-file-delete/m-p/149187#M2764</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-02-24T14:55:08Z</dc:date>
    </item>
  </channel>
</rss>

