<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/97606#M39512</link>
    <description>&lt;P&gt;Dear,&lt;/P&gt;&lt;P&gt;I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up new files landing in the storage account. I have checked the Queue Storage and it stays empty. However, when I manually add a file, a message is added to the queue and the file is processed as expected.&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some digging I found out the external system writing the files to the storage account was written these files as a stream (when I inspect the properties of the files written by the external system, I see "application/octet-stream" as CONTENT-TYPE whereas when I manually add a file I see "application/json"). This event type is not matched by default by the event subscription created by Databricks.&lt;/P&gt;&lt;P data-unlink="true"&gt;I tried to add it to the advanced filters of the event subscription (with key pair data.api: CreateFile). This generates messages in the queue but because&amp;nbsp;&lt;A href="http://%20the Microsoft.Storage.BlobCreated event is triggered when the CopyBlob operation is initiated and not when the Block Blob is completely committed" target="_self"&gt;the&amp;nbsp;Microsoft.Storage.BlobCreated&amp;nbsp;event is triggered when the&amp;nbsp;CopyBlob&amp;nbsp;operation is&amp;nbsp;initiated&amp;nbsp;and not when the Block Blob is completely committed&lt;/A&gt; and &lt;A href="https://learn.microsoft.com/en-us/rest/api/storageservices/create-file" target="_self"&gt;the Create File API call&amp;nbsp; first initiates files and then content is added to the file&lt;/A&gt;, the contentLength parameter of the corresponding message in the queue is set to 0 and Auto Loader considers the file to be empty, even though it's not.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a solution/work-around or is this a limitation of file notification? Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Mon, 04 Nov 2024 20:03:28 GMT</pubDate>
    <dc:creator>rvo19941</dc:creator>
    <dc:date>2024-11-04T20:03:28Z</dc:date>
    <item>
      <title>Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream</title>
      <link>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/97606#M39512</link>
      <description>&lt;P&gt;Dear,&lt;/P&gt;&lt;P&gt;I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up new files landing in the storage account. I have checked the Queue Storage and it stays empty. However, when I manually add a file, a message is added to the queue and the file is processed as expected.&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some digging I found out the external system writing the files to the storage account was written these files as a stream (when I inspect the properties of the files written by the external system, I see "application/octet-stream" as CONTENT-TYPE whereas when I manually add a file I see "application/json"). This event type is not matched by default by the event subscription created by Databricks.&lt;/P&gt;&lt;P data-unlink="true"&gt;I tried to add it to the advanced filters of the event subscription (with key pair data.api: CreateFile). This generates messages in the queue but because&amp;nbsp;&lt;A href="http://%20the Microsoft.Storage.BlobCreated event is triggered when the CopyBlob operation is initiated and not when the Block Blob is completely committed" target="_self"&gt;the&amp;nbsp;Microsoft.Storage.BlobCreated&amp;nbsp;event is triggered when the&amp;nbsp;CopyBlob&amp;nbsp;operation is&amp;nbsp;initiated&amp;nbsp;and not when the Block Blob is completely committed&lt;/A&gt; and &lt;A href="https://learn.microsoft.com/en-us/rest/api/storageservices/create-file" target="_self"&gt;the Create File API call&amp;nbsp; first initiates files and then content is added to the file&lt;/A&gt;, the contentLength parameter of the corresponding message in the queue is set to 0 and Auto Loader considers the file to be empty, even though it's not.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a solution/work-around or is this a limitation of file notification? Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2024 20:03:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/97606#M39512</guid>
      <dc:creator>rvo19941</dc:creator>
      <dc:date>2024-11-04T20:03:28Z</dc:date>
    </item>
    <item>
      <title>Re: Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream</title>
      <link>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/99333#M39970</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110238"&gt;@rvo19941&lt;/a&gt;&amp;nbsp;-&amp;nbsp; Can you share your autoloder config.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2024 13:31:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/99333#M39970</guid>
      <dc:creator>Panda</dc:creator>
      <dc:date>2024-11-19T13:31:59Z</dc:date>
    </item>
    <item>
      <title>Re: Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream</title>
      <link>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/139321#M51157</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Auto Loader file notification in Databricks relies on Azure Event Grid’s&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;BlobCreated&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;event to trigger notifications for newly created files in Azure Data Lake Gen2. The issue you’re experiencing is a known limitation when files are written via certain methods—such as streamed writes or the Create File API—especially when they use Content-Type&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;application/octet-stream&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and trigger creation events before the file is fully committed.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Issue Explanation&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When files are written with the Create File API or via streaming, the BlobCreated event is triggered&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;as soon as the file is initiated&lt;/STRONG&gt;, not when it is completely written.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;As a result, the corresponding Event Grid message may have&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;contentLength = 0&lt;/CODE&gt;, so Auto Loader sees the file as empty and ignores it.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When files are uploaded manually (e.g., via Azure Portal/Storage Explorer), the event fires after the file is fully committed, the content type is often set to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;application/json&lt;/CODE&gt;, and the file is ingested correctly.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Workarounds and Solutions&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;1.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Poll Mode Instead of File Notification&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Switching Auto Loader to directory listing (poll) mode will periodically scan for files and pick up those that have finished writing, regardless of the initial event trigger or content type. This can be less real-time but is more robust with respect to such file commit timing issues.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;2.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Change External System's Write Method&lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If possible, update the external system to upload files in a single operation or set the appropriate content type (&lt;CODE&gt;application/json&lt;/CODE&gt;), ensuring that BlobCreated events are only fired after the full file is committed.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Alternatively, the system could upload to a temporary location, then move the fully written file into the target directory when complete.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;3.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Event Subscription Advanced Filters&lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Your workaround to filter on additional event details (e.g.,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;data.api: CreateFile&lt;/CODE&gt;) helps to catch more events but does not resolve the core issue, since Event Grid may still fire events for empty/partially committed files.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;No direct configuration on the Event Grid side can guarantee that only fully committed, non-empty files trigger an event.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;4.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Post-Processing Validation&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If notification mode is required, you might need to build a post-processing validation in your pipeline. For example, before ingesting files, validate their size/content to avoid processing empty files created by incomplete writes.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;5.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;File Locking or Marker Files&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Implement a marker file strategy: the external system writes a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.tmp&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;file or appends a special suffix, then renames or moves the file once the write is complete. Auto Loader can be configured to process only files without the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.tmp&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;suffix or marker.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Limitations&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This is primarily a limitation of Azure’s event generation logic and how the storage API triggers these events, not Databricks Auto Loader itself. Some updates to Azure Event Grid and Auto Loader are in progress to improve this scenario, but no instant fix currently exists.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 12:00:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/139321#M51157</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-11-17T12:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream</title>
      <link>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/149321#M53073</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/82205"&gt;@mark_ott&lt;/a&gt;&amp;nbsp;Any news regarding "&lt;SPAN&gt;updates to Azure Event Grid and Auto Loader are in progress to improve this scenario"?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Feb 2026 23:44:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/auto-loader-file-notification-mode-not-working-with-adls-gen2/m-p/149321#M53073</guid>
      <dc:creator>awhorton</dc:creator>
      <dc:date>2026-02-25T23:44:09Z</dc:date>
    </item>
  </channel>
</rss>

