<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader with file notification mode sleeps for 5000ms multiple times in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101591#M40737</link>
    <description>&lt;P&gt;You can find it in here:&lt;BR /&gt;&lt;A href="https://docs.azure.cn/en-us/databricks/spark/latest/structured-streaming/aqs" target="_blank"&gt;https://docs.azure.cn/en-us/databricks/spark/latest/structured-streaming/aqs&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 10 Dec 2024 12:44:45 GMT</pubDate>
    <dc:creator>VZLA</dc:creator>
    <dc:date>2024-12-10T12:44:45Z</dc:date>
    <item>
      <title>Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101497#M40692</link>
      <description>&lt;P&gt;Using DBR 15.4, i'm ingesting streaming data from adls using autoloader with file notification mode enabled. This is an older code which is using foreachbatch sink to process the data before merging with tables in delta lake.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Issue&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Streaming job, is using available now trigger, but rather than processing the data in one go, it sleeps for 5000ms multiple times before closing the stream.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AbdulMannan_0-1733760650416.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13408i7B14F6916771BB32/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AbdulMannan_0-1733760650416.png" alt="AbdulMannan_0-1733760650416.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Expectation&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;With available now trigger, it should process the available data and then close the stream, rather than waiting for 5000ms multiple times (5 to 6 times) which is creating undesired execution delay.&lt;/P&gt;&lt;P&gt;Here's the autoloader options used with streaming job:&lt;/P&gt;&lt;PRE&gt;{
    'cloudFiles.format': 'json', 
    'cloudFiles.includeExistingFiles': 'false', 
    'cloudFiles.maxFilesPerTrigger': 1000, 
    'cloudFiles.maxBytesPerTrigger': '2g', 
    'cloudFiles.useNotifications': 'true', 
    'cloudFiles.subscriptionId': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
    'cloudFiles.tenantId': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
    'cloudFiles.clientId': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
    'cloudFiles.clientSecret': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
    'cloudFiles.resourceGroup': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
    'cloudFiles.fetchParallelism': 10, 
    'cloudFiles.resourceTag.streaming_job_autoloader_stream_id': 'databricks-event-xxxxxxxxxxxxxxx', 
    'cloudFiles.queueName': 'databricks-event-xxxxxxxxxxxxxxx', 
    'pathGlobfilter': '*.json'
}&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Is this the default behaviour with file notification mode in autoloader?&lt;BR /&gt;Is it possible to customize/remove the delay?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 16:13:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101497#M40692</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-09T16:13:38Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101515#M40706</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/121787"&gt;@Abdul-Mannan&lt;/a&gt;&amp;nbsp;thanks for your question!&lt;/P&gt;
&lt;P&gt;To control the 5000ms default value, you can use the &lt;STRONG&gt;cloudFiles.queueFetchInterval&lt;/STRONG&gt; option. This option allows you to specify the interval at which Auto Loader fetches messages from the queueing service.&lt;/P&gt;
&lt;P&gt;Here is an example of how you can set this option in your Auto Loader configuration:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;df = (spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "json")
      .option("cloudFiles.queueFetchInterval", "500ms")  # Set the desired interval here
      .load("path/to/source"))

df.writeStream
  .format("delta")
  .option("checkpointLocation", "path/to/checkpoint")
  .start("path/to/destination")&lt;/LI-CODE&gt;
&lt;P&gt;In this example, the &lt;CODE&gt;cloudFiles.queueFetchInterval&lt;/CODE&gt; is set to &lt;CODE&gt;500ms&lt;/CODE&gt;, but you can adjust this value to meet your specific requirements. This setting controls how frequently Auto Loader fetches new messages from the queue, which can help in reducing the delay you are experiencing.&lt;/P&gt;
&lt;P&gt;Hope it helps!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 18:17:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101515#M40706</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-09T18:17:51Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101551#M40717</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your reply.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I could not find this option in the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/options" target="_self"&gt;autoloader docs&lt;/A&gt;, where can I find more details on this option?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 07:43:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101551#M40717</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-10T07:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101574#M40728</link>
      <description>&lt;P&gt;Is it possible to close the stream on 1st try when there is no data in queue?&lt;BR /&gt;&lt;BR /&gt;Please suggest if there is a config which can do it.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 09:38:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101574#M40728</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-10T09:38:37Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101577#M40730</link>
      <description>&lt;P&gt;I tried using the option&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;cloudFiles.queueFetchInterval&lt;/LI-CODE&gt;&lt;P&gt;but it is still taking a minute to process the stream even though there is no data.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AbdulMannan_0-1733823590639.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13421i742E1F014EBE433C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AbdulMannan_0-1733823590639.png" alt="AbdulMannan_0-1733823590639.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 09:41:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101577#M40730</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-10T09:41:35Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101591#M40737</link>
      <description>&lt;P&gt;You can find it in here:&lt;BR /&gt;&lt;A href="https://docs.azure.cn/en-us/databricks/spark/latest/structured-streaming/aqs" target="_blank"&gt;https://docs.azure.cn/en-us/databricks/spark/latest/structured-streaming/aqs&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 12:44:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101591#M40737</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-10T12:44:45Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101592#M40738</link>
      <description>&lt;P&gt;Can you please try setting "spark.databricks.cloudFiles.useAsyncFetch true" at the cluster level ?&lt;/P&gt;
&lt;P&gt;I'm not sure, if this will still be applied, but if restarting the cluster is not possible then try via session level config:&lt;BR /&gt;spark.conf.&lt;SPAN class="hljs-built_in"&gt;set&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"spark.databricks.cloudFiles.useAsyncFetch"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"true"&lt;/SPAN&gt;)&lt;/P&gt;
&lt;P&gt;When enabled, Autoloader will use an optimized async client for fetching messages.&amp;nbsp;This allows the FileEventFetcher to interact with the queueing service asynchronously, potentially reducing delays.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 12:49:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101592#M40738</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-10T12:49:35Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101604#M40744</link>
      <description>&lt;P&gt;Unfortunately, I don't think this is possible or configurable. With the "available now" trigger, Auto Loader checks multiple times before closing if it finds no data. Reducing the cloudFiles.queueFetchInterval and enabling async fetch are the main options to minimize the delay.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 13:19:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101604#M40744</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-10T13:19:57Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101625#M40752</link>
      <description>&lt;P&gt;I tried following options&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# with autoloader options
cloudFiles.fetchParallelism = 10
cloudFiles.queueFetchInterval = "500ms"
# setting this at the start of notebook execution
spark.conf.set("spark.databricks.cloudFiles.useAsyncFetch", "true")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;it seems to be stuck and not making any progress.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AbdulMannan_0-1733843250269.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13428i6E0B34FF1C0C1783/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AbdulMannan_0-1733843250269.png" alt="AbdulMannan_0-1733843250269.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;There is no data in ADLS queue for this stream but it was stuck there for more than 40mins then I cancelled the task.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AbdulMannan_1-1733843394562.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13429i38E40586D4226659/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AbdulMannan_1-1733843394562.png" alt="AbdulMannan_1-1733843394562.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;If I disable/not set this property&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.databricks.cloudFiles.useAsyncFetch&lt;/LI-CODE&gt;&lt;P&gt;it processes the stream but still takes a minute even though the queue is empty.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 15:12:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101625#M40752</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-10T15:12:27Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101638#M40754</link>
      <description>&lt;P&gt;The logging behavior appears normal and is influenced by the sync/async property configuration. However, the 40+ minute runtime is unusual and could indicate delays related to producer/consumer states, ADLS queue fetches, or metadata cleanup tasks.&lt;/P&gt;
&lt;P&gt;To investigate further, I recommend raising a support ticket with the Driver logs and Driver Thread Dumps attached for a detailed root cause analysis.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 16:51:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101638#M40754</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-10T16:51:48Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101866#M40861</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp; for your support. I'll proceed with next steps.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 07:52:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101866#M40861</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-12T07:52:32Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101888#M40868</link>
      <description>&lt;P&gt;This documentation gives the impression of being about an old deprecated feature (e.g. the line "The ABS-AQS source is deprecated. For new streams, we recommend using Auto Loader instead."). If these config options are still relevant for autoloader I recommend that you update the auto loader documentation to mention them &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 10:06:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101888#M40868</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2024-12-12T10:06:43Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101920#M40889</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/23894"&gt;@Erik&lt;/a&gt;&amp;nbsp;Absolutely, I agree with respect to updating the autoloader documentation.&lt;/P&gt;
&lt;P&gt;Not on the Azure website though, but on Databrick's you may use the&amp;nbsp;&lt;A href="mailto:doc-feedback@databricks.com" target="_blank"&gt;doc-feedback@databricks.com &lt;/A&gt;which is linked in each documentation section and provide your feedback, which the Documentation team will gladly review and take care of fixing(adding/updating/removing).&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 14:03:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101920#M40889</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-12T14:03:07Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101922#M40891</link>
      <description>&lt;P&gt;Sure, thanks for helping improve our product, looking forward to assisting you through our support channel.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 14:07:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/101922#M40891</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-12T14:07:03Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader with file notification mode sleeps for 5000ms multiple times</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/102108#M40967</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;I just tested it and it seems this autoloader behaviour with available now trigger &amp;amp; file notification enabled, would remain the same with DLT pipeline, it sleeps 7 times each time sleeping for 5000ms before finally closing the stream, even though there is no data in the queue. Each stream takes atleast 1 min even when the queue used for file notification has no data.&lt;/P&gt;&lt;P&gt;Is there any other way to avoid this behaviour?&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 17:11:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-with-file-notification-mode-sleeps-for-5000ms/m-p/102108#M40967</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-12-13T17:11:45Z</dc:date>
    </item>
  </channel>
</rss>

