<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Autoloader File Notification Not Working As Expected in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70115#M33999</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90818"&gt;@Sambit_S&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;File notification would only impact any new arriving files, so unless you have a backfill interval to load any existing data it will not load existing data. You could also backfill initially by using directory mode and then switch to file notification for new arriving data. This can be done with the same checkpoint.&lt;/P&gt;
&lt;P&gt;From the logs you shared it shows that there are no outstanding bytes or files, so this could indicate that the existing data is not being loaded.&lt;/P&gt;
&lt;P&gt;Let me know if it helps.&lt;/P&gt;</description>
    <pubDate>Tue, 21 May 2024 12:58:53 GMT</pubDate>
    <dc:creator>matthew_m</dc:creator>
    <dc:date>2024-05-21T12:58:53Z</dc:date>
    <item>
      <title>Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/69954#M33938</link>
      <description>&lt;P&gt;Hello Everyone,&lt;/P&gt;&lt;P&gt;In my project I am using databricks autoloader to&amp;nbsp;&lt;SPAN&gt;incrementally and efficiently processes new data files as they arrive in cloud storage.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am using&amp;nbsp;&lt;/SPAN&gt;file notification mode with event grid and queue service setup in azure storage account &lt;SPAN&gt;that subscribes to file events from the input directory.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My file size is 65KB and I have received 3million file events.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My code is as below to process the files.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;raw_payload_df = (&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; spark.readStream&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .format("cloudFiles")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option("cloudFiles.useNotifications", "true")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option("cloudFiles.queueName", queue_name)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option("cloudFiles.connectionString", queue_conn_string)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option("cloudFiles.fetchParallelism", 10)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option('cloudFiles.format', 'json')&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option('cloudFiles.schemaLocation', schema_path)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; #.option('cloudFiles.maxFilesPerTrigger', 50000)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option('cloudFiles.maxBytesPerTrigger', 10g)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .option('multiline', 'true')&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .load(pl_path)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .withColumn("FilePath",input_file_name())&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .withColumn("AppId", lit(app_id))&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .withColumn("SchemaId", lit(schema_id))&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .withColumn("SchemaVersion", lit(schema_version))&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; .withColumn("Priority", lit(priority))&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;payloadCompressStream = obs_df.writeStream.foreachBatch(forEachBatch) \&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .trigger(availableNow=True) \&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .option('checkpointLocation', checkpoint_path) \&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .start()&lt;/STRONG&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;When I used both the option&amp;nbsp;&lt;STRONG&gt;cloudFiles.maxFilesPerTrigger set to 50000, &lt;/STRONG&gt;I could see it is only triggering 2000 to 5000 files per batch.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Same is also true when I used the option&amp;nbsp;&lt;STRONG&gt;cloudFiles.maxBytesPerTrigger set to 10g.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;PRE&gt;&lt;STRONG&gt;One record is one json file here and I monitor the numInputRows attribute for each batch to check how many files are getting processed in a batch.&lt;/STRONG&gt;&lt;BR /&gt;{
  "id" : "d49e1e9c-ac58-4da0-8ccb-9aa1790a7e40",
  "runId" : "746a885c-c2d8-4ec8-91c6-6c7126c85e64",
  "name" : null,
  "timestamp" : "2024-05-16T09:36:49.295Z",
  "batchId" : 1319,
  "batchDuration" : 3349,
  "numInputRows" : 952,
  "inputRowsPerSecond" : 161.19200812732814,
  "processedRowsPerSecond" : 284.2639593908629,
  "durationMs" : {
    "addBatch" : 2627,
    "commitOffsets" : 153,
    "getBatch" : 42,
    "latestOffset" : 352,
    "queryPlanning" : 9,
    "triggerExecution" : 3349,
    "walCommit" : 147
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "CloudFilesSource[abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/]",
    "startOffset" : {
      "seqNum" : 7835887,
      "sourceVersion" : 1,
      "lastBackfillStartTimeMs" : 1715782085477,
      "lastBackfillFinishTimeMs" : 1715782478832,
      "lastInputPath" : "abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/"
    },
    "endOffset" : {
      "seqNum" : 7837532,
      "sourceVersion" : 1,
      "lastBackfillStartTimeMs" : 1715782085477,
      "lastBackfillFinishTimeMs" : 1715782478832,
      "lastInputPath" : "abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/"
    },
    "latestOffset" : null,
    "numInputRows" : &lt;FONT color="#993366"&gt;&lt;STRONG&gt;952&lt;/STRONG&gt;&lt;/FONT&gt;,
    "inputRowsPerSecond" : 161.19200812732814,
    "processedRowsPerSecond" : 284.2639593908629,
    "metrics" : {
      "approximateQueueSize" : "0",
      "numBytesOutstanding" : "0",
      "numFilesOutstanding" : "0"
    }
  } ],
  "sink" : {
    "description" : "ForeachBatchSink",
    "numOutputRows" : -1
  },
  "observedMetrics" : {
    "15012_5f540a60-11bb-4db9-9246-14bb346f1ad2_compress" : {
      "PipelineRunID" : "fabf2d53-29d6-4ef6-ab7d-665f99227c9c",
      "BatchStatus" : "Fail",
      "AppId" : "5f540a60-11bb-4db9-9246-14bb346f1ad2",
      "SchemaId" : 15012,
      "SchemaVersion" : 1,
      "Priority" : 10,
      "InputRecordCount" : 952,
      "OutputRecordCount" : 952
    }
  }
}&lt;/PRE&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Cluster Configuration&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;Driver: Standard_F32s_v2 · Workers: Standard_F16s_v2 · 30-50 workers · 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2024 10:57:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/69954#M33938</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-20T10:57:13Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70099#M33986</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;A class="" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9" target="_self"&gt;&lt;SPAN class=""&gt;Kaniz,&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;I have read through all your points from the official documentation page.&lt;/P&gt;&lt;P&gt;Can you let me know how can I achieve below?&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I have 3 million json files each of 65KB, so total size is 186GB. It is in adls gen2 storage container and the queue contains all the event notifications&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sambit_S_2-1716288865124.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/7776i5E0EF1ECC6AA958C/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Sambit_S_2-1716288865124.png" alt="Sambit_S_2-1716288865124.png" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;I want to process 10GB data files per micro batch of autoloader streaming with file event notification set to true. SO total batch would come around 20.&lt;/LI&gt;&lt;LI&gt;What configuration cluster I have to use and what other config I have to set other than&amp;nbsp;&lt;STRONG&gt;cloudFiles.maxBytesPerTrigger to 10g.&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT color="#000000"&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;The readstream code is as below.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;UL&gt;&lt;LI&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sambit_S_1-1716288754052.png"&gt;&lt;img /&gt;&lt;/span&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sambit_S_1-1716288754052.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/7775i78375FB9B41AB976/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Sambit_S_1-1716288754052.png" alt="Sambit_S_1-1716288754052.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 21 May 2024 10:54:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70099#M33986</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-21T10:54:57Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70115#M33999</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90818"&gt;@Sambit_S&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;File notification would only impact any new arriving files, so unless you have a backfill interval to load any existing data it will not load existing data. You could also backfill initially by using directory mode and then switch to file notification for new arriving data. This can be done with the same checkpoint.&lt;/P&gt;
&lt;P&gt;From the logs you shared it shows that there are no outstanding bytes or files, so this could indicate that the existing data is not being loaded.&lt;/P&gt;
&lt;P&gt;Let me know if it helps.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 12:58:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70115#M33999</guid>
      <dc:creator>matthew_m</dc:creator>
      <dc:date>2024-05-21T12:58:53Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70131#M34009</link>
      <description>&lt;P&gt;&lt;SPAN&gt;File notification would only impact any new arriving files&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Yes, I have all the 3 million files as newly arriving files as I generate synthetic data files for performance testing.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;From the logs you shared it shows that there are no outstanding bytes or files.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I think it was the last batch of the stream but if you can see below metric&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;where&amp;nbsp;"numBytesOutstanding" : "2605258897", "numFilesOutstanding" : "60310" are wrong as per my assumption, because when it runs next batch the numbers increase/decrease.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;{
  "id" : "d49e1e9c-ac58-4da0-8ccb-9aa1790a7e40",
  "runId" : "6d3ac39b-42c1-4bd8-99c7-8477817b278e",
  "name" : null,
  "timestamp" : "2024-05-21T14:16:10.155Z",
  "batchId" : 1478,
  "batchDuration" : 35581,
  "numInputRows" : 998,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 28.04867766504595,
  "durationMs" : {
    "addBatch" : 25634,
    "commitOffsets" : 202,
    "getBatch" : 246,
    "latestOffset" : 2295,
    "queryPlanning" : 5687,
    "triggerExecution" : 35568,
    "walCommit" : 180
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "CloudFilesSource[abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/]",
    "startOffset" : {
      "seqNum" : 18875879,
      "sourceVersion" : 1,
      "lastBackfillStartTimeMs" : 1715782085477,
      "lastBackfillFinishTimeMs" : 1715782478832,
      "lastInputPath" : "abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/"
    },
    "endOffset" : {
      "seqNum" : 18877451,
      "sourceVersion" : 1,
      "lastBackfillStartTimeMs" : 1715782085477,
      "lastBackfillFinishTimeMs" : 1715782478832,
      "lastInputPath" : "abfss://5f540a60-11bb-4db9-9246-14bb346f1ad2@dtmsplztestscudlsdvc001.dfs.core.windows.net/data/15012/1/"
    },
    "latestOffset" : null,
    "numInputRows" : 998,
    "inputRowsPerSecond" : 0.0,
    "processedRowsPerSecond" : 28.04867766504595,
    "metrics" : {
      "approximateQueueSize" : "5869492",
      "numBytesOutstanding" : "2605258897",
      "numFilesOutstanding" : "60310"
    }
  } ],
  "sink" : {
    "description" : "ForeachBatchSink",
    "numOutputRows" : -1
  },
  "observedMetrics" : {
    "15012_5f540a60-11bb-4db9-9246-14bb346f1ad2_compress" : {
      "PipelineRunID" : "5ba763c1-4e6f-4b8e-8555-9f42935fd6a3",
      "BatchStatus" : "Fail",
      "AppId" : "5f540a60-11bb-4db9-9246-14bb346f1ad2",
      "SchemaId" : 15012,
      "SchemaVersion" : 1,
      "Priority" : 10,
      "InputRecordCount" : 998,
      "OutputRecordCount" : 998
    }
  }
}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 14:20:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70131#M34009</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-21T14:20:44Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70133#M34011</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;A class="" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/32098" target="_self"&gt;matthew_m,&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;Please check my comments below and let me know if you find something&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 14:34:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70133#M34011</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-21T14:34:13Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70136#M34013</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90818"&gt;@Sambit_S&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;From that last log I can see that it ingested 998 files, which is close to the expected default max, which appears to be set in your code. maxBytesPerTrigger and maxFilesPerTrigger work together when defining the soft upper bound max.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 14:42:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70136#M34013</guid>
      <dc:creator>matthew_m</dc:creator>
      <dc:date>2024-05-21T14:42:47Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70139#M34014</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;A class="" href="https://community.databricks.com/t5/user/viewprofilepage/user-id/32098" target="_self"&gt;matthew_m&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I have only set&amp;nbsp;&lt;SPAN&gt;maxBytesPerTrigger to 10g and if you see for the next batch metric it is something like below.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sambit_S_0-1716303042890.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/7781i6FAE9E4E1E6BA407/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Sambit_S_0-1716303042890.png" alt="Sambit_S_0-1716303042890.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2024 14:51:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70139#M34014</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-21T14:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70236#M34034</link>
      <description>&lt;P&gt;Found some documentation on Azure Queue throughput which might be the reason.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Sambit_S_0-1716372928961.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/7794i4F8C164058130D0B/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Sambit_S_0-1716372928961.png" alt="Sambit_S_0-1716372928961.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Any idea how can we maximize the throughput? Is there some spark autoloader configuration which can help increasing the streaming batch size?&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2024 10:17:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70236#M34034</guid>
      <dc:creator>Sambit_S</dc:creator>
      <dc:date>2024-05-22T10:17:10Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Autoloader File Notification Not Working As Expected</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70265#M34041</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90818"&gt;@Sambit_S&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;I misread inputRows as inputFiles which aren't the same thing. Considering the limitation on Azure queue, if you are already at the limit then you may need to consider to switching to an event source such as Kafka or Event Hub to get better ingestion performance.&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2024 12:23:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-autoloader-file-notification-not-working-as-expected/m-p/70265#M34041</guid>
      <dc:creator>matthew_m</dc:creator>
      <dc:date>2024-05-22T12:23:11Z</dc:date>
    </item>
  </channel>
</rss>

