<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic maxFilesPerTrigger not working in bronze to silver layer in missing-QuestionPost</title>
    <link>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3124#M49</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using Matillion architecture where autoloader picks files from AWS S3 and saves in delta lake. Next layer picks the changes from delta lake and does some processing.  I am able to set batch size in autoloader and its working. But in bronze to silver layer, unable to set batch limit, its picking all files in one go.  Here is my code from bronze to silver layer..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(spark.readStream.format("delta")&lt;/P&gt;&lt;P&gt;.option("useNotification","true")&lt;/P&gt;&lt;P&gt;.option("includeExistingFiles","true")&lt;/P&gt;&lt;P&gt;.option("allowOverwrites",True)&lt;/P&gt;&lt;P&gt;.option("ignoreMissingFiles",True)&lt;/P&gt;&lt;P&gt;.option("maxFilesPerTrigger", 100)&lt;/P&gt;&lt;P&gt;.load(bronze_path)&lt;/P&gt;&lt;P&gt;.writeStream&lt;/P&gt;&lt;P&gt;.option("checkpointLocation", silver_checkpoint_path)&lt;/P&gt;&lt;P&gt;.trigger(processingTime="1 minute")&lt;/P&gt;&lt;P&gt;.foreachBatch(foreachBatchFunction)&lt;/P&gt;&lt;P&gt;.start()&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Appreciate any help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Sanjay&lt;/P&gt;</description>
    <pubDate>Wed, 14 Jun 2023 09:40:04 GMT</pubDate>
    <dc:creator>sanjay</dc:creator>
    <dc:date>2023-06-14T09:40:04Z</dc:date>
    <item>
      <title>maxFilesPerTrigger not working in bronze to silver layer</title>
      <link>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3124#M49</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using Matillion architecture where autoloader picks files from AWS S3 and saves in delta lake. Next layer picks the changes from delta lake and does some processing.  I am able to set batch size in autoloader and its working. But in bronze to silver layer, unable to set batch limit, its picking all files in one go.  Here is my code from bronze to silver layer..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(spark.readStream.format("delta")&lt;/P&gt;&lt;P&gt;.option("useNotification","true")&lt;/P&gt;&lt;P&gt;.option("includeExistingFiles","true")&lt;/P&gt;&lt;P&gt;.option("allowOverwrites",True)&lt;/P&gt;&lt;P&gt;.option("ignoreMissingFiles",True)&lt;/P&gt;&lt;P&gt;.option("maxFilesPerTrigger", 100)&lt;/P&gt;&lt;P&gt;.load(bronze_path)&lt;/P&gt;&lt;P&gt;.writeStream&lt;/P&gt;&lt;P&gt;.option("checkpointLocation", silver_checkpoint_path)&lt;/P&gt;&lt;P&gt;.trigger(processingTime="1 minute")&lt;/P&gt;&lt;P&gt;.foreachBatch(foreachBatchFunction)&lt;/P&gt;&lt;P&gt;.start()&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Appreciate any help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Sanjay&lt;/P&gt;</description>
      <pubDate>Wed, 14 Jun 2023 09:40:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3124#M49</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2023-06-14T09:40:04Z</dc:date>
    </item>
    <item>
      <title>Re: maxFilesPerTrigger not working in bronze to silver layer</title>
      <link>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3125#M50</link>
      <description>&lt;P&gt;Hi @Sanjay Jain​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jun 2023 06:08:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3125#M50</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-16T06:08:31Z</dc:date>
    </item>
    <item>
      <title>Re: maxFilesPerTrigger not working in bronze to silver layer</title>
      <link>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3126#M51</link>
      <description>&lt;P&gt;Hi @Sanjay Jain​&amp;nbsp;, Could you try using a fresh checkpoint location if not already tried? Also, could you please check the logs what is the size of the micro batch it is currently processing?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jun 2023 11:30:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3126#M51</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-06-16T11:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: maxFilesPerTrigger not working in bronze to silver layer</title>
      <link>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3127#M52</link>
      <description>&lt;P&gt;Hi Lakshay,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried with new checkpoint location but still its not working. Its taking whole data in one go and not respecting batch size mentioned in code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Sanjay&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jun 2023 11:49:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/missing-questionpost/maxfilespertrigger-not-working-in-bronze-to-silver-layer/m-p/3127#M52</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2023-06-16T11:49:13Z</dc:date>
    </item>
  </channel>
</rss>

