<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Tables: control microbatch size in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82420#M36645</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115097"&gt;@skolukmar&lt;/a&gt;,&amp;nbsp;Yes, you can control the size of microbatches in Delta Live Tables on Databricks using options similar to Spark Structured Streaming. You can use **`maxBytesPerTrigger`** to limit the data processed per microbatch by setting a maximum byte size, and **`maxFilesPerTrigger`** to limit the number of files considered in each trigger. For example, `.option("maxBytesPerTrigger", 104857600)` sets a 100 MB limit per microbatch, while `.option("maxFilesPerTrigger", 100)` restricts it to 100 files. These settings help manage workload and optimize pipeline performance.&amp;nbsp;&lt;SPAN&gt;Is there anything specific you’re trying to achieve with these settings? Maybe I can help further!&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 08 Aug 2024 15:37:09 GMT</pubDate>
    <dc:creator>Retired_mod</dc:creator>
    <dc:date>2024-08-08T15:37:09Z</dc:date>
    <item>
      <title>Delta Live Tables: control microbatch size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82118#M36525</link>
      <description>&lt;P&gt;A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?&lt;/P&gt;&lt;P&gt;I am thinking about a solution used by spark structured streaming that enables control of batch size using:&lt;/P&gt;&lt;PRE&gt;.option("maxBytesPerTrigger", 104857600)
.option("maxFilesPerTrigger", 100) &lt;/PRE&gt;&lt;P&gt;Is any similar option applicable?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 05:53:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82118#M36525</guid>
      <dc:creator>skolukmar</dc:creator>
      <dc:date>2024-08-07T05:53:51Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables: control microbatch size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82420#M36645</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115097"&gt;@skolukmar&lt;/a&gt;,&amp;nbsp;Yes, you can control the size of microbatches in Delta Live Tables on Databricks using options similar to Spark Structured Streaming. You can use **`maxBytesPerTrigger`** to limit the data processed per microbatch by setting a maximum byte size, and **`maxFilesPerTrigger`** to limit the number of files considered in each trigger. For example, `.option("maxBytesPerTrigger", 104857600)` sets a 100 MB limit per microbatch, while `.option("maxFilesPerTrigger", 100)` restricts it to 100 files. These settings help manage workload and optimize pipeline performance.&amp;nbsp;&lt;SPAN&gt;Is there anything specific you’re trying to achieve with these settings? Maybe I can help further!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Aug 2024 15:37:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82420#M36645</guid>
      <dc:creator>Retired_mod</dc:creator>
      <dc:date>2024-08-08T15:37:09Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables: control microbatch size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82431#M36648</link>
      <description>&lt;P&gt;One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:&lt;A href="https://docs.databricks.com/en/udf/pandas.html#usage" target="_self"&gt;pandas_udf guide&lt;/A&gt;&amp;nbsp; &amp;nbsp;note the comments there about arrow batch size params.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Aug 2024 17:07:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-control-microbatch-size/m-p/82431#M36648</guid>
      <dc:creator>lprevost</dc:creator>
      <dc:date>2024-08-08T17:07:41Z</dc:date>
    </item>
  </channel>
</rss>

