cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Tables: control microbatch size

skolukmar
New Contributor

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?

I am thinking about a solution used by spark structured streaming that enables control of batch size using:

.option("maxBytesPerTrigger", 104857600)
.option("maxFilesPerTrigger", 100) 

Is any similar option applicable?

2 REPLIES 2

Retired_mod
Esteemed Contributor III

Hi @skolukmar, Yes, you can control the size of microbatches in Delta Live Tables on Databricks using options similar to Spark Structured Streaming. You can use **`maxBytesPerTrigger`** to limit the data processed per microbatch by setting a maximum byte size, and **`maxFilesPerTrigger`** to limit the number of files considered in each trigger. For example, `.option("maxBytesPerTrigger", 104857600)` sets a 100 MB limit per microbatch, while `.option("maxFilesPerTrigger", 100)` restricts it to 100 files. These settings help manage workload and optimize pipeline performance. Is there anything specific youโ€™re trying to achieve with these settings? Maybe I can help further!

lprevost
Contributor II

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide   note the comments there about arrow batch size params.