Delta Live Tables: control microbatch size

skolukmar
New Contributor

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?

I am thinking about a solution used by spark structured streaming that enables control of batch size using:

.option("maxBytesPerTrigger", 104857600)
.option("maxFilesPerTrigger", 100) 

Is any similar option applicable?

Retired_mod
Esteemed Contributor III

Hi @skolukmar, Yes, you can control the size of microbatches in Delta Live Tables on Databricks using options similar to Spark Structured Streaming. You can use **`maxBytesPerTrigger`** to limit the data processed per microbatch by setting a maximum byte size, and **`maxFilesPerTrigger`** to limit the number of files considered in each trigger. For example, `.option("maxBytesPerTrigger", 104857600)` sets a 100 MB limit per microbatch, while `.option("maxFilesPerTrigger", 100)` restricts it to 100 files. These settings help manage workload and optimize pipeline performance. Is there anything specific you’re trying to achieve with these settings? Maybe I can help further!

lprevost
Contributor III

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide   note the comments there about arrow batch size params.