Databricks Community

skolukmar · ‎08-06-2024

A delta live table pipeline reads a delta table on databricks. Is it possible to limit the size of microbatch during data transformation?

I am thinking about a solution used by spark structured streaming that enables control of batch size using:

.option("maxBytesPerTrigger", 104857600)
.option("maxFilesPerTrigger", 100)

Is any similar option applicable?

lprevost · ‎08-08-2024

One other thought -- if you are considering using pandas_udf api, there is a way to control batch size there:pandas_udf guide note the comments there about arrow batch size params.

Databricks Community

Delta Live Tables: control microbatch size

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs