cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

best practice for optimizedWrites and Optimize

User16783853501
New Contributor II
New Contributor II

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?  

3 REPLIES 3

User16826994223
Honored Contributor III

A better way I can think of is-

Enable auto optimize (it will automatically create a file of 128 mb)

Enable Auto compact.

โ€‹

delta.autoOptimize.optimizeWrite = true
 delta.autoOptimize.autoCompact = true 

โ€‹complete guide-

https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize

โ€‹

โ€‹

โ€‹

โ€‹

โ€‹

sajith_appukutt
Honored Contributor II

As kunal mentioned, delta.autoOptimize.optimizeWrite aims to create 128 mb files. If you have very high write throughput, and need low latency inserts, perhaps disable autoCompact by setting "delta.autoOptimize.autoCompact = false".

This pattern is convenient if you have the table partitioned by day and an append heavy pipeline - you could run a manual optimize and specify filter condition to exclude current day to reduce write conflicts

brickster_2018
Esteemed Contributor
Esteemed Contributor

The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-compaction will also introduce latency in the write - specifically in the commit operation. So running an optimize command on a daily basis is a general practice in use.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group