cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Optimized Write vs Reparation, Which is recommended?

saipujari_spark
Valued Contributor
Valued Contributor

When streaming to a Delta table, both repartitioning on the partition column and optimized write can help to avoid small files.

Which is recommended between Delta Optimized Write vs Repartitioning?

Thanks,
Saikrishna Pujari
Sr. Spark Technical Solutions Engineer, Databricks
1 REPLY 1

saipujari_spark
Valued Contributor
Valued Contributor

 Optimized write is recommended over repartitioning for the below reasons.

* The key part of Optimized Writes is that it is an adaptive shuffle. If you have a streaming ingest use case and input data rates change over time, the adaptive shuffle will adjust itself accordingly to the incoming data rates across micro-batches. If you have code snippets where you coalesce(n) or repartition(n) just before you write out your stream, you can remove those lines.

* Databricks dynamically optimizes Spark partition sizes based on the actual data and attempts to write out 128 MB files for each table partition. This is an approximate size and can vary depending on dataset characteristics.

* Repartitioning on a partition column can result in partitions with varying sizes when there is data skew, this will result in not so optimized file sizes.

The bottom line is that Optimize write is no different than Repartitioning, To simple put Optimized write is a repartition where we pick the number of partitions in an adaptive and optimal way on the fly based on data.

Reference: https://docs.databricks.com/delta/optimizations/auto-optimize.html#auto-compaction

Thanks,
Saikrishna Pujari
Sr. Spark Technical Solutions Engineer, Databricks
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.