cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pantelis_mare
by Contributor III
  • 5080 Views
  • 6 replies
  • 1 kudos

Delta merge file size control

Hello community!I have a rather weird issue where a delta merge is writing very big files (~1GB) that slow down my pipeline. Here is some context:I have a dataframe containg updates for several dates in the past. Current and last day contain the vast...

  • 5080 Views
  • 6 replies
  • 1 kudos
Latest Reply
pantelis_mare
Contributor III
  • 1 kudos

Hello Jose,I just went with splitting the merge in 2 so I have a merge that touches many partitions but few rows per file and a second that touches ​2-3 partitions but contain the build of the data.

  • 1 kudos
5 More Replies
William_Scardua
by Valued Contributor
  • 3611 Views
  • 4 replies
  • 4 kudos

Resolved! Small/big file problem, how do you fix it ?

How do you work to fixing the small/big file problem ? what you suggest ?

  • 3611 Views
  • 4 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

What Jose said.If you cannot use delta or do not want to:the use of coalesce and repartition/partitioning is the way to define the file size.There is no one ideal file size. It all depends on the use case, available cluster size, data flow downstrea...

  • 4 kudos
3 More Replies
Labels