cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pantelis_mare
by Contributor III
  • 2606 Views
  • 6 replies
  • 1 kudos

Delta merge file size control

Hello community!I have a rather weird issue where a delta merge is writing very big files (~1GB) that slow down my pipeline. Here is some context:I have a dataframe containg updates for several dates in the past. Current and last day contain the vast...

  • 2606 Views
  • 6 replies
  • 1 kudos
Latest Reply
pantelis_mare
Contributor III
  • 1 kudos

Hello Jose,I just went with splitting the merge in 2 so I have a merge that touches many partitions but few rows per file and a second that touches ​2-3 partitions but contain the build of the data.

  • 1 kudos
5 More Replies
William_Scardua
by Valued Contributor
  • 1570 Views
  • 5 replies
  • 4 kudos

Resolved! Small/big file problem, how do you fix it ?

How do you work to fixing the small/big file problem ? what you suggest ?

  • 1570 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

What Jose said.If you cannot use delta or do not want to:the use of coalesce and repartition/partitioning is the way to define the file size.There is no one ideal file size. It all depends on the use case, available cluster size, data flow downstrea...

  • 4 kudos
4 More Replies
Labels