cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

khh2023
New Contributor

Hello,

I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetrics captured some increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes (highlighted in red below).

image.pngI checked the source files, which are of similar size as before. I am wondering where I should look to see what caused the increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes?

Thank you.

1 REPLY 1

mathan_pillai
Valued Contributor
Valued Contributor

This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition then the files within that partition qualifies for optimize operation. May be recently there is a surge in number of new files to each partition that made most partitions to have 50+ files and there by it could have received more number of files that are eligible for optimize operation. It could be that there is an increase in the number of files / data volume recently (even though individual file sizes are the same). That is also one reason this could happen. You can check the number of files that got recently added to verify this.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.