cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

khh2023
New Contributor

Hello,

I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetrics captured some increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes (highlighted in red below).

image.pngI checked the source files, which are of similar size as before. I am wondering where I should look to see what caused the increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes?

Thank you.

1 REPLY 1

mathan_pillai
Databricks Employee
Databricks Employee

This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition then the files within that partition qualifies for optimize operation. May be recently there is a surge in number of new files to each partition that made most partitions to have 50+ files and there by it could have received more number of files that are eligible for optimize operation. It could be that there is an increase in the number of files / data volume recently (even though individual file sizes are the same). That is also one reason this could happen. You can check the number of files that got recently added to verify this.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group