cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

khh2023
New Contributor

Hello,

I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetrics captured some increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes (highlighted in red below).

image.pngI checked the source files, which are of similar size as before. I am wondering where I should look to see what caused the increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes?

Thank you.

1 REPLY 1

mathan_pillai
Valued Contributor
Valued Contributor

This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition then the files within that partition qualifies for optimize operation. May be recently there is a surge in number of new files to each partition that made most partitions to have 50+ files and there by it could have received more number of files that are eligible for optimize operation. It could be that there is an increase in the number of files / data volume recently (even though individual file sizes are the same). That is also one reason this could happen. You can check the number of files that got recently added to verify this.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!