cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I get the size of files cleaned up by a vacuum for a Delta table.

User16826987838
Contributor
 
2 REPLIES 2

Ryan_Chynoweth
Honored Contributor III

The output of the optimized command produces the following metrics:

  • Number of Files Added
  • Number of Files Removed
  • min, max, avg, total files, and total size of files added
  • min, max, avg, total files, and total size of files removed
  • number of partitions optimized
  • z order stats
  • number of batches
  • total files considered
  • total files skipped

If that information does not provide the details required then you would need to scan the file system before and after running the command to collect and analyze the data yourself.

sajith_appukutt
Honored Contributor II
def getVaccumSize(table: String): Long = {
  val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList
  var sum = 0L
  listFiles.foreach(x => sum += dbutils.fs.ls(x.toString)(0).size)
  sum 
}
 
getVaccumSize("<your-table-name>")

You could use this function to get the size of files cleaned up

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.