cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I get the size of files cleaned up by a vacuum for a Delta table.

User16826987838
Contributor
 
2 REPLIES 2

Ryan_Chynoweth
Esteemed Contributor

The output of the optimized command produces the following metrics:

  • Number of Files Added
  • Number of Files Removed
  • min, max, avg, total files, and total size of files added
  • min, max, avg, total files, and total size of files removed
  • number of partitions optimized
  • z order stats
  • number of batches
  • total files considered
  • total files skipped

If that information does not provide the details required then you would need to scan the file system before and after running the command to collect and analyze the data yourself.

sajith_appukutt
Honored Contributor II
def getVaccumSize(table: String): Long = {
  val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList
  var sum = 0L
  listFiles.foreach(x => sum += dbutils.fs.ls(x.toString)(0).size)
  sum 
}
 
getVaccumSize("<your-table-name>")

You could use this function to get the size of files cleaned up

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group