cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Show Vacuum operation result (files deleted) without DRY RUN

alejandrofm
Valued Contributor

Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III
SELECT * FROM (DESCRIBE HISTORY table)x WHERE operation IN ('VACUUM END', 'VACUUM START');

that gives us required information:

imagen.png

View solution in original post

4 REPLIES 4

RKNutalapati
Valued Contributor

Hi @Alejandro Martinezโ€‹ :

 I don't think we have any such command to get the statistics before vacuum and after vacuum.

 Atleast I haven't come across any.

 If you want to capture more details, may be you can write a function to capture the statistics as below.

 Data files size:

 Data files count:

 Before:

var getDataFileSize = 0

val getDataFileCount = dbutils.fs.ls(<Your Table Path>").toList.size

dbutils.fs.ls(<Your Table Path>)

 .foreach

 {

  file =>

  getDataFileSize = getDataFileSize + file.size

 }

 After:

   Repeat above

Lets see if other community members have better ideas on this.

Hubert-Dudek
Esteemed Contributor III
SELECT * FROM (DESCRIBE HISTORY table)x WHERE operation IN ('VACUUM END', 'VACUUM START');

that gives us required information:

imagen.png

alejandrofm
Valued Contributor

Thank you! Not the solution I was looking for, but it seems nothing better exists...yet so going for that.

Thanks!!!

RKNutalapati
Valued Contributor

We have to enable logging to capture the logs for vacuum.

spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group