â03-06-2022 05:09 PM
Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?
Thanks!
â03-07-2022 02:44 AM
SELECT * FROM (DESCRIBE HISTORY table)x WHERE operation IN ('VACUUM END', 'VACUUM START');
that gives us required information:
â03-06-2022 09:51 PM
Hi @Alejandro Martinezâ :
I don't think we have any such command to get the statistics before vacuum and after vacuum.
Atleast I haven't come across any.
If you want to capture more details, may be you can write a function to capture the statistics as below.
Data files size:
Data files count:
Before:
var getDataFileSize = 0
val getDataFileCount = dbutils.fs.ls(<Your Table Path>").toList.size
dbutils.fs.ls(<Your Table Path>)
.foreach
{
file =>
getDataFileSize = getDataFileSize + file.size
}
After:
Repeat above
Lets see if other community members have better ideas on this.
â03-07-2022 02:44 AM
â03-07-2022 06:13 AM
Thank you! Not the solution I was looking for, but it seems nothing better exists...yet so going for that.
Thanks!!!
â03-07-2022 06:22 AM
We have to enable logging to capture the logs for vacuum.
spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.