cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What exactly does the Vacuum? Does cleaning the old versions of the Delta table?

BorislavBlagoev
Valued Contributor III
 
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoev​ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether you’re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Delta’s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoev​ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether you’re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Delta’s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

BorislavBlagoev
Valued Contributor III

Thanks!

Hi @Borislav Blagoev​ , If that answers your question would you like to mark it as the best answer?

I can't see where is the button for that

Hi @Borislav Blagoev​ , Shared the SS. Screenshot 2022-01-13 at 6.50.20 PM