cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What exactly does the Vacuum? Does cleaning the old versions of the Delta table?

BorislavBlagoev
Valued Contributor III
 
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoevโ€‹ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether youโ€™re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Deltaโ€™s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoevโ€‹ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether youโ€™re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Deltaโ€™s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

BorislavBlagoev
Valued Contributor III

Thanks!

Hi @Borislav Blagoevโ€‹ , If that answers your question would you like to mark it as the best answer?

I can't see where is the button for that

Hi @Borislav Blagoevโ€‹ , Shared the SS. Screenshot 2022-01-13 at 6.50.20 PM