cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What exactly does the Vacuum? Does cleaning the old versions of the Delta table?

BorislavBlagoev
Valued Contributor III
 
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoevโ€‹ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether youโ€™re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Deltaโ€™s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Borislav Blagoevโ€‹ ,

Vacuum cleans up files associated with a table.

Note:-

This command works differently depending on whether youโ€™re working on a Delta or Apache Spark table.

Vacuum a Delta table (Delta Lake on Databricks)

Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Deltaโ€™s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

Vacuum a Spark table (Apache Spark)

Recursively vacuums directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.

On Spark tables, Databricks automatically triggers 

VACUUM operations as data are written.

Source

BorislavBlagoev
Valued Contributor III

Thanks!

Hi @Borislav Blagoevโ€‹ , If that answers your question would you like to mark it as the best answer?

I can't see where is the button for that

Hi @Borislav Blagoevโ€‹ , Shared the SS. Screenshot 2022-01-13 at 6.50.20 PM

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.