cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimize and Vaccum Command

Ramakrishnan83
New Contributor III

Hi team,

I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items

1.Should I need to  run Optimize and Vacuum command ? . Very Minimal Read Queries are executed against the audit tables

2. If i need to run, should I add Optimize and vacuum command in the same notebook to shrink the storage layer?

3. What scenarios should i look for to optimize and vaccum command for tables involved in purge process

3.No Action. Will data bricks and Apache Spark framework takes care internally on optimizing ? 

2 REPLIES 2

Hkesharwani
New Contributor III

Hi Ramakrishnan83,
1. Vacume commands only work with delta tables, Vacume command will delete the parquet files older than the retention period which is by default 7 days.  Optimize will rather club the files in case any special serial is provided.
2. Ideally, as per the databricks recommendation if there is continuous data writing, then the optimize command should be executed daily.

3. Both the commands optimize and vacuum will optimize in different ways:

  • Optimize will collocate the data based on patterns in the dataset.

Vacuum will delete the paruqet files from the storage layer.
Please refer to the articles for more details.
https://docs.databricks.com/en/delta/optimize.html https://docs.databricks.com/en/sql/language-manual/delta-optimize.html

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀
 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.