Optimize and Vaccum Command
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-04-2024 06:11 AM
Hi team,
I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items
1.Should I need to run Optimize and Vacuum command ? . Very Minimal Read Queries are executed against the audit tables
2. If i need to run, should I add Optimize and vacuum command in the same notebook to shrink the storage layer?
3. What scenarios should i look for to optimize and vaccum command for tables involved in purge process
3.No Action. Will data bricks and Apache Spark framework takes care internally on optimizing ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2024 10:21 AM
Hi Ramakrishnan83,
1. Vacume commands only work with delta tables, Vacume command will delete the parquet files older than the retention period which is by default 7 days. Optimize will rather club the files in case any special serial is provided.
2. Ideally, as per the databricks recommendation if there is continuous data writing, then the optimize command should be executed daily.
3. Both the commands optimize and vacuum will optimize in different ways:
- Optimize will collocate the data based on patterns in the dataset.
Vacuum will delete the paruqet files from the storage layer.
Please refer to the articles for more details.
https://docs.databricks.com/en/delta/optimize.html https://docs.databricks.com/en/sql/language-manual/delta-optimize.html
Data engineer at Rsystema

