- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2024 10:21 AM
Hi Ramakrishnan83,
1. Vacume commands only work with delta tables, Vacume command will delete the parquet files older than the retention period which is by default 7 days. Optimize will rather club the files in case any special serial is provided.
2. Ideally, as per the databricks recommendation if there is continuous data writing, then the optimize command should be executed daily.
3. Both the commands optimize and vacuum will optimize in different ways:
- Optimize will collocate the data based on patterns in the dataset.
Vacuum will delete the paruqet files from the storage layer.
Please refer to the articles for more details.
https://docs.databricks.com/en/delta/optimize.html https://docs.databricks.com/en/sql/language-manual/delta-optimize.html
Data engineer at Rsystema