- 13454 Views
- 6 replies
- 17 kudos
Optimize -> VacuumorVacuum -> Optimize
- 13454 Views
- 6 replies
- 17 kudos
Latest Reply
What about ReOrg delta table https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-reorg-tableDoes it help or make sense to add Re-org then Optimize - Vacuum every week?Reorganize a Delta Lake table by rewriting files to purge ...
5 More Replies
- 3558 Views
- 3 replies
- 7 kudos
We have a table containing records from the last 2-3 years. The table size is around 7.5 TBytes (67 Billion rows).Because there are periodic updates on historical records and daily optimizations of this table, we have tried repeatedly to execute a m...
- 3558 Views
- 3 replies
- 7 kudos
Latest Reply
Hi @EDDatabricks EDDatabricks Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that ...
2 More Replies
- 2408 Views
- 1 replies
- 1 kudos
I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...
- 2408 Views
- 1 replies
- 1 kudos
Latest Reply
Hello @Venkatesh Kottapalli VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...
by
Kash
• Contributor III
- 1524 Views
- 2 replies
- 6 kudos
Hi there,I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.Background:Each day we run an ETL job that appends data in a table and stores the data in S3 b...
- 1524 Views
- 2 replies
- 6 kudos
Latest Reply
Hi @Avkash Kana Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...
1 More Replies
- 3989 Views
- 6 replies
- 14 kudos
Want to know the best process of removal of files on ADLS after Optimize and Vacuum Dry run is completed
- 3989 Views
- 6 replies
- 14 kudos
Latest Reply
Hi @Ravikanth Narayanabhatla Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fr...
5 More Replies
by
Dicer
• Valued Contributor
- 7025 Views
- 2 replies
- 1 kudos
I tried to VACUUM a delta table, but there is a Syntax error.Here is the code:%sql
set spark.databricks.delta.retentionDurationCheck.enabled = False
VACUUM test_deltatable
- 7025 Views
- 2 replies
- 1 kudos
Latest Reply
@Cheuk Hin Christophe Poon Missing semi-colon at end of line 2?%sql
set spark.databricks.delta.retentionDurationCheck.enabled = False;
VACUUM test_deltatable
1 More Replies
by
elgeo
• Valued Contributor II
- 5092 Views
- 3 replies
- 5 kudos
Hello! I am trying to understand time travel feature. I see with "DESCRIBE HISTORY" command that all the transaction history on a specific table is recorded by version and timestamp. However, I understand that this occupies a lot of storage especiall...
- 5092 Views
- 3 replies
- 5 kudos
Latest Reply
elgeo
Valued Contributor II
Thank you @Werner Stinckens for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".SET spark.databricks.delta.retentionDurationCheck...
2 More Replies
by
AP
• New Contributor III
- 4604 Views
- 5 replies
- 3 kudos
So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...
- 4604 Views
- 5 replies
- 3 kudos
Latest Reply
@AKSHAY PALLERLA Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens for jumping in, as always!
4 More Replies
- 5478 Views
- 2 replies
- 3 kudos
Hi, in line with my question about optimize, this is the next step, with a retention of 7 days I could execute vacuum on all tables once a week, is this a recommended procedure?How can I know if I'll be getting any benefit from vacuum, without DRY RU...
- 5478 Views
- 2 replies
- 3 kudos
Latest Reply
Ideally 7 days is recommended, but discuss with data stakeholders to identify what's suitable? 7/14/28 days. To use VACCUM, first run some analytics on behaviour of your data.Identify % of operations that perform updates and deletes vs insert operati...
1 More Replies
- 2964 Views
- 2 replies
- 0 kudos
Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?
- 2964 Views
- 2 replies
- 0 kudos
Latest Reply
In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...
1 More Replies
- 3214 Views
- 2 replies
- 0 kudos
If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?
- 3214 Views
- 2 replies
- 0 kudos
Latest Reply
VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty hig...
1 More Replies