cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EDDatabricks
by Contributor
  • 3027 Views
  • 3 replies
  • 7 kudos

Resolved! Unable to perform VACUUM on Delta table

We have a table containing records from the last 2-3 years. The table size is around 7.5 TBytes (67 Billion rows).Because there are periodic updates on historical records and daily optimizations of this table, we have tried repeatedly to execute a m...

  • 3027 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @EDDatabricks EDDatabricks​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that ...

  • 7 kudos
2 More Replies
User16783853906
by Contributor III
  • 1845 Views
  • 1 replies
  • 1 kudos

Understanding file retention with Vacuum

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...

  • 1845 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hello @Venkatesh Kottapalli​ VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...

  • 1 kudos
Kash
by Contributor III
  • 1349 Views
  • 2 replies
  • 6 kudos

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Hi there,I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.Background:Each day we run an ETL job that appends data in a table and stores the data in S3 b...

  • 1349 Views
  • 2 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Avkash Kana​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 6 kudos
1 More Replies
ravikanthranjit
by New Contributor III
  • 3298 Views
  • 6 replies
  • 14 kudos

Vacuum on external tables that we mount on ADLS

Want to know the best process of removal of files on ADLS after Optimize and Vacuum Dry run is completed

  • 3298 Views
  • 6 replies
  • 14 kudos
Latest Reply
Anonymous
Not applicable
  • 14 kudos

Hi @Ravikanth Narayanabhatla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fr...

  • 14 kudos
5 More Replies
Dicer
by Valued Contributor
  • 6433 Views
  • 2 replies
  • 1 kudos

Resolved! PARSE_SYNTAX_ERROR: Syntax error at or near 'VACUUM'

I tried to VACUUM a delta table, but there is a Syntax error.Here is the code:%sql set spark.databricks.delta.retentionDurationCheck.enabled = False   VACUUM test_deltatable

  • 6433 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ravi
Databricks Employee
  • 1 kudos

@Cheuk Hin Christophe Poon​ Missing semi-colon at end of line 2?%sql set spark.databricks.delta.retentionDurationCheck.enabled = False; VACUUM test_deltatable

  • 1 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 4228 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Table - Reduce time travel storage size

Hello! I am trying to understand time travel feature. I see with "DESCRIBE HISTORY" command that all the transaction history on a specific table is recorded by version and timestamp. However, I understand that this occupies a lot of storage especiall...

  • 4228 Views
  • 3 replies
  • 5 kudos
Latest Reply
elgeo
Valued Contributor II
  • 5 kudos

Thank you @Werner Stinckens​ for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".SET spark.databricks.delta.retentionDurationCheck...

  • 5 kudos
2 More Replies
AP
by New Contributor III
  • 4025 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 4025 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
alejandrofm
by Valued Contributor
  • 4670 Views
  • 2 replies
  • 3 kudos

Resolved! Running vacuum on each table

Hi, in line with my question about optimize, this is the next step, with a retention of 7 days I could execute vacuum on all tables once a week, is this a recommended procedure?How can I know if I'll be getting any benefit from vacuum, without DRY RU...

  • 4670 Views
  • 2 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

Ideally 7 days is recommended, but discuss with data stakeholders to identify what's suitable? 7/14/28 days. To use VACCUM, first run some analytics on behaviour of your data.Identify % of operations that perform updates and deletes vs insert operati...

  • 3 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 10497 Views
  • 5 replies
  • 17 kudos

Resolved! Optimize and Vacuum - which is the best order of operations?

Optimize -> VacuumorVacuum -> Optimize

  • 10497 Views
  • 5 replies
  • 17 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 17 kudos

I optimize first as delta lake knows which files are relevant for the optimize. Like that I have my optimized data available faster. Then a vacuum. Seemed logical to me, but I might be wrong. Never actually thought about it

  • 17 kudos
4 More Replies
brickster_2018
by Databricks Employee
  • 1072 Views
  • 1 replies
  • 1 kudos
  • 1072 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

At a high-level VACUUM operation on a Delta table has 2 steps. 1) Identifying the stale files based on the VACUUM command triggered. 2) Deleting the files identified in Step 1The #1 is performed by triggering a Spark job hence utilizes the resource o...

  • 1 kudos
User16783853906
by Contributor III
  • 2525 Views
  • 2 replies
  • 0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time?  Will it impact the job result/performance?

  • 2525 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

  • 0 kudos
1 More Replies
User16783853906
by Contributor III
  • 2822 Views
  • 2 replies
  • 0 kudos

How does running VACUUM on Delta Lake tables effect read/write performance?

If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?

  • 2822 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty hig...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 1111 Views
  • 1 replies
  • 0 kudos
  • 1111 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

It really depends on your business intentions! You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retent...

  • 0 kudos
Labels