cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EDDatabricks
by Contributor
  • 2823 Views
  • 3 replies
  • 7 kudos

Resolved! Unable to perform VACUUM on Delta table

We have a table containing records from the last 2-3 years. The table size is around 7.5 TBytes (67 Billion rows).Because there are periodic updates on historical records and daily optimizations of this table, we have tried repeatedly to execute a m...

  • 2823 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @EDDatabricks EDDatabricks​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that ...

  • 7 kudos
2 More Replies
User16783853906
by Contributor III
  • 1617 Views
  • 1 replies
  • 1 kudos

Understanding file retention with Vacuum

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...

  • 1617 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 1 kudos

Hello @Venkatesh Kottapalli​ VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...

  • 1 kudos
Kash
by Contributor III
  • 1263 Views
  • 2 replies
  • 6 kudos

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Hi there,I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.Background:Each day we run an ETL job that appends data in a table and stores the data in S3 b...

  • 1263 Views
  • 2 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Avkash Kana​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 6 kudos
1 More Replies
ravikanthranjit
by New Contributor III
  • 3051 Views
  • 6 replies
  • 14 kudos

Vacuum on external tables that we mount on ADLS

Want to know the best process of removal of files on ADLS after Optimize and Vacuum Dry run is completed

  • 3051 Views
  • 6 replies
  • 14 kudos
Latest Reply
Anonymous
Not applicable
  • 14 kudos

Hi @Ravikanth Narayanabhatla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fr...

  • 14 kudos
5 More Replies
Dicer
by Valued Contributor
  • 6125 Views
  • 2 replies
  • 1 kudos

Resolved! PARSE_SYNTAX_ERROR: Syntax error at or near 'VACUUM'

I tried to VACUUM a delta table, but there is a Syntax error.Here is the code:%sql set spark.databricks.delta.retentionDurationCheck.enabled = False   VACUUM test_deltatable

  • 6125 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ravi
Valued Contributor
  • 1 kudos

@Cheuk Hin Christophe Poon​ Missing semi-colon at end of line 2?%sql set spark.databricks.delta.retentionDurationCheck.enabled = False; VACUUM test_deltatable

  • 1 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 3797 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Table - Reduce time travel storage size

Hello! I am trying to understand time travel feature. I see with "DESCRIBE HISTORY" command that all the transaction history on a specific table is recorded by version and timestamp. However, I understand that this occupies a lot of storage especiall...

  • 3797 Views
  • 3 replies
  • 5 kudos
Latest Reply
elgeo
Valued Contributor II
  • 5 kudos

Thank you @Werner Stinckens​ for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".SET spark.databricks.delta.retentionDurationCheck...

  • 5 kudos
2 More Replies
AP
by New Contributor III
  • 3721 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 3721 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
alejandrofm
by Valued Contributor
  • 4243 Views
  • 2 replies
  • 3 kudos

Resolved! Running vacuum on each table

Hi, in line with my question about optimize, this is the next step, with a retention of 7 days I could execute vacuum on all tables once a week, is this a recommended procedure?How can I know if I'll be getting any benefit from vacuum, without DRY RU...

  • 4243 Views
  • 2 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

Ideally 7 days is recommended, but discuss with data stakeholders to identify what's suitable? 7/14/28 days. To use VACCUM, first run some analytics on behaviour of your data.Identify % of operations that perform updates and deletes vs insert operati...

  • 3 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 9760 Views
  • 5 replies
  • 17 kudos

Resolved! Optimize and Vacuum - which is the best order of operations?

Optimize -> VacuumorVacuum -> Optimize

  • 9760 Views
  • 5 replies
  • 17 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 17 kudos

I optimize first as delta lake knows which files are relevant for the optimize. Like that I have my optimized data available faster. Then a vacuum. Seemed logical to me, but I might be wrong. Never actually thought about it

  • 17 kudos
4 More Replies
brickster_2018
by Esteemed Contributor
  • 1022 Views
  • 1 replies
  • 1 kudos
  • 1022 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 1 kudos

At a high-level VACUUM operation on a Delta table has 2 steps. 1) Identifying the stale files based on the VACUUM command triggered. 2) Deleting the files identified in Step 1The #1 is performed by triggering a Spark job hence utilizes the resource o...

  • 1 kudos
User16783853906
by Contributor III
  • 2367 Views
  • 2 replies
  • 0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time?  Will it impact the job result/performance?

  • 2367 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

  • 0 kudos
1 More Replies
User16783853906
by Contributor III
  • 2677 Views
  • 2 replies
  • 0 kudos

How does running VACUUM on Delta Lake tables effect read/write performance?

If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?

  • 2677 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty hig...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 1047 Views
  • 1 replies
  • 0 kudos
  • 1047 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

It really depends on your business intentions! You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retent...

  • 0 kudos
Labels