cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Will Vacuum delete previous folders of data if we z-ordered by as_of_date each day?

Kash
Contributor III

Hi there,

I've had horrible experiences Vacuuming tables in the past and losing tons of data so I wanted to confirm a few things about Vacuuming and Z-Order.

Background:

Each day we run an ETL job that appends data in a table and stores the data in S3 by date (as_of_date).

When we Optimize Z-ORDER we do so with a WHERE as_of_date=current_date. This optimizes and z-orders the data for the folder we've just written and tens to double the size of the folder.

Questions:

1. When should we Vacuums? Each day?

2 - If we vacuum will it delete previous as_of_date folders in S3 since the current version is today's as_of_date?

3 - Should we not optimize with the where clause?

Thanks in advance for your help. It doesn't look like anyone has asked this question.

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III
  • If you append data to the partition folder, which is the date, then it makes sense to Optimize only that folder (so WHERE is equal to appended date), but if it is not partitioned, I will skip WHERE option,
  • VACUUM cleans the history of the table. So you can adjust the retention period. In some cases, like Bronze data, I VACUUM daily with RETAIN 0, but essential data, especially data under some regulations, can never be VACUUM. So every case can be different.

Anonymous
Not applicable

Hi @Avkash Kana​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group