cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT optimize and vacuum

Gil
New Contributor III

We were finally able to get DLT pipelines to run the optimize and vacuum automatically.  We verified this via the the table history.   However I am able to still query versions older than 7 days.   Has anyone been experiencing this and how were you able to fix it.  

10 REPLIES 10

NathanSundarara
Contributor

Can you please tell me how you verified the vacuum and optimize it's performing automatically. Because I couldn't figure out so I'm running optimize and vacuum command manually every night. Any help would be appreciated.

karthik_p
Esteemed Contributor

@Gilhow much retention period u r setting to your vacuum command please, looks by default it is 7, but still it is recommended to add retention time

Gil
New Contributor III

We left the default so I believe itโ€™s 7 days.  Thanks 

Gil
New Contributor III

If I recall I can query versions older than 30days. 

NathanSundarara
Contributor

Even with our case I didn't see the default 7 days didn't work based on what I saw that's why I'm running the command manually. IF @Gil can explain or someone can explain how to validate I can stop my job and see if it's actually working (the automatic Vacuum process)

@NathanSundarara it looks vacuum and optimize are part of maintenance  tasks, these tasks will get triggered only within 24 hours of a table being updated

karthik_p_0-1688143460140.png

 

That's what I thought as well but I checked the number of files didn't reduce now after adding the job it did show less files and compressed. That's why I asked @Gil for verification. Here is how I did one of the table we get like 24 files every hour. One day I noticed it was like total 300 files then I was under assumption if we add 24 files next day after compression it should go down it kept increasing. Now after I created the job it's now showing like 4 or 5 files when I look in the morning and as day progress I see the files it gets added and next day again it will come down to 4 or 5 files.

Gil
New Contributor III

I am verifying that optimize and vacuum is running by looking at table history.   I am checking which older versions I am able to query and have found I can still query versions older than 7 days.   If vacuum is working I should not see versions older than 7 days.  

Anonymous
Not applicable

Hi @Gil 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

Anonymous
Not applicable

Hi @Gil 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group