01-18-2024 12:38 AM
Hi Everyone,
I need your help/suggestion.
We are using a DLT framework for our ELT process, data is received from the Source to the RAW layer in parquet format. This raw data is loaded to the Bronze layer which acts like a history table. From the BRONZE layer data is loaded to Silver using Merge login(upsert) in Delta format.
As part of cost savings, we cleaned up unwanted/old data from Silver tables(records older than 2017)
we used the delete query to delete these records and we ran VACCUME after clean-up to claim the free space.
we saw very good space savings after this activity.
But a few weeks later we observed above silver table's ADLS space was the same as before the cleanup
But when I query the table I don't see the deleted records.
Can someone please help me what is the reason for the space increase and how to get free space permanently after the cleanup?
01-24-2024 03:30 AM
Can someone please help on this
01-24-2024 03:37 AM
Hi @Azure_Data_Bric , Did you run the Vacuum command on the same day as the delete query? Did you run the Vacuum command with default retention period of 7 days? And are you running vacuum command regularly or you have run it only once?
01-30-2024 08:29 AM
Hi Thanks a lot for the response.
I did run VACCUM with 0-hour retention(VACUUM <table_name> RETAIN 0 HOURS;) immedialty after deletion query.
Also, we have default VACCUM running every day once.
Please let me know if you need any more information.
01-30-2024 09:34 AM
Hi @Azure_Data_Bric , Thank you for responding. Are you also running optimize command before vacuum. If not, I would advise you to so.
01-30-2024 10:29 AM
Hi,
I see Optimize and VACUUM are running on all tables once per day automatically.
that day when we performed historical deletion, we deleted the data first, and then we just ran VACUUM with zero hour retention.
After some time Optimize and VACUUM (with 7-day retention) ran automatically which is scheduled daily
after deletion activity space of all these tables is reduced by 80%.
when I checked after a week I saw all the space was occupied again. but when i query the table i don't see deleted records and new data addition also not so much, still i see same space is occupied these tables as before deletion activity.
Please help what could be the reason
01-30-2024 10:50 AM
Hi @Azure_Data_Bric , As you mentioned you are not seeing the deleted records, my understanding of the issue is that new data has been loaded in the table since the day the historical data was deleted. It also suggests to me that you are not performing delete operations after that day.
If this is the situation, then the only space we can free up(using vacuum operation) is the space taken up by stale files. Please note that when you deleted the historical data and ran the vacuum, a large amount of space was released because you had deleted lot of the data.
In addition to this, we can also look for below 2 problems:
1. Over partitioning: If your table is over partitioned, then the optimize will not be able to generate a lot of stale files which the vacuum command clears later.
2. Look for any problem at the cloud provider side if the vacuum is actually deleting the data or not. Or is it facing any issue there.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group