cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I reduce the size of a hive table's S3 bucket

dotan
New Contributor II

I have a hive table in Delta format with over 1B rows, when I check the Data Explorer in the SQL section of Databricks it notes that the table size is 139.3GiB with 401 files but when I check the S3 bucket where the files are located (dbfs:/user/hive/warehouse/large_table) it's over 110TB and contains over 100K files.

Is it possible to reduce the size of the S3 bucket without losing any data in the table?

1 ACCEPTED SOLUTION

Accepted Solutions

apingle
Contributor

When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables.

In order to delete older files for a delta table, you can use the vacuum command.

https://docs.databricks.com/sql/language-manual/delta-vacuum.html

View solution in original post

2 REPLIES 2

apingle
Contributor

When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables.

In order to delete older files for a delta table, you can use the vacuum command.

https://docs.databricks.com/sql/language-manual/delta-vacuum.html

dotan
New Contributor II

That's great, thanks. It reduced the size of the bucket from 110TB to 7TB

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group