cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to reduce storage space consumed by delta with many updates

Greg
New Contributor III

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that gets many events). The actual data size of the 2nd table is ≈ 400MB, however due to delta versions it consumes ≈ 40GB. I have added vacuum every hour to the streaming process to keep it even this low. Any suggestions on how I can reduce this storage consumption further? I do not require the versioning. Ideally I could have some way to disable this while retaining the ability to MERGE.

1 REPLY 1

Jb11
New Contributor II

Did you already solved this problem?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group