cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Impact of VACUUM Operations on Shallow Clones in Databricks

chsoni12
New Contributor II

I performed a POC where i have to check that can we create a new delta table which contains only particular version of data of normal delta table without copying the data and if we make changes or perform any operation(insert/delete/truncate/records) or running a vacuum in the normal delta table. It should not impact the new delta table.

I used the databricks shallow clone to create the new delta table from the normal delta table. This operation did not copy the data of a particular version in new delta table. It was referencing the same file which normal table was using and also performed some operations like insert, delete, truncate and vacuum on normal delta table but it did not impact the new delta table created by using shallow clone.

Databricks Concept says:-
1) Data of clone table(new delta table) may be impacted of deleted if we perform any operation(like vacuum) in the normal table.
2) Until and unless file is referenced by any delta table, it won't delete the file.
3) Also after running the Vacuum, it will not delete the data if it is referenced by delta table even after the retention period because it deletes only old data which is not referenced by a delta table.  

So my question is Vacuum or any other operation impact the clone table or not ? Because the concept 2 says , it won't impact. Is my understanding correct?

2 REPLIES 2

Isi
Contributor III

Hey @chsoni12 ,


Using UC:

Databricks tracks metadata dependencies between the source and the clone. This means:

  • VACUUM on the source table will not delete any data files if they are still referenced by the shallow clone.

  • Even after the retention period, VACUUM only removes files not needed by any Delta table, including the clone.

  • So: your clone remains safe, even if the source table is modified (insert/delete/truncate) or vacuumed.

 

Not using UC:

  • Shallow clones are not protected the same way.

  • If a file is no longer needed by the source table and you run VACUUM, it can be deleted, even if the clone still references it.

  • That would break your clone (queries could fail due to missing files).

 

Some limitations

  • Shallow clones on external tables must be external tables. Shallow clones on managed tables must be managed tables.
  • You cannot use REPLACE or CREATE OR REPLACE to overwrite an existing shallow clone. Instead, DROP the shallow clone and run a new CREATE statement.
  • You cannot nest shallow clones, meaning you cannot make a shallow clone from a shallow clone.

 

References: Databricks Docs 

Hope this helps, 🙂

Isi

 

chsoni12
New Contributor II

Thanks🙂. It really helps me a lot But there is also an issue in shallow clone. We can only clone the full table data, particular delta version data using timestamp/version from the normal table using shallow clone but we can not clone the table data by applying filter condition on a particular column. For that we need to copy data. We do not have any databricks native approach to achieve without copying data. Is my understanding correct?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now