cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it a bug in DEEP CLONE?

norbitek
New Contributor II

Hi,

I'm trying to modify a delta table using following approach:

  1. Shallow clone of the table (source_table)
  2. Modification of the the clone (clonned_table)
  3. Deep clone of the modified table to the source table.

Source delta table has 26 752 rows. Current Delta table version is: 123

This is my code:


CREATE TABLE clonned_table SHALLOW CLONE source_table;

ALTER TABLE clonned_table ALTER COLUMN column_1 COMMENT 'a comment';
ALTER TABLE clonned_table SET TBLPROPERTIES ('schema_version' = '8');

REPLACE TABLE source_table DEEP CLONE clonned_table;
I run last command 13 times

After that I verified number of rows before and after every replace.

before:

SELECT COUNT(*) FROM source_table VERSION AS OF 123 -> result 26 752 rows - expected value 

after:

SELECT COUNT(*) FROM source_table VERSION AS OF 124 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 125 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 126 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 127 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 128 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 129 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 130 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 131 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 132 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 133 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 134 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 135 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 136 -> result 26 752 rows - expected value 

I tested it on DB 12.2 LTS but last 3 execution using 13.3 LTS

It looks like DEEP CLONE sometime doesn't work properly.

What is the most interesting one time DEEP CLONE copied a few records.

In the delta log file I can see file attached to the version

Sometime it is just "add" command but sometime it is "remove" and "add" the same file but there is no relation between these two types of log and result.

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @norbitekThe issue you are experiencing is likely due to the nature of "shallow" cloning. A "shallow" clone in Delta Lake does not copy the data files to the clone target. Instead, it creates pointers to the existing data files. This means that if the source data files are deleted or moved, the cloned table cannot access them, resulting in a table with 0 rows. When you perform a "deep" clone, however, Delta Lake copies the source table data to the clone target in addition to the metadata of the existing table. This makes the cloned table independent of the source table, which could explain why you see the expected number of rows after performing a deep clone. Something may be happening to the data files in the source table between your "shallow" clone and "deep" clone operations, causing the discrepancy in row counts.

Hi @Kaniz_Fatma thanks for the response.

I do not modify data in the source and cloned table between executions of "deep" clone operations.

I modify metadata only before deep cloning

Parquet file is not deleted from the storage at all there is no new files as well.

It looks like this is metadata operation.

Every time 3 new files are generated in the delta log: json, crc and checkpoint

As I mentioned I executed the same command but delta log is different

for example:

  • for version 128 that returns 26 752 rows

"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

 

  • for version 135 that returns 0 rows

"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

 

  • for version 136 that returns 26 752 rows 

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!