cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it a bug in DEEP CLONE?

norbitek
New Contributor II

Hi,

I'm trying to modify a delta table using following approach:

  1. Shallow clone of the table (source_table)
  2. Modification of the the clone (clonned_table)
  3. Deep clone of the modified table to the source table.

Source delta table has 26 752 rows. Current Delta table version is: 123

This is my code:


CREATE TABLE clonned_table SHALLOW CLONE source_table;

ALTER TABLE clonned_table ALTER COLUMN column_1 COMMENT 'a comment';
ALTER TABLE clonned_table SET TBLPROPERTIES ('schema_version' = '8');

REPLACE TABLE source_table DEEP CLONE clonned_table;
I run last command 13 times

After that I verified number of rows before and after every replace.

before:

SELECT COUNT(*) FROM source_table VERSION AS OF 123 -> result 26 752 rows - expected value 

after:

SELECT COUNT(*) FROM source_table VERSION AS OF 124 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 125 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 126 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 127 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 128 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 129 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 130 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 131 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 132 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 133 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 134 -> result 26 752 rows - expected value 

SELECT COUNT(*) FROM source_table VERSION AS OF 135 -> result 0 rows - !!!

SELECT COUNT(*) FROM source_table VERSION AS OF 136 -> result 26 752 rows - expected value 

I tested it on DB 12.2 LTS but last 3 execution using 13.3 LTS

It looks like DEEP CLONE sometime doesn't work properly.

What is the most interesting one time DEEP CLONE copied a few records.

In the delta log file I can see file attached to the version

Sometime it is just "add" command but sometime it is "remove" and "add" the same file but there is no relation between these two types of log and result.

1 REPLY 1

Hi @Retired_mod thanks for the response.

I do not modify data in the source and cloned table between executions of "deep" clone operations.

I modify metadata only before deep cloning

Parquet file is not deleted from the storage at all there is no new files as well.

It looks like this is metadata operation.

Every time 3 new files are generated in the delta log: json, crc and checkpoint

As I mentioned I executed the same command but delta log is different

for example:

  • for version 128 that returns 26 752 rows

"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

 

  • for version 135 that returns 0 rows

"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

 

  • for version 136 that returns 26 752 rows 

"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"

and all detailed information (including statistics)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group