Is it a bug in DEEP CLONE?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2023 03:14 AM
Hi,
I'm trying to modify a delta table using following approach:
- Shallow clone of the table (source_table)
- Modification of the the clone (clonned_table)
- Deep clone of the modified table to the source table.
Source delta table has 26 752 rows. Current Delta table version is: 123
This is my code:
CREATE TABLE clonned_table SHALLOW CLONE source_table;
ALTER TABLE clonned_table ALTER COLUMN column_1 COMMENT 'a comment';
ALTER TABLE clonned_table SET TBLPROPERTIES ('schema_version' = '8');
REPLACE TABLE source_table DEEP CLONE clonned_table;
I run last command 13 times
After that I verified number of rows before and after every replace.
before:
SELECT COUNT(*) FROM source_table VERSION AS OF 123 -> result 26 752 rows - expected value
after:
SELECT COUNT(*) FROM source_table VERSION AS OF 124 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 125 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 126 -> result 0 rows - !!!
SELECT COUNT(*) FROM source_table VERSION AS OF 127 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 128 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 129 -> result 0 rows - !!!
SELECT COUNT(*) FROM source_table VERSION AS OF 130 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 131 -> result 0 rows - !!!
SELECT COUNT(*) FROM source_table VERSION AS OF 132 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 133 -> result 0 rows - !!!
SELECT COUNT(*) FROM source_table VERSION AS OF 134 -> result 26 752 rows - expected value
SELECT COUNT(*) FROM source_table VERSION AS OF 135 -> result 0 rows - !!!
SELECT COUNT(*) FROM source_table VERSION AS OF 136 -> result 26 752 rows - expected value
I tested it on DB 12.2 LTS but last 3 execution using 13.3 LTS
It looks like DEEP CLONE sometime doesn't work properly.
What is the most interesting one time DEEP CLONE copied a few records.
In the delta log file I can see file attached to the version
Sometime it is just "add" command but sometime it is "remove" and "add" the same file but there is no relation between these two types of log and result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2023 07:42 AM
Hi @Retired_mod thanks for the response.
I do not modify data in the source and cloned table between executions of "deep" clone operations.
I modify metadata only before deep cloning
Parquet file is not deleted from the storage at all there is no new files as well.
It looks like this is metadata operation.
Every time 3 new files are generated in the delta log: json, crc and checkpoint
As I mentioned I executed the same command but delta log is different
for example:
- for version 128 that returns 26 752 rows
"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"
"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"
and all detailed information (including statistics)
- for version 135 that returns 0 rows
"remove":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"
"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"
and all detailed information (including statistics)
- for version 136 that returns 26 752 rows
"add":{"path":"part-00000-72d1d631-a961-479a-a7eb-1580e8665d78-c000.snappy.parquet"
and all detailed information (including statistics)