Difference between "deep clone ..." and as "select * from ..."

camilo_s — Mon, 28 Oct 2024 08:42:18 GMT

Hi all,

I was trying to deep clone one of the sample tables provided with a parametrized query:

create table if not exists IDENTIFIER(:target_catalog || :target_schema || :table_name) DEEP CLONE IDENTIFIER('samples.tpch.' || :table_name)

But Databricks SQL throws the error:

[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(samples.tpch., ), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$12333/0x00007f4304d55708@4e795225 ', whose format is Unknown. The supported formats are 'delta', 'iceberg' and 'parquet'.

For testing purposes, I also tried hard-coding the source table's full name (e.g. deep clone samples.tpch.partsupp), to no avail (for the record: the query works when both source and target table full names are hard-coded)

I came up with the following workaround, which does work without problems

create table if not exists IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :table_name) as select * from IDENTIFIER('samples.tpch.' || :table_name)

This leads me to the question: What is the difference between using deep clone an as select * from for cloning tables?

Re: Difference between "deep clone ..." and as "select * from ..."

skarpeck — Mon, 28 Oct 2024 10:19:35 GMT

Deep clone will also clone all the metadata (e.g. indexes, properties, history, etc.), while SELECT * will create a new, fresh Delta Table, with it's own history and properties.

topic Difference between "deep clone ..." and as "select * from ..." in Data Engineering

Difference between "deep clone ..." and as "select * from ..."

Re: Difference between "deep clone ..." and as "select * from ..."