Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam 😉
Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to prevent SQL injection), to no avail.
I'm able to run this unparametrized query:
CREATE OR REPLACE TABLE catalog_1.schema_2.table_1
DEEP CLONE catalog_1.schema_1.table_1
Now, what I'd really like to do would be to parametrize the entities involved, so I can wrap this with a function that clones based on the parameters I provide.
Now, according to the documentation, since the query uses DDL, I should use an IDENTIFIER clause with named parameter markers to enable a dynamic reference to the tables. I've extrapolated the third example in the linked documentation, but it just doesn't work 😱
def clone_table(
source_catalog,
source_schema,
source_table,
target_catalog,
target_schema,
target_table,
):
_clone_table_query = """
CREATE OR REPLACE TABLE IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :target_table)
DEEP CLONE IDENTIFIER(:source_catalog || '.' || :source_schema || '.' || :source_table)
"""
spark.sql(
_clone_table_query,
{
"source_catalog": source_catalog,
"source_schema": source_schema,
"source_table": source_table,
"target_catalog": target_catalog,
"target_schema": target_schema,
"target_table": target_table,
},
)
Then I try to execute the function with the values I had in the unparametrized query, but I get an error:
clone_table("catalog_1", "schema_1", "table_1", "catalog_1", "schema_2", "table_1")
[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(concat(concat(concat(catalog_1, .), schema_1), .), table_1), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$5725/1148071365@723da234
', whose format is Unknown.
The supported formats are 'delta', 'iceberg' and 'parquet'. SQLSTATE: 0AKDC
Is it just me, or is there really no way around string interpolation and Bobby tables?