Parametrizing query for DEEP CLONE

camilo_s — Thu, 27 Jun 2024 14:43:59 GMT

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam 😉

Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to prevent SQL injection), to no avail.

I'm able to run this unparametrized query:

CREATE OR REPLACE TABLE catalog_1.schema_2.table_1 DEEP CLONE catalog_1.schema_1.table_1

Now, what I'd really like to do would be to parametrize the entities involved, so I can wrap this with a function that clones based on the parameters I provide.

Now, according to the documentation, since the query uses DDL, I should use an IDENTIFIER clause with named parameter markers to enable a dynamic reference to the tables. I've extrapolated the third example in the linked documentation, but it just doesn't work 😱

def clone_table( source_catalog, source_schema, source_table, target_catalog, target_schema, target_table, ): _clone_table_query = """ CREATE OR REPLACE TABLE IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :target_table) DEEP CLONE IDENTIFIER(:source_catalog || '.' || :source_schema || '.' || :source_table) """ spark.sql( _clone_table_query, { "source_catalog": source_catalog, "source_schema": source_schema, "source_table": source_table, "target_catalog": target_catalog, "target_schema": target_schema, "target_table": target_table, }, )

Then I try to execute the function with the values I had in the unparametrized query, but I get an error:

clone_table("catalog_1", "schema_1", "table_1", "catalog_1", "schema_2", "table_1")

[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(concat(concat(concat(catalog_1, .), schema_1), .), table_1), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$5725/1148071365@723da234 ', whose format is Unknown. The supported formats are 'delta', 'iceberg' and 'parquet'. SQLSTATE: 0AKDC

Is it just me, or is there really no way around string interpolation and Bobby tables?

topic Parametrizing query for DEEP CLONE in Data Engineering

Parametrizing query for DEEP CLONE