cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parametrizing query for DEEP CLONE

camilo_s
Contributor

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam 😉

Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to prevent SQL injection), to no avail.

I'm able to run this unparametrized query:

 

 

CREATE OR REPLACE TABLE catalog_1.schema_2.table_1
DEEP CLONE catalog_1.schema_1.table_1

 

 

Now, what I'd really like to do would be to parametrize the entities involved, so I can wrap this with a function that clones based on the parameters I provide.

Now, according to the documentation, since the query uses DDL, I should use an IDENTIFIER clause with named parameter markers to enable a dynamic reference to the tables. I've extrapolated the third example in the linked documentation, but it just doesn't work 😱

 

 

 

def clone_table(
    source_catalog,
    source_schema,
    source_table,
    target_catalog,
    target_schema,
    target_table,
):
    _clone_table_query = """
        CREATE OR REPLACE TABLE IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :target_table)
        DEEP CLONE IDENTIFIER(:source_catalog || '.' || :source_schema || '.' || :source_table)
    """
    spark.sql(
        _clone_table_query,
        {
            "source_catalog": source_catalog,
            "source_schema": source_schema,
            "source_table": source_table,
            "target_catalog": target_catalog,
            "target_schema": target_schema,
            "target_table": target_table,
        },
    )

 

 

Then I try to execute the function with the values I had in the unparametrized query, but I get an error:

 

 

clone_table("catalog_1", "schema_1", "table_1", "catalog_1", "schema_2", "table_1")
[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(concat(concat(concat(catalog_1, .), schema_1), .), table_1), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$5725/1148071365@723da234
', whose format is Unknown.
The supported formats are 'delta', 'iceberg' and 'parquet'. SQLSTATE: 0AKDC

 

 

Is it just me, or is there really no way around string interpolation and Bobby tables?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group