cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parametrizing query for DEEP CLONE

camilo_s
New Contributor III

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam 😉

Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to prevent SQL injection), to no avail.

I'm able to run this unparametrized query:

 

 

CREATE OR REPLACE TABLE catalog_1.schema_2.table_1
DEEP CLONE catalog_1.schema_1.table_1

 

 

Now, what I'd really like to do would be to parametrize the entities involved, so I can wrap this with a function that clones based on the parameters I provide.

Now, according to the documentation, since the query uses DDL, I should use an IDENTIFIER clause with named parameter markers to enable a dynamic reference to the tables. I've extrapolated the third example in the linked documentation, but it just doesn't work 😱

 

 

 

def clone_table(
    source_catalog,
    source_schema,
    source_table,
    target_catalog,
    target_schema,
    target_table,
):
    _clone_table_query = """
        CREATE OR REPLACE TABLE IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :target_table)
        DEEP CLONE IDENTIFIER(:source_catalog || '.' || :source_schema || '.' || :source_table)
    """
    spark.sql(
        _clone_table_query,
        {
            "source_catalog": source_catalog,
            "source_schema": source_schema,
            "source_table": source_table,
            "target_catalog": target_catalog,
            "target_schema": target_schema,
            "target_table": target_table,
        },
    )

 

 

Then I try to execute the function with the values I had in the unparametrized query, but I get an error:

 

 

clone_table("catalog_1", "schema_1", "table_1", "catalog_1", "schema_2", "table_1")
[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(concat(concat(concat(catalog_1, .), schema_1), .), table_1), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$5725/1148071365@723da234
', whose format is Unknown.
The supported formats are 'delta', 'iceberg' and 'parquet'. SQLSTATE: 0AKDC

 

 

Is it just me, or is there really no way around string interpolation and Bobby tables?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @camilo_s, I appreciate your efforts to avoid string interpolation and prevent SQL injection. The approach you’ve taken using the IDENTIFIER clause with named parameter markers is indeed the right direction. However, there’s a subtle issue in your code snippet.

The issue lies in how you’re constructing the table identifiers within the query. Instead of concatenating the identifiers directly, you should use placeholders for the catalog, schema, and table names.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!