Databricks Community

camilo_s · ‎06-27-2024

Update: Hey moderator, I've removed the link to the Bobby tables XKCD to reassure that this post is not spam 😉

Hi, I'm somehow unable to write a parametrized query to create a DEEP CLONE. I'm trying really hard to avoid using string interpolation (to prevent SQL injection), to no avail.

I'm able to run this unparametrized query:

CREATE OR REPLACE TABLE catalog_1.schema_2.table_1
DEEP CLONE catalog_1.schema_1.table_1

Now, what I'd really like to do would be to parametrize the entities involved, so I can wrap this with a function that clones based on the parameters I provide.

Now, according to the documentation, since the query uses DDL, I should use an IDENTIFIER clause with named parameter markers to enable a dynamic reference to the tables. I've extrapolated the third example in the linked documentation, but it just doesn't work 😱

def clone_table(
    source_catalog,
    source_schema,
    source_table,
    target_catalog,
    target_schema,
    target_table,
):
    _clone_table_query = """
        CREATE OR REPLACE TABLE IDENTIFIER(:target_catalog || '.' || :target_schema || '.' || :target_table)
        DEEP CLONE IDENTIFIER(:source_catalog || '.' || :source_schema || '.' || :source_table)
    """
    spark.sql(
        _clone_table_query,
        {
            "source_catalog": source_catalog,
            "source_schema": source_schema,
            "source_table": source_table,
            "target_catalog": target_catalog,
            "target_schema": target_schema,
            "target_table": target_table,
        },
    )

Then I try to execute the function with the values I had in the unparametrized query, but I get an error:

clone_table("catalog_1", "schema_1", "table_1", "catalog_1", "schema_2", "table_1")

[DELTA_CLONE_UNSUPPORTED_SOURCE] Unsupported DEEP clone source ''PlanWithUnresolvedIdentifier concat(concat(concat(concat(catalog_1, .), schema_1), .), table_1), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$5725/1148071365@723da234
', whose format is Unknown.
The supported formats are 'delta', 'iceberg' and 'parquet'. SQLSTATE: 0AKDC