Databricks Community

SamGreene · ‎05-09-2024

Hi,

My scenario is I have an export of a table being dropped in ADLS every day. I would like to load this data into a UC table and then repeat the process every day, replacing the data. This seems to rule out DLT as it is meant for incremental processing and if I remember correctly, it didn't detect the new file and even attempt to load it. I switched to using a SQL notebook and got the code working with hardcoded 'catalog.schema' . Now I suppose I need to parameterize this unless there is some other way to set the schema context through the workflow/job. We used the parameter markers/widgets and the first few statements work, but the COPY INTO statement throws this error: [DELTA_COPY_INTO_TARGET_FORMAT] COPY INTO target must be a Delta table. Thanks for your help.

CREATE TABLE IF NOT EXISTS IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw');

DELETE FROM IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw');

COPY INTO IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw')

FROM '/Volumes/path/to/file/my_table_export.parquet'

FILEFORMAT = PARQUET

FORMAT_OPTIONS ('mergeSchema' = 'true')

COPY_OPTIONS ('mergeSchema' = 'true','force' = 'true');

[DELTA_COPY_INTO_TARGET_FORMAT] COPY INTO target must be a Delta table.

Cary · ‎05-15-2024

I would use widgets in the notebook which will process in Jobs. SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.

View solution in original post

SamGreene · ‎06-05-2024

The solution that worked what adding this python cell to the notebook:

%python

from pyspark.dbutils import DBUtils

dbutils = DBUtils(spark)

dbutils.widgets.text("catalog", "my_business_app")

dbutils.widgets.text("schema", "dev")

Then in the SQL Cell:

CREATE TABLE IF NOT EXISTS ${catalog}.${schema}.my_table_name;

View solution in original post

daniel_sahal · ‎05-09-2024

@SamGreene
Simply write your sql queries as a python variables and then run them through

spark.sql(qry)

SamGreene · ‎05-10-2024

Thanks for the suggestion, but we are using SQL in these notebooks and databricks documentation says COPY INTO supports using the IDENTIFIER function. I need to find a way to parameterize sql notebooks to run them against different catalog/schema.

Cary · ‎05-15-2024

I would use widgets in the notebook which will process in Jobs. SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.