cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using parameters in a SQL Notebook and COPY INTO statement

SamGreene
Contributor
Hi,
 
My scenario is I have an export of a table being dropped in ADLS every day.  I would like to load this data into a UC table and then repeat the process every day, replacing the data.  This seems to rule out DLT as it is meant for incremental processing and if I remember correctly, it didn't detect the new file and even attempt to load it.  I switched to using a SQL notebook and got the code working with hardcoded 'catalog.schema' .  Now I suppose I need to parameterize this unless there is some other way to set the schema context through the workflow/job. We used the parameter markers/widgets and the first few statements work, but the COPY INTO statement throws this error: [DELTA_COPY_INTO_TARGET_FORMAT] COPY INTO target must be a Delta table.  Thanks for your help. 
 
CREATE TABLE IF NOT EXISTS IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw');
 
DELETE FROM IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw');

COPY INTO IDENTIFIER(:catalog_name || '.' || :schema_name || '.' || 'my_table_raw')
FROM '/Volumes/path/to/file/my_table_export.parquet'
FILEFORMAT = PARQUET
FORMAT_OPTIONS ('mergeSchema' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true','force' = 'true');
 
[DELTA_COPY_INTO_TARGET_FORMAT] COPY INTO target must be a Delta table.
2 ACCEPTED SOLUTIONS

Accepted Solutions

Cary
New Contributor III

I would use widgets in the notebook which will process in Jobs.  SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.

View solution in original post

SamGreene
Contributor

The solution that worked what adding this python cell to the notebook:

 

%python
from pyspark.dbutils import DBUtils

dbutils = DBUtils(spark)

dbutils.widgets.text("catalog", "my_business_app")
dbutils.widgets.text("schema", "dev")
 
Then in the SQL Cell:
 
CREATE TABLE IF NOT EXISTS ${catalog}.${schema}.my_table_name;

View solution in original post

4 REPLIES 4

daniel_sahal
Esteemed Contributor

@SamGreene 
Simply write your sql queries as a python variables and then run them through 

spark.sql(qry)

Thanks for the suggestion, but we are using SQL in these notebooks and databricks documentation says COPY INTO supports using the IDENTIFIER function.  I need to find a way to parameterize sql notebooks to run them against different catalog/schema. 

Cary
New Contributor III

I would use widgets in the notebook which will process in Jobs.  SQL in Notebooks can use parameters, as would the SQL in the jobs with parameterized queries now supported.

SamGreene
Contributor

The solution that worked what adding this python cell to the notebook:

 

%python
from pyspark.dbutils import DBUtils

dbutils = DBUtils(spark)

dbutils.widgets.text("catalog", "my_business_app")
dbutils.widgets.text("schema", "dev")
 
Then in the SQL Cell:
 
CREATE TABLE IF NOT EXISTS ${catalog}.${schema}.my_table_name;

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group