cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks sql variables and if/else workflow

JKR
Contributor

I have 2 tasks in databricks job workflow first task is of type SQL and SQL task is query.In that query I've declared 2 variables and SET the values by running query.e.g:

DECLARE VARIABLE max_timestamp TIMESTAMP DEFAULT '1970-01-01';
SET VARIABLE max_timestamp = (select max(timestamp) from Table);
select max_timestamp ;

Can I pass this max_timestamp variable to the next if/else task to check if it is null or is there any value in this variable and pass that value of max_timestamp to another task. or is there any workaround to do this? I want to use SQL warehouse for this.

I created a SQL task and assign max timestamp value from subquery to the SQL variable and also created if/else task and in condition I've added {{tasks. [task_name]. values.[value_name]}}, but the job throw below error on the if/else task.

run failed with error message

Failed to resolve references: Task value referenced by 'tasks.task_name.values.max_timestamp' was not found.

Query based dropdown list option in databricks sql editor will execute the query multiple times if I use the variable at multiple places in a big sql query ? If yes then it will be executing query to max_timestamp everytime I refer in the query.

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @JKR, You can indeed pass variables between tasks in a Databricks job workflow. This is done using the taskValues subutility in Databricks Utilities. This utility allows tasks to output values that can be referenced in subsequent tasks.

 

However, it’s important to note that the error message you’re seeing, Failed to resolve references: Task value referenced by 'tasks.task_name.values.max_timestamp' was not found, suggests that the max_timestamp value was not set correctly in the SQL task, or there might be a mismatch in the task name or value name.

 

Regarding your question about the query-based dropdown list option in Databricks SQL editor, it appe.... If performance is a concern due to the repeated execution of the query, you might want to consider alternative strategies for managing your data.

 

Please note that the use of SQL variables like DECLARE and SET is only supported in Databricks runti.... If you’re using SQL Data Warehouse, you might need to use a Databricks cluster with a SQL notebook4. Also, the {{tasks.[task_name].values.[value_name]}} syntax is supported only in Python notebooks.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!