cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SET configuration in SQL DLT pipeline does not work

MarkD
New Contributor II

Hi,

I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.

SET startDate='2020-01-01';

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};

It is getting an error because it plugs in an empty value for the variable in the query.

[PARSE_SYNTAX_ERROR] Syntax error at or near end of input. SQLSTATE: 42601 (line 3, pos 19)

== SQL ==
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > 
-------------------^^^

It seems like the documentation is wrong. Is there a different way to do this?

Thanks!

8 REPLIES 8

anardinelli
Databricks Employee
Databricks Employee

Hi @MarkD how are you?

The `SET` command alone is typically used to change the settings of a session, such as configuration parameters.

To define a variable inside a SQL statement, you can use SET VAR <variablename> = <variablevalue> or SET VARIABLE <variablename> = <variablevalue> to define, as on the example below:

-- Set a SQL variable to a value using SET VAR
> SET VAR myvar1 = 5;
> VALUES (myvar1);
  5

-- Set a SQL variable to 7 using the SET VARIABLE command
> SET VARIABLE myvar1 = 7;
> VALUES (myvar1);
  7

Please relate to the SET VAR documentation for further reference: https://docs.databricks.com/ja/sql/language-manual/sql-ref-syntax-aux-set-variable.html

Take care!

MarkD
New Contributor II

The example I gave is directly from Databrick's DLT documentation, but as you say does not appear the be the right way to set a variable and doesn't work.

Unfortunately SET VAR is not supported in Delta Live Tables. 

anardinelli
Databricks Employee
Databricks Employee

For DLT pipelines, please use the "Advanced" panel, on the pipeline menu, to set your variables:

anardinelli_0-1715891788808.png

That will set up a new configuration value that can be referenced inside your code as a variable:

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};

Would this solve your issue? Also, can you please share the documentation from DLT that states that the SET command can be used to create variables?

Alessandro

MarkD
New Contributor II

I've used the advanced configuration option for other variables, but in this case I wanted something dynamic. I want to `SET startDate = current_time()` and then use `startDate` through all my queries to make sure they are all using the same time comparison in their queries.

Here's the documentation that shows using "SET" as I did in the initial question. https://docs.databricks.com/en/delta-live-tables/sql-ref.html#set-configuration-values-for-a-table-o...

Icassatti
New Contributor III

@MarkD I beleve that you don't need use a var like that.

What do you think about use only a function current_time(), like that:

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > current_time();

MarkD
New Contributor II

The reason I want to use a variable is so that I can use the same cut off time in multiple queries to make sure to make sure that I don't get records in one query with a later date than another.

For example

CREATE OR REFRESH LIVE TABLE filtered_one
AS SELECT * FROM my_table_one
WHERE created_at > current_time();

CREATE OR REFRESH LIVE TABLE filtered_two
AS SELECT * FROM my_table_two
WHERE created_at > current_time();

If I want to compare records in `filtered_one` to the results in `filtered_two` I can't rely on them having used the same value for `current_time()`.

Icassatti
New Contributor III

Have you considered use checkpoint?
If you do this maybe empty dataset comes becouse there are no date in future... created_at > current_time().

Read more about [Change Data Feed] Enable it in your base table and then capture only new changes feed.

DLT Doc 

Stream Apply Changes

Hkesharwani
Contributor II

Hi @MarkD ,
You may use 

 

set variable_name.var= '1900-01-01'

to set the value of variable and in order to use the value of variable 
use 

${automated_date.var}

 


Example:

 

set automated_date.var= '1800-01-01'
select * from my table where date = CAST(${automated_date.var} as DATE)

 

Harshit Kesharwani
Data engineer at Rsystema

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group