cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SET configuration in SQL DLT pipeline does not work

MarkD
New Contributor II

Hi,

I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.

SET startDate='2020-01-01';

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};

It is getting an error because it plugs in an empty value for the variable in the query.

[PARSE_SYNTAX_ERROR] Syntax error at or near end of input. SQLSTATE: 42601 (line 3, pos 19)

== SQL ==
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > 
-------------------^^^

It seems like the documentation is wrong. Is there a different way to do this?

Thanks!

8 REPLIES 8

anardinelli
New Contributor III
New Contributor III

Hi @MarkD how are you?

The `SET` command alone is typically used to change the settings of a session, such as configuration parameters.

To define a variable inside a SQL statement, you can use SET VAR <variablename> = <variablevalue> or SET VARIABLE <variablename> = <variablevalue> to define, as on the example below:

-- Set a SQL variable to a value using SET VAR
> SET VAR myvar1 = 5;
> VALUES (myvar1);
  5

-- Set a SQL variable to 7 using the SET VARIABLE command
> SET VARIABLE myvar1 = 7;
> VALUES (myvar1);
  7

Please relate to the SET VAR documentation for further reference: https://docs.databricks.com/ja/sql/language-manual/sql-ref-syntax-aux-set-variable.html

Take care!

MarkD
New Contributor II

The example I gave is directly from Databrick's DLT documentation, but as you say does not appear the be the right way to set a variable and doesn't work.

Unfortunately SET VAR is not supported in Delta Live Tables. 

anardinelli
New Contributor III
New Contributor III

For DLT pipelines, please use the "Advanced" panel, on the pipeline menu, to set your variables:

anardinelli_0-1715891788808.png

That will set up a new configuration value that can be referenced inside your code as a variable:

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};

Would this solve your issue? Also, can you please share the documentation from DLT that states that the SET command can be used to create variables?

Alessandro

MarkD
New Contributor II

I've used the advanced configuration option for other variables, but in this case I wanted something dynamic. I want to `SET startDate = current_time()` and then use `startDate` through all my queries to make sure they are all using the same time comparison in their queries.

Here's the documentation that shows using "SET" as I did in the initial question. https://docs.databricks.com/en/delta-live-tables/sql-ref.html#set-configuration-values-for-a-table-o...

Icassatti
New Contributor II

@MarkD I beleve that you don't need use a var like that.

What do you think about use only a function current_time(), like that:

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > current_time();

MarkD
New Contributor II

The reason I want to use a variable is so that I can use the same cut off time in multiple queries to make sure to make sure that I don't get records in one query with a later date than another.

For example

CREATE OR REFRESH LIVE TABLE filtered_one
AS SELECT * FROM my_table_one
WHERE created_at > current_time();

CREATE OR REFRESH LIVE TABLE filtered_two
AS SELECT * FROM my_table_two
WHERE created_at > current_time();

If I want to compare records in `filtered_one` to the results in `filtered_two` I can't rely on them having used the same value for `current_time()`.

Icassatti
New Contributor II

Have you considered use checkpoint?
If you do this maybe empty dataset comes becouse there are no date in future... created_at > current_time().

Read more about [Change Data Feed] Enable it in your base table and then capture only new changes feed.

DLT Doc 

Stream Apply Changes

Hkesharwani
Contributor II

Hi @MarkD ,
You may use 

 

set variable_name.var= '1900-01-01'

to set the value of variable and in order to use the value of variable 
use 

${automated_date.var}

 


Example:

 

set automated_date.var= '1800-01-01'
select * from my table where date = CAST(${automated_date.var} as DATE)

 

Harshit Kesharwani
Self-taught Data Engineer | Seeking Remote Full-time Opportunities
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!