SET configuration in SQL DLT pipeline does not work
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2024 09:25 AM
Hi,
I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.
SET startDate='2020-01-01';
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};
It is getting an error because it plugs in an empty value for the variable in the query.
[PARSE_SYNTAX_ERROR] Syntax error at or near end of input. SQLSTATE: 42601 (line 3, pos 19)
== SQL ==
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at >
-------------------^^^
It seems like the documentation is wrong. Is there a different way to do this?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 12:34 PM
Hi @MarkD how are you?
The `SET` command alone is typically used to change the settings of a session, such as configuration parameters.
To define a variable inside a SQL statement, you can use SET VAR <variablename> = <variablevalue> or SET VARIABLE <variablename> = <variablevalue> to define, as on the example below:
-- Set a SQL variable to a value using SET VAR > SET VAR myvar1 = 5; > VALUES (myvar1); 5 -- Set a SQL variable to 7 using the SET VARIABLE command > SET VARIABLE myvar1 = 7; > VALUES (myvar1); 7
Please relate to the SET VAR documentation for further reference: https://docs.databricks.com/ja/sql/language-manual/sql-ref-syntax-aux-set-variable.html
Take care!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 12:42 PM
The example I gave is directly from Databrick's DLT documentation, but as you say does not appear the be the right way to set a variable and doesn't work.
Unfortunately SET VAR is not supported in Delta Live Tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 01:39 PM
For DLT pipelines, please use the "Advanced" panel, on the pipeline menu, to set your variables:
That will set up a new configuration value that can be referenced inside your code as a variable:
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > ${startDate};
Would this solve your issue? Also, can you please share the documentation from DLT that states that the SET command can be used to create variables?
Alessandro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 02:33 PM
I've used the advanced configuration option for other variables, but in this case I wanted something dynamic. I want to `SET startDate = current_time()` and then use `startDate` through all my queries to make sure they are all using the same time comparison in their queries.
Here's the documentation that shows using "SET" as I did in the initial question. https://docs.databricks.com/en/delta-live-tables/sql-ref.html#set-configuration-values-for-a-table-o...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 05:12 PM
@MarkD I beleve that you don't need use a var like that.
What do you think about use only a function current_time(), like that:
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM my_table
WHERE created_at > current_time();
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2024 06:18 PM
The reason I want to use a variable is so that I can use the same cut off time in multiple queries to make sure to make sure that I don't get records in one query with a later date than another.
For example
CREATE OR REFRESH LIVE TABLE filtered_one
AS SELECT * FROM my_table_one
WHERE created_at > current_time();
CREATE OR REFRESH LIVE TABLE filtered_two
AS SELECT * FROM my_table_two
WHERE created_at > current_time();
If I want to compare records in `filtered_one` to the results in `filtered_two` I can't rely on them having used the same value for `current_time()`.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2024 10:16 AM
Have you considered use checkpoint?
If you do this maybe empty dataset comes becouse there are no date in future... created_at > current_time().
Read more about [Change Data Feed] Enable it in your base table and then capture only new changes feed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2024 06:29 AM
Hi @MarkD ,
You may use
set variable_name.var= '1900-01-01'
to set the value of variable and in order to use the value of variable
use
${automated_date.var}
Example:
set automated_date.var= '1800-01-01'
select * from my table where date = CAST(${automated_date.var} as DATE)
Data engineer at Rsystema
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2024 09:16 AM
@anardinelli Can you please help with a solution? I am am having issue with setting a variable in delta live table pipeline and use it with APPLY CHANGES INTO syntax.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 06:15 AM
@smit_tw Have you tried setting it on the "Advanced" tab as my previous suggests?

