cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to make FOR cycle and dynamic SQL and variables work together

smoortema
New Contributor III

I am working on a testing notebook where the table that is tested can be given as a widget. I wanted to write it in SQL. The notebook does the following steps in a cycle that should run 10 times:

1. Store the starting version of a delta table in a variable

2. Do some data transformation

3. Restore the table to its original version

It seems to me that the above process cannot be written in SQL currently. If I try to give a value to a variable by a dynamic SQL within the FOR loop, the variable returns 0.

Example below (I did not include the widgets in the example, but widgets are the reason the SET sql statement is run by EXECUTE IMMEDIATE, and not directly). The result should be 5 but instead it shows null.

It is also remarkable that within a BEGIN END statement, SET VARIABLE should not be accepted, it should only work with SET sql =. However, if I omit the VARIABLE word from the definition of sql, then it throws an error.

 

BEGIN
  DECLARE sum INT DEFAULT 0;
  DECLARE start_version INT;
  DECLARE sql STRING;
  testCycle: FOR SELECT * FROM range(1, 2) DO

    SET sql = 'SET VARIABLE start_version = 5';
    EXECUTE IMMEDIATE sql;
    SELECT start_version;

  END FOR testCycle;
END;
1 REPLY 1

mmayorga
Databricks Employee
Databricks Employee

Hi @smoortema ,

Thank you for reaching out! You are very close to getting the “start_version”; you just need to include “INTO start_version” after the “EXECUTE IMMEDIATE”. Here is the updated code

BEGIN
  DECLARE sum INT DEFAULT 0;
  DECLARE start_version INT;
  DECLARE sql STRING;
  testCycle: FOR SELECT * FROM range(1, 2) DO

    SET sql = 'SELECT 5';
    EXECUTE IMMEDIATE sql iNTO start_version;
    SELECT start_version;

  END FOR testCycle;
END;

Now, If I understand correctly, the idea is to leverage a widget within a notebook that holds a table name. Then use this table name to get a specific version, run transformations, and then restore it again.

First, we define our widget

dbutils.widgets.text(
    name='table_name',
    defaultValue='',
)

Then declare a cursor, build your sql while leveraging the widget's value using ":table_name" to get latest or first version of your table, here is a starter script:

%sql
BEGIN
  -- Declare variables if needed
  DECLARE sum INT DEFAULT 0;
  DECLARE table_version INT DEFAULT 0;
  DECLARE dynamic_sql STRING;

  -- Declare a cursor
  testCycle: FOR row AS
    SELECT num FROM range(1, 10) AS t(num)
  DO
    -- Reference columns using the cursor variable (here 'row')
    SET sum = sum + row.num;

    -- Get Specific Table Version
    SET dynamic_sql = CONCAT('(SELECT version FROM (DESCRIBE HISTORY ',:table_name,' ) ORDER BY version LIMIT 1)');
    EXECUTE IMMEDIATE dynamic_sql INTO table_version;

    -- Transformations Section Here

    -- Restore table using another sql string variable Here
    
   
  END FOR testCycle;

  VALUES (sum, table_version, :table_name);
END;

I hope this helps! And if this solution works for you, I would appreciate it if you could please click the "Accept as Solution" button.

Thank you!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now