Hi all,
As part of an on-going exercise to refactor existing T-SQL code into Databricks, we've stumbled into an issue that we can't seem to overcome through Spark SQL.
Currently we use dynamic SQL to loop through a number of tables, where we use parameters to specify field names among other elements. We're attempting to use the EXECUTE IMMEDIATE functionality combined with session variables to achieve the same result.
However, when trying to achieve this in Databricks it does not seem to interpret the variables as expected. See example code below:
DECLARE varfield_names1 STRING;
SET VAR varfield_names1 = 'field1 STRING';
DECLARE varsqlstring1 STRING;
SET VAR varsqlstring1 = 'CREATE TEMPORARY VIEW AS
SELECT test as (PrimaryKey STRING, Table STRING, VALUES(:varfield_names1))';
EXECUTE IMMEDIATE varsqlstring1 USING varfield_names1;
The same issue exists when trying to use hive variables as well:
1) Set field variable
spark.sql(""" SET var.field_name= 'test1' """)
2) Concat approach to define dynamic string:
spark.sql(""" SET var.sql_string = CONCAT('CREATE OR REPLACE TABLE Table_1 (PrimaryKey STRING, Table STRING, ', ${var.field_name}, ')') """)
3) Execute the string (fails)
EXECUTE IMMEDIATE ${var.sql_string}
Does anyone have any idea how we can get this to work - in this example we're looking to be able to specify a field as a variable into an executable sql string. The rationale for this is the code is used as part of a loop, hence we require this to be dynamically defined.