cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parameterized spark.sql() not working

Michael_Appiah
Contributor II

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)

Problem: I cannot run any of the examples provided in the PySpark  documentation: (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s...)

I literally just copy pasted the examples into a notebook and tried running them but got following error message (Im running DBR 14.1):

Michael_Appiah_0-1704459542967.png

If I use regular f-formatted strings it works:

Michael_Appiah_1-1704459570498.png

Regardless of which example in the PySpark Doc or from the Databricks blog post I tried - they all resulted in the error shown above. It would only work when I used f-formatted strings. But the whole idea of this new functionality is that we do not have to use f-formatted strings anymore as they can present a SQL injection vulnerability.

Did anyone get this new parameterized SQL functionality to work? or am I missing something here?

16 REPLIES 16

adriennn
Valued Contributor

@Michael_Appiah circling back on this because the forEachBatch functionality has landed in Lakeflow:

https://docs.databricks.com/aws/en/ldp/for-each-batch

So now you can do this in "DLT". But there's going to be a lot of boiler plate stuff like error handling, full refresh cleanup, logging that needs to be done manually compared to the normal functonalities.


Malthe
Contributor III

@adriennn this has nothing to do with DLT, but about Databricks providing a different session implementation here than regular Spark.