cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parameterized spark.sql() not working

Michael_Appiah
Contributor

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)

Problem: I cannot run any of the examples provided in the PySpark  documentation: (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s...)

I literally just copy pasted the examples into a notebook and tried running them but got following error message (Im running DBR 14.1):

Michael_Appiah_0-1704459542967.png

If I use regular f-formatted strings it works:

Michael_Appiah_1-1704459570498.png

Regardless of which example in the PySpark Doc or from the Databricks blog post I tried - they all resulted in the error shown above. It would only work when I used f-formatted strings. But the whole idea of this new functionality is that we do not have to use f-formatted strings anymore as they can present a SQL injection vulnerability.

Did anyone get this new parameterized SQL functionality to work? or am I missing something here?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Michael_Appiah,  Itโ€™s indeed a great feature that enhances query reusability and mitigates the risk of SQL injection ....

 

The issue youโ€™re experiencing might be due to a compatibility issue with Databricks Runtime (DBR) 14.1. While Spark 3.4 does support parameterized SQL queries, itโ€™s possible that DBR 14.1 might not fully support this feature yet.

 

As a workaround, you could continue using f-strings for now, but I understand your concern about SQL injection vulnerabilities. Itโ€™s always a good practice to avoid directly incorporating user input into SQL queries.

 

I hope this helps, and Iโ€™m here if you have any more questions! ๐Ÿ˜Š

View solution in original post

7 REPLIES 7

Kaniz_Fatma
Community Manager
Community Manager

Hi @Michael_Appiah,  Itโ€™s indeed a great feature that enhances query reusability and mitigates the risk of SQL injection ....

 

The issue youโ€™re experiencing might be due to a compatibility issue with Databricks Runtime (DBR) 14.1. While Spark 3.4 does support parameterized SQL queries, itโ€™s possible that DBR 14.1 might not fully support this feature yet.

 

As a workaround, you could continue using f-strings for now, but I understand your concern about SQL injection vulnerabilities. Itโ€™s always a good practice to avoid directly incorporating user input into SQL queries.

 

I hope this helps, and Iโ€™m here if you have any more questions! ๐Ÿ˜Š

Michael_Appiah
Contributor

Hi @Kaniz_Fatma , 

thank you for your quick answer. 

So DBR14.1 actually includes Spark 3.5.0. I will test with DBR13.3 LTS however and see if that solves the problem. Maybe the issue is caused by Spark Connect which was introduced for shared clusters in DBR14.0. Thanks for the hint.

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Hi @Kaniz_Fatma 

I just ran a couple of tests with the parameterized spark.sql() (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s...  query examples from the documentation with following results:

Parameterized spark.sql() works for:
- single user clusters with DBR 13.3 LTS, 14.0 and 14.2
- shared access mode clusters with DBR 13.3 LTS 

Parameterized spark.sql() does NOT work for
- shared access mode clusters with DBR 14.0 and 14.2

So it seems as if the shared access mode does not fully support parameterized queries yet. If I remember correctly, shared access mode clusters introduced spark connect start DBR14.0 I assume that this is the issue. Are there plans to support parameterized spark.sql() for shared access mode cluster with DBR > 13.3 in the future?

Cas
New Contributor III

Thanks for the clarification @Michael_Appiah, very helpfull! Is there already a timeline when this will be supported in DBR 14.x ? As alternatives are not sql injection proof enough for us.

Michael_Appiah
Contributor

@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ahead and try with the DBR 15.1 (or 15.2 which is in Beta). Maybe @Kaniz_Fatma has more information on any future plans to support parameterized spark.sql for shared access mode Clusters with DBR > 13.3?

adriennn
Contributor II

Can confirm it's working again, tested on a job cluster with DBR 15.4 LTS. It failed on 14.3 LTS.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group