โ01-05-2024 05:10 AM
Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)
Problem: I cannot run any of the examples provided in the PySpark documentation: (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s...)
I literally just copy pasted the examples into a notebook and tried running them but got following error message (Im running DBR 14.1):
If I use regular f-formatted strings it works:
Regardless of which example in the PySpark Doc or from the Databricks blog post I tried - they all resulted in the error shown above. It would only work when I used f-formatted strings. But the whole idea of this new functionality is that we do not have to use f-formatted strings anymore as they can present a SQL injection vulnerability.
Did anyone get this new parameterized SQL functionality to work? or am I missing something here?
โ01-09-2024 12:37 AM
Hi @Michael_Appiah, Itโs indeed a great feature that enhances query reusability and mitigates the risk of SQL injection ....
The issue youโre experiencing might be due to a compatibility issue with Databricks Runtime (DBR) 14.1. While Spark 3.4 does support parameterized SQL queries, itโs possible that DBR 14.1 might not fully support this feature yet.
As a workaround, you could continue using f-strings for now, but I understand your concern about SQL injection vulnerabilities. Itโs always a good practice to avoid directly incorporating user input into SQL queries.
I hope this helps, and Iโm here if you have any more questions! ๐
โ01-09-2024 12:37 AM
Hi @Michael_Appiah, Itโs indeed a great feature that enhances query reusability and mitigates the risk of SQL injection ....
The issue youโre experiencing might be due to a compatibility issue with Databricks Runtime (DBR) 14.1. While Spark 3.4 does support parameterized SQL queries, itโs possible that DBR 14.1 might not fully support this feature yet.
As a workaround, you could continue using f-strings for now, but I understand your concern about SQL injection vulnerabilities. Itโs always a good practice to avoid directly incorporating user input into SQL queries.
I hope this helps, and Iโm here if you have any more questions! ๐
โ01-09-2024 08:29 AM
Hi @Kaniz_Fatma ,
thank you for your quick answer.
So DBR14.1 actually includes Spark 3.5.0. I will test with DBR13.3 LTS however and see if that solves the problem. Maybe the issue is caused by Spark Connect which was introduced for shared clusters in DBR14.0. Thanks for the hint.
โ01-18-2024 02:17 AM
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
โ01-20-2024 03:12 PM
Hi @Kaniz_Fatma
I just ran a couple of tests with the parameterized spark.sql() (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s... query examples from the documentation with following results:
Parameterized spark.sql() works for:
- single user clusters with DBR 13.3 LTS, 14.0 and 14.2
- shared access mode clusters with DBR 13.3 LTS
Parameterized spark.sql() does NOT work for
- shared access mode clusters with DBR 14.0 and 14.2
So it seems as if the shared access mode does not fully support parameterized queries yet. If I remember correctly, shared access mode clusters introduced spark connect start DBR14.0 I assume that this is the issue. Are there plans to support parameterized spark.sql() for shared access mode cluster with DBR > 13.3 in the future?
โ05-10-2024 12:43 AM
Thanks for the clarification @Michael_Appiah, very helpfull! Is there already a timeline when this will be supported in DBR 14.x ? As alternatives are not sql injection proof enough for us.
โ05-15-2024 03:45 AM
@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ahead and try with the DBR 15.1 (or 15.2 which is in Beta). Maybe @Kaniz_Fatma has more information on any future plans to support parameterized spark.sql for shared access mode Clusters with DBR > 13.3?
an hour ago
Can confirm it's working again, tested on a job cluster with DBR 15.4 LTS. It failed on 14.3 LTS.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group