โ01-05-2024 05:10 AM
Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)
Problem: I cannot run any of the examples provided in the PySpark documentation: (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s...)
I literally just copy pasted the examples into a notebook and tried running them but got following error message (Im running DBR 14.1):
If I use regular f-formatted strings it works:
Regardless of which example in the PySpark Doc or from the Databricks blog post I tried - they all resulted in the error shown above. It would only work when I used f-formatted strings. But the whole idea of this new functionality is that we do not have to use f-formatted strings anymore as they can present a SQL injection vulnerability.
Did anyone get this new parameterized SQL functionality to work? or am I missing something here?
โ01-09-2024 08:29 AM
Hi @Retired_mod ,
thank you for your quick answer.
So DBR14.1 actually includes Spark 3.5.0. I will test with DBR13.3 LTS however and see if that solves the problem. Maybe the issue is caused by Spark Connect which was introduced for shared clusters in DBR14.0. Thanks for the hint.
โ01-20-2024 03:12 PM
Hi @Retired_mod
I just ran a couple of tests with the parameterized spark.sql() (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.s... query examples from the documentation with following results:
Parameterized spark.sql() works for:
- single user clusters with DBR 13.3 LTS, 14.0 and 14.2
- shared access mode clusters with DBR 13.3 LTS
Parameterized spark.sql() does NOT work for
- shared access mode clusters with DBR 14.0 and 14.2
So it seems as if the shared access mode does not fully support parameterized queries yet. If I remember correctly, shared access mode clusters introduced spark connect start DBR14.0 I assume that this is the issue. Are there plans to support parameterized spark.sql() for shared access mode cluster with DBR > 13.3 in the future?
โ05-10-2024 12:43 AM
Thanks for the clarification @Michael_Appiah, very helpfull! Is there already a timeline when this will be supported in DBR 14.x ? As alternatives are not sql injection proof enough for us.
โ05-15-2024 03:45 AM
@Cas Unfortunately I do not have any information on this. However, I have seen that DBR 14.3 and 15.0 introduced some changes to spark.sql(). I have not checked whether those changes resolve the issue outlined here. Your best bet is probably to go ahead and try with the DBR 15.1 (or 15.2 which is in Beta). Maybe @Retired_mod has more information on any future plans to support parameterized spark.sql for shared access mode Clusters with DBR > 13.3?
โ09-18-2024 08:51 AM
Can confirm it's working again, tested on a job cluster with DBR 15.4 LTS. It failed on 14.3 LTS.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group