How can I call a stored procedure in Spark Sql?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2016 08:44 AM
I have seen the following code:
val url =
"jdbc:mysql://yourIP:yourPort/test?
user=yourUsername; password=yourPassword"
val df = sqlContext
.read
.format("jdbc")
.option("url", url)
.option("dbtable", "people")
.load()
But I need to run a stored procedure. When I use
exec
command for the dbtable
option above, it gives me this error:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'exec'.
- Labels:
-
Spark sql
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2018 04:09 AM
Hi.
From the docs
The JDBC table that should be read. Note that anything that is valid in aFROM
clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.
So has to be a subquery or alternatively you can use table functions and to achieve the same as a stored procedure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2018 02:28 PM
you can use User Defined function
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-23-2019 01:22 AM
Hi, could you please elaborate? I understand that unles you bury some dynamic sql into a UDF then you can't do anything other than select data and return it.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-03-2019 08:34 PM
This doesn't seem to be supported. There is an alternative but requires using pyodbc and adding to your init script. Details can be found here:
https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark
I have tested this myself and works fine. If anyone has any alternative methods please let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-23-2019 01:23 AM
Thanks. I found this article also. I was concerned about it using driver mode and blocking all worker nodes. This sounds quite bad if you have many concurrent jobs running or need to call stored procs frequently. Are you still using this approach or did you find another approach?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-15-2019 10:35 PM
Hi @Christian Bracchi, we're still using this approach at the moment and haven't experienced any issues so far. Although we only have one production job running at the moment!

