- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2015 04:01 AM
I'd like to access a table on a MS SQL Server (Microsoft). Is it possible from Databricks?
To my understanding, the syntax is something like this (in a SQL Notebook):
CREATE TEMPORARY TABLE jdbcTable
USING org.apache.spark.sql.jdbc
OPTIONS ( url "jdbc:sqlserver://aaaa.database.windows.net;DatabaseName=bbbb;user=xxx;password=yyy",
dbtable "SalesLT.Product" )The following error occurs: Error in SQL statement: SQLException: No suitable driver found for jdbc:sqlserver
Some research brought me here: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases
Based on this sentence: "To get started you will need to include the JDBC driver for you particular database on the spark classpath." I uploaded the sqljdbc4.jar in the library and attached to the cluster, but unfortunately the error still occurs.
I'd very much like to do this in either R or SQL, unfortunately I don't know Java or Scala that well, Python is so-so.
I'm relatively new to Spark - any help is much appreciated!
- Labels:
-
Data Ingestion & connectivity
-
Spark
-
SQL
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2015 02:48 PM
Hi Vida,
Thank you for your response and for your suggestion to go to this particular notebook. I got the connection now!
For future reference, these where the obstacles for me
- "You just need to use the right JDBC driver." --> Where could I find 'the right JDBC driver'? I found sqljdbc4.jar somewhere, uploaded it in 'create library' before starting the cluster (or restart cluster).
- Azure has a JDBC connection string that you can copy, there you can clearly find the parameters you need. I used this complete connection string in the "url"
- don't forget to whitelist the IP address in Azure
- I wasn't sure what to use for the 'driver'. This worked for me: "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val resultsDF = sqlContext.load("jdbc", Map( "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url" -> s"jdbc:sqlserver://xxxxx.database.windows.net:1433;database=yyyydb01;user=someuser@xxxxx;password=secret;encrypt=true; trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;", "dbtable" -> s"$t_table"))
And in a SQL Notebook: CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:sqlserver://xxxxx.databse.windows.net;DabaseName=yyyydb01;user=someuser;password=secret", dbtable "SalesLT.Product")
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2015 09:51 AM
Hi Tamara,
Please go to the Databricks Guide >> Accessing Data >> Databases & Datasources >> JDBC for SQL databases notebook.
Did you try that? Which step did you get stuck? There is code in that notebook to make sure that you have the correct driver for your SQL server type installed by using a regular java JDBC connection. Can you get that to work first? It's best to troubleshoot that before trying to get Spark to connect to your database.
-Vida
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2015 02:48 PM
Hi Vida,
Thank you for your response and for your suggestion to go to this particular notebook. I got the connection now!
For future reference, these where the obstacles for me
- "You just need to use the right JDBC driver." --> Where could I find 'the right JDBC driver'? I found sqljdbc4.jar somewhere, uploaded it in 'create library' before starting the cluster (or restart cluster).
- Azure has a JDBC connection string that you can copy, there you can clearly find the parameters you need. I used this complete connection string in the "url"
- don't forget to whitelist the IP address in Azure
- I wasn't sure what to use for the 'driver'. This worked for me: "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val resultsDF = sqlContext.load("jdbc", Map( "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url" -> s"jdbc:sqlserver://xxxxx.database.windows.net:1433;database=yyyydb01;user=someuser@xxxxx;password=secret;encrypt=true; trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;", "dbtable" -> s"$t_table"))
And in a SQL Notebook: CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:sqlserver://xxxxx.databse.windows.net;DabaseName=yyyydb01;user=someuser;password=secret", dbtable "SalesLT.Product")
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2016 07:51 PM
Few days ago I had to write some programs to connect with MS SQL..
Here is quick snippet.. Enjoy!
Import the data from MS SQL Server and Export the CSV to file system (could be a mount pointing to S3 or any other location)
By default databrick cluster doesn't any jdbc driver for MS SQL Server by default so download the jar file (sqljdbc42.jar) from microsoft site https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
>// Reference to correct driver class
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
import com.microsoft.sqlserver.jdbc.SQLServerDriver
import java.sql.DriverManager
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
// MS SQL JDBC Connection String ...
val jdbcSqlConnStr = "jdbc:sqlserver://ServerIP:1433;databaseName=YourDBName;user=user1223;password=xxxxx;"
// Loading the ms sql table via spark context into dataframe
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConnStr,
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"dbtable" -> "yourtablename")).load()
// Registering the temp table so that we can SQL like query against the table
jdbcDF.registerTempTable("yourtablename")
// selecting only top 10 rows here but you can use any sql statement
val yourdata = sqlContext.sql("SELECT * FROM yourtablename LIMIT 10")
// display the data
yourdata.show()
// Simple write to mount location which could be pointing to S3 or any other storage
// If you planning to overwrite the same file then its important that you use overwrite
customerSumtop10.write.format("com.databricks.spark.csv").mode(SaveMode.Overwrite).option("header", "true").save("/yourmount/yourdata.csv")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2018 10:41 AM
Is there way to connect with AAD instead of SQL userid/password?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2016 01:19 PM
sqlContext.load() - No more working
var df1=sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://localhost:1433;database=").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","tablename").option("user","").option("password","**").load()
df1.show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2016 09:09 AM
You're pointed at a local database - Databricks won't be able to read from localhost
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2016 01:14 PM
Actually what i wanna mention is the change in the syntax, and I am using my server name instead of localhost.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-27-2018 01:19 AM
Thanks for the trick that you have shared with us. I am really amazed to use this informational post.
If you are facing MacBook error like MacBook Pro won't turn on black screen then click the link.

