Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2016 07:51 PM
Few days ago I had to write some programs to connect with MS SQL..
Here is quick snippet.. Enjoy!
Import the data from MS SQL Server and Export the CSV to file system (could be a mount pointing to S3 or any other location)
By default databrick cluster doesn't any jdbc driver for MS SQL Server by default so download the jar file (sqljdbc42.jar) from microsoft site https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
>// Reference to correct driver classClass.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")import com.microsoft.sqlserver.jdbc.SQLServerDriverimport java.sql.DriverManagerimport org.apache.spark.sql.SQLContextimport sqlContext.implicits._ // MS SQL JDBC Connection String ... val jdbcSqlConnStr = "jdbc:sqlserver://ServerIP:1433;databaseName=YourDBName;user=user1223;password=xxxxx;" // Loading the ms sql table via spark context into dataframeval jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> jdbcSqlConnStr, "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver", "dbtable" -> "yourtablename")).load() // Registering the temp table so that we can SQL like query against the table jdbcDF.registerTempTable("yourtablename")// selecting only top 10 rows here but you can use any sql statementval yourdata = sqlContext.sql("SELECT * FROM yourtablename LIMIT 10") // display the data yourdata.show() // Simple write to mount location which could be pointing to S3 or any other storage // If you planning to overwrite the same file then its important that you use overwritecustomerSumtop10.write.format("com.databricks.spark.csv").mode(SaveMode.Overwrite).option("header", "true").save("/yourmount/yourdata.csv")