cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Server To Databricks Table Migration

YS1
Contributor

Hello,

Is there an equivalent SQL code for the following Pyspark code? I'm trying to copy a table from SQL Server to Databricks and save it as a managed delta table.

jdbcHostname = "your_sql_server_hostname"
jdbcPort = 1433
jdbcDatabase = "your_database_name"
jdbcUsername = "your_username"
jdbcPassword = "your_password"

# JDBC URL format for SQL Server
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase}"

# Connection properties
connectionProperties = {
  "user" : jdbcUsername,
  "password" : jdbcPassword,
  "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
df = spark.read.jdbc(url=jdbcUrl, table=query, properties=connectionProperties)
df.write.format("delta").mode("overwrite").saveAsTable("table_name")
1 ACCEPTED SOLUTION

Accepted Solutions

ranged_coop
Valued Contributor II

I hope you are asking for a SQL version of the pyspark code ? Can you please explain the advantages of having it in SQL as compared to pyspark ? There are some options, best would be federated queries against sql server ? select as if it were a databricks table and write it to the target ? alternatively you could create a view against the sql server table and then use that as insert into the databricks table...but in my limited understanding (could be wrong)  all of it would basically get optimized similarly in the background and serves no additional benefits to be rewritten in SQL...

View solution in original post

4 REPLIES 4

ranged_coop
Valued Contributor II

I hope you are asking for a SQL version of the pyspark code ? Can you please explain the advantages of having it in SQL as compared to pyspark ? There are some options, best would be federated queries against sql server ? select as if it were a databricks table and write it to the target ? alternatively you could create a view against the sql server table and then use that as insert into the databricks table...but in my limited understanding (could be wrong)  all of it would basically get optimized similarly in the background and serves no additional benefits to be rewritten in SQL...

@ranged_coop Yes, I'm asking for a SQL version of my Pyspark code. The only reason is to give it to person who only code in SQL which would make it easier for them to understand. Thanks for the suggested solution!

jacovangelder
Honored Contributor

The only option to have it in Databricks SQL is lakehouse federation with a SQL Server connection. 

Thank you @jacovangelder 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group