cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Server To Databricks Table Migration

YS1
Contributor

Hello,

Is there an equivalent SQL code for the following Pyspark code? I'm trying to copy a table from SQL Server to Databricks and save it as a managed delta table.

jdbcHostname = "your_sql_server_hostname"
jdbcPort = 1433
jdbcDatabase = "your_database_name"
jdbcUsername = "your_username"
jdbcPassword = "your_password"

# JDBC URL format for SQL Server
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase}"

# Connection properties
connectionProperties = {
  "user" : jdbcUsername,
  "password" : jdbcPassword,
  "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
df = spark.read.jdbc(url=jdbcUrl, table=query, properties=connectionProperties)
df.write.format("delta").mode("overwrite").saveAsTable("table_name")
1 ACCEPTED SOLUTION

Accepted Solutions

ranged_coop
Valued Contributor II

I hope you are asking for a SQL version of the pyspark code ? Can you please explain the advantages of having it in SQL as compared to pyspark ? There are some options, best would be federated queries against sql server ? select as if it were a databricks table and write it to the target ? alternatively you could create a view against the sql server table and then use that as insert into the databricks table...but in my limited understanding (could be wrong)  all of it would basically get optimized similarly in the background and serves no additional benefits to be rewritten in SQL...

View solution in original post

4 REPLIES 4

ranged_coop
Valued Contributor II

I hope you are asking for a SQL version of the pyspark code ? Can you please explain the advantages of having it in SQL as compared to pyspark ? There are some options, best would be federated queries against sql server ? select as if it were a databricks table and write it to the target ? alternatively you could create a view against the sql server table and then use that as insert into the databricks table...but in my limited understanding (could be wrong)  all of it would basically get optimized similarly in the background and serves no additional benefits to be rewritten in SQL...

@ranged_coop Yes, I'm asking for a SQL version of my Pyspark code. The only reason is to give it to person who only code in SQL which would make it easier for them to understand. Thanks for the suggested solution!

jacovangelder
Contributor III

The only option to have it in Databricks SQL is lakehouse federation with a SQL Server connection. 

Thank you @jacovangelder 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!