cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL Server to Databricks Migration

Avinash_Narala
Valued Contributor II

Hi,

I want to build a python function to migrate SQL Server tables to Databricks.

Is there any guide/ best practices on how to do so.

It'll be really helpful if there is any.

Regards,

Avinash N

1 ACCEPTED SOLUTION

Accepted Solutions

filipniziol
Contributor III

Hi @Avinash_Narala ,

If it is lift and shift, then try this:

1. Set up Lakehouse Federation to SQL Server

2. Use CTAS statements to copy each table into Unity Catalog

 

CREATE TABLE catalog_name.schema_name.table_name
AS 
SELECT *
FROM sql_server_catalog_name.sql_server_schema_name.your_sql_server_table

 

3. Loop through a list of tables in Python for automation

 

tables_to_migrate = [
    ("sql_server_schema_name", "SQL_TABLE_A", "unity_catalog_schema", "DEST_TABLE_A"),
    ("sql_server_schema_name", "SQL_TABLE_B", "unity_catalog_schema", "DEST_TABLE_B")
]

for src_schema, src_table, dest_schema, dest_table in tables_to_migrate:
    spark.sql(f"""
      CREATE TABLE my_catalog.{dest_schema}.{dest_table} 
      AS
      SELECT *
      FROM sql_server_catalog_name.{src_schema}.{src_table}
    """)

 

View solution in original post

7 REPLIES 7

Edward357J
New Contributor II

@Avinash_Narala wrote:

Hi,

I want to build a python function to migrate SQL Server tables to Databricks.

Is there any guide/ best practices on how to do so.

It'll be really helpful if there is any.

Regards,

Avinash N


Hello!

To migrate SQL Server tables to Databricks, ensure your SQL Server meets the version requirements and enable change tracking or CDC. Set up a dedicated database user for Databricks ingestion and use Databricks' LakeFlow Connect or other connectors to establish a connection between SQL Server and Databricks. Create an ingestion pipeline to transfer data and schedule it, monitoring its performance to ensure smooth data transfer. For detailed steps and best practices, refer to the official Databricks documentation and community forums.

I hope this helps, Avinash!

filipniziol
Contributor III

Hi @Avinash_Narala ,

Can you share in detail what are you planning to do?
Do you want to:
1. create only table structure or create tables and migrate the data

2. if that's the data migration, is it one-time operation or do you want your SQL Server to be the data source for some regular data loads?

Hi @filipniziol ,

I want to create table structure and migrate data to databricks, it is lift and shift, so only one-time operation.

filipniziol
Contributor III

Hi @Avinash_Narala ,

If it is lift and shift, then try this:

1. Set up Lakehouse Federation to SQL Server

2. Use CTAS statements to copy each table into Unity Catalog

 

CREATE TABLE catalog_name.schema_name.table_name
AS 
SELECT *
FROM sql_server_catalog_name.sql_server_schema_name.your_sql_server_table

 

3. Loop through a list of tables in Python for automation

 

tables_to_migrate = [
    ("sql_server_schema_name", "SQL_TABLE_A", "unity_catalog_schema", "DEST_TABLE_A"),
    ("sql_server_schema_name", "SQL_TABLE_B", "unity_catalog_schema", "DEST_TABLE_B")
]

for src_schema, src_table, dest_schema, dest_table in tables_to_migrate:
    spark.sql(f"""
      CREATE TABLE my_catalog.{dest_schema}.{dest_table} 
      AS
      SELECT *
      FROM sql_server_catalog_name.{src_schema}.{src_table}
    """)

 

Is there any way to 1st create only metadata(table structure) and then migrate the data.

What you can do is for example just add a WHERE condition in CTAS like:

CREATE TABLE ...
AS 
SELECT *
FROM 
  ...
WHERE 1 = 0

Since no records meet the condition, only the table structure is created.

Also, this is a simplistic solution that should be considered as a base for your script.
In real-world you may want to add liquid clustering, so your list with mapping would include another field with clustering columns and the script would be modified to include liquid clustering if needed etc.

 

It looks good, but doesn't it take more time to execute , i.e, in case if we have a huge table, we need only table structure but scanning whole table for the condition 1=0?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group