cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Migrating data from synapse to databricks

Krizofe
New Contributor II

Hello team,

I have a requirement of moving all the table from Azure Synapse (dedicated sql pool) to databricks.

we have a data coming up from source to azure data lake frequently.

we have Azure data factory to load data (data flow does the basic transformation ) to Synapse.

we are looking to migrate data(tables) from synapse to databricks as a requirement.

And the new data from azure data lake to be ingested directly to databricks bypassing synapse.

can you please let me know the best approach to perform this action.

your insights are valuable!! Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Rishabh-Pandey
Esteemed Contributor

@Krizofe  You can use Azure Databricks to directly query and load data from Azure Synapse using Apache Spark. Databricks has built-in connectors that allow it to read from Synapse (or its underlying SQL Data Warehouse).

load it from synapse and then write it to ADLS .

Rishabh Pandey

View solution in original post

6 REPLIES 6

Rishabh-Pandey
Esteemed Contributor

Use Azure Data Factory (ADF):

  • Configure a pipeline in ADF to copy data from Synapse SQL to Azure Data Lake. Set up an ADF Copy Activity to handle this data transfer.

  • Source Dataset: Azure Synapse SQL table.

  • Sink Dataset: Azure Data Lake Storage or Azure Blob Storage.

  • Data Movement Activity: Use the Copy Data activity in ADF.

 

Load Data into Databricks

  1. Create Delta Tables in Databricks:

    • Use Delta Lake for storing and managing data in Databricks. Delta Lake provides ACID transactions and supports efficient queries.
  2. Automate Data Ingestion:

    • Set up Databricks jobs or Delta Live Tables to automate the ingestion of data from Azure Data Lake into Delta tables.
  3. Update Data Pipeline

    1. Adjust ADF Pipelines:

      • Update your Azure Data Factory pipelines to load new data directly into Databricks instead of Synapse.

      • Source Dataset: Azure Data Lake Storage.

      • Sink Dataset: Databricks Delta table.

Rishabh Pandey

Hello Rishabh Pandey,

Thanks for the reply but just have some concerns.

The data volume is huge like in billions and data gets deleted once loaded to Synapse.

Pushing all those data back to Azure Storage account from Synapse will be time consuming and costing more on storage. (Note: we have 3k tables with billions of rows)

So, we have any approach to directly move data to Databricks without staging it to Azure storage.

Thanks for the reply, mate.

Your insights will be valuable.

 

Rishabh-Pandey
Esteemed Contributor

@Krizofe  You can use Azure Databricks to directly query and load data from Azure Synapse using Apache Spark. Databricks has built-in connectors that allow it to read from Synapse (or its underlying SQL Data Warehouse).

load it from synapse and then write it to ADLS .

Rishabh Pandey

ebm
New Contributor II

Hi Rishabh,

Thanks for the explanation, it looks a little oversimplified. I feel like copying tables from one place to another is not a big deal, however, the real challenge is to update the stored procedures and deal with system integrations. What is your recommendation for that?

 

Stefan-Koch
Contributor III

Hi @Krizofe 

Instead of copying all the data with ADF into files in the Data lake, you could use Lakehouse Federation in Databricks. Create a connection to Azure Synapse, and then you can query your data from Synapse directly in a catalog in Unity Catalog.

thelogicplus
New Contributor III

Hi @Krizofe , Just gone through you deatils and thought our similar experience  with  Azure Synapse to databrick migration. We faced a similar situation and were initially hesitant, One of the my colleague recommanded to use Travinto Technologies accelator but after using Travinto Technologies' ETL migrator solution and services, we were blown away.Their accelerator helped us seamlessly migrate data and ETL jobs for our clients, saving us time and effort. We highly recommend Travinto Technologies.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group