cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental Data copy from one SQL DB to another DB

sudhanshu1
New Contributor III

Hi All,

I have 20 tables in source sql DB and we need to create pipeline to incrementally load data into target database .

Can some one please suggest me best approach to achieve this using Azure Databricks please?

Should i use merge Into ? Copy Into? or something else please?

Please note all tables have a column which i can use to identify any changes happening in source .

7 REPLIES 7

-werners-
Esteemed Contributor III

I have some questions:

what is the source and target database?

do you apply transformations?

how much data are we talking about?

sudhanshu1
New Contributor III

My Source database is Azure postgress database . We have 20 tables in that that database , which we need to bring into other database ( Incremental loads).

Table volume is not big . They are medium size tables .

Also No transformation as of now. Just simple copy from one DB to another DB, but doing incremental load

-werners-
Esteemed Contributor III

I would not use databricks for that.

In fact what you do is a mere move of data.

Data Factory/Synapse pipelines is cheaper and better for those kind of things.

Hey , Thanks for suggestion . I too agree with you . I am just checking ,if we need to this in Databricks , then how we should approach this ?

I am comfortable in creating this pipeline in ADF using watermark column method, but i am not sure what's best approach in Databricks

-werners-
Esteemed Contributor III

Well IMO there is no best approach as there is no use case for Spark here.

spark is distributed data processing, you have neither need for distributed nor processing (transformations).

If you really want to do it using databricks, I'd open a jdbc connection to the source and target, read the data and write it to the target.

But I would not do that as I already said.

Thanks Werners . Your explanation is really nice .

Vartika
Databricks Employee
Databricks Employee

Hi @SUDHANSHU RAJ​,

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group