04-25-2023 06:28 AM
Hi All,
I have 20 tables in source sql DB and we need to create pipeline to incrementally load data into target database .
Can some one please suggest me best approach to achieve this using Azure Databricks please?
Should i use merge Into ? Copy Into? or something else please?
Please note all tables have a column which i can use to identify any changes happening in source .
04-25-2023 07:25 AM
I have some questions:
what is the source and target database?
do you apply transformations?
how much data are we talking about?
04-25-2023 08:18 AM
My Source database is Azure postgress database . We have 20 tables in that that database , which we need to bring into other database ( Incremental loads).
Table volume is not big . They are medium size tables .
Also No transformation as of now. Just simple copy from one DB to another DB, but doing incremental load
04-25-2023 08:20 AM
I would not use databricks for that.
In fact what you do is a mere move of data.
Data Factory/Synapse pipelines is cheaper and better for those kind of things.
04-25-2023 08:24 AM
Hey , Thanks for suggestion . I too agree with you . I am just checking ,if we need to this in Databricks , then how we should approach this ?
I am comfortable in creating this pipeline in ADF using watermark column method, but i am not sure what's best approach in Databricks
04-25-2023 08:27 AM
Well IMO there is no best approach as there is no use case for Spark here.
spark is distributed data processing, you have neither need for distributed nor processing (transformations).
If you really want to do it using databricks, I'd open a jdbc connection to the source and target, read the data and write it to the target.
But I would not do that as I already said.
04-25-2023 08:29 AM
Thanks Werners . Your explanation is really nice .
04-25-2023 11:12 PM
Hi @SUDHANSHU RAJ,
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group