cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Updating tables from SQL Server to Databricks

YS1
Contributor

Hi,

I have SQL Server tables which are the primary location for all live transactions happen and currently I read them through pyspark as dataframes and overwrite them everyday to have the latest copy of them in Databricks. The problem is it takes long time to copy all tables from SQL Server to Databricks and the tables don't have a primary key like an id, some tables have timestamps. The question is how can I read only the changes and write them to my Databricks tables in an efficient way without the need to rewrite the whole table again and again?

3 REPLIES 3

Stephen678
New Contributor II

Do you have any audit column in the sql server table which can be used for incremental loading?

Is Sql server table full load each time?

For the first question, some tables have pkid or timestamp columns, but some don't.

For the second question, some tables gets only appended to while others are truncated and created again (on SQL Server) but on Databricks currently we overwrite them everyday.

Anonymous
Not applicable

Hi @YS1 

Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group