Databricks Community

YS1 · ‎07-20-2023

Hi,

I have SQL Server tables which are the primary location for all live transactions happen and currently I read them through pyspark as dataframes and overwrite them everyday to have the latest copy of them in Databricks. The problem is it takes long time to copy all tables from SQL Server to Databricks and the tables don't have a primary key like an id, some tables have timestamps. The question is how can I read only the changes and write them to my Databricks tables in an efficient way without the need to rewrite the whole table again and again?

Stephen678 · ‎07-22-2023

Do you have any audit column in the sql server table which can be used for incremental loading?

Is Sql server table full load each time?

YS1 · ‎07-22-2023

For the first question, some tables have pkid or timestamp columns, but some don't.

For the second question, some tables gets only appended to while others are truncated and created again (on SQL Server) but on Databricks currently we overwrite them everyday.

Anonymous · ‎07-22-2023

Hi @YS1

Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.

Cheers!

Databricks Community

Updating tables from SQL Server to Databricks

Join Us as a Local Community Builder!

What's new in Databricks: July - August 2025

How to use Lakebase as a transactional data layer for Databricks Apps

🌟 Community Spark of the Week | Aug 22 – Aug 28 🌟

EMEA Learning Festival: Hands-on Learning Journey!

Virtual Learning Festival: 10 October - 31 October 2025