cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Question on best method to deliver Azure SQL Server data into Databricks Bronze and Silver.

Nick_Pacey
New Contributor III

Hi,

We have a Azure SQL Server (replicating from an On Prem SQL Server) that is required to be in Databricks bronze and beyond.

This database has 100s of tables that are all required.  Size of tables will vary from very small up to the biggest tables 100 million+ rows.  Change on biggest tables can be 10,000 rows per hour.

So far, we've been using Lakehouse Federation and materialised view generation via DLT pipelines to deliver SQL data into Databricks, but this scale/change is bigger.  We don't believe we can use incremental updates using this method (source doesn't have row tracking available) so we would have to bring in a full load of data into the mat view on every refresh.  Is this correct?

We're also looking again at native SQL CDC options.  This still seems to have the same limitations when we last looked i.e. you have to set this up for every table (as above, we have over 500 tables) and schema drift takes a fair bit of code and management.

Welcome thoughts and latest ideas on what's the best to handle this from the Databricks end.  Do you think our usual method will cope okay with this scale?  Are we missing something on MV incremental loads or CDC?

As always, thanks in advance!

Nick

2 REPLIES 2

ilir_nuredini
Honored Contributor

Hello @Nick_Pacey ,

Databricks has recently introduced Lakeflow Connect and it supports ingesting data from SQL Server. I have done some small experimentation which went all well but not at scale. It looks like a very promising option. To note that it is still in public preview.

You can refer to the documentation here: Managed connectors in Lakeflow Connect | Databricks Documentation
And if you wanna look into a more detailed doc for SQL Server you can check this article:
Efficient Data Ingestion from SQL Server with Lake... - Databricks Community - 122597

Hope that helps and I would definitely give it a shot to Lakeflow Connect.

Best, Ilir

kerem
Contributor

Hey Nick,

Have you tried the SQL Server connector with Lakeflow Connect? This should provide native connection to your SQL server, potentially allowing for incremental updates and CDC setup. 

https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/sql-server-pipeline

 

I haven’t tried this connector before but it seems like a good first thing to try for your case. 

 

Kerem Durak

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now