Batch reading from sql server tables with cdc on ssql server tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2025 04:38 PM
Hi all,
I need to do a batch load from sql server into Databricks. I have CC enabled on some tables. The simple appears to be union CDC and regular table to get a single set of records to load, but this appears to be fraught with risk of out of sequence data and potentially leaving the last state of the table in Databricks not reflecting the true state of the source table. Also, does anyone know how I can use SQL server table valued functions? My query appeared to fail from the existence of the function in the query when run from Databricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2025 03:06 AM
For Q1,
Consider this approach or a watermark based approach:
https://learn.microsoft.com/en-us/azure/databricks/ldp/cdc
For Q2,
You have few options: Pushdown via JDBC Query, Use a SQL View instead if you want to avoid replicating the logic in spark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2025 06:38 PM
Yes, you can use TVFs on Databricks. Please check the following link: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-select-tvf#gsc.tab=0
Can you please elaborate on how you are loading the SQL Server Data into Databricks? Hopefully, you are using the
Lakeflow connector for SQL Server to ingest it.
https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/sql-server-pipeline#gsc.tab=0