cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT pipeline python stop scanning all databases in source

turagittech
Contributor

Hi All,

I have set up a DLT pipleline for SQL Server to use CDC as per this instruction https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/sql-server-pipeline I have it in principal working, however, it scans all databases associated with the server. The account can read from any database on that server.

How do I restrict it to only one database. source_catalog doesn't appear to restrict it?

Will I need to create a service principal, or other account and connection, with only read access in the one database?

2 REPLIES 2

Renu_
Valued Contributor II

Hi @turagittech, to prevent the Databricks CDC pipeline from scanning all databases on your SQL Server, try setting up a new account with read access only to the specific database. Just ensure this account doesnโ€™t have permissions to any other databases on the server, and use its credentials when setting up the connection in Databricks.

turagittech
Contributor

I thought I might follow up this after getting it all working with the help of my local Databricks office. AS the CDC has been crated it scans metadata for the server that you connect to. This may get altered in a future release, I have no idea as to the benefits of either. It does it once at initial start of the cdc_gateway and it may do it periodically at some later time. It appears relatively benign to both the server and Databricks. The permissions that are required for CDC and it will fail if you don't have them right means it can't be limited to only looking at the database you have you connection for.  The product seems good in public preview. This behaviour is a bit unnerving for initial deployment, but appears to cause no issues.