Lakeflow Connect Data ingestion from SQL Server and PostgreSQL to Databricks with CDC

shan-databricks
Databricks Partner
We have a requirement to use Lakeflow Connect for data ingestion from SQL Server and PostgreSQL into Databricks with CDC and Lakehouse federation. I would like to understand the pros and cons of Lakeflow Connect in the following areas
 
Firewall/gateway considerations
CDC capabilities
Reliability
Overall success of Lakeflow implementation
Overall success of Lakehouse federation

ziafazal
Databricks Partner

Hi @shan-databricks 

You should setup postgresql for ingestion via Lakeflow connect. Once your Postgres logical replication is ready you have to create ingestion pipelines which comprise a gateway and ingestion pipeline. Your gateway pipeline is continuous pipeline to pull changed data from the source Postgres database and stores it into you staging catalog in Databricks. Ingestion gateway pipeline should use a compute which resides in your Databricks VPC and that VPC should be whitelisted in your firewall. Second pipeline use serverless compute to move changed data from stage catalog to target catalog's bronze schema.

Thanks