cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Automate Lakeflow connect to ingest 300 tables not manually

muaaz
New Contributor II

I have data in PostgreSQL and Iโ€™m using Lakeflow Connect via UI to ingest it into Databricks streaming tables.

Currently, each Lakeflow Connect pipeline only allows connecting one PostgreSQL table. I have around 300 tables, and creating pipelines manually for each table is time-consuming.

Iโ€™m looking for a way to automate this process, where I can provide a PostgreSQL connection and table names (or a list/schema), and automatically generate and deploy the required Lakeflow Connect pipelines.

I explored Asset Bundles and YAML-based definitions, but it seems Lakeflow Connect resources are not fully supported there yet.

What would be a scalable or recommended approach to design this setup in Databricks?

1 REPLY 1

balajij8
Contributor III

Configuring Databricks Lake flow Connect for PostgreSQL is a streamlined, multi-step seamless process and you can ingest multiple tables within a single pipeline.

You can follow below

Selecting Multiple Tables via the UI

In the pipeline creation wizard where you will select your tables in the Source step ("Specify what data to ingest" - 3rd step).

  • You can check the boxes for all the tables you want to include.

  • For each selected table, you can individually configure specific settings such as Primary Keys and History Tracking (SCD behavior). Ensure the post gres schema & tables are configured before creating a pipeline in Lakeflow Connect

Scalability & Limits

  • Table Limits: Databricks recommends configuring 250 or fewer tables per pipeline to ensure optimal performance and manageability. If you need to ingest more than 250 tables, you can split them across multiple pipelines grouping by domain or schema. More details here

  • Data Volume: There is no limit on the number of rows or columns supported within these tables.

YAML

You can configure your multi table Lake flow pipelines using YAML configuration if you prefer configuration to ensure reproducibility. More details here