topic Re: Automate Lakeflow connect to ingest 300 tables not manually in Data Engineering

Automate Lakeflow connect to ingest 300 tables not manually

muaaz — Mon, 01 Jun 2026 15:06:56 GMT

I have data in PostgreSQL and I’m using Lakeflow Connect via UI to ingest it into Databricks streaming tables.

Currently, each Lakeflow Connect pipeline only allows connecting one PostgreSQL table. I have around 300 tables, and creating pipelines manually for each table is time-consuming.

I’m looking for a way to automate this process, where I can provide a PostgreSQL connection and table names (or a list/schema), and automatically generate and deploy the required Lakeflow Connect pipelines.

I explored Asset Bundles and YAML-based definitions, but it seems Lakeflow Connect resources are not fully supported there yet.

What would be a scalable or recommended approach to design this setup in Databricks?

Re: Automate Lakeflow connect to ingest 300 tables not manually

balajij8 — Mon, 01 Jun 2026 16:53:07 GMT

Configuring Databricks Lake flow Connect for PostgreSQL is a streamlined, multi-step seamless process and you can ingest multiple tables within a single pipeline.

You can follow below

Selecting Multiple Tables via the UI

In the pipeline creation wizard where you will select your tables in the Source step ("Specify what data to ingest" - 3rd step).

You can check the boxes for all the tables you want to include.
For each selected table, you can individually configure specific settings such as Primary Keys and History Tracking (SCD behavior). Ensure the post gres schema & tables are configured before creating a pipeline in Lakeflow Connect

Scalability & Limits

Table Limits: Databricks recommends configuring 250 or fewer tables per pipeline to ensure optimal performance and manageability. If you need to ingest more than 250 tables, you can split them across multiple pipelines grouping by domain or schema. More details here
Data Volume: There is no limit on the number of rows or columns supported within these tables.

YAML

You can configure your multi table Lake flow pipelines using YAML configuration if you prefer configuration to ensure reproducibility. More details here

Re: Automate Lakeflow connect to ingest 300 tables not manually

muaaz — Tue, 02 Jun 2026 08:16:51 GMT

Thanks @balajij8 for your reply.

My goal is to find a solution that works across all Lakeflow Connect connectors. For the proof of concept, I am using PostgreSQL, but the approach should ideally be connector agnostic.

We would prefer not to use the UI because we don't want to manually configure hundreds of tables and pipelines. Instead, we are looking for an Infrastructure-as-Code or YAML-based approach where connector configurations and Lakeflow Connect pipeline definitions can be managed declaratively.

Is there a way to define the connector configuration and Lakeflow Connect pipeline once, and then dynamically onboard multiple tables without manually creating through the UI?

Any guidance, examples, or recommended patterns would be greatly appreciated.

Re: Automate Lakeflow connect to ingest 300 tables not manually

szymon_dybczak — Tue, 02 Jun 2026 08:37:37 GMT

Hi @muaaz ,

Still you can achieve that. You can use Databricks Automation Bundles (DABs) to implement dynamic behaviour:

resources: pipelines: gateway: name: <gateway-name> gateway_definition: connection_id: <connection-id> gateway_storage_catalog: <destination-catalog> gateway_storage_schema: <destination-schema> gateway_storage_name: <destination-schema> target: <destination-schema> catalog: <destination-catalog> pipeline_sqlserver: name: <pipeline-name> catalog: <target-catalog-1> # Location of the pipeline event log schema: <target-schema-1> # Location of the pipeline event log ingestion_definition: connection_name: <connection-name> objects: - table: source_schema: <source-schema-1> source_table: <source-table-1> destination_catalog: <target-catalog-1> # Location of this table destination_schema: <target-schema-1> # Location of this table - table: source_schema: <source-schema-2> source_table: <source-table-2> destination_catalog: <target-catalog-2> # Location of this table destination_schema: <target-schema-2> # Location of this table

Check following documentation for details:

Create multi-destination pipelines | Databricks on AWS

If my answer was helpful, please consider marking it as accepted solution.

Re: Automate Lakeflow connect to ingest 300 tables not manually

balajij8 — Tue, 02 Jun 2026 08:41:50 GMT

You can seamlessly execute the things done via UI in the DABs. You can configure your multi table Lake flow pipelines using YAML configuration if you prefer configuration to ensure reproducibility. More details for Post gre sql ingestion here

You can manage Lakeflow Connect pipelines as code using Asset Bundles for sql server by adding few files like below and use similar approach for other databases

Workflow file that controls the frequency of data ingestion (sqlserver.yml).

variables: # Common variables used multiple places in the DAB definition. gateway_name: default: sqlserver01-gateway-pipeline dest_catalog: default: main dest_schema: default: sqlserver01 resources: pipelines: gateway: name: ${var.gateway_name} gateway_definition: connection_name: rebel gateway_storage_catalog: main gateway_storage_schema: sqlserver01 gateway_storage_name: sqlserver01-gateway-pipeline catalog: main target: sqlserver01 pipeline_sqlserver: name: sqlserver-ingestion-pipeline ingestion_definition: ingestion_gateway_id: ${resources.pipelines.gateway.id} objects: - schema: # Ingest all tables in the sqlserver01.dbo schema to main.dest_schema. The destination table name will be drivers, the same as it is on the source. source_catalog: sqlserver01 source_schema: dbo destination_catalog: main destination_schema: sqlserver01 target: sqlserver01 catalog: main

Pipeline Job definition file (sqlserver_pipeline.yml).

resources: jobs: sqlserver_dab_job: name: sqlserver-ingestion-pipeline job trigger: periodic: interval: 8 unit: HOURS email_notifications: on_failure: - user email tasks: - task_key: refresh_pipeline pipeline_task: pipeline_id: ${resources.pipelines.pipeline_sqlserver.id}

Re: Automate Lakeflow connect to ingest 300 tables not manually

muaaz — Tue, 02 Jun 2026 13:04:27 GMT

Thanks @balajij8 for your support. This YAML approach is working for me.

Re: Automate Lakeflow connect to ingest 300 tables not manually

muaaz — Tue, 02 Jun 2026 13:06:37 GMT

Thanks @szymon_dybczak for your support.