<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Automate Lakeflow connect to ingest 300 tables not manually in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158134#M54666</link>
    <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt;&amp;nbsp;for your support. This YAML approach is working for me.&lt;/P&gt;</description>
    <pubDate>Tue, 02 Jun 2026 13:04:27 GMT</pubDate>
    <dc:creator>muaaz</dc:creator>
    <dc:date>2026-06-02T13:04:27Z</dc:date>
    <item>
      <title>Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158058#M54655</link>
      <description>&lt;P&gt;I have data in PostgreSQL and I’m using Lakeflow Connect via UI to ingest it into Databricks streaming tables.&lt;/P&gt;&lt;P&gt;Currently, each Lakeflow Connect pipeline only allows connecting one PostgreSQL table. I have around 300 tables, and creating pipelines manually for each table is time-consuming.&lt;/P&gt;&lt;P&gt;I’m looking for a way to automate this process, where I can provide a PostgreSQL connection and table names (or a list/schema), and automatically generate and deploy the required Lakeflow Connect pipelines.&lt;/P&gt;&lt;P&gt;I explored Asset Bundles and YAML-based definitions, but it seems Lakeflow Connect resources are not fully supported there yet.&lt;/P&gt;&lt;P&gt;What would be a scalable or recommended approach to design this setup in Databricks?&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 15:06:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158058#M54655</guid>
      <dc:creator>muaaz</dc:creator>
      <dc:date>2026-06-01T15:06:56Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158067#M54656</link>
      <description>&lt;P&gt;Configuring &lt;STRONG&gt;Databricks Lake flow Connect for PostgreSQL&lt;/STRONG&gt; is a streamlined, multi-step seamless process and you can ingest multiple tables within a single pipeline.&lt;/P&gt;&lt;P&gt;You can follow below&lt;/P&gt;&lt;H3&gt;Selecting Multiple Tables via the UI&lt;/H3&gt;&lt;P&gt;In the pipeline creation wizard where you will select your tables in the &lt;STRONG&gt;Source&lt;/STRONG&gt; step (&lt;I&gt;"Specify what data to ingest" - 3rd step&lt;/I&gt;).&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;You can check the boxes for all the tables you want to include.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;For each selected table, you can individually configure specific settings such as &lt;STRONG&gt;Primary Keys&lt;/STRONG&gt; and &lt;STRONG&gt;History Tracking&lt;/STRONG&gt; (SCD behavior). Ensure the post gres schema &amp;amp; tables are configured before creating a pipeline in Lakeflow Connect&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Scalability &amp;amp; Limits&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Table Limits:&lt;/STRONG&gt; Databricks recommends configuring &lt;STRONG&gt;250 or fewer tables per pipeline&lt;/STRONG&gt; to ensure optimal performance and manageability. If you need to ingest more than 250 tables, you can split them across multiple pipelines grouping by domain or schema.&amp;nbsp;&lt;SPAN&gt;More details &lt;/SPAN&gt;&lt;A style="background-color: #ffffff;" href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/postgresql-limits#tables" target="_blank" rel="noopener"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Data Volume:&lt;/STRONG&gt; There is &lt;STRONG&gt;no limit&lt;/STRONG&gt; on the number of rows or columns supported within these tables.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;YAML&lt;/H3&gt;&lt;P&gt;You can configure your multi table Lake flow pipelines using &lt;STRONG&gt;YAML configuration&amp;nbsp;&lt;/STRONG&gt;if you prefer configuration&amp;nbsp;to ensure reproducibility. More details &lt;A href="https://dzone.com/articles/lakeflow-connect-postgresql-integration-tutorial" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 16:53:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158067#M54656</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-06-01T16:53:07Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158109#M54660</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt;&amp;nbsp;for your reply.&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;My goal is to find a solution that works across all Lakeflow Connect connectors. For the proof of concept, I am using PostgreSQL, but the approach should ideally be connector agnostic.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We would prefer not to use the UI because we don't want to manually configure hundreds of tables and pipelines. Instead, we are looking for an Infrastructure-as-Code or YAML-based approach where connector configurations and Lakeflow Connect pipeline definitions can be managed declaratively.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;Is there a way to define the connector configuration and Lakeflow Connect pipeline once, and then dynamically onboard multiple tables without manually creating through the UI?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any guidance, examples, or recommended patterns would be greatly appreciated.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 08:16:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158109#M54660</guid>
      <dc:creator>muaaz</dc:creator>
      <dc:date>2026-06-02T08:16:51Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158111#M54661</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/226843"&gt;@muaaz&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Still you can achieve that. You can use Databricks Automation Bundles (DABs) to implement dynamic behaviour:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="ruby"&gt;resources:
  pipelines:
    gateway:
      name: &amp;lt;gateway-name&amp;gt;
      gateway_definition:
        connection_id: &amp;lt;connection-id&amp;gt;
        gateway_storage_catalog: &amp;lt;destination-catalog&amp;gt;
        gateway_storage_schema: &amp;lt;destination-schema&amp;gt;
        gateway_storage_name: &amp;lt;destination-schema&amp;gt;
      target: &amp;lt;destination-schema&amp;gt;
      catalog: &amp;lt;destination-catalog&amp;gt;

    pipeline_sqlserver:
      name: &amp;lt;pipeline-name&amp;gt;
      catalog: &amp;lt;target-catalog-1&amp;gt; # Location of the pipeline event log
      schema: &amp;lt;target-schema-1&amp;gt; # Location of the pipeline event log
      ingestion_definition:
        connection_name: &amp;lt;connection-name&amp;gt;
        objects:
          - table:
              source_schema: &amp;lt;source-schema-1&amp;gt;
              source_table: &amp;lt;source-table-1&amp;gt;
              destination_catalog: &amp;lt;target-catalog-1&amp;gt; # Location of this table
              destination_schema: &amp;lt;target-schema-1&amp;gt; # Location of this table
          - table:
              source_schema: &amp;lt;source-schema-2&amp;gt;
              source_table: &amp;lt;source-table-2&amp;gt;
              destination_catalog: &amp;lt;target-catalog-2&amp;gt; # Location of this table
              destination_schema: &amp;lt;target-schema-2&amp;gt; # Location of this table&lt;/LI-CODE&gt;&lt;P&gt;Check following documentation for details:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/multi-destination-pipeline#sql-server" target="_blank" rel="noopener"&gt;Create multi-destination pipelines | Databricks on AWS&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;If my answer was helpful, please consider marking it as accepted solution.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 08:37:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158111#M54661</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-06-02T08:37:37Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158112#M54662</link>
      <description>&lt;P&gt;You can seamlessly execute the things done via UI in the DABs.&amp;nbsp;&lt;SPAN&gt;You can configure your multi table Lake flow pipelines using&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;YAML configuration&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;if you prefer configuration&amp;nbsp;to ensure reproducibility. More details&amp;nbsp;for Post gre sql ingestion&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://dzone.com/articles/lakeflow-connect-postgresql-integration-tutorial" target="_blank" rel="nofollow noopener noreferrer"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can manage Lakeflow Connect pipelines as code using Asset Bundles for sql server by adding few files like below and use similar approach for other databases&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Workflow file that controls the frequency of data ingestion (sqlserver.yml).&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;variables:
  # Common variables used multiple places in the DAB definition.
  gateway_name:
    default: sqlserver01-gateway-pipeline
  dest_catalog:
    default: main
  dest_schema:
    default: sqlserver01

resources:
  pipelines:
    gateway:
      name: ${var.gateway_name}
      gateway_definition:
        connection_name: rebel
        gateway_storage_catalog: main
        gateway_storage_schema: sqlserver01
        gateway_storage_name: sqlserver01-gateway-pipeline
      catalog: main
      target: sqlserver01

    pipeline_sqlserver:
      name: sqlserver-ingestion-pipeline
      ingestion_definition:
        ingestion_gateway_id: ${resources.pipelines.gateway.id}
        objects:
          - schema:
              # Ingest all tables in the sqlserver01.dbo schema to main.dest_schema. The destination table name will be drivers, the same as it is on the source.
              source_catalog: sqlserver01
              source_schema: dbo
              destination_catalog: main
              destination_schema: sqlserver01
      target: sqlserver01
      catalog: main&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;Pipeline Job definition file (sqlserver_pipeline.yml).&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;resources:
  jobs:
    sqlserver_dab_job:
      name: sqlserver-ingestion-pipeline job

      trigger:
        periodic:
          interval: 8
          unit: HOURS

      email_notifications:
        on_failure:
          - user email

      tasks:
        - task_key: refresh_pipeline
          pipeline_task:
            pipeline_id: ${resources.pipelines.pipeline_sqlserver.id}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 08:41:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158112#M54662</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-06-02T08:41:50Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158134#M54666</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt;&amp;nbsp;for your support. This YAML approach is working for me.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 13:04:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158134#M54666</guid>
      <dc:creator>muaaz</dc:creator>
      <dc:date>2026-06-02T13:04:27Z</dc:date>
    </item>
    <item>
      <title>Re: Automate Lakeflow connect to ingest 300 tables not manually</title>
      <link>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158135#M54667</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;for your support.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 13:06:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/automate-lakeflow-connect-to-ingest-300-tables-not-manually/m-p/158135#M54667</guid>
      <dc:creator>muaaz</dc:creator>
      <dc:date>2026-06-02T13:06:37Z</dc:date>
    </item>
  </channel>
</rss>

