<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DELTA LIVE TABLE -Parallel processing in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141073#M51618</link>
    <description>&lt;P&gt;Maybe better approach will be to use dlt meta&amp;nbsp;&lt;A href="https://databrickslabs.github.io/dlt-meta/index.html" target="_blank"&gt;https://databrickslabs.github.io/dlt-meta/index.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Dec 2025 20:42:17 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2025-12-03T20:42:17Z</dc:date>
    <item>
      <title>DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/101856#M40859</link>
      <description>&lt;P&gt;how can we process multiple tables within a delta live table pipeline parallelly as table names as parameters.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 07:09:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/101856#M40859</guid>
      <dc:creator>JUMAN4422</dc:creator>
      <dc:date>2024-12-12T07:09:36Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/101909#M40881</link>
      <description>&lt;P class="p1"&gt;To process multiple tables within a Delta Live Table (DLT) pipeline in parallel using table names as parameters, you can leverage the flexibility of the DLT Python API. Here’s a step-by-step guide on how to achieve this:&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL class="ol1"&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Define the Tables Dynamically&lt;/STRONG&gt;:&lt;BR /&gt;&lt;BR /&gt;Use the &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table decorator to define your tables. You can create a function that takes table names as parameters and dynamically generates the required tables.&lt;A href="https://docs.databricks.com/en/delta-live-tables/python-ref.html" target="_blank"&gt;&lt;SPAN class="s1"&gt;1&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Use the dlt.read or spark.read.table Functions&lt;/STRONG&gt;:&lt;BR /&gt;&lt;BR /&gt;These functions allow you to read from other tables within the same pipeline. Use the LIVE keyword to reference tables defined in the same pipeline.&lt;A href="https://docs.databricks.com/en/delta-live-tables/python-ref.html" target="_blank"&gt;&lt;SPAN class="s1"&gt;1&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI class="li1"&gt;&lt;STRONG&gt;Parallel Processing&lt;/STRONG&gt;: While DLT manages the orchestration of tasks, you can define multiple tables in your pipeline, and DLT will handle their dependencies and execution order. Ensure that your tables are defined in a way that allows DLT to infer the dependencies correctly.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;Here’s an example of how you can define multiple tables dynamically:&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;import dlt&lt;/P&gt;
&lt;P class="p1"&gt;from pyspark.sql.functions import col&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Function to create a table&lt;/P&gt;
&lt;P class="p1"&gt;def create_table(table_name):&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(name=table_name)&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;def table_def():&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;return spark.read.table(f"source_database.{table_name}")&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# List of table names to process&lt;/P&gt;
&lt;P class="p1"&gt;table_names = ["table1", "table2", "table3"]&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Create tables dynamically&lt;/P&gt;
&lt;P class="p1"&gt;for table_name in table_names:&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;create_table(table_name)&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 12:54:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/101909#M40881</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2024-12-12T12:54:17Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/102193#M41012</link>
      <description>&lt;P&gt;if we use for loop to pass table names, it will be handled one by one, right?&lt;BR /&gt;if yes, can you suggest any other methods like I need to process 'n' number of tables at a time .&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 04:49:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/102193#M41012</guid>
      <dc:creator>JUMAN4422</dc:creator>
      <dc:date>2024-12-16T04:49:02Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/102454#M41121</link>
      <description>&lt;P&gt;can we run a dlt pipeline multiple time at the same time using different parameters using rest api call with asyncio.&lt;BR /&gt;&lt;BR /&gt;i have created a function to start the pipeline using rest api.&lt;BR /&gt;when calling the function with asyncio , i am getting&amp;nbsp;[409 Conflict]&amp;gt; error.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 07:28:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/102454#M41121</guid>
      <dc:creator>JUMAN4422</dc:creator>
      <dc:date>2024-12-18T07:28:30Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/111153#M43813</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;where you're ingesting the list "&lt;SPAN&gt;table_names = ["table1", "table2", "table3"]", can I replace this with the row values from a DLT view?&amp;nbsp;&lt;BR /&gt;When I've tried using the @dlt.view, I run into the error that I need to iterate within the confines of a dlt structure and if I use the rows from a @dlt.table then I run into a "table not found" error which I think is a limitation on how DLT sets up the DAG/relationships before actual processing?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Feb 2025 17:33:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/111153#M43813</guid>
      <dc:creator>ChantellevdWalt</dc:creator>
      <dc:date>2025-02-25T17:33:27Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141026#M51608</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/135897"&gt;@JUMAN4422&lt;/a&gt;&amp;nbsp;, if you have found any solution on this, please post&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 14:52:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141026#M51608</guid>
      <dc:creator>swatkat</dc:creator>
      <dc:date>2025-12-03T14:52:52Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141030#M51609</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;you can use for loop, it will process parallelly based on the cluster size.&lt;BR /&gt;Define the dlt logic in a function,&lt;/P&gt;&lt;P&gt;def dlt_logic(table_name):&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;........&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;then pass your table names in a list to the function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;table_names = ["table1", "table2", "table3"]&lt;/P&gt;&lt;P&gt;for table_name in table_names:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;dlt_logic(table_name)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 15:03:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141030#M51609</guid>
      <dc:creator>JUMAN4422</dc:creator>
      <dc:date>2025-12-03T15:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141060#M51617</link>
      <description>&lt;P class="qt3gz91 paragraph"&gt;DLT analyzes your code to build a dependency graph (DAG) and schedules independent flows concurrently up to the available compute; you don’t have to orchestrate parallelism yourself if flows don’t depend on each other.&lt;/P&gt;
&lt;H3&gt;parameterise a list of table names and generate per‑table flows (Python)&lt;/H3&gt;
&lt;P&gt;Use a pipeline configuration parameter (for example, table_list) and read it from your notebook. Then, create DLT tables in a loop using a small function factory so each table gets its own definition, which DLT will parallelize when they’re independent.&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;# Python (DLT)&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; dlt
&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; pyspark.sql.functions &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; *

&lt;SPAN class="hljs-comment"&gt;# 1) Read list of tables from pipeline parameter "table_list", e.g., "customers,orders,products"&lt;/SPAN&gt;
tables = [t.strip() &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; t &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; spark.conf.get(&lt;SPAN class="hljs-string"&gt;"table_list"&lt;/SPAN&gt;).split(&lt;SPAN class="hljs-string"&gt;","&lt;/SPAN&gt;)]

&lt;SPAN class="hljs-comment"&gt;# 2) Use a function factory to avoid late-binding issues in loops&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;define_bronze&lt;/SPAN&gt;(&lt;SPAN class="hljs-params"&gt;name: &lt;SPAN class="hljs-built_in"&gt;str&lt;/SPAN&gt;&lt;/SPAN&gt;):
&lt;SPAN class="hljs-meta"&gt;    @dlt.table(&lt;SPAN class="hljs-params"&gt;name=&lt;SPAN class="hljs-string"&gt;f"&lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;_bronze"&lt;/SPAN&gt;, comment=&lt;SPAN class="hljs-string"&gt;f"Bronze ingestion for &lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;"&lt;/SPAN&gt;&lt;/SPAN&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;_bronze&lt;/SPAN&gt;():
        &lt;SPAN class="hljs-comment"&gt;# Example: Auto Loader per-table path; adapt format/path/options to your sources&lt;/SPAN&gt;
        &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; (
            spark.readStream.&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"cloudFiles"&lt;/SPAN&gt;)
            .option(&lt;SPAN class="hljs-string"&gt;"cloudFiles.format"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"json"&lt;/SPAN&gt;)
            .option(&lt;SPAN class="hljs-string"&gt;"inferSchema"&lt;/SPAN&gt;, &lt;SPAN class="hljs-literal"&gt;True&lt;/SPAN&gt;)
            .load(&lt;SPAN class="hljs-string"&gt;f"/mnt/data/&lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;"&lt;/SPAN&gt;)  &lt;SPAN class="hljs-comment"&gt;# e.g., one folder per table name&lt;/SPAN&gt;
        )
    &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; _bronze

&lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;define_silver&lt;/SPAN&gt;(&lt;SPAN class="hljs-params"&gt;name: &lt;SPAN class="hljs-built_in"&gt;str&lt;/SPAN&gt;&lt;/SPAN&gt;):
&lt;SPAN class="hljs-meta"&gt;    @dlt.table(&lt;SPAN class="hljs-params"&gt;name=&lt;SPAN class="hljs-string"&gt;f"&lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;_silver"&lt;/SPAN&gt;, comment=&lt;SPAN class="hljs-string"&gt;f"Silver cleansing for &lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;"&lt;/SPAN&gt;&lt;/SPAN&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;_silver&lt;/SPAN&gt;():
        &lt;SPAN class="hljs-comment"&gt;# Example transformation; replace with your logic&lt;/SPAN&gt;
        &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; dlt.read_stream(&lt;SPAN class="hljs-string"&gt;f"&lt;SPAN class="hljs-subst"&gt;{name}&lt;/SPAN&gt;_bronze"&lt;/SPAN&gt;).select(&lt;SPAN class="hljs-string"&gt;"*"&lt;/SPAN&gt;)
    &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; _silver

&lt;SPAN class="hljs-comment"&gt;# 3) Instantiate a bronze+silver flow for each table name&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; n &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; tables:
    define_bronze(n)
    define_silver(n)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Because DLT evaluates decorators lazily, you must create datasets inside separate functions when looping; otherwise, you’ll accidentally capture the last loop variable value for all tables.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 18:56:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141060#M51617</guid>
      <dc:creator>iyashk-DB</dc:creator>
      <dc:date>2025-12-03T18:56:06Z</dc:date>
    </item>
    <item>
      <title>Re: DELTA LIVE TABLE -Parallel processing</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141073#M51618</link>
      <description>&lt;P&gt;Maybe better approach will be to use dlt meta&amp;nbsp;&lt;A href="https://databrickslabs.github.io/dlt-meta/index.html" target="_blank"&gt;https://databrickslabs.github.io/dlt-meta/index.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 20:42:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-parallel-processing/m-p/141073#M51618</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2025-12-03T20:42:17Z</dc:date>
    </item>
  </channel>
</rss>

