<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Tables are refreshed in parallel rather than sequentially in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112046#M44087</link>
    <description>&lt;P&gt;Yes it is. All code is in one notebook. But the code of sample-DLT-pipeline-notebook is also in one notebook, but the run is sequential:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 07 Mar 2025 20:47:02 GMT</pubDate>
    <dc:creator>BobCat62</dc:creator>
    <dc:date>2025-03-07T20:47:02Z</dc:date>
    <item>
      <title>Delta Live Tables are refreshed in parallel rather than sequentially</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112025#M44073</link>
      <description>&lt;P&gt;Hi experts,&lt;/P&gt;&lt;P&gt;I have defined my DLT Pipeline as follows:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;-- Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_bronze TBLPROPERTIES ("myCompanyPipeline.quality" = "bronze") AS SELECT * FROM cloud_files("abfss://xxx@xxx.dfs.core.windows.net/xxx/*/*/*/*/*.JSON","JSON"); --Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_silver PARTITIONED BY (extracted_date) COMMENT "The cleaned sales orders with valid order_number(s) and partitioned by order_datetime." TBLPROPERTIES ("myCompanyPipeline.quality" = "silver") AS SELECT DATE(EnqueuedTimeUtc) AS extracted_date, DATE_FORMAT(EnqueuedTimeUtc, 'HH:mm:ss') AS extracted_time, ROUND(Body:distance, 2) AS distance FROM STREAM(bstdwh.pumpdata_bronze) where Body is not null; &lt;/LI-CODE&gt;&lt;P&gt;When I start this pipeline, I expect the Bronze table to refresh first, followed by the Silver table after its completion. However, both run in parallel, causing the Silver table to miss the latest data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did I miss some settings?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 16:03:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112025#M44073</guid>
      <dc:creator>BobCat62</dc:creator>
      <dc:date>2025-03-07T16:03:55Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables are refreshed in parallel rather than sequentially</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112027#M44075</link>
      <description>&lt;P&gt;Is all of this code in the same notebook?&amp;nbsp; If so, this sounds like the expected behavior, it's a performance optimization.&amp;nbsp; If you need sequential execution you put the code into two notebooks and make a pipeline.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 16:57:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112027#M44075</guid>
      <dc:creator>Rjdudley</dc:creator>
      <dc:date>2025-03-07T16:57:47Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables are refreshed in parallel rather than sequentially</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112046#M44087</link>
      <description>&lt;P&gt;Yes it is. All code is in one notebook. But the code of sample-DLT-pipeline-notebook is also in one notebook, but the run is sequential:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 20:47:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112046#M44087</guid>
      <dc:creator>BobCat62</dc:creator>
      <dc:date>2025-03-07T20:47:02Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables are refreshed in parallel rather than sequentially</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112058#M44091</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/66116"&gt;@BobCat62&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details :&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to-tables-in-multiple-schemas-and-catalogs" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to-tables-in-multiple-schemas-and-catalogs&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;1. if you are using legacy mode in dlt configuration setting { target variable will be defined(basically the default schema of the pipeline)}, so if using this method dlt expects you to use live.pumpdata_silver on your table where you want it to be dependent on the first pumpdata_bronze table. It makes sure that refreshing of the dependent table starts only when the bronze refreshing is done hence, the latest records.&lt;BR /&gt;&lt;BR /&gt;Though above method is legacy now. Its a best practice to follow latest advancements.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;2. dlt direct publishing mode, in your dlt pipeline configuration (if you use schema var instead of target var&amp;nbsp; (both have same use but are mutually exclusive only one can be used) , then it automatically means your pipeline is in latest mode hence live is not required and dlt will automatically handle all the dependencies itself.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ashraf1395_0-1741408720854.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15294i377045C40E187A28/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ashraf1395_0-1741408720854.png" alt="ashraf1395_0-1741408720854.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ashraf1395_1-1741408744958.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15295i3345218F6D5ADBD5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ashraf1395_1-1741408744958.png" alt="ashraf1395_1-1741408744958.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I haven't used sequentialityin direct publishing moe but the above link would have some guidelines on it.&lt;/P&gt;</description>
      <pubDate>Sat, 08 Mar 2025 04:41:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-are-refreshed-in-parallel-rather-than/m-p/112058#M44091</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-03-08T04:41:59Z</dc:date>
    </item>
  </channel>
</rss>

