cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables are refreshed in parallel rather than sequentially

BobCat62
New Contributor III

Hi experts,

I have defined my DLT Pipeline as follows:

-- Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_bronze TBLPROPERTIES ("myCompanyPipeline.quality" = "bronze") AS SELECT * FROM cloud_files("abfss://xxx@xxx.dfs.core.windows.net/xxx/*/*/*/*/*.JSON","JSON"); --Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_silver PARTITIONED BY (extracted_date) COMMENT "The cleaned sales orders with valid order_number(s) and partitioned by order_datetime." TBLPROPERTIES ("myCompanyPipeline.quality" = "silver") AS SELECT DATE(EnqueuedTimeUtc) AS extracted_date, DATE_FORMAT(EnqueuedTimeUtc, 'HH:mm:ss') AS extracted_time, ROUND(Body:distance, 2) AS distance FROM STREAM(bstdwh.pumpdata_bronze) where Body is not null; 

When I start this pipeline, I expect the Bronze table to refresh first, followed by the Silver table after its completion. However, both run in parallel, causing the Silver table to miss the latest data.

 

Did I miss some settings?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

ashraf1395
Honored Contributor

Hi @BobCat62 ,

So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details : https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to...


1. if you are using legacy mode in dlt configuration setting { target variable will be defined(basically the default schema of the pipeline)}, so if using this method dlt expects you to use live.pumpdata_silver on your table where you want it to be dependent on the first pumpdata_bronze table. It makes sure that refreshing of the dependent table starts only when the bronze refreshing is done hence, the latest records.

Though above method is legacy now. Its a best practice to follow latest advancements.


2. dlt direct publishing mode, in your dlt pipeline configuration (if you use schema var instead of target var  (both have same use but are mutually exclusive only one can be used) , then it automatically means your pipeline is in latest mode hence live is not required and dlt will automatically handle all the dependencies itself.

ashraf1395_0-1741408720854.pngashraf1395_1-1741408744958.png

I haven't used sequentialityin direct publishing moe but the above link would have some guidelines on it.

View solution in original post

3 REPLIES 3

Rjdudley
Honored Contributor

Is all of this code in the same notebook?  If so, this sounds like the expected behavior, it's a performance optimization.  If you need sequential execution you put the code into two notebooks and make a pipeline.

BobCat62
New Contributor III

Yes it is. All code is in one notebook. But the code of sample-DLT-pipeline-notebook is also in one notebook, but the run is sequential:

 

ashraf1395
Honored Contributor

Hi @BobCat62 ,

So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details : https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to...


1. if you are using legacy mode in dlt configuration setting { target variable will be defined(basically the default schema of the pipeline)}, so if using this method dlt expects you to use live.pumpdata_silver on your table where you want it to be dependent on the first pumpdata_bronze table. It makes sure that refreshing of the dependent table starts only when the bronze refreshing is done hence, the latest records.

Though above method is legacy now. Its a best practice to follow latest advancements.


2. dlt direct publishing mode, in your dlt pipeline configuration (if you use schema var instead of target var  (both have same use but are mutually exclusive only one can be used) , then it automatically means your pipeline is in latest mode hence live is not required and dlt will automatically handle all the dependencies itself.

ashraf1395_0-1741408720854.pngashraf1395_1-1741408744958.png

I haven't used sequentialityin direct publishing moe but the above link would have some guidelines on it.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now