cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables executing repeatedly and returning empty DF

FD_MR
New Contributor II

Still relatively new to Spark and even more so to Delta Live Tables so apologies if I've missed something fundamental but here goes.

We are trying to run a notebook via Delta Live Tables, which contains 2 functions decorated by the `dlt.table` decorator and each returning a Spark DataFrame as required. The first decorated function pulls from an external database, does a bit of processing within the function and then returns for the downstream function to consume. However, when we start the DLT run and look at the logs, it seems that the notebook is executed 4 times, and on the last 3 times the Spark DataFrame consumed by the downstream function has 0 rows.

# DLT seems to execute this at least 4 times, for one run
 
@dlt.table(
)
def load_from_external():
    input_df = spark.read(...) # contains 500 rows
    # do some transformations
    return out_df # always contains 500 rows
 
@dlt.table(
)
def downstream_etl():
    input_df = dlt.read("load_from_external") # contains 500 rows on first execution, 0 for 2-4
    # do some transformations
    return out_df

Is this intended behaviour? If so, is there any way to disable it and only have the notebook execute once?

Thanks for your help in advance.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group