cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Tables executing repeatedly and returning empty DF

FD_MR
New Contributor II

Still relatively new to Spark and even more so to Delta Live Tables so apologies if I've missed something fundamental but here goes.

We are trying to run a notebook via Delta Live Tables, which contains 2 functions decorated by the `dlt.table` decorator and each returning a Spark DataFrame as required. The first decorated function pulls from an external database, does a bit of processing within the function and then returns for the downstream function to consume. However, when we start the DLT run and look at the logs, it seems that the notebook is executed 4 times, and on the last 3 times the Spark DataFrame consumed by the downstream function has 0 rows.

# DLT seems to execute this at least 4 times, for one run
 
@dlt.table(
)
def load_from_external():
    input_df = spark.read(...) # contains 500 rows
    # do some transformations
    return out_df # always contains 500 rows
 
@dlt.table(
)
def downstream_etl():
    input_df = dlt.read("load_from_external") # contains 500 rows on first execution, 0 for 2-4
    # do some transformations
    return out_df

Is this intended behaviour? If so, is there any way to disable it and only have the notebook execute once?

Thanks for your help in advance.

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.