Still relatively new to Spark and even more so to Delta Live Tables so apologies if I've missed something fundamental but here goes.
We are trying to run a notebook via Delta Live Tables, which contains 2 functions decorated by the `dlt.table` decorator and each returning a Spark DataFrame as required. The first decorated function pulls from an external database, does a bit of processing within the function and then returns for the downstream function to consume. However, when we start the DLT run and look at the logs, it seems that the notebook is executed 4 times, and on the last 3 times the Spark DataFrame consumed by the downstream function has 0 rows.
# DLT seems to execute this at least 4 times, for one run
@dlt.table(
)
def load_from_external():
input_df = spark.read(...) # contains 500 rows
# do some transformations
return out_df # always contains 500 rows
@dlt.table(
)
def downstream_etl():
input_df = dlt.read("load_from_external") # contains 500 rows on first execution, 0 for 2-4
# do some transformations
return out_df
Is this intended behaviour? If so, is there any way to disable it and only have the notebook execute once?
Thanks for your help in advance.