In a Delta Live Tables (DLT) continuous pipeline, does it make a difference if df_dim_prev (loaded in cell 1) is only read once at the start?
For example, if df_dim_prev is initialized as:
# Cell 1: Read dim_table once
df_dim_prev = spark.read.table("dim_table")
Then used in a streaming join inside a DLT table:
# Cell 2: DLT table with a streaming source
@Dlt.table def joined_table():
dim_df = df_dim_prev
# Using the preloaded dimension table
fact_df = spark.readStream.table("fact_stream")
return fact_df.join(dim_df, "id", "left")
Would this mean that dim_df remains static until the entire pipeline is refreshed, rather than updating dynamically as dim_table changes?
is there a better way to handle this if we want dim_table to update periodically in a continuous pipeline?