DLT: Only STREAMING tables can have multiple queries.

coltonflowers
New Contributor III

I am trying to to do a one-time back-fill on a DLT table following the example here:

 

dlt.table()
def test():
    # providing a starting version
    return (spark.readStream.format("delta") 
        .option("readChangeFeed", "true") 
        .option("startingTimestamp", "2024-01-7 05:00:00") 
        .table("LIVE.concepts_to_flag")
        .select("group_id","cui","cel_label",F.array([F.lit("fake")]),F.lit(True))
    )
@dlt.append_flow(target = "test")
def backfill():
  return spark.readStream.option("endingTimestamp", "2024-01-7 05:00:00").table("hive_metastore.gold.flagged_entities")

 

However, after validating this pipeline, get the following error:

 

org.apache.spark.sql.AnalysisException: 'test' contains multiple queries 'test,backfill'. Only STREAMING tables can have multiple queries.

 

coltonflowers
New Contributor III

I should also add that when I drop the `backfill` function, validation happens successfully and we get the following pipeline DAG: