Databricks Community

coltonflowers · ‎02-08-2024

I am trying to to do a one-time back-fill on a DLT table following the example here:

dlt.table()
def test():
    # providing a starting version
    return (spark.readStream.format("delta") 
        .option("readChangeFeed", "true") 
        .option("startingTimestamp", "2024-01-7 05:00:00") 
        .table("LIVE.concepts_to_flag")
        .select("group_id","cui","cel_label",F.array([F.lit("fake")]),F.lit(True))
    )
@dlt.append_flow(target = "test")
def backfill():
  return spark.readStream.option("endingTimestamp", "2024-01-7 05:00:00").table("hive_metastore.gold.flagged_entities")

However, after validating this pipeline, get the following error:

org.apache.spark.sql.AnalysisException: 'test' contains multiple queries 'test,backfill'. Only STREAMING tables can have multiple queries.

coltonflowers · ‎02-08-2024

I should also add that when I drop the `backfill` function, validation happens successfully and we get the following pipeline DAG:

Databricks Community

DLT: Only STREAMING tables can have multiple queries.

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April