DLT: Only STREAMING tables can have multiple queries.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 07:56 AM - edited 02-08-2024 08:13 AM
I am trying to to do a one-time back-fill on a DLT table following the example here:
dlt.table()
def test():
# providing a starting version
return (spark.readStream.format("delta")
.option("readChangeFeed", "true")
.option("startingTimestamp", "2024-01-7 05:00:00")
.table("LIVE.concepts_to_flag")
.select("group_id","cui","cel_label",F.array([F.lit("fake")]),F.lit(True))
)
@dlt.append_flow(target = "test")
def backfill():
return spark.readStream.option("endingTimestamp", "2024-01-7 05:00:00").table("hive_metastore.gold.flagged_entities")
However, after validating this pipeline, get the following error:
org.apache.spark.sql.AnalysisException: 'test' contains multiple queries 'test,backfill'. Only STREAMING tables can have multiple queries.
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 08:53 AM
I should also add that when I drop the `backfill` function, validation happens successfully and we get the following pipeline DAG:

