Hi @WearBeard By default, streaming tables require append-only sources. The encountered error is due to an update or delete operation on the 'streaming_table_test'. To fix this issue, perform a Full Refresh on the 'streaming_table_test' table.
You can use the @append_flow
decorator to write to a streaming table from multiple streaming sources to do the following:
-
Add and remove streaming sources that append data to an existing streaming table without requiring a full refresh. For example, you might have a table that combines regional data from every region you’re operating in. As new regions are rolled out, you can add the new region data to the table without performing a full refresh.
-
Update a streaming table by appending missing historical data (backfilling). For example, you have an existing streaming table that is written to by an Apache Kafka topic. You also have historical data stored in a table that you need inserted exactly once into the streaming table and you cannot stream the data because you need to perform a complex aggregation before inserting the data.
The following is the syntax for @append_flow
:
import dlt
dlt.create_streaming_table("<target-table-name>")
@dlt.append_flow(
target = "<target-table-name>",
name = "<flow-name>", # optional, defaults to function name
spark_conf = {"<key>" : "<value", "<key" : "<value>"}, # optional
comment = "<comment") # optional
def <function-name>():
return (<streaming query>)
If you plan to delete or update rows in the source table in the future, consider converting 'streaming_table_test' to a live table.
Ref: https://docs.databricks.com/en/delta-live-tables/python-ref.html