I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.
The pipeline runs successfully on the first run. However on the second run it fails:
org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 48f8dad4-1ae6-4203-9bd1-bcda239db9c3, runId = 023d9d7f-33e0-4301-ae39-5c041a392ea5] terminated with exception: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example part-00000-4ad8ffe0-5732-406e-b1b1-fd76107ab0a4-c000.snappy.parquet) in the source table at version 26. This is currently not supported. If you'd like to ignore updates, set the option 'skipChangeCommits' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory. The source table can be found at path abfss://sustainability
Setting the skipChangeCommits flag to true, doesn't work - any changes in the second-last table are simply ignored and last table remains unchanged. It seems that any streaming table (append-only) in DLT requires a streaming source - but none of the other tables in the DLT pipeline need to be append-only. I do not wish to change the logic in all upstream tables so that they are streaming, just so that final table can be append-only.
All I am trying to do is have an append-only table at the very end of a DLT pipeline, and only at the end.