cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Append-only table from non-streaming source in Delta Live Tables

Oliver_Angelil
Valued Contributor II

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.

The pipeline runs successfully on the first run. However on the second run it fails:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 48f8dad4-1ae6-4203-9bd1-bcda239db9c3, runId = 023d9d7f-33e0-4301-ae39-5c041a392ea5] terminated with exception: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example part-00000-4ad8ffe0-5732-406e-b1b1-fd76107ab0a4-c000.snappy.parquet) in the source table at version 26. This is currently not supported. If you'd like to ignore updates, set the option 'skipChangeCommits' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory. The source table can be found at path abfss://sustainability

Setting the skipChangeCommits flag to true, doesn't work - any changes in the second-last table are simply ignored and last table remains unchanged. It seems that any streaming table (append-only) in DLT requires a streaming source - but none of the other tables in the DLT pipeline need to be append-only. I do not wish to change the logic in all upstream tables so that they are streaming, just so that final table can be append-only.

All I am trying to do is have an append-only table at the very end of a DLT pipeline, and only at the end.

2 REPLIES 2

nkarwa
New Contributor II

@Oliver_Angelil - I was wondering if you found a solution? I have a similar use-case. I want to create a archive table using DLT from non-streaming (MV). I would prefer a DLT solution. I was able to get it to work using traditional merge approach (non-DLT).

Oliver_Angelil
Valued Contributor II

@nkarwa we move away from Delta Live Tables. It was a lousy inflexible framework by Databricks.

Now we just use regular Spark in jobs (and if we need append-only tables, we use Spark Structured Streaming)

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now