cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Append-only table from non-streaming source in Delta Live Tables

Oliver_Angelil
Valued Contributor II

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.

The pipeline runs successfully on the first run. However on the second run it fails:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 48f8dad4-1ae6-4203-9bd1-bcda239db9c3, runId = 023d9d7f-33e0-4301-ae39-5c041a392ea5] terminated with exception: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example part-00000-4ad8ffe0-5732-406e-b1b1-fd76107ab0a4-c000.snappy.parquet) in the source table at version 26. This is currently not supported. If you'd like to ignore updates, set the option 'skipChangeCommits' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory. The source table can be found at path abfss://sustainability

Setting the skipChangeCommits flag to true, doesn't work - any changes in the second-last table are simply ignored and last table remains unchanged. It seems that any streaming table (append-only) in DLT requires a streaming source - but none of the other tables in the DLT pipeline need to be append-only. I do not wish to change the logic in all upstream tables so that they are streaming, just so that final table can be append-only.

All I am trying to do is have an append-only table at the very end of a DLT pipeline, and only at the end.

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Oliver_Angelil, It appears that you’re encountering an issue with your DLT (Databricks Delta Live Tables) pipeline, specifically related to having an append-only table at the end of the pipeline.

Let’s explore some potential solutions:

  1. Streaming Live Table with Append-Only Behavior:

    • DLT has a concept of a streaming live table that is append-only by default. You can define your pipeline as triggered, which would be equivalent to using Trigger.Once.
    • Here’s an example of how you can create an append-only table using DLT:
      @dlt.table
      def append_only():
          return spark.readStream.format("xyz").load()
      
    • Replace "xyz" with the actual format you’re using for your streaming source1.
  2. Full Refresh Instead of Incremental Load:

  3. skipChangeCommits Option:

Remember that streaming tables in DLT are designed to be append-only, and any changes to the source table can impact data consistency. Experiment with the above approaches to find the best solution for your specific use case. If you encounter any further issues, feel free to ask for more assistance! 😊

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!