cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Read Data from Updating Streaming Table

smit_tw
New Contributor III

 

We are working on a project with the following setup:

  • The Bronze table is append-only.
  • The Silver table uses "APPLY CHANGES INTO" from the Bronze table and implements SCD Type 1, meaning records can be updated at any time.

We are trying to use the Silver table as a source for the Gold table in a streaming pipeline. However, when the pipeline runs for the second time and records in the Silver table are updated, we encounter the following error:

Flow 'gold_table' has FAILED fatally. An error occurred because we detected an update or delete to one or more rows in the source table. Streaming tables may only use append-only streaming sources. If you expect to delete or update rows to the source table in the future, please convert table gold_table to a materialized view instead of a streaming table. If you are fine to skip changes, set the option 'skipChangeCommits' to 'true'. If you only have a one-time change in the source table and will only have append-only commits from now on, you can perform a full refresh to table gold_table to resolve the issue. A Full Refresh will attempt to clear all data from table gold_table and then load all data from the streaming source. The non-append change can be found at version 265. Operation: MERGE Source table name: silver_table

 We cannot use skipChangeCommits as we want to track this updates and move those to Gold table. 

How to resolve this issue with given configuration and architecture?

Thank you in advance.

1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @smit_tw,

you can consider the following approach:

  1. Materialized View: Convert the Gold table to a materialized view instead of a streaming table. This approach will allow you to handle updates and deletes in the Silver table without causing errors in the Gold table. Materialized views do not have the append-only restriction that streaming tables do.
  2. Full Refresh: If you expect only a one-time change in the Silver table and subsequent operations will be append-only, you can perform a full refresh of the Gold table. This will clear all data from the Gold table and reload all data from the Silver table, resolving the issue caused by the non-append change.
  3. Change Data Feed (CDF): Utilize the Delta Lake Change Data Feed (CDF) feature to track changes in the Silver table and apply those changes to the Gold table. This approach allows you to capture updates and deletes and propagate them to the Gold table efficiently

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group