cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error : . If you expect to delete or update rows to the source table in the future.......

rt-slowth
Contributor

Flow 'user_silver' has FAILED fatally. An error occurred because we detected an update or delete to one or more rows in the source table. Streaming tables may only use append-only streaming sources. If you expect to delete or update rows to the source table in the future, please convert table user_silver to a live table instead of a streaming live table. To resolve this issue, perform a Full Refresh to table user_silver. A Full Refresh will attempt to clear all data from table user_silver and then load all data from the streaming source.

The non-append change can be found at version 11.
Operation: MERGE
Username: doheekim

 

 

When I create a silver table in Databricks that uses the table to which dlt.apply_changes is applied as the source data, why does the first run work fine, but an error is thrown saying that only appned is possible from the second run?
Source table name: user

5 REPLIES 5

Kaniz_Fatma
Community Manager
Community Manager

Hi @rt-slowth, When you encounter the error message, it means that there was a problem updating or deleting rows from the source table while using a streaming table. To explain further, streaming tables in Databricks are specifically designed to work with append-only sources. This means that you can only add data to the table, but cannot make modifications or deletions. Therefore, when changes are applied to a streaming table, the expectation is that new data will be appended rather than updated or deleted.

 

In the scenario you described, you proposed the creation of a silver table on Databricks that relies on the table where dlt.apply_changes is applied for its source data. The initial run was successful as it successfully appended data to the silver table. However, subsequent runs encountered an error due to attempts to modify or delete rows in the source table, which is prohibited for streaming tables.

 

In order to tackle this issue, let's outline the following actions for consideration: 

 

1) Convert to a Live Table: Rather than using a streaming live table, switch to a live table option. This will enable updates and deletions to be made. 

2) Perform a Full Refresh: A full refresh on your silver table is necessary. This procedure involves clearing all current data from the table and then reloading all data from the streaming source.

 

Version 11 of the table has encountered a non-append change, as stated by the error message. This means that the problem happened while making the 11th update or alteration. The cause of the error is a MERGE operation. It's worth noting that while streaming tables excel in processing real-time data, they do have restrictions. To ensure consistency with your source data, it may be beneficial to use live tables or perform full refreshes when updating or deleting rows.

 

@Kaniz_Fatma 
Can you give me some example code on how to switch to live tables using pyspark?

Palash01
Valued Contributor

Hey @rt-slowth 

Adding to @Kaniz_Fatma's points and refereeing to your bronze and silver code you posted in questions-about-the-design-of-bronze-silver-and-gold-for-live post. Looks like you are doing a SCD operation on your bronze layer which explains why the pipeline is erroring out in the subsequent run. To over come this you can try as explained by Kaniz in her post.

Also, the code to switch to live tables is using readStream and writeStream functions in your code.

For writing the streaming table: 

query = transformed_stream \
    .writeStream \
    .format("delta") \
    .option("checkpointLocation", "/databricks/dbfs/checkpoints") \
    .trigger(processingTime="5 minutes") \
    .start("/databricks/dbfs/live_tables/my_live_table")

For reading a streaming table:

source_stream = spark.readStream \
    .format("delta") \
    .option("path", "/databricks/dbfs/mnt/landing_zone") \
    .option("processingTime", "1 minute") \
    .load()

 

Leave a like if this helps! Kudos,
Palash

Kaniz_Fatma
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀

 

Palash01
Valued Contributor

Hey @rt-slowth 

Just checking in if the provided solution was helpful to you. If yes, please accept this as a Best Solution so that this thread can be considered closed.

Leave a like if this helps! Kudos,
Palash

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group