cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

sanjay
Valued Contributor II

Hi,

I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting error

com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'.

I have set ignoreChanges to true but still getting same error.

spark.readStream.format("delta")

.option("ignoreChanges", "true")

.load("/tmp/delta/user_events")

Regards,

Sanjay

8 REPLIES 8

karthik_p
Esteemed Contributor

@Sanjay Jainโ€‹ which run time version you are using, above should work if you have updated required field and that should be consumed by downstream consumers, please check below article

https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes

sanjay
Valued Contributor II

I am using 11.3 LTS. I have updated one field and this data is consumed by downstream consumer. But challenge is, I am getting other unchanged files as well to consumer along with updated file.

Sandeep
Contributor III

@Sanjay Jainโ€‹ how did you update the file? Can you elaborate on the steps, please?

sanjay
Valued Contributor II

Using merge into table a

using table b

when matched then update SET a.name= b.name

sanjay
Valued Contributor II

I can see ignoreChanges true emits all the updates + emits unupdated files as well for same partition. As per documents, need to handle duplicates in downstream. Can you suggest how to handle duplicate files.

This is from databrick documents. (https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes)

"The semantics for ignoreChanges differ greatly from skipChangeCommits. With ignoreChanges enabled, rewritten data files in the source table are re-emitted after a data changing operation such as UPDATE, MERGE INTO, DELETE (within partitions), or OVERWRITE. Unchanged rows are often emitted alongside new rows, so downstream consumers must be able to handle duplicates. Deletes are not propagated downstream. ignoreChanges subsumes ignoreDeletes."

Anonymous
Not applicable

Hi @Sanjay Jainโ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

sanjay
Valued Contributor II

Hi Vidula,

Sorry I am still looking for solution. Appreciate if you can provide any help.

Regards,

Sanjay

Sanjeev_Chauhan
New Contributor II

Hi Sanjay, 
You can try adding .option("overwriteSchema", "true")

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group