cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

sanjay
Valued Contributor II

Hi,

I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting error

com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'.

I have set ignoreChanges to true but still getting same error.

spark.readStream.format("delta")

.option("ignoreChanges", "true")

.load("/tmp/delta/user_events")

Regards,

Sanjay

8 REPLIES 8

karthik_p
Esteemed Contributor

@Sanjay Jain​ which run time version you are using, above should work if you have updated required field and that should be consumed by downstream consumers, please check below article

https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes

sanjay
Valued Contributor II

I am using 11.3 LTS. I have updated one field and this data is consumed by downstream consumer. But challenge is, I am getting other unchanged files as well to consumer along with updated file.

Sandeep
Contributor III

@Sanjay Jain​ how did you update the file? Can you elaborate on the steps, please?

sanjay
Valued Contributor II

Using merge into table a

using table b

when matched then update SET a.name= b.name

sanjay
Valued Contributor II

I can see ignoreChanges true emits all the updates + emits unupdated files as well for same partition. As per documents, need to handle duplicates in downstream. Can you suggest how to handle duplicate files.

This is from databrick documents. (https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes)

"The semantics for ignoreChanges differ greatly from skipChangeCommits. With ignoreChanges enabled, rewritten data files in the source table are re-emitted after a data changing operation such as UPDATE, MERGE INTO, DELETE (within partitions), or OVERWRITE. Unchanged rows are often emitted alongside new rows, so downstream consumers must be able to handle duplicates. Deletes are not propagated downstream. ignoreChanges subsumes ignoreDeletes."

Anonymous
Not applicable

Hi @Sanjay Jain​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

sanjay
Valued Contributor II

Hi Vidula,

Sorry I am still looking for solution. Appreciate if you can provide any help.

Regards,

Sanjay

Sanjeev_Chauhan
New Contributor II

Hi Sanjay, 
You can try adding .option("overwriteSchema", "true")

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.