Databricks

sanjay · ‎03-30-2023

Hi,

I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting error

com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'.

I have set ignoreChanges to true but still getting same error.

spark.readStream.format("delta")

.option("ignoreChanges", "true")

.load("/tmp/delta/user_events")

Regards,

Sanjay

karthik_p · ‎03-30-2023

@Sanjay Jain which run time version you are using, above should work if you have updated required field and that should be consumed by downstream consumers, please check below article

https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes

sanjay · ‎03-30-2023

I am using 11.3 LTS. I have updated one field and this data is consumed by downstream consumer. But challenge is, I am getting other unchanged files as well to consumer along with updated file.

Sandeep · ‎03-30-2023

@Sanjay Jain how did you update the file? Can you elaborate on the steps, please?

sanjay · ‎03-30-2023

Using merge into table a

using table b

when matched then update SET a.name= b.name

sanjay · ‎03-31-2023

I can see ignoreChanges true emits all the updates + emits unupdated files as well for same partition. As per documents, need to handle duplicates in downstream. Can you suggest how to handle duplicate files.

This is from databrick documents. (https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes)

"The semantics for ignoreChanges differ greatly from skipChangeCommits. With ignoreChanges enabled, rewritten data files in the source table are re-emitted after a data changing operation such as UPDATE, MERGE INTO, DELETE (within partitions), or OVERWRITE. Unchanged rows are often emitted alongside new rows, so downstream consumers must be able to handle duplicates. Deletes are not propagated downstream. ignoreChanges subsumes ignoreDeletes."

Anonymous · ‎03-31-2023

Hi @Sanjay Jain

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.

Cheers!

sanjay · ‎04-03-2023

Hi Vidula,

Sorry I am still looking for solution. Appreciate if you can provide any help.

Regards,

Sanjay

Sanjeev_Chauhan · ‎01-01-2024

Hi Sanjay,
You can try adding .option("overwriteSchema", "true")

Databricks

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI