03-30-2023 05:02 AM
Hi,
I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting error
com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'.
I have set ignoreChanges to true but still getting same error.
spark.readStream.format("delta")
.option("ignoreChanges", "true")
.load("/tmp/delta/user_events")
Regards,
Sanjay
03-30-2023 05:16 AM
@Sanjay Jain which run time version you are using, above should work if you have updated required field and that should be consumed by downstream consumers, please check below article
https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes
03-30-2023 05:20 AM
I am using 11.3 LTS. I have updated one field and this data is consumed by downstream consumer. But challenge is, I am getting other unchanged files as well to consumer along with updated file.
03-30-2023 06:50 AM
@Sanjay Jain how did you update the file? Can you elaborate on the steps, please?
03-30-2023 06:57 AM
Using merge into table a
using table b
when matched then update SET a.name= b.name
03-31-2023 05:33 AM
I can see ignoreChanges true emits all the updates + emits unupdated files as well for same partition. As per documents, need to handle duplicates in downstream. Can you suggest how to handle duplicate files.
This is from databrick documents. (https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-updates-and-deletes)
"The semantics for ignoreChanges differ greatly from skipChangeCommits. With ignoreChanges enabled, rewritten data files in the source table are re-emitted after a data changing operation such as UPDATE, MERGE INTO, DELETE (within partitions), or OVERWRITE. Unchanged rows are often emitted alongside new rows, so downstream consumers must be able to handle duplicates. Deletes are not propagated downstream. ignoreChanges subsumes ignoreDeletes."
03-31-2023 07:19 PM
Hi @Sanjay Jain
Hope everything is going great.
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.
Cheers!
04-03-2023 12:01 AM
Hi Vidula,
Sorry I am still looking for solution. Appreciate if you can provide any help.
Regards,
Sanjay
01-01-2024 12:10 PM
Hi Sanjay,
You can try adding .option("overwriteSchema", "true")
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group