cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Dealing with updates to a delta table being used as a streaming source

Confused
New Contributor III

Hi All

I have a requirement to perform updates on a delta table that is the source for a streaming query.

I would like to be able to update the table and have the stream continue to work while also not ending up with duplicates.

From my research it seems that the ignoreDeletes option will not work as I am not going to be updating/deleting based on the partition column. The ignoreChanges option also looks unsuitable as it will generate duplicates of not only the rows I update, but also any other rows in the same files.

Does anyone have any suggestions/procedures they've used for similar in the past?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Manjunath
New Contributor III
New Contributor III

Hi @Leszek​ 

For your case ignoreChanges option will work, but you need to handle duplicates from your streaming app while writing to sink. If your sink is Delta then you can go with Delta Streaming Merge.

https://docs.databricks.com/_static/notebooks/merge-in-streaming.html

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Mathew Walters​ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz
Community Manager
Community Manager

Hi @Mathew Walters​ , Delta Lake supports several statements to facilitate deleting data from and updating data in Delta tables.

Please go through the guide.

https://docs.databricks.com/delta/delta-update.html

Leszek
Contributor

Maybe merging data from updated delta into next streaming delta will work?

https://www.youtube.com/watch?v=2Iy5S0Hf4XM

Manjunath
New Contributor III
New Contributor III

Hi @Leszek​ 

For your case ignoreChanges option will work, but you need to handle duplicates from your streaming app while writing to sink. If your sink is Delta then you can go with Delta Streaming Merge.

https://docs.databricks.com/_static/notebooks/merge-in-streaming.html

Anonymous
Not applicable

Hey @Mathew Walters​ 

Hope you are doing great.

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.