08-08-2022 08:14 PM
I am running a Delta Live Pipeline that explodes JSON docs into small Delta Live Tables. The docs can receive multiple updates over the lifecycle of the transaction. I am curating the data via medallion architecture, when I run an API /update with
{"full_refresh":"true"}
it resets checkpoints and runs fine, when I try to perform INCREMENTAL I am getting the following error:
org.apache.spark.sql.streaming.StreamingQueryException: Query dlt_fulfillment_tickets [id = 7c256d93-6271-4013-9d5d-fe356c18511f, runId = 1aba276e-1118-43c1-b1fa-85e688bf523b] terminated with exception: Detected a data update (for example part-00000-ba0db042-39f9-450b-ad19-3f05afb52830-c000.snappy.parquet) in the source table at version 10. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory.
Is there a way to set the above option via SQL? My entire pipeline is in SQL.
09-07-2022 06:10 AM
Hi @Vidula Khanna - The response was not a solution for my issue, it was more an acknowledgement that there is/was a GAP in documentation as the error was not pointing customers to the correct path to solve this issue. Hopefully this has been take care of by Databricks.
I had to refactor some SQL code to find a workaround.
Thanks for the follow-up.
08-09-2022 01:56 AM
As per the doc we have the scala code to do this. I don't find a SQL alternative for this. However, I have raised this with the product team to see if we could get a SQL code for the same. I shall let you know once I get a response from them.
08-09-2022 07:27 AM
@Danny Aguirre I had a discussion with the product team and they mentioned that streaming only supports processing append-only changes. If you expect updates to the input then you should use normal live tables. The error message is not appropriate for the issue and they will be fixing the error message to ensure customers are not pointed to the wrong side of the investigation.
09-07-2022 05:58 AM
Hey there @Danny Aguirre
Does @Prabakar Ammeappin response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?
We'd love to hear from you.
Thanks!
09-07-2022 06:10 AM
Hi @Vidula Khanna - The response was not a solution for my issue, it was more an acknowledgement that there is/was a GAP in documentation as the error was not pointing customers to the correct path to solve this issue. Hopefully this has been take care of by Databricks.
I had to refactor some SQL code to find a workaround.
Thanks for the follow-up.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group