Databricks Community

Sparks · ‎08-08-2022

I am running a Delta Live Pipeline that explodes JSON docs into small Delta Live Tables. The docs can receive multiple updates over the lifecycle of the transaction. I am curating the data via medallion architecture, when I run an API /update with

{"full_refresh":"true"}

it resets checkpoints and runs fine, when I try to perform INCREMENTAL I am getting the following error:

org.apache.spark.sql.streaming.StreamingQueryException: Query dlt_fulfillment_tickets [id = 7c256d93-6271-4013-9d5d-fe356c18511f, runId = 1aba276e-1118-43c1-b1fa-85e688bf523b] terminated with exception: Detected a data update (for example part-00000-ba0db042-39f9-450b-ad19-3f05afb52830-c000.snappy.parquet) in the source table at version 10. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory.

Is there a way to set the above option via SQL? My entire pipeline is in SQL.

Sparks · ‎09-07-2022

Hi @Vidula Khanna - The response was not a solution for my issue, it was more an acknowledgement that there is/was a GAP in documentation as the error was not pointing customers to the correct path to solve this issue. Hopefully this has been take care of by Databricks.

I had to refactor some SQL code to find a workaround.

Thanks for the follow-up.

View solution in original post

Prabakar · ‎08-09-2022

As per the doc we have the scala code to do this. I don't find a SQL alternative for this. However, I have raised this with the product team to see if we could get a SQL code for the same. I shall let you know once I get a response from them.

Prabakar · ‎08-09-2022

@Danny Aguirre I had a discussion with the product team and they mentioned that streaming only supports processing append-only changes. If you expect updates to the input then you should use normal live tables. The error message is not appropriate for the issue and they will be fixing the error message to ensure customers are not pointed to the wrong side of the investigation.

Vidula · ‎09-07-2022

Hey there @Danny Aguirre

Does @Prabakar Ammeappin response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Sparks · ‎09-07-2022

Hi @Vidula Khanna - The response was not a solution for my issue, it was more an acknowledgement that there is/was a GAP in documentation as the error was not pointing customers to the correct path to solve this issue. Hopefully this has been take care of by Databricks.

I had to refactor some SQL code to find a workaround.

Thanks for the follow-up.