cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delete on streaming table and starting startingVersion

6502
New Contributor III

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. 

So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new version. I did not delete the checkpoint on the first attempt, and the job failed again. As a second attempt, I deleted the checkpoint and the job still did not start, somehow the code was still detecting the deleted rows. Can someone explain to me why did it happen? 

Deleting the checkpoint and not passing the startingVersion works, of course. But I see that the checkpoint file reports: 

 

{"sourceVersion":1,"reservoirId":"963e2797-2f22-449a-91c6-c3e3972e4ea5","reservoirVersion":1254,"index":8,"isStartingVersion":true}

 

Why is telling that isStartingVersion true? Did it get the startingVersion I passed? If so, why the job did not start when startVersion was provided? 

 

1 REPLY 1

raphaelblg
Honored Contributor
Honored Contributor

Hello @6502,

It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processed in order, potentially resulting in failures.

To resolve this issue, consider the following options:

Roll back your table version to the version before the DELETE operation using time travel.

(https://docs.databricks.com/en/delta/history.html#restore-a-delta-table-to-an-earlier-state)

or


2. Add the `ignoreDeletes` or `skipChangeCommits` parameter to your query. You can find more information on this in the Databricks documentation.

(https://docs.databricks.com/en/structured-streaming/delta-lake.html#ignore-updates-and-deletes)

 

Should you have any questions or concerns, please don't hesitate to respond to this message. I'm here to help!

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!