cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delete on streaming table and starting startingVersion

6502
New Contributor III

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. 

So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new version. I did not delete the checkpoint on the first attempt, and the job failed again. As a second attempt, I deleted the checkpoint and the job still did not start, somehow the code was still detecting the deleted rows. Can someone explain to me why did it happen? 

Deleting the checkpoint and not passing the startingVersion works, of course. But I see that the checkpoint file reports: 

 

{"sourceVersion":1,"reservoirId":"963e2797-2f22-449a-91c6-c3e3972e4ea5","reservoirVersion":1254,"index":8,"isStartingVersion":true}

 

Why is telling that isStartingVersion true? Did it get the startingVersion I passed? If so, why the job did not start when startVersion was provided? 

 

1 REPLY 1

raphaelblg
Databricks Employee
Databricks Employee

Hello @6502,

It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processed in order, potentially resulting in failures.

To resolve this issue, consider the following options:

Roll back your table version to the version before the DELETE operation using time travel.

(https://docs.databricks.com/en/delta/history.html#restore-a-delta-table-to-an-earlier-state)

or


2. Add the `ignoreDeletes` or `skipChangeCommits` parameter to your query. You can find more information on this in the Databricks documentation.

(https://docs.databricks.com/en/structured-streaming/delta-lake.html#ignore-updates-and-deletes)

 

Should you have any questions or concerns, please don't hesitate to respond to this message. I'm here to help!

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group