cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Streaming: Checkpoint corrupted

mriccardi
New Contributor II

Hi Everyone!

Today 4 streaming jobs started to fail out of nowhere due to: StreamingQueryException: [STREAM_FAILED] Query [id = ####, runId = ####] terminated with exception: dbfs:/mnt/path/my_table/sources/0/0 doesn't exist (latestId: 8, compactInterval: 10).

  • These streamings have been on for about +1 year.
  • The only change we did was in March we added one more column to the schema.
  • These streamings point to S3 and load parquet data, the run once daily.
  • To keep track of files loaded we have a checkpoint path defined for each table.

What we found:

  • When I go to path sources/0 the file 0 does not exists.
  • We find the file 711 that was created the 23 of May.
  • For some reason the 24 of May the streaming failed to get the latest batchId state and restarted the batchId to 0, also it stopped to write files in the sources, offset, and commits folder of the checkpoint location.

root cause:

  • I understand that the issue is that for some reason spark streamming lost the last state of the checkpoint + stopped logging the checkpoint.

Anyone has experienced something like this? How do you manage to recover without processing all the files again?

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

Vartika
Moderator
Moderator

Hi @Martin Riccardi​,

We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

 Thanks!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!