cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Streaming: Checkpoint corrupted

mriccardi
New Contributor II

Hi Everyone!

Today 4 streaming jobs started to fail out of nowhere due to: StreamingQueryException: [STREAM_FAILED] Query [id = ####, runId = ####] terminated with exception: dbfs:/mnt/path/my_table/sources/0/0 doesn't exist (latestId: 8, compactInterval: 10).

  • These streamings have been on for about +1 year.
  • The only change we did was in March we added one more column to the schema.
  • These streamings point to S3 and load parquet data, the run once daily.
  • To keep track of files loaded we have a checkpoint path defined for each table.

What we found:

  • When I go to path sources/0 the file 0 does not exists.
  • We find the file 711 that was created the 23 of May.
  • For some reason the 24 of May the streaming failed to get the latest batchId state and restarted the batchId to 0, also it stopped to write files in the sources, offset, and commits folder of the checkpoint location.

root cause:

  • I understand that the issue is that for some reason spark streamming lost the last state of the checkpoint + stopped logging the checkpoint.

Anyone has experienced something like this? How do you manage to recover without processing all the files again?

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

Vartika
Moderator
Moderator

Hi @Martin Riccardi​,

We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

 Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group