cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Streaming: Checkpoint corrupted

mriccardi
New Contributor II

Hi Everyone!

Today 4 streaming jobs started to fail out of nowhere due to: StreamingQueryException: [STREAM_FAILED] Query [id = ####, runId = ####] terminated with exception: dbfs:/mnt/path/my_table/sources/0/0 doesn't exist (latestId: 8, compactInterval: 10).

  • These streamings have been on for about +1 year.
  • The only change we did was in March we added one more column to the schema.
  • These streamings point to S3 and load parquet data, the run once daily.
  • To keep track of files loaded we have a checkpoint path defined for each table.

What we found:

  • When I go to path sources/0 the file 0 does not exists.
  • We find the file 711 that was created the 23 of May.
  • For some reason the 24 of May the streaming failed to get the latest batchId state and restarted the batchId to 0, also it stopped to write files in the sources, offset, and commits folder of the checkpoint location.

root cause:

  • I understand that the issue is that for some reason spark streamming lost the last state of the checkpoint + stopped logging the checkpoint.

Anyone has experienced something like this? How do you manage to recover without processing all the files again?

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Martin Riccardi​, Ensure you are using the latest stable version of Apache Spark™. Sometimes, checkpoint-related issues are addressed and fixed in newer releases. Upgrading to a more recent version of Spark might resolve the problem you're facing.

Vartika
Moderator
Moderator

Hi @Martin Riccardi​,

We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

 Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.