Databricks Community

JordanYaker · ‎06-04-2023

I have some Delta tables in our dev environment that started popping up with the following error today:

py4j.protocol.Py4JJavaError: An error occurred while calling o670.execute.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 104 in stage 1145.0 failed 4 times, most recent failure: Lost task 104.3 in stage 1145.0 (TID 2949) (10.111.21.215 executor 1): java.lang.IllegalStateException: Error reading streaming state file of HDFSStateStoreProvider[id = (op=0,part=104),dir = s3a://###################/offers-stage-1/checkpoints/offers-silver-stage1-pipeline/state/0/104]: s3a://###################/offers-stage-1/checkpoints/offers-silver-stage1-pipeline/state/0/104/1.delta does not exist. If the stream job is restarted with a new or updated state operation, please create a new checkpoint location or clear the existing checkpoint location.

These tables don't have an incredibly high write volume and just two weeks ago I ended up resetting the entire data lake in our dev/stage environment to deploy some new logic; which coincidentally corresponds to our current vacuum policy (i.e., 14 days).

This feels like less than a coincidence.

Is there a known issue with using vacuum on tables without a high write volume?

JordanYaker · ‎06-05-2023

@Kaniz Fatma

The file is indeed gone. Our permissions have not changed and everything is appropriate.
The checkpoint locations have not changed and are still accessible with the proper permissions as I mentioned in item 1.
Clearing the existing checkpoint locations is the only thing that works. This is not an acceptable long-term strategy however, because that means each of these pipelines will need to be re-processed and I'll be forever chasing my tail and deleting checkpoints with issues.
I'm already managing the checkpoint locations manually in S3.
I haven't manipulated the state provider configuration. It's all the default values.

JordanYaker · ‎06-05-2023

@Kaniz Fatma I'm using DBR 11.3 which means PySpark 3.3.0.

Additionally, the full stack trace that I'm getting is attached to this reply.

Anonymous · ‎06-09-2023

Hi @Jordan Yaker

We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Databricks Community

Has anyone else seen state files disappear in low-volume delta tables?

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions