I have some Delta tables in our dev environment that started popping up with the following error today:
py4j.protocol.Py4JJavaError: An error occurred while calling o670.execute.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 104 in stage 1145.0 failed 4 times, most recent failure: Lost task 104.3 in stage 1145.0 (TID 2949) (10.111.21.215 executor 1): java.lang.IllegalStateException: Error reading streaming state file of HDFSStateStoreProvider[id = (op=0,part=104),dir = s3a://###################/offers-stage-1/checkpoints/offers-silver-stage1-pipeline/state/0/104]: s3a://###################/offers-stage-1/checkpoints/offers-silver-stage1-pipeline/state/0/104/1.delta does not exist. If the stream job is restarted with a new or updated state operation, please create a new checkpoint location or clear the existing checkpoint location.
These tables don't have an incredibly high write volume and just two weeks ago I ended up resetting the entire data lake in our dev/stage environment to deploy some new logic; which coincidentally corresponds to our current vacuum policy (i.e., 14 days).
This feels like less than a coincidence.
Is there a known issue with using vacuum on tables without a high write volume?