DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).
The schema and the dlt-pipeline are managed via databricks asset bundles. I did not change any storage location configuration, and used the default metastore.
For one of the my dlt-tables, I get the following an error message, that it can not read the streaming state file (full message below). Here are things, I have tried, without success:
- run `databricks bundle destroy` and then `databricks bundle deploy` again.
- go to the AWS-console, and delete the checkpoint files manually
- go to the AWS-console, and delete everything inside the s3-object for the relevant schema
- double and tripple-checked, that there is no naming conflict for the table. There is not
Has anyone suggestions how to fix this?
Greetings, Daniel
If it helps. I run with the dlt-runtime vs 16.1.1. Here is the full error message:
org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 8e614f5a-cdb7-4942-962d-6cdcee920df7, runId = 8a2f8254-82ab-409d-82a1-2e745cfcbace] terminated with exception: org.apache.spark.SparkException: [CANNOT_LOAD_STATE_STORE.CANNOT_READ_STREAMING_STATE_FILE] An error occurred during loading state. Error reading streaming state file of HDFSStateStoreProvider[id = (op=4,part=0),dir = s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0]: s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0/1.delta does not exist. If the stream job is restarted with a new or updated state operation, please create a new checkpoint location or clear the existing checkpoint location. SQLSTATE: 58030 SQLSTATE: XXKST
As a final remark: I checked. The file As a remark: The state file s3://<...>/checkpoints/***/0/state/4/0/1.delta indeed does not exist. But the following file is there s3://<...>/checkpoints/***/0/state/4/1.delta
- Labels:
-
Delta Lake
-
Spark

