<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/113172#M44451</link>
    <description>&lt;P&gt;I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The schema and the dlt-pipeline are managed via databricks asset bundles. I did not change any storage location configuration, and used the default metastore.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;For one of the my dlt-tables, I get the following an error message, that it can not read the streaming state file (full message below).&amp;nbsp; Here are things, I have tried, without success:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;run `databricks bundle destroy` and then `databricks bundle deploy` again.&lt;/LI&gt;&lt;LI&gt;go to the AWS-console, and delete the checkpoint files manually&lt;/LI&gt;&lt;LI&gt;go to the AWS-console, and delete everything inside the s3-object for the relevant schema&lt;/LI&gt;&lt;LI&gt;double and tripple-checked, that there is no naming conflict for the table. There is not&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Has anyone suggestions how to fix this?&lt;/P&gt;&lt;P&gt;Greetings, Daniel&lt;/P&gt;&lt;P&gt;If it helps. I run with the dlt-runtime vs 16.1.1. Here is the full error message:&lt;/P&gt;&lt;P&gt;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 8e614f5a-cdb7-4942-962d-6cdcee920df7, runId = 8a2f8254-82ab-409d-82a1-2e745cfcbace] terminated with exception: org.apache.spark.SparkException: [CANNOT_LOAD_STATE_STORE.CANNOT_READ_STREAMING_STATE_FILE] An error occurred during loading state. Error reading streaming state file of HDFSStateStoreProvider[id = (op=4,part=0),dir = s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0]: s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0/1.delta does not exist. If the stream job is restarted with a new or updated state operation, please create a new checkpoint location or clear the existing checkpoint location. SQLSTATE: 58030 SQLSTATE: XXKST&lt;/SPAN&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;As a final remark: I checked. The file&amp;nbsp;As a remark: The state file&amp;nbsp;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;s3://&amp;lt;...&amp;gt;/checkpoints/***/0/state/4/0/1.delta&lt;/SPAN&gt;&lt;/FONT&gt; indeed does not exist. But the following file is there&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;s3://&amp;lt;...&amp;gt;/checkpoints/***/0/state/4/1.delta&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 20 Mar 2025 15:59:19 GMT</pubDate>
    <dc:creator>DaPo</dc:creator>
    <dc:date>2025-03-20T15:59:19Z</dc:date>
    <item>
      <title>DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/113172#M44451</link>
      <description>&lt;P&gt;I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The schema and the dlt-pipeline are managed via databricks asset bundles. I did not change any storage location configuration, and used the default metastore.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;For one of the my dlt-tables, I get the following an error message, that it can not read the streaming state file (full message below).&amp;nbsp; Here are things, I have tried, without success:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;run `databricks bundle destroy` and then `databricks bundle deploy` again.&lt;/LI&gt;&lt;LI&gt;go to the AWS-console, and delete the checkpoint files manually&lt;/LI&gt;&lt;LI&gt;go to the AWS-console, and delete everything inside the s3-object for the relevant schema&lt;/LI&gt;&lt;LI&gt;double and tripple-checked, that there is no naming conflict for the table. There is not&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Has anyone suggestions how to fix this?&lt;/P&gt;&lt;P&gt;Greetings, Daniel&lt;/P&gt;&lt;P&gt;If it helps. I run with the dlt-runtime vs 16.1.1. Here is the full error message:&lt;/P&gt;&lt;P&gt;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 8e614f5a-cdb7-4942-962d-6cdcee920df7, runId = 8a2f8254-82ab-409d-82a1-2e745cfcbace] terminated with exception: org.apache.spark.SparkException: [CANNOT_LOAD_STATE_STORE.CANNOT_READ_STREAMING_STATE_FILE] An error occurred during loading state. Error reading streaming state file of HDFSStateStoreProvider[id = (op=4,part=0),dir = s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0]: s3://databricks-workspace-stack-876d9-bucket/unity-catalog/520995832158046/dev/__unitystorage/schemas/07975d9e-97e1-42c8-96a5-a90498e75223/tables/f6fc5371-9617-4cb2-a48b-2f3aee236c1e/_dlt_metadata/checkpoints/***/0/state/4/0/1.delta does not exist. If the stream job is restarted with a new or updated state operation, please create a new checkpoint location or clear the existing checkpoint location. SQLSTATE: 58030 SQLSTATE: XXKST&lt;/SPAN&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;As a final remark: I checked. The file&amp;nbsp;As a remark: The state file&amp;nbsp;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;s3://&amp;lt;...&amp;gt;/checkpoints/***/0/state/4/0/1.delta&lt;/SPAN&gt;&lt;/FONT&gt; indeed does not exist. But the following file is there&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;FONT face="terminal,monaco" size="2"&gt;&lt;SPAN&gt;s3://&amp;lt;...&amp;gt;/checkpoints/***/0/state/4/1.delta&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 15:59:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/113172#M44451</guid>
      <dc:creator>DaPo</dc:creator>
      <dc:date>2025-03-20T15:59:19Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/118792#M45713</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117671"&gt;@DaPo&lt;/a&gt;&amp;nbsp;,&amp;nbsp;Have you made any code changes to your streaming query?&amp;nbsp;There are limitations on what changes in a streaming query are allowed between restarts from the same checkpoint location. Refer this &lt;A href="https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovery-semantics-after-changes-in-a-streaming-query" target="_blank"&gt;documentation&lt;/A&gt;&lt;/P&gt;
&lt;P class="p1"&gt;The checkpoint location appears to be corrupted, as some files are missing. You can try performing a FULL REFRESH on the pipeline.&lt;/P&gt;</description>
      <pubDate>Sun, 11 May 2025 10:19:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/118792#M45713</guid>
      <dc:creator>mani_22</dc:creator>
      <dc:date>2025-05-11T10:19:02Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/119311#M45833</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89888"&gt;@mani_22&lt;/a&gt;&amp;nbsp;, the issue was somewhere hidden in my code. (If I remember correctly, the issue was: I was using an internal library, which created a spark-dataframe "on the fly" using &lt;FONT face="andale mono,times"&gt;spark.createDataframe([some, data])&lt;/FONT&gt;. That dataframe was not backed by a&amp;nbsp; table in Unity Catalog . The logic worked fine in Batch-Workflows, but not in a streaming-DLT.) My solution was, to save that dataframe as a table, and load that table.&lt;/P&gt;</description>
      <pubDate>Thu, 15 May 2025 11:49:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-fails-with-exception-cannot-read-streaming-state-file/m-p/119311#M45833</guid>
      <dc:creator>DaPo</dc:creator>
      <dc:date>2025-05-15T11:49:19Z</dc:date>
    </item>
  </channel>
</rss>

