<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Read Structured Streaming state information in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/read-structured-streaming-state-information/m-p/65516#M1257</link>
    <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I am exploring the&amp;nbsp;read state functionality in spark streaming:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/structured-streaming/read-state.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/structured-streaming/read-state.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When I start a streaming query like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    ...
    .writeStream
    .option("checkpointLocation", f"{CHECKPOINTS_PATH}/experiment_2_2/")
    .outputMode("append")
    .format("memory")
    .queryName("display_experiment_2_2")
    .start()
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see it works correctly (I can query the table `&lt;SPAN&gt;display_experiment_2_2`, and see the query&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;recentProgress information.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;However, when I run this code to test the state store:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    spark
    .read
    .format("statestore")
    .load(f"{CHECKPOINTS_PATH}/experiment_2_2/")
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it raises the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-04 12:41:01,082 1522 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/checkpoints/experiment_2_2, batchId=1, operatorId=0, storeName=default, joinSide=none).
Rerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {grpc_message:"[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/checkpoints/experiment_2_2, batchId=1, operatorId=0, storeName=default, joinSide=none).\nRerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03", grpc_status:13, created_time:"2024-04-04T12:41:01.081634178+00:00"}"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The same with state-metadata.&amp;nbsp;When I run this code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    spark
    .read
    .format("state-metadata")
    .load(f"{CHECKPOINTS_PATH}/experiment_2_2/")
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it raises the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-04 12:41:03,988 1522 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "Data source V2 relation is not supported on table acl or credential passthrough clusters. RelationV2[operatorId#27178L, operatorName#27179, stateStoreName#27180, numPartitions#27181, minBatchId#27182L, maxBatchId#27183L]  state-metadata-table"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {created_time:"2024-04-04T12:41:03.988523804+00:00", grpc_status:13, grpc_message:"Data source V2 relation is not supported on table acl or credential passthrough clusters. RelationV2[operatorId#27178L, operatorName#27179, stateStoreName#27180, numPartitions#27181, minBatchId#27182L, maxBatchId#27183L]  state-metadata-table"}"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using a cluster with Shared Compute policy, Shared Access mode and Databricks Runtime version 14.3.&lt;/P&gt;&lt;P&gt;Is it a bug or am I doing something wrong?&lt;/P&gt;&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
    <pubDate>Thu, 04 Apr 2024 12:54:47 GMT</pubDate>
    <dc:creator>jcozar</dc:creator>
    <dc:date>2024-04-04T12:54:47Z</dc:date>
    <item>
      <title>Read Structured Streaming state information</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/read-structured-streaming-state-information/m-p/65516#M1257</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I am exploring the&amp;nbsp;read state functionality in spark streaming:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/structured-streaming/read-state.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/structured-streaming/read-state.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When I start a streaming query like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    ...
    .writeStream
    .option("checkpointLocation", f"{CHECKPOINTS_PATH}/experiment_2_2/")
    .outputMode("append")
    .format("memory")
    .queryName("display_experiment_2_2")
    .start()
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see it works correctly (I can query the table `&lt;SPAN&gt;display_experiment_2_2`, and see the query&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;recentProgress information.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;However, when I run this code to test the state store:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    spark
    .read
    .format("statestore")
    .load(f"{CHECKPOINTS_PATH}/experiment_2_2/")
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it raises the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-04 12:41:01,082 1522 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/checkpoints/experiment_2_2, batchId=1, operatorId=0, storeName=default, joinSide=none).
Rerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {grpc_message:"[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/checkpoints/experiment_2_2, batchId=1, operatorId=0, storeName=default, joinSide=none).\nRerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03", grpc_status:13, created_time:"2024-04-04T12:41:01.081634178+00:00"}"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The same with state-metadata.&amp;nbsp;When I run this code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    spark
    .read
    .format("state-metadata")
    .load(f"{CHECKPOINTS_PATH}/experiment_2_2/")
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it raises the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-04 12:41:03,988 1522 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "Data source V2 relation is not supported on table acl or credential passthrough clusters. RelationV2[operatorId#27178L, operatorName#27179, stateStoreName#27180, numPartitions#27181, minBatchId#27182L, maxBatchId#27183L]  state-metadata-table"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {created_time:"2024-04-04T12:41:03.988523804+00:00", grpc_status:13, grpc_message:"Data source V2 relation is not supported on table acl or credential passthrough clusters. RelationV2[operatorId#27178L, operatorName#27179, stateStoreName#27180, numPartitions#27181, minBatchId#27182L, maxBatchId#27183L]  state-metadata-table"}"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using a cluster with Shared Compute policy, Shared Access mode and Databricks Runtime version 14.3.&lt;/P&gt;&lt;P&gt;Is it a bug or am I doing something wrong?&lt;/P&gt;&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
      <pubDate>Thu, 04 Apr 2024 12:54:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/read-structured-streaming-state-information/m-p/65516#M1257</guid>
      <dc:creator>jcozar</dc:creator>
      <dc:date>2024-04-04T12:54:47Z</dc:date>
    </item>
    <item>
      <title>Re: Read Structured Streaming state information</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/read-structured-streaming-state-information/m-p/65716#M1268</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, thank you for your time!&lt;/P&gt;&lt;P&gt;I tried again. I used format("memory") and format("delta"), just in case, but the results are the same.&lt;/P&gt;&lt;P&gt;Error traceback using `&lt;SPAN&gt;dbfs:/tmp/checkpoints/experiment_2_2` for checkpoint using&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;format("statestore"):&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-07 09:11:50,672 1956 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/experiment_2_2, batchId=9, operatorId=0, storeName=default, joinSide=none).
Rerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {grpc_message:"[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=dbfs:/tmp/experiment_2_2, batchId=9, operatorId=0, storeName=default, joinSide=none).\nRerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03", grpc_status:13, created_time:"2024-04-07T09:11:50.671796067+00:00"}"
&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;Error traceback using a azure storage account&lt;SPAN&gt;&amp;nbsp;path for checkpoint&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;using&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;format("statestore"):&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2024-04-07 09:06:55,381 1956 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/client/core.py", line 1339, in _analyze
    resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata())
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/databricks/python/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: &amp;lt;_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=abfss://unitycatalog@mastercatalogdev.dfs.core.windows.net/tutorials/checkpoints/experiment_2_2, batchId=9, operatorId=0, storeName=default, joinSide=none).
Rerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03"
	debug_error_string = "UNKNOWN:Error received from peer unix:/databricks/sparkconnect/grpc.sock {grpc_message:"[STDS_FAILED_TO_READ_STATE_SCHEMA] Failed to read the state schema. Either the file does not exist, or the file is corrupted. options: StateSourceOptions(checkpointLocation=abfss://unitycatalog@mastercatalogdev.dfs.core.windows.net/tutorials/checkpoints/experiment_2_2, batchId=9, operatorId=0, storeName=default, joinSide=none).\nRerun the streaming query to construct the state schema, and report to the corresponding communities or vendors if the error persists. SQLSTATE: 42K03", grpc_status:13, created_time:"2024-04-07T09:06:55.381219715+00:00"}"
&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;If you need more details about the code, stream, or storage accounts just tell me.&lt;/P&gt;&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
      <pubDate>Sun, 07 Apr 2024 09:12:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/read-structured-streaming-state-information/m-p/65716#M1268</guid>
      <dc:creator>jcozar</dc:creator>
      <dc:date>2024-04-07T09:12:59Z</dc:date>
    </item>
  </channel>
</rss>

