Replay a stream after converting to liquid cluster failes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-04-2025 06:42 AM
I have problem replaying a stream.
I need to replay it because conversion from liquid cluster
to partition doesnt work. I see a lot of garbage collection
and memory maxes out immediatly. Then the driver restarts.
TO debug the problem I try to force only 1 record to be read so I run:
reader = reader.load(bronze_path).select("*", "_metadata").limit(1)
print("going to collect reader. Only 1 record")
reader.collect()
This gives error:
File /databricks/spark/python/pyspark/sql/connect/client/core.py:2155, in SparkConnectClient._handle_rpc_error(self, rpc_error)
2140 raise Exception(
2141 "Python versions in the Spark Connect client and server are different. "
2142 "To execute user-defined functions, client and server should have the "
(...)
2151 "https://docs.databricks.com/en/release-notes/serverless.html" target="_blank" rel="noopener noreferrer">https://docs.databricks.com/en/release-notes/serverless.html</a>.</span><span>"
2152 )
2153 # END-EDGE
-> 2155 raise convert_exception(
2156 info,
2157 status.message,
2158 self._fetch_enriched_error(info),
2159 self._display_server_stack_trace(),
2160 ) from None
2162 raise SparkConnectGrpcException(status.message) from None
2163 else:
And this is the explanation chatgpt gives which i find strange since its running on databricks:
This error usually means there's a mismatch between the Python versions (or environments) used by your Spark Connect client and the Spark server. When you call collect(), Spark Connect tries to run user‐defined code, and if the Python versions differ the operation fails.
To fix this you should ensure that:
• The Python version in your client environment (where you're running the code) exactly matches the Python version configured on the Spark server/cluster.
• All relevant libraries are consistent between client and server.
Once they match, the collect() call should work without the RPC error.