03-17-2025 08:08 AM
Hey there,
in our local development flow we heavily rely on databricks asset bundles and databricks connect. Recently, locally run workflows (i.e. just pyspark python files) have begun to frequently fail with the following grpc error:
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Cannot operate on a handle that is closed."
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Cannot operate on a handle that is closed.", grpc_status:13, created_time:"2025-03-17T15:51:24.396549+01:00"}"
This error is non-deterministic and cluster restarts sometimes allow us to run workflows once or twice before the error is appearing again. Might also be coincidental due to the non-deterministic nature, but it seems that some pypsark code fails more often with this error than others.
databricks-connect version: 15.4.7
databricks-sdk: 0.29.0
cluster runtime: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)
Researching this error returns basically zero results, so I'm asking if someone else has received and solved this before or this is some known issue?
Thanks!
03-21-2025 02:03 AM
@marcelhfm Are you able to find a soln to this error?
03-24-2025 05:47 AM - edited 03-24-2025 05:49 AM
No, unfortunately not. Have you encountered similar behavior before?
03-27-2025 08:23 AM
Hey @marcelhfm
Two questions:
- Was your script working before with the same configuration?
- What are you trying to do?
03-28-2025 02:01 AM
Hey,
Was your script working before with the same configuration?
- Yes, this error has only been coming up recently. And it is super undeterministic, comes up couple of times per week.
What are you trying to do?
- We're using Asset Bundles and Databricks Connect to develop pyspark tasks. More specifically, to speed up development flows, we develop pyspark tasks and execute them locally. Later they will be turned into DLT Workflows.
Any more information I can provide you?
03-27-2025 10:15 AM - edited 03-27-2025 10:16 AM
I've had the same issue happen to me today. A previously working workflow using serverless compute that is doing streaming foreachbatch operation. It sort of silently failed and ended up timing out.
03-31-2025 01:06 AM
I am encountering the exact same error as marcelhfm on my side as well. Have only very rarely encountered this issue in the past, usually upon a re-run everything worked fine.
Since Friday (note: no code or environment changes made), I'm encountering this issue on almost everything I try to run with databricks connect, which is a major problem for us as it renders most of our local workflows unusable.
03-31-2025 01:49 AM - edited 03-31-2025 01:53 AM
@marcelhfm it might be a Spark Connect issue
I would say it is the same for the rest of you, guys
Nothing much to do until the situation is fixed by Databricks
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now