โ03-17-2025 08:08 AM
Hey there,
in our local development flow we heavily rely on databricks asset bundles and databricks connect. Recently, locally run workflows (i.e. just pyspark python files) have begun to frequently fail with the following grpc error:
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Cannot operate on a handle that is closed."
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Cannot operate on a handle that is closed.", grpc_status:13, created_time:"2025-03-17T15:51:24.396549+01:00"}"
This error is non-deterministic and cluster restarts sometimes allow us to run workflows once or twice before the error is appearing again. Might also be coincidental due to the non-deterministic nature, but it seems that some pypsark code fails more often with this error than others.
databricks-connect version: 15.4.7
databricks-sdk: 0.29.0
cluster runtime: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)
Researching this error returns basically zero results, so I'm asking if someone else has received and solved this before or this is some known issue?
Thanks!
โ03-21-2025 02:03 AM
@marcelhfm Are you able to find a soln to this error?
โ03-24-2025 05:47 AM - edited โ03-24-2025 05:49 AM
No, unfortunately not. Have you encountered similar behavior before?
โ03-27-2025 08:23 AM
Hey @marcelhfm
Two questions:
- Was your script working before with the same configuration?
- What are you trying to do?
โ03-28-2025 02:01 AM
Hey,
Was your script working before with the same configuration?
- Yes, this error has only been coming up recently. And it is super undeterministic, comes up couple of times per week.
What are you trying to do?
- We're using Asset Bundles and Databricks Connect to develop pyspark tasks. More specifically, to speed up development flows, we develop pyspark tasks and execute them locally. Later they will be turned into DLT Workflows.
Any more information I can provide you?
โ03-27-2025 10:15 AM - edited โ03-27-2025 10:16 AM
I've had the same issue happen to me today. A previously working workflow using serverless compute that is doing streaming foreachbatch operation. It sort of silently failed and ended up timing out.
โ03-31-2025 01:06 AM
I am encountering the exact same error as marcelhfm on my side as well. Have only very rarely encountered this issue in the past, usually upon a re-run everything worked fine.
Since Friday (note: no code or environment changes made), I'm encountering this issue on almost everything I try to run with databricks connect, which is a major problem for us as it renders most of our local workflows unusable.
โ03-31-2025 01:49 AM - edited โ03-31-2025 01:53 AM
@marcelhfm it might be a Spark Connect issue
I would say it is the same for the rest of you, guys
Nothing much to do until the situation is fixed by Databricks
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now