Databricks Community

marcelhfm · ‎03-17-2025

Hey there,

in our local development flow we heavily rely on databricks asset bundles and databricks connect. Recently, locally run workflows (i.e. just pyspark python files) have begun to frequently fail with the following grpc error:

pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Cannot operate on a handle that is closed."
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Cannot operate on a handle that is closed.", grpc_status:13, created_time:"2025-03-17T15:51:24.396549+01:00"}"

This error is non-deterministic and cluster restarts sometimes allow us to run workflows once or twice before the error is appearing again. Might also be coincidental due to the non-deterministic nature, but it seems that some pypsark code fails more often with this error than others.

databricks-connect version: 15.4.7

databricks-sdk: 0.29.0

cluster runtime: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Researching this error returns basically zero results, so I'm asking if someone else has received and solved this before or this is some known issue?

Thanks!

somya · ‎03-21-2025

@marcelhfm Are you able to find a soln to this error?

marcelhfm · ‎03-24-2025

No, unfortunately not. Have you encountered similar behavior before?

ChrisChieu · ‎03-27-2025

Hey @marcelhfm
Two questions:
- Was your script working before with the same configuration?
- What are you trying to do?

marcelhfm · ‎03-28-2025

Hey,

Was your script working before with the same configuration?
- Yes, this error has only been coming up recently. And it is super undeterministic, comes up couple of times per week.

What are you trying to do?

- We're using Asset Bundles and Databricks Connect to develop pyspark tasks. More specifically, to speed up development flows, we develop pyspark tasks and execute them locally. Later they will be turned into DLT Workflows.

Any more information I can provide you?

cmathieu · ‎03-27-2025

@ChrisChieu

I've had the same issue happen to me today. A previously working workflow using serverless compute that is doing streaming foreachbatch operation. It sort of silently failed and ended up timing out.

lukasstr · ‎03-31-2025

@ChrisChieu

I am encountering the exact same error as marcelhfm on my side as well. Have only very rarely encountered this issue in the past, usually upon a re-run everything worked fine.

Since Friday (note: no code or environment changes made), I'm encountering this issue on almost everything I try to run with databricks connect, which is a major problem for us as it renders most of our local workflows unusable.